Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: trim large events in Athena querier #35402

Merged
merged 3 commits into from Dec 7, 2023
Merged

Conversation

nklaassen
Copy link
Contributor

@nklaassen nklaassen commented Dec 6, 2023

Fixes #35161

Large events queried from the Athena audit backend will now be trimmed before they are stored and before they are returned from a query according to the existing TrimToMaxSize implementations for each event type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for Dynamo this is due to the 400 KB item size limit, for the file backend it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but we've been using a 2 GiB limit in the publisher so far. With this change we will attempt to trim events to 2 GiB before writing them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an empty result when an event larger than that is encountered. With this change we will attempt to trim the event to 1MiB before returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve as much of the event data as possible in case we improve the querying story for large events in the future (and in case the user wants to query the events directly from S3).

#35440 adds metrics for sizes of all events emitted and how many were trimmed.

Changelog: fix querying of large audit events with Athena backend

@github-actions github-actions bot added audit-log Issues related to Teleports Audit Log size/md labels Dec 6, 2023
@gravitational gravitational deleted a comment from github-actions bot Dec 6, 2023
@gravitational gravitational deleted a comment from github-actions bot Dec 6, 2023
Fixes #35161

Large events queried from the Athena audit backend will now be trimmed
before they are stored and before they are returned from a query
according to the existing TrimToMaxSize implementations for each event
type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for
Dynamo this is due to the 400 KB item size limit, for the file backend
it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but
we've been using a 2 GiB limit in the publisher so far.
With this change we will attempt to trim events to 2 GiB before writing
them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an
empty result when an event larger than that is encountered.
With this change we will attempt to trim the event to 1MiB before
returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve
as much of the event data as possible in case we improve the querying
story for large events in the future (and in case the user wants to
query the events directly from S3).
lib/events/athena/querier.go Outdated Show resolved Hide resolved
nklaassen and others added 2 commits December 6, 2023 14:11
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
@nklaassen nklaassen added this pull request to the merge queue Dec 7, 2023
Merged via the queue into master with commit 935bb62 Dec 7, 2023
33 checks passed
@nklaassen nklaassen deleted the nklaassen/s3-large-events branch December 7, 2023 02:11
@public-teleport-github-review-bot

@nklaassen See the table below for backport results.

Branch Result
branch/v14 Create PR

nklaassen added a commit that referenced this pull request Jan 26, 2024
Backport #35402 to branch/v13
Fixes #35161

Large events queried from the Athena audit backend will now be trimmed
before they are stored and before they are returned from a query
according to the existing TrimToMaxSize implementations for each event
type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for
Dynamo this is due to the 400 KB item size limit, for the file backend
it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but
we've been using a 2 GiB limit in the publisher so far.
With this change we will attempt to trim events to 2 GiB before writing
them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an
empty result when an event larger than that is encountered.
With this change we will attempt to trim the event to 1MiB before
returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve
as much of the event data as possible in case we improve the querying
story for large events in the future (and in case the user wants to
query the events directly from S3).
github-merge-queue bot pushed a commit that referenced this pull request Jan 26, 2024
* [v13] fix: trim large events in Athena querier

Backport #35402 to branch/v13
Fixes #35161

Large events queried from the Athena audit backend will now be trimmed
before they are stored and before they are returned from a query
according to the existing TrimToMaxSize implementations for each event
type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for
Dynamo this is due to the 400 KB item size limit, for the file backend
it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but
we've been using a 2 GiB limit in the publisher so far.
With this change we will attempt to trim events to 2 GiB before writing
them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an
empty result when an event larger than that is encountered.
With this change we will attempt to trim the event to 1MiB before
returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve
as much of the event data as possible in case we improve the querying
story for large events in the future (and in case the user wants to
query the events directly from S3).

* [v13] feat: add metrics for event sizes

Backport #35440 to branch/v13

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit-log Issues related to Teleports Audit Log backport/branch/v14 size/md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 audit event paging gets stuck on single large event
3 participants