New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: trim large events in Athena querier #35402
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes #35161 Large events queried from the Athena audit backend will now be trimmed before they are stored and before they are returned from a query according to the existing TrimToMaxSize implementations for each event type already used by the Dynamo and File backends. The other backends typically trim the event before storing it, for Dynamo this is due to the 400 KB item size limit, for the file backend it's due to the 64 KiB bufio.MaxScanTokenSize. There is no hard limit to events stored in Parquet files in S3, but we've been using a 2 GiB limit in the publisher so far. With this change we will attempt to trim events to 2 GiB before writing them (if we haven't already run out of memory) instead of just failing. We've also been using a 1 MiB limit in the querier and just returning an empty result when an event larger than that is encountered. With this change we will attempt to trim the event to 1MiB before returning it. The 1 MiB limit ultimately stems from the 4MB max gRPC message size. We could just trim to 1 MiB in the publisher, but I'd prefer to preserve as much of the event data as possible in case we improve the querying story for large events in the future (and in case the user wants to query the events directly from S3).
nklaassen
force-pushed
the
nklaassen/s3-large-events
branch
from
December 6, 2023 18:56
a16b4af
to
06a3847
Compare
rosstimothy
approved these changes
Dec 6, 2023
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
mdwn
approved these changes
Dec 7, 2023
public-teleport-github-review-bot
bot
removed request for
fheinecke and
fspmarshall
December 7, 2023 01:18
@nklaassen See the table below for backport results.
|
nklaassen
added a commit
that referenced
this pull request
Jan 26, 2024
Backport #35402 to branch/v13 Fixes #35161 Large events queried from the Athena audit backend will now be trimmed before they are stored and before they are returned from a query according to the existing TrimToMaxSize implementations for each event type already used by the Dynamo and File backends. The other backends typically trim the event before storing it, for Dynamo this is due to the 400 KB item size limit, for the file backend it's due to the 64 KiB bufio.MaxScanTokenSize. There is no hard limit to events stored in Parquet files in S3, but we've been using a 2 GiB limit in the publisher so far. With this change we will attempt to trim events to 2 GiB before writing them (if we haven't already run out of memory) instead of just failing. We've also been using a 1 MiB limit in the querier and just returning an empty result when an event larger than that is encountered. With this change we will attempt to trim the event to 1MiB before returning it. The 1 MiB limit ultimately stems from the 4MB max gRPC message size. We could just trim to 1 MiB in the publisher, but I'd prefer to preserve as much of the event data as possible in case we improve the querying story for large events in the future (and in case the user wants to query the events directly from S3).
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jan 26, 2024
* [v13] fix: trim large events in Athena querier Backport #35402 to branch/v13 Fixes #35161 Large events queried from the Athena audit backend will now be trimmed before they are stored and before they are returned from a query according to the existing TrimToMaxSize implementations for each event type already used by the Dynamo and File backends. The other backends typically trim the event before storing it, for Dynamo this is due to the 400 KB item size limit, for the file backend it's due to the 64 KiB bufio.MaxScanTokenSize. There is no hard limit to events stored in Parquet files in S3, but we've been using a 2 GiB limit in the publisher so far. With this change we will attempt to trim events to 2 GiB before writing them (if we haven't already run out of memory) instead of just failing. We've also been using a 1 MiB limit in the querier and just returning an empty result when an event larger than that is encountered. With this change we will attempt to trim the event to 1MiB before returning it. The 1 MiB limit ultimately stems from the 4MB max gRPC message size. We could just trim to 1 MiB in the publisher, but I'd prefer to preserve as much of the event data as possible in case we improve the querying story for large events in the future (and in case the user wants to query the events directly from S3). * [v13] feat: add metrics for event sizes Backport #35440 to branch/v13 --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #35161
Large events queried from the Athena audit backend will now be trimmed before they are stored and before they are returned from a query according to the existing TrimToMaxSize implementations for each event type already used by the Dynamo and File backends.
The other backends typically trim the event before storing it, for Dynamo this is due to the 400 KB item size limit, for the file backend it's due to the 64 KiB bufio.MaxScanTokenSize.
There is no hard limit to events stored in Parquet files in S3, but we've been using a 2 GiB limit in the publisher so far. With this change we will attempt to trim events to 2 GiB before writing them (if we haven't already run out of memory) instead of just failing.
We've also been using a 1 MiB limit in the querier and just returning an empty result when an event larger than that is encountered. With this change we will attempt to trim the event to 1MiB before returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.
We could just trim to 1 MiB in the publisher, but I'd prefer to preserve as much of the event data as possible in case we improve the querying story for large events in the future (and in case the user wants to query the events directly from S3).
#35440 adds metrics for sizes of all events emitted and how many were trimmed.
Changelog: fix querying of large audit events with Athena backend