Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure for better large bucket support (Fixes #138, #128. #100, #80, #54, and #14) #249

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

eherot
Copy link

@eherot eherot commented Nov 2, 2023

(This is a reworking of the now defunct #84)

What this changes

  • Use a synchronous queue to handle the processing of events in parallel
  • Starts sending events as soon as they are ingested rather than trying to scan the entire bucket first
  • Switch to aws-sdk v3
  • Add a lot of logging
  • Enable the use of the SDK's start_after parameter call to fetch only new events (useful in cases where objects are stored in alphabetical order by time, such as S3 access logs)
  • Limit the batch size of the S3 request

May/Should fix a number of open issues either directly or by virtue of replacing the problem code entirely

Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/

@eherot eherot force-pushed the improve-large-bucket-performance branch 2 times, most recently from 8822db0 to ce1edd1 Compare November 2, 2023 21:25
* Use a synchronous queue to handle the processing of events in parallel
* Switch to aws-sdk v3
* Add a lot of logging
* Enable the use of the SDK's start_after parameter call to fetch only new events (useful in cases where objects are stored in alphabetical order by time, such as S3 access logs)
* Limit the batch size of the S3 request
@eherot eherot force-pushed the improve-large-bucket-performance branch from ce1edd1 to c32c13f Compare November 3, 2023 17:41
@eherot eherot force-pushed the improve-large-bucket-performance branch from 3bcd04f to a38b42a Compare November 3, 2023 18:50
@eherot eherot force-pushed the improve-large-bucket-performance branch 15 times, most recently from e83ec3d to bbbf9c1 Compare November 5, 2023 22:29
@eherot eherot force-pushed the improve-large-bucket-performance branch from 119d4a6 to 9d85332 Compare November 5, 2023 22:55
@eherot eherot force-pushed the improve-large-bucket-performance branch from fc28c9b to 209e919 Compare January 29, 2024 17:21
@eherot eherot force-pushed the improve-large-bucket-performance branch from 209e919 to 68081ff Compare January 29, 2024 17:33
@eherot eherot force-pushed the improve-large-bucket-performance branch from 68081ff to 05b4a65 Compare January 29, 2024 17:36
@eherot eherot force-pushed the improve-large-bucket-performance branch from beaa532 to 2aae3bc Compare February 6, 2024 19:29
@roaksoax
Copy link

Hi @eherot

The Logstash S3 input is now been shipped as part of https://github.com/logstash-plugins/logstash-integration-aws , rather than the individual plugin. Not sure if you want to re-target your PR against it?

@eherot
Copy link
Author

eherot commented May 15, 2024

Thanks for the heads up. Let me see how much work that's going to be...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants