Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] AWS S3 direct listing input: states registry purged only on "stored" and not on "error" #33513

Closed
aspacca opened this issue Nov 1, 2022 · 2 comments · Fixed by #33722
Assignees
Labels
bug Team:Cloud-Monitoring Label for the Cloud Monitoring team

Comments

@aspacca
Copy link
Contributor

aspacca commented Nov 1, 2022

In the AWS S3 direct listing input we keep the state of the listed S3 objects in the registry in order to decide if an S3 object during a current listing has to be ingested or was already ingested and has to be skipped

We have some logic to purge ingested S3 objects based on a "commit timestamp", in order to not let grow the registry indefinitely.

We apply the comparison of the commit timestamp (and eventually purge them) only to S3 objects that are marked with state.Stored = true

We forgot the fact that an S3 object could also be marked with state.Error = true

This can lead to the fact that S3 objects where an error occurred during the ingestion won't be purged and could be ingested again (potentially not the whole file, but at least some of the events it contains)

We should add an AND condition with the state.Error mark

@aspacca aspacca added the bug label Nov 1, 2022
@aspacca aspacca self-assigned this Nov 1, 2022
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 1, 2022
@aspacca aspacca added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Nov 1, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 1, 2022
@andrewkroh
Copy link
Member

and could be ingested again (potentially not the whole file, but at least some of the events it contains)

I'm curious how partial read works. Does it retry and resume from a byte offset stored in the registry? Or does it read the whole file and assume that Elasticsearch will deduplicate based on _id.

@aspacca
Copy link
Contributor Author

aspacca commented Nov 2, 2022

I'm curious how partial read works. Does it retry and resume from a byte offset stored in the registry? Or does it read the whole file and assume that Elasticsearch will deduplicate based on _id.

it's not handled at the moment, so it will read the whole file again, and elasticsearch will indeed deduplicate based on _id, but that's was not properly intended. indeed it was something I forgot to manage properly and we could probably store the failing offsets in the registry and do a minimum amount of retries and the just purge the file. not sure how this align with at-least-once delivery: any suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants