`awscloudwatch` input drops data #38918

faec · 2024-04-13T14:32:10Z

The awscloudwatch input can skip data in its target log groups, with severity depending on configuration and log size. This seems to apply to all platforms and all versions at least since 8.0.

Easy reproduction:

Set number_of_workers to 1 (the default)
Set start_position to beginning (the default)
Set log_group_name_prefix to a value matching 2 or more log groups

The first matching log group will ingest data starting from the beginning, but all other log groups will only include data from after ingestion began.

This loss continues during ingestion: events from any time span will only include data from at most one log group at a time.

More finicky reproduction with a single log group:

Set number_of_workers to 1 (the default)
Set start_position to beginning (the default)
Target a single log group with a significant amount of past data (enough to require significantly longer than the scan_frequency to ingest -- optionally set scan_frequency to 1s to make this easier)

Data added to the log group between the start of ingestion and the completion of the first scan will be skipped.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-04-13T14:32:12Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Fix a bug in cloudwatch worker allocation that could cause data loss (#38918). The previous behavior wasn't really tested, since worker tasks were computed in cloudwatchPoller's polling loop which required live AWS connections. So in addition to the basic logical fix, I did some refactoring to cloudwatchPoller that makes the task iteration visible to unit tests.

Fix a bug in cloudwatch worker allocation that could cause data loss (#38918). The previous behavior wasn't really tested, since worker tasks were computed in cloudwatchPoller's polling loop which required live AWS connections. So in addition to the basic logical fix, I did some refactoring to cloudwatchPoller that makes the task iteration visible to unit tests. (cherry picked from commit deece39)

Fix a bug in cloudwatch worker allocation that could cause data loss (#38918). The previous behavior wasn't really tested, since worker tasks were computed in cloudwatchPoller's polling loop which required live AWS connections. So in addition to the basic logical fix, I did some refactoring to cloudwatchPoller that makes the task iteration visible to unit tests. (cherry picked from commit deece39) Co-authored-by: Fae Charlton <fae.charlton@elastic.co>

faec added bug Team:Elastic-Agent Label for the Agent team Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team labels Apr 13, 2024

faec self-assigned this Apr 13, 2024

faec added Team:Cloud-Monitoring Label for the Cloud Monitoring team and removed Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team labels Apr 15, 2024

This was referenced Apr 15, 2024

Fix awscloudwatch worker allocation #38953

Merged

Meta: Improve performance and reliability of awss3 and awscloudwatch inputs #38956

Open

faec added Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team and removed Team:Cloud-Monitoring Label for the Cloud Monitoring team labels Apr 15, 2024

faec closed this as completed in #38953 Apr 23, 2024

mergify bot mentioned this issue Apr 23, 2024

[8.14](backport #38953) Fix awscloudwatch worker allocation #39164

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`awscloudwatch` input drops data #38918

`awscloudwatch` input drops data #38918

faec commented Apr 13, 2024 •

edited

elasticmachine commented Apr 13, 2024

awscloudwatch input drops data #38918

awscloudwatch input drops data #38918

Comments

faec commented Apr 13, 2024 • edited

elasticmachine commented Apr 13, 2024

`awscloudwatch` input drops data #38918

`awscloudwatch` input drops data #38918

faec commented Apr 13, 2024 •

edited