Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cloudwatch metricset collecting duplicate data points #27248

Merged
merged 2 commits into from Aug 5, 2021
Merged

Fix cloudwatch metricset collecting duplicate data points #27248

merged 2 commits into from Aug 5, 2021

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Aug 5, 2021

What does this PR do?

When a CloudWatch data point lands on the start time or end time, it gets collected twice. This PR is to fix this bug by moving the start time by one second so the end time of the first collection period does not overlap with the start time of the second collection period.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 5, 2021
@kaiyan-sheng kaiyan-sheng self-assigned this Aug 5, 2021
@kaiyan-sheng kaiyan-sheng added backport-v7.14.0 Automated backport with mergify backport-v7.15.0 Automated backport with mergify Team:Integrations Label for the Integrations team labels Aug 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 5, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 5, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-08-05T03:49:38.803+0000

  • Duration: 76 min 30 sec

  • Commit: 5ef0239

Test stats 🧪

Test Results
Failed 0
Passed 2591
Skipped 231
Total 2822

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2591
Skipped 231
Total 2822

@aspacca aspacca merged commit b8cbdee into elastic:master Aug 5, 2021
mergify bot pushed a commit that referenced this pull request Aug 5, 2021
* Fix cloudwatch metricset collecting duplicate data points

* add changelog

(cherry picked from commit b8cbdee)
mergify bot pushed a commit that referenced this pull request Aug 5, 2021
* Fix cloudwatch metricset collecting duplicate data points

* add changelog

(cherry picked from commit b8cbdee)
aspacca pushed a commit that referenced this pull request Aug 5, 2021
…27253)

* Fix cloudwatch metricset collecting duplicate data points

* add changelog

(cherry picked from commit b8cbdee)

Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
aspacca pushed a commit that referenced this pull request Aug 5, 2021
…27252)

* Fix cloudwatch metricset collecting duplicate data points

* add changelog

(cherry picked from commit b8cbdee)

Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
@kaiyan-sheng kaiyan-sheng deleted the fix_duplicate_metrics branch August 8, 2021 21:28
@exekias
Copy link
Contributor

exekias commented Aug 9, 2021

While I'm +1 to with this change I think it won't fix all cases we are seeing? We double the query period to ensure we always get same data, but we are seeing this can result on intermittent data duplication when the latency is big enough.

I wonder if we should invest in better handling of these latency situations and reduce the range we request to the period. Thoughts @kaiyan-sheng? Some things we could do:

  • Always apply a latency of 1 full period (this will, of course, add a delay to the results)
  • Detect when a query returns no items and automatically increase latency there (with some sensible maximum)
  • Leave this to the user and document how they should use latency, provide good warnings in the logs

@kaiyan-sheng
Copy link
Contributor Author

Yep thanks @exekias this is just a temporary fix for the bug as a workaround. I'm working on a permanent fix for this issue.

  1. Instead of checking twice the period size time frame, we only need to check one period length time frame. For example: if current timestamp is 2021-08-09 21:24:18.31 and period is 5m, then startTime=2021-08-09 21:21:48, endTime=2021-08-09 21:26:48.31
  2. Im also thinking to cache the previous collection timestamp so to make sure no duplicate data is collected.

@exekias
Copy link
Contributor

exekias commented Aug 10, 2021

  • Instead of checking twice the period size time frame, we only need to check one period length time frame. For example: if current timestamp is 2021-08-09 21:24:18.31 and period is 5m, then startTime=2021-08-09 21:21:48, endTime=2021-08-09 21:26:48.31

Does it make sense to look for dates in the future? Any concern with known delays on the cloudwatch side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.14.0 Automated backport with mergify backport-v7.15.0 Automated backport with mergify bug Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metricbeat] GetStartTimeEndTime potential causing duplicate data
4 participants