Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynatrace v2] Ensure synchronization when taking snapshots #3615

Merged

Conversation

pirgeo
Copy link
Contributor

@pirgeo pirgeo commented Feb 1, 2023

We found a timing bug that can appear when using the Dynatrace v2 exporter, usually only when recording data at high volumes. In previous versions, recording a snapshot was not a synchronized operation. That means that it was possible to write data to a DynatraceSummary, while the snapshot was being created. In very rare cases this can lead to invalid data. This PR moves the synchronization while taking a snapshot to the innermost level, and ensures that no writes can happen while a Snapshot is produced.

Edit: As requested below, here is some more context:

This error is really hard to reproduce, since it only happens in a very limited number of cases. One case would be: A counter recording only ones, but a lot of them. The resulting export would have min=1, max=1, sum=999 and count=1000. It means that between reading the sum (999) and the count (1000), another 1 was recorded. This is inconsistent data, and is therefore rejected with an error message like:

inconsistent gauge fields: min <= avg <= max doesn't hold: min: 1.000000, max: 1.000000, sum: 500467.000000, count: 500468, avg: 0.999998, tolerance: 0.000001

With this change, it is possible that there is a very small delay when recording a new value at the exact time a snapshot is taken.

@pirgeo
Copy link
Contributor Author

pirgeo commented Feb 8, 2023

@shakuzen or @jonatan-ivanov, could you take a look? I would love to get this into the next patch release, if possible 😄

Copy link
Member

@jonatan-ivanov jonatan-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I applied the parameter change suggestion, the build is broken and I can't fix it on the branch but I think I can fix it with applying these suggestions. :)

@jonatan-ivanov jonatan-ivanov added bug A general bug registry: dynatrace A Dynatrace Registry related issue labels Feb 9, 2023
@jonatan-ivanov jonatan-ivanov added this to the 1.9.8 milestone Feb 9, 2023
@jonatan-ivanov jonatan-ivanov merged commit 67a409a into micrometer-metrics:1.9.x Feb 9, 2023
@jonatan-ivanov
Copy link
Member

@pirgeo Thank you for fixing this!
Could you please do two more things?

  1. Could you please add a note in the description that with this change recordings can be blocked during publishing for a very small amount of time while we get and reset the values for the snapshot?
  2. Could you please add some example error message that Dynatrace will produce if this happens so users who get it and search on it might discover this issue?

@pirgeo
Copy link
Contributor Author

pirgeo commented Feb 9, 2023

@jonatan-ivanov done! Should I send another PR for adding the missing test that you pointed out?

Edit: Created a follow up, see here: #3626

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug registry: dynatrace A Dynatrace Registry related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants