Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

histograms: Reset ASAP after resolution reduction #1248

Closed
beorn7 opened this issue Apr 8, 2023 · 1 comment · Fixed by #1367
Closed

histograms: Reset ASAP after resolution reduction #1248

beorn7 opened this issue Apr 8, 2023 · 1 comment · Fixed by #1367

Comments

@beorn7
Copy link
Member

beorn7 commented Apr 8, 2023

While analysing real-world production data, I had an insight:

If we hit the configured NativeHistogramMaxBucketNumber before the configured NativeHistogramMinResetDuration has passed since the last reset, we'll reduce the resolution instead. So far so good.

However, once we have reduced the resolution, it is much less likely to ever hit NativeHistogramMaxBucketNumber again. Therefore, we might very well stay at the reduced resolution forever (i.e. for the lifetime of the binary). It would be better to reset the histogram immediately after NativeHistogramMinResetDuration has passed, because this will reinstate the originally configured resolution.

More context: In the real-world production scenario I have analysed, it is indeed a very rare even that the resolution has to be reduced, mostly triggered by short peaks of more intense traffic. However, because of the reason explained above, it essentially means that a single occurrence will cause the affected binary to expose a reduced-resolution histogram for the rest of its lifetime. Since histograms are often aggregated together, this needlessly reduced the resolution of the aggregated results with quite some blast radius.

@beorn7 beorn7 added this to the Native Histograms milestone Apr 8, 2023
@beorn7 beorn7 self-assigned this Apr 8, 2023
@beorn7
Copy link
Member Author

beorn7 commented Apr 8, 2023

/cc @codesome for academic reasons and @fstab and @csmarchbanks because you are working on other instrumentation libraries.

beorn7 added a commit that referenced this issue Apr 25, 2023
Fixes #1248.

Signed-off-by: beorn7 <beorn@grafana.com>
beorn7 added a commit that referenced this issue Oct 13, 2023
Fixes #1248. See issue description for all the details.

Signed-off-by: beorn7 <beorn@grafana.com>
bwplotka pushed a commit that referenced this issue Oct 19, 2023
#1367)

Fixes #1248. See issue description for all the details.

Signed-off-by: beorn7 <beorn@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant