Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v22.2.x] metrics: ensure consistent histogram upper bounds #9219

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

VladLazar
Copy link
Contributor

Backport of PR #9192

Mostly clean backport. There was a conflict in CMakeLists.txt, but the logic was clean.

In order to publish histogram metrics, we convert from our internal HDR
histogram representation, to Seastar's histogram type. Seastar expects
the histograms of a given metric to always have the same number of
buckets, with idential upper bounds.

If that's not the case, it will throw whilst handling the request to
fetch metrics. The visible behaviour for this is different depending on
the Redpanda version where the issue occurs. For v22.2.x, the connection
for the metrics request will be killed, and, therefore the metrics are
truncated. For later versions, Redpanda crashes as the exception happens
on a noexcept code-path.

This commit ensures that the upper bounds of Seastar histograms
generated from the same HDR histogram are always the same. Previously,
the we used `hdr_iter_log_init.iterated_to` as the upper bound for the
bucket. However, (due to a bug?) the C HDR histogram implementation,
does not update this field under certain condititions. To fix this, we
keep track of the value we've iterated to outside of the library's
iterator.

I noticed a couple other issues with the conversion logic and they're
also fixed by this commit:
* We used the `hdr_iter_log_init.iterated_to` for the upper bound of the
  Seastar histogram bucket. Internally, HDR histogram uses a
  representation similar to floating point numbers, which means that it
  can't actually represent every value in the range, and it treats
  spans(buckets) of values as equivalent. Therefore, the upper bound we
  reported was not correct. This has been fixed by reporting the highest
  equivalent value instead.
* The smallest discernible value is a configurable knob of HDR
  histogram. When that's set, it doesn't make sense to iterate over the
  histogram from values smaller than that. We did previously, but it's
  been fixed by taking the maximum between the requested start value
  and the configured minimum discernible value.

(cherry picked from commit bb55761)
@jcsp jcsp merged commit 6023a01 into redpanda-data:v22.2.x Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants