Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using native histogram version of latency metrics in Mimir #7154

Open
Tracked by #7229
krajorama opened this issue Jan 17, 2024 · 0 comments
Open
Tracked by #7229

Allow using native histogram version of latency metrics in Mimir #7154

krajorama opened this issue Jan 17, 2024 · 0 comments

Comments

@krajorama
Copy link
Contributor

krajorama commented Jan 17, 2024

Is your feature request related to a problem? Please describe.

The classic histogram cortex_request_duration_seconds_(bucket|count|sum) now has a native histogram version called cortex_request_duration_seconds. Users should be able to drop the classic histogram series and just use the native histogram that has better quality and performance.

The solution should implement the above for all latency related metrics that Mimir has. Tracked in #5020 .

Describe the solution you'd like

Update recording rules and alerts to be able to use the new native histograms. It will be up to the end user to decide which ones to scrape.

Update dashboards to be able to show either classic or native histograms based on a dashboard variable.

Describe alternatives you've considered

We considered many alternatives:

  • Simply use OR in PromQL between classic and native queries - but this can produce some unexpected results due to range vector selectors under rate working on very little data during migration.
  • Duplicate the whole dashboard - makes it inconvenient to look at old data.
  • Allow setting a timestamp to use as a guideline for PromQL when to query which data - can be inconvenient to set up.
  • Try to show both classic and native at the same time - implementation became complex.
  • Require users to keep both histograms inside their retention time - this is still recommended, but may be too expensive.
  • Implement some kind of automatic detection / translation in Prometheus or Grafana - implementation became too complex.

Additional context

https://grafana.com/docs/mimir/v2.11.x/send/native-histograms/
https://grafana.com/docs/mimir/v2.11.x/visualize/native-histograms/

@krajorama krajorama changed the title Allow using native histogram version of cortex_request_duration_seconds Allow using native histogram version of latency metrics in Mimir Feb 26, 2024
krajorama added a commit that referenced this issue Mar 20, 2024
Allow switching between basing status on classic or native version
of cortex_request_duration_seconds.

Related to #7154

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
krajorama added a commit that referenced this issue Apr 8, 2024
* dashboards: overview: use native histograms in status

Allow switching between basing status on classic or native version
of cortex_request_duration_seconds.

Related to #7154

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment