Prometheus enhancement #1120

amnonh · 2022-06-30T09:27:58Z

Motivation:
Histograms are the Prometheus killer. They are big to send over the
network and are big to store in Prometheus.
Histograms' size is always an issue, but with Seastar, it becomes even trickier.
Typically, each shard would collect its histograms, so the overall data
is multiplied by the number of shards.

This series addresses the need to report quantile information like latency
without generating massive metrics reports.

A summary is a Prometheus metric type that holds a quantile summary (i.e.
p95, p99).

The downside of summaries is that they cannot be aggregated, which is
needed for a distributed system (i.e., calculate the p99 latency of a cluster).

The series adds four tools for Prometheus performance:

Add summaries.
Optionally, remove empty metrics. It's common to register metrics for
optional services. It is now possible to mark those metrics as
skip_when_empty and they will not be reported if they were never used.
Allow aggregating metrics. The most common case is reporting per-node
histogram instead of per shard. For example, for multi-nodes quantile calculation,
we need a per-node histogram. It is now possible to mark a registered
metric for aggregation. The metrics layer will aggregate this
metric based on a list of labels. (Typically, this will be by shard,
but it could be any other combination of labels).
Reuse the stringstream instead of recreating an object on each
iteration.

V6
changed the name of the skip_when_empty variable to should_skip_when_empty to make the compiler happy.

V5
Main changes:

split the aggregate and skip_on_empty into two patches
Use bool_class for skip_when_empty this also allows using the parenthesis
operator to mark that a metric should use skip_when_empty.
Fix some typos in the comments
Add noexcept when needed
Change "\n" to '\n' when possible.

V4
Aggregation and skip_when_empty are now generalized in a sense that it's
a per metric configuration and can be used on any metric, not just
histogram.

The 'aggregate by' now accept labels instead of string which makes the API
cleaner.

The stringstream that is used inside Prometheus text reporting is now
reused instead of recreating on each iteration.

The aggregator object in Prometheus was rename and some comments were
added to make its use clearer.

V3
The main issue with the previous version was that it was too specific.
This version breaks the functionality and pushes some of the logic to
whoever uses the metrics layer.

Summaries are now just another kind of histogram, with no special magic
around them and no need for their own type.

A user can use any combination of labels for aggregation.
It's currently only applied to histograms, but we can do it with other
metrics if it will be helpful.

In the current implementation, to report a summary per shard and a single
histogram per node, the user of the metrics layer would register two metrics,
one for the summary and one for the histogram.

This patch add support for the summary type on the metrics layer. A summary is a different kind of histogram, it's buckets are percentile so the reporting layer (i.e. Prometheus for example) would know to report it correctly. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

This patch adds a missing part to how histograms are being aggregated, it needs to aggregate the sum and count as well. Fixes scylladb#1114 Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Skip when empty, means that metrics that are not in used, will not be reported. A common scenario is that a user register a metric, but that metric is never used. The most common case is histogram and summary but it is also valid for counters. This patch adds an option to mark a metric with skip_when_empty. When done so, if a metric was never used (true for histogram, counters and summaries) it will not be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Aggregate by labels is a mechanism for reporting aggregated results. Most commonly it is used to report one histogram per node instead of per shard, but it can be used for counters, and gauges. This patch adds an option to mark a metric with a vector of labels. That vector will be part of the metric meta-data so the reporting layer would be able to aggregate over it.

This patch adds multiple functionality to Prometheus reporting: 1. Add summary reporting. Summaries are used in seastar to report aggregated percentile information (for example p95 and p99) The main usage is to report per-shard summary of a latency histograms. 2. Support aggregated metrics. With an aggregated metrics, Prometheus would aggregate multiple metrics based on labels and would report the result. Usually this would be for reporting a single latency histogram per node instead of per shard. But it could be used for counters and gauge as well. 3. Skip empty counters, histograms and summaries. It's a common practice to register lots of metrics even if they are not being used. Histograms have a huge effect on performance, so not reporting an empty histogram is a great performance boost both for the application and for the Prometheus server. This is true for Summaries and Counters as well, marking a metric with skip_when_empty would mean Prometheus will not report the metric if it was never used. 4. As an optimization, the stringstream that is used per metric is reused and clear insted of recreated. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

avikivity · 2022-06-30T09:50:29Z

Please use more descriptive subjects.

nyh · 2022-06-30T11:32:19Z

Also please don't include the version history as part of the commit message - just describe what you ended up with.
I'll fix both problems and commit.

amnonh added 5 commits June 23, 2022 15:58

metrics.cc: missing count and sum when aggregating histograms

093536f

This patch adds a missing part to how histograms are being aggregated, it needs to aggregate the sum and count as well. Fixes scylladb#1114 Signed-off-by: Amnon Heiman <amnon@scylladb.com>

nyh merged commit ce78808 into scylladb:master Jun 30, 2022

amnonh mentioned this pull request Mar 12, 2024

Adding Metrics family config #2121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus enhancement #1120

Prometheus enhancement #1120

amnonh commented Jun 30, 2022

avikivity commented Jun 30, 2022

nyh commented Jun 30, 2022

Prometheus enhancement #1120

Prometheus enhancement #1120

Conversation

amnonh commented Jun 30, 2022

avikivity commented Jun 30, 2022

nyh commented Jun 30, 2022