Add metrics aggregation capabilities #26

This patch add support for the summary type on the metrics layer. A summary is a different kind of histogram, it's buckets are percentile so the reporting layer (i.e. Prometheus for example) would know to report it correctly. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

This patch adds a missing part to how histograms are being aggregated, it needs to aggregate the sum and count as well. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Aggregate labels are a mechanism for reporting aggregated results. Most commonly it allows to report one histogram per node instead of per shard. This patch adds an option to mark a metric with a vector of labels. That vector will be part of the metric meta-data so the reporting layer would be able to aggregate over it. Skip when empty, means that metrics that are not in used, will not be reported. A common scenario is that user register a metrics, but that metrics is never used. The most common case is histogram and summary but it it can also happen with counters. This patch adds an option to mark a metric with skip_when_empty. When done so, if a metric was never used (true for histogram, counters and summary) it will not be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

This patch adds multiple functionality to Prometheus reporting: 1. Add summary reporting. Summaries are used in seastar to report aggregated percentile information (for example p95 and p99) The main usage is to report per-shard summary of a latency histograms. 2. Support aggregated metrics. With an aggregated metrics, Prometheus would aggregate multiple metrics based on labels and would report the result. Usually this would be for reporting a single latency histogram per node instead of per shard. But it could be used for counters and gauge as well. 3. Skip empty counters, histograms and summaries. It's a common practice to register lots of metrics even if they are not being used. Histograms have a huge effect on performance, so not reporting an empty histogram is a great performance boost both for the application and for the Prometheus server. This is true for Summaries and Counters as well, marking a metrics with skip_when_empty would mean Prometheus will not report those metrics. 4. As an optimization, the stringstream that is used per metric is reused and clear insted of recreated. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics aggregation capabilities #26

Add metrics aggregation capabilities #26

Commits on Jun 24, 2022