Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics aggregation capabilities #26

Merged
merged 4 commits into from
Jun 27, 2022

Conversation

VladLazar
Copy link

@VladLazar VladLazar commented Jun 20, 2022

This PR contains a set of patches from the seastar mailing list (patch set and repo). They add aggregations
support to the seastar metrics subsystem. This PR used to be based on an older version of the upstream patch set,
but Ben fed back our findings and the new version matches our needs as is.

Mailing List Cover Letter

Motivation:
Histograms are the Prometheus killer. They are big to send over the
network and are big to store in Prometheus.
Histograms' size is always an issue, but with Seastar, it becomes even trickier.
Typically, each shard would collect its histograms, so the overall data
is multiplied by the number of shards.

This series addresses the need to report quantile information like latency
without generating massive metrics reports.

A summary is a Prometheus metric type that holds a quantile summary (i.e.
p95, p99).

The downside of summaries is that they cannot be aggregated, which is
needed for a distributed system (i.e., calculate the p99 latency of a cluster).

The series adds four tools for Prometheus performance:

  1. Add summaries.
  2. Optionally, remove empty metrics. It's common to register metrics for
    optional services. It is now possible to mark those metrics as
    skip_when_empty and they will not be reported if they were never used.
  3. Allow aggregating metrics. The most common case is reporting per-node
    metrics instead of per shard. For example, for multi-nodes quantile calculation,
    we need a per-node histogram. It is now possible to mark a registered
    metric for aggregation. The metrics layer will aggregate this
    metric based on a list of labels. (Typically, this will be by shard,
    but it could be any other combination of labels).
  4. Reuse the stringstream instead of recreating an object on each
    iteration.

force push:

  • split into smaller commits
  • use sm::label strong type instead of std::string for specifying aggregation labels
  • reuse string stream when converting aggregated metrics to text

force push:

  • Remove aggregation_labels argument from metric creation functions (make_<metric_type>).

force push:

  • Use the write_counter function in the commit it's introduced

force push:

  • Use strong label type sm::label in setter of aggregation labels

force push:

  • Amnon's latest version of the patch set includes everything we need and is functionally
    equivalent to what we had on this branch before (it also allows for skipping reporting of
    empty metrics configurably).

@VladLazar VladLazar changed the title Add metrics aggrecation capabilities Add metrics aggregation capabilities Jun 20, 2022
@BenPope BenPope requested review from jcsp, BenPope and dotnwat June 20, 2022 18:14
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to review if the last commit was split up a bit and the comments improved.

src/core/prometheus.cc Outdated Show resolved Hide resolved
@BenPope
Copy link
Member

BenPope commented Jun 21, 2022

The patchset brought in optimises histograms by not outputting them if they aren't used. I wonder if we always want that. Maybe it should be configurable.

@VladLazar
Copy link
Author

The patchset brought in optimises histograms by not outputting them if they aren't used. I wonder if we always want that. Maybe it should be configurable.

Didn't you remove that in this commit (the value.is_empty() check)?

@BenPope
Copy link
Member

BenPope commented Jun 21, 2022

The patchset brought in optimises histograms by not outputting them if they aren't used. I wonder if we always want that. Maybe it should be configurable.

Didn't you remove that in this commit (the value.is_empty() check)?

Looks like it. But it was also printing the headers twice. Lets consider this later.

@VladLazar
Copy link
Author

This would be easier to review if the last commit was split up a bit and the comments improved.

I split it up and expanded on the commit messages in this force push.

include/seastar/core/metrics.hh Outdated Show resolved Hide resolved
src/core/prometheus.cc Outdated Show resolved Hide resolved
This patch add support for the summary type on the metrics layer.

A summary is a different kind of histogram, it's buckets are percentile
so the reporting layer (i.e. Prometheus for example) would know to
report it correctly.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch adds a missing part to how histograms are being aggregated,
it needs to aggregate the sum and count as well.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Aggregate labels are a mechanism for reporting aggregated results. Most
commonly it allows to report one histogram per node instead of per
shard.

This patch adds an option to mark a metric with a vector of labels.
That vector will be part of the metric meta-data so the reporting
layer would be able to aggregate over it.

Skip when empty, means that metrics that are not in used, will not be
reported.

A common scenario is that user register a metrics, but that metrics is
never used.
The most common case is histogram and summary but it it can also happen
with counters.

This patch adds an option to mark a metric with skip_when_empty.
When done so, if a metric was never used (true for histogram, counters
and summary) it will not be reported.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch adds multiple functionality to Prometheus reporting:
1. Add summary reporting. Summaries are used in seastar to report aggregated
   percentile information (for example p95 and p99)
   The main usage is to report per-shard summary of a latency
   histograms.
2. Support aggregated metrics. With an aggregated metrics,
   Prometheus would aggregate multiple metrics based on labels and
   would report the result. Usually this would be for reporting a single
   latency histogram per node instead of per shard. But it could be used
   for counters and gauge as well.
3. Skip empty counters, histograms and summaries. It's a common practice to
   register lots of metrics even if they are not being used.
   Histograms have a huge effect on performance, so not reporting an empty
   histogram is a great performance boost both for the application and
   for the Prometheus server.
   This is true for Summaries and Counters as well, marking a metrics
   with skip_when_empty would mean Prometheus will not report those
   metrics.
4. As an optimization, the stringstream that is used per metric is
   reused and clear insted of recreated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this. Sent some minor nits to the mailing list.

@BenPope BenPope merged commit da64789 into redpanda-data:v22.2.x Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants