Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate metrics to reduce cardinality #5166

Merged
merged 10 commits into from
Jul 4, 2022

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Jun 20, 2022

Note that this PR depends on the following seastar PR for aggregation capabilites:
redpanda-data/seastar#26

Cover letter

Previously, the '/metrics` Prometheus endpoint was very high in cardinality due to the high
number of metrics published and the fact that a lot of them use multiple labels (each
label is essentially one dimension). This leads to issues as the metrics scrapers need
to ingest large amounts of data. It also leads to high costs if any upstream service bills
per metric.

This set of patches introduces a new configuration option, aggregate_metrics that specifies
whether metrics should be aggregated by default. This configuration option is then used in
various probes.

Cardinality Reduction

This section compares the cardinality of the aggregated and non-aggregated metrics in a couple
scenarios.

Scenario 1: No user added partitions and/or topics
Number of metrics published without aggregation: 23525
Number of metrics published with aggregation: 3386

Scenario 2: 50 topics with 3 partitions each on a single broker
Number of metrics published without aggregation: 30725
Number of metrics published with aggregation: 7626

In both cases the number of published metrics is reduced by several orders of magnitude.

Release notes

  • A new configuration option has been introduced: 'aggregate_metrics'. When enabled,
    Prometheus metrics on the '/metrics' endpoint are aggregated based on their labels by summation.
    This has the effect of reducing the cardinality of the metrics published when scraping the '/metrics' endpoint.

Force push log

force push:

  • use sm::label for specifying the aggregation labels (required due to change in seastar PR)

force push:

  • clang format latency_probe.h

force push:

  • use the aggregate member function to match the changes in seastar PR
  • fix typo in config flag description

force push:

  • removed defaulted parameters from metric creation calls
  • fixed a formatting nit
  • use the correct labels for aggregating in pandaproxy

force push:

  • update compaction test that was relying on partition label that's now aggregated

force push:

  • python formatting

force push:

  • revert to explicit aggregations

force push:

  • Expand on description of aggregate_metrics config option
  • Make aggregate_metrics false by default as we're unsure if it breaks any of the existing dashboards yet

force push:

  • Format the 'aggregate_metrics' config description

force push:

  • Don't aggregate compaction_ratio by partition as it's an average and adding it up doesn't make sense.
    Aggregation by shard is fine as it just drops the label here.

@VladLazar VladLazar force-pushed the metrics-aggregation branch 2 times, most recently from a423c92 to 83ab25c Compare June 21, 2022 13:26
@VladLazar VladLazar marked this pull request as ready for review June 21, 2022 13:27
@VladLazar VladLazar requested review from nk-87 and removed request for a team June 21, 2022 13:27
Copy link
Contributor

@LenaAn LenaAn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but let's wait for redpanda-data/seastar#26 to be merged

src/v/config/configuration.cc Outdated Show resolved Hide resolved
src/v/archival/probe.cc Outdated Show resolved Hide resolved
src/v/pandaproxy/probe.cc Outdated Show resolved Hide resolved
src/v/net/probes.cc Outdated Show resolved Hide resolved
src/v/storage/probe.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Archival part LGTM

BenPope
BenPope previously approved these changes Jun 24, 2022
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'd like more reviews from the owners of the subsystems. And we need to wait for the merge of the seastar fix.

@BenPope BenPope dismissed their stale review June 24, 2022 12:03

It's not ready to merge, and needs more reviews.

@c4milo
Copy link
Member

c4milo commented Jun 24, 2022

This is great, thank you! one question that popped up in my mind is what's the pattern of growth. Is it 2542 timeseries per partition? So for a 50k partition cluster, we should expect 127m metrics per scrape interval? if so, it is still super high IMO.

@VladLazar
Copy link
Contributor Author

VladLazar commented Jun 24, 2022

This is great, thank you! one question that popped up in my mind is what's the pattern of growth. Is it 2542 timeseries per partition? So for a 50k partition cluster, we should expect 127m metrics per scrape interval? if so, it is still super high IMO.

I think the number would be a lot better actually. I counted the types of labels for each metric we publish:

  • shard: 63 (i.e. 63 unique metrics have only the shard label)
  • namespace, topic: 32: (i.e. 32 unique metrics have only the namespace and topic labels)
  • namespace, topic, partition: 13 (i.e. 13 unique metrics have only the namespace, topic and partition labels)
  • ommited the rest as I don't think they're big contributors

So, based on these numbers a cluster with 10k topics, 50k partitions running on 16 shards should publish roughly:
10k * 32 + 50k * 13 + 16 * 64 ~= 682k when aggregating metrics. This estimation might be worth by an order of
magnitude (maybe x2) due to omitting some label sets.

@BenPope
Copy link
Member

BenPope commented Jun 24, 2022

  • namespace, topic: 32: (i.e. 32 unique metrics have only the namespace and topic labels)
  • namespace, topic, partition: 13 (i.e. 13 unique metrics have only the namespace, topic and partition labels)

This consists primarily of my not particularly exhaustive first pass; there may be more we can do (maybe in another PR).

Suggestions welcome.

@jcsp
Copy link
Contributor

jcsp commented Jun 24, 2022

namespace, topic, partition: 13 (i.e. 13 unique metrics have only the namespace, topic and partition labels)

Weren't there only ~2 metrics in the plan that were per partition? It's clearly still a huge improvement, but I want to check we're being super conservative about adding anything per-partition in the primary endpoint.

@BenPope
Copy link
Member

BenPope commented Jun 24, 2022

namespace, topic, partition: 13 (i.e. 13 unique metrics have only the namespace, topic and partition labels)

Weren't there only ~2 metrics in the plan that were per partition? It's clearly still a huge improvement, but I want to check we're being super conservative about adding anything per-partition in the primary endpoint.

This is aggregation for the old endpoint. See #5165 for the new endpoint, which currently has 2 partition metrics and 2 topic metrics, but there is a group metric that is per partition, too, which isn't in that PR yet.

@VladLazar VladLazar force-pushed the metrics-aggregation branch 3 times, most recently from 8f89f05 to 3a7d59e Compare June 28, 2022 18:34
@VladLazar VladLazar requested a review from BenPope June 29, 2022 13:46
, aggregate_metrics(
*this,
"aggregate_metrics",
"Enable default aggregations for metrics",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to specify that this is specifically for prometheus metrics (as opposed to the properties relating to "metrics reporter") and say whether it's for the internal endpoint, public endpoint, or both.
Could also maybe say add something in parentheses to indicate what we mean by "aggregations" here (like examples of labels we sum across).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expanded on the description. Let me know if you think other tweaks are required.

"aggregate_metrics",
"Enable default aggregations for metrics",
{.needs_restart = needs_restart::yes},
true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we default behavioral changes in external interfaces to be off -- in this instance I guess it depends if it breaks any existing dashboards in practice or not. If it does, then this should probably be false by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect it to break stuff, but I'm not certain yet. We could have it set to false and enable on the cloud side once we're more confident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defaulted to false for now.

@jcsp
Copy link
Contributor

jcsp commented Jun 29, 2022

Please populate release notes with a description of the new configuration property added in this PR -- even for tunables, a configuration property is worth calling out in the notes.

src/v/raft/consensus.cc Outdated Show resolved Hide resolved
src/v/ssx/metrics.h Outdated Show resolved Hide resolved
@VladLazar VladLazar force-pushed the metrics-aggregation branch 2 times, most recently from 8f89f05 to 58e0af6 Compare June 30, 2022 17:27
BenPope and others added 8 commits June 30, 2022 18:35
Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default

Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default

Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default

Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default

Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate latency over shard and method by default

Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default

Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard, partition by default

Signed-off-by: Ben Pope <ben@redpanda.com>
Vlad Lazar added 2 commits July 1, 2022 10:52
Aggregate over shard, partition by default. Also update a compaction
test that was relying on the partition label.

Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard, method by default

This accounts for pandaproxy and schema registry, so an addional
service label will be added (later).

Signed-off-by: Ben Pope <ben@redpanda.com>
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VladLazar VladLazar merged commit c5c9050 into redpanda-data:dev Jul 4, 2022
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VladLazar excellent PR really easy to read through.

Scenario 2: 50 topics with 3 partitions each on a single broker
Number of metrics published without aggregation: 30725
Number of metrics published with aggregation: 7626

In both cases the number of published metrics is reduced by several orders of magnitude.

In the cover letter you mention multiple orders of magnitude, but I only really see one here in the example. Am I missing something?

@@ -86,13 +87,18 @@ class failure_probes;
std::vector<ss::metrics::label_instance> labels{
service_label("{{service_name}}"),
method_label("{{method.name}}")};
auto aggregate_labels
= config::shard_local_cfg().aggregate_metrics()
? std::vector<sm::label>{sm::shard_label, method_label}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's neat

= config::shard_local_cfg().aggregate_metrics()
? std::vector<sm::label>{sm::shard_label, sm::label("partition")}
: std::vector<sm::label>{};
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an extra semi-colon here on the line by itself?

Comment on lines +80 to +81
{sm::make_total_bytes(
"written_bytes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for next time: in cases like this where one line of change is causing clang-format to change a lot it makes reviewing difficult. to deal with this you can split the commit which makes the change into two commits where the second one only applies the formatting update. that way the diff is much easier to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants