-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate metrics to reduce cardinality #5166
Conversation
a423c92
to
83ab25c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM but let's wait for redpanda-data/seastar#26 to be merged
83ab25c
to
8f49e23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Archival part LGTM
8f49e23
to
2ea343d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'd like more reviews from the owners of the subsystems. And we need to wait for the merge of the seastar fix.
It's not ready to merge, and needs more reviews.
This is great, thank you! one question that popped up in my mind is what's the pattern of growth. Is it 2542 timeseries per partition? So for a 50k partition cluster, we should expect 127m metrics per scrape interval? if so, it is still super high IMO. |
I think the number would be a lot better actually. I counted the types of labels for each metric we publish:
So, based on these numbers a cluster with 10k topics, 50k partitions running on 16 shards should publish roughly: |
This consists primarily of my not particularly exhaustive first pass; there may be more we can do (maybe in another PR). Suggestions welcome. |
Weren't there only ~2 metrics in the plan that were per partition? It's clearly still a huge improvement, but I want to check we're being super conservative about adding anything per-partition in the primary endpoint. |
This is aggregation for the old endpoint. See #5165 for the new endpoint, which currently has 2 partition metrics and 2 topic metrics, but there is a group metric that is per partition, too, which isn't in that PR yet. |
8f89f05
to
3a7d59e
Compare
src/v/config/configuration.cc
Outdated
, aggregate_metrics( | ||
*this, | ||
"aggregate_metrics", | ||
"Enable default aggregations for metrics", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to specify that this is specifically for prometheus metrics (as opposed to the properties relating to "metrics reporter") and say whether it's for the internal endpoint, public endpoint, or both.
Could also maybe say add something in parentheses to indicate what we mean by "aggregations" here (like examples of labels we sum across).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've expanded on the description. Let me know if you think other tweaks are required.
src/v/config/configuration.cc
Outdated
"aggregate_metrics", | ||
"Enable default aggregations for metrics", | ||
{.needs_restart = needs_restart::yes}, | ||
true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we default behavioral changes in external interfaces to be off -- in this instance I guess it depends if it breaks any existing dashboards in practice or not. If it does, then this should probably be false
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect it to break stuff, but I'm not certain yet. We could have it set to false
and enable on the cloud side once we're more confident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defaulted to false
for now.
Please populate release notes with a description of the new configuration property added in this PR -- even for tunables, a configuration property is worth calling out in the notes. |
8f89f05
to
58e0af6
Compare
Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate latency over shard and method by default Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard by default Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard, partition by default Signed-off-by: Ben Pope <ben@redpanda.com>
58e0af6
to
a53e634
Compare
Aggregate over shard, partition by default. Also update a compaction test that was relying on the partition label. Signed-off-by: Ben Pope <ben@redpanda.com>
Aggregate over shard, method by default This accounts for pandaproxy and schema registry, so an addional service label will be added (later). Signed-off-by: Ben Pope <ben@redpanda.com>
a53e634
to
1a27b58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VladLazar excellent PR really easy to read through.
Scenario 2: 50 topics with 3 partitions each on a single broker
Number of metrics published without aggregation: 30725
Number of metrics published with aggregation: 7626In both cases the number of published metrics is reduced by several orders of magnitude.
In the cover letter you mention multiple orders of magnitude, but I only really see one here in the example. Am I missing something?
@@ -86,13 +87,18 @@ class failure_probes; | |||
std::vector<ss::metrics::label_instance> labels{ | |||
service_label("{{service_name}}"), | |||
method_label("{{method.name}}")}; | |||
auto aggregate_labels | |||
= config::shard_local_cfg().aggregate_metrics() | |||
? std::vector<sm::label>{sm::shard_label, method_label} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's neat
= config::shard_local_cfg().aggregate_metrics() | ||
? std::vector<sm::label>{sm::shard_label, sm::label("partition")} | ||
: std::vector<sm::label>{}; | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this an extra semi-colon here on the line by itself?
{sm::make_total_bytes( | ||
"written_bytes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for next time: in cases like this where one line of change is causing clang-format to change a lot it makes reviewing difficult. to deal with this you can split the commit which makes the change into two commits where the second one only applies the formatting update. that way the diff is much easier to review.
Note that this PR depends on the following seastar PR for aggregation capabilites:
redpanda-data/seastar#26
Cover letter
Previously, the '/metrics` Prometheus endpoint was very high in cardinality due to the high
number of metrics published and the fact that a lot of them use multiple labels (each
label is essentially one dimension). This leads to issues as the metrics scrapers need
to ingest large amounts of data. It also leads to high costs if any upstream service bills
per metric.
This set of patches introduces a new configuration option,
aggregate_metrics
that specifieswhether metrics should be aggregated by default. This configuration option is then used in
various probes.
Cardinality Reduction
This section compares the cardinality of the aggregated and non-aggregated metrics in a couple
scenarios.
Scenario 1: No user added partitions and/or topics
Number of metrics published without aggregation: 23525
Number of metrics published with aggregation: 3386
Scenario 2: 50 topics with 3 partitions each on a single broker
Number of metrics published without aggregation: 30725
Number of metrics published with aggregation: 7626
In both cases the number of published metrics is reduced by several orders of magnitude.
Release notes
Prometheus metrics on the '/metrics' endpoint are aggregated based on their labels by summation.
This has the effect of reducing the cardinality of the metrics published when scraping the '/metrics' endpoint.
Force push log
force push:
sm::label
for specifying the aggregation labels (required due to change in seastar PR)force push:
latency_probe.h
force push:
aggregate
member function to match the changes in seastar PRforce push:
force push:
force push:
force push:
force push:
aggregate_metrics
config optionaggregate_metrics
false by default as we're unsure if it breaks any of the existing dashboards yetforce push:
force push:
compaction_ratio
by partition as it's an average and adding it up doesn't make sense.Aggregation by shard is fine as it just drops the label here.