Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Introduce metrics aggregation by labelNames #825

Open
1 task done
tjiuming opened this issue Dec 8, 2022 · 1 comment
Open
1 task done

[feature] Introduce metrics aggregation by labelNames #825

tjiuming opened this issue Dec 8, 2022 · 1 comment

Comments

@tjiuming
Copy link

tjiuming commented Dec 8, 2022

Currently, in Prometheus client, we don't have metrics aggregations, the exposed metrics data is the origin data.
For a example:

Counter c = Counter.build("metrics_name", "help").labelNames("cluster", "namespace", "topic").create();
c. labels("a1", "b1", "c1").inc();
c. labels("a1", "b1", "c2").inc();
c. labels("a1", "b2", "c3").inc();
c. labels("a1", "b2", "c3").inc();

the exposed metrics as below:

metrics_name_total{cluster="a1", namespace="b1", topic="c1"} 1
metrics_name_total{cluster="a1", namespace="b1", topic="c2"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c3"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c4"} 1

But in some conditions, we want to expose the metrics in custom levels. Say, expose metrics data in cluster level as below:

metrics_name_total{cluster="a1"} 4

or in [cluster, namespace] level as below:

metrics_name_total{cluster="a1", namespace="b1"} 2
metrics_name_total{cluster="a1", namespace="b2"} 2

If this request is feasible, it will greatly reduce the pressure on the Prometheus Server side. And also benefits to the Client side, because it can reduce the size of the response body.

Implementation

Aggregator

We can introduce 2 aggregators, SUM and AVG.
For COUNTER/GAUGE/HISTOGRAM, we can apply the SUM aggregator to them, for SUMMARY, we can apply AVG and SUM aggregators to it.

Gauge:

Gauge g = Gauge.build("metrics_name", "help").labelNames("cluster", "namespace", "topic").create();
g. labels("a1", "b1", "c1").inc();
g. labels("a1", "b1", "c2").inc();
g. labels("a1", "b2", "c3").inc();
g. labels("a1", "b2", "c3").inc();

The origin data:

metrics_name{cluster="a1", namespace="b1", topic="c1"} 1
metrics_name{cluster="a1", namespace="b1", topic="c2"} 1
metrics_name{cluster="a1", namespace="b2", topic="c3"} 1
metrics_name{cluster="a1", namespace="b2", topic="c4"} 1

Aggregate in cluster level:

metrics_name{cluster="a1"} 4

Aggregate in [cluster, namespace] level:

metrics_name{cluster="a1", namespace="b1"} 2
metrics_name{cluster="a1", namespace="b2"} 2

Counter:

Counter c = Counter.build("metrics_name", "help").labelNames("cluster", "namespace", "topic").create();
c. labels("a1", "b1", "c1").inc();
c. labels("a1", "b1", "c2").inc();
c. labels("a1", "b2", "c3").inc();
c. labels("a1", "b2", "c3").inc();

The origin data:

metrics_name_total{cluster="a1", namespace="b1", topic="c1"} 1
metrics_name_total{cluster="a1", namespace="b1", topic="c2"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c3"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c4"} 1

Aggregate in cluster level:

metrics_name_total{cluster="a1"} 4

Aggregate in [cluster, namespace] level:

metrics_name_total{cluster="a1", namespace="b1"} 2
metrics_name_total{cluster="a1", namespace="b2"} 2

Histogram:

Histogram h = Histogram.build("metrics_name", "help").buckets(100, 200, 500).create();
h.labels("a1", "b1", "c1").observe(50);
h.labels("a1", "b1", "c1").observe(150);
h.labels("a1", "b1", "c1").observe(400);
h.labels("a1", "b1", "c2").observe(50);
h.labels("a1", "b1", "c2").observe(150);
h.labels("a1", "b1", "c2").observe(400);
h.labels("a1", "b2", "c3").observe(50);
h.labels("a1", "b2", "c3").observe(150);
h.labels("a1", "b2", "c3").observe(400);
h.labels("a1", "b2", "c4").observe(50);
h.labels("a1", "b2", "c4").observe(150);
h.labels("a1", "b2", "c4").observe(400);

The origin data:

metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b1",topic="c1",} 3.0
metrics_name_sum{cluster="a1",namespace="b1",topic="c1",} 600.0

metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b1",topic="c2",} 3.0
metrics_name_sum{cluster="a1",namespace="b1",topic="c2",} 600.0

metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b2",topic="c3",} 3.0
metrics_name_sum{cluster="a1",namespace="b2",topic="c3",} 600.0

metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b2",topic="c4",} 3.0
metrics_name_sum{cluster="a1",namespace="b2",topic="c4",} 600.0

Aggregate in cluster level:

metrics_name_bucket{cluster="a1",le="100.0",} 4.0
metrics_name_bucket{cluster="a1",le="200.0",} 8.0
metrics_name_bucket{cluster="a1",le="500.0",} 12.0
metrics_name_bucket{cluster="a1",le="+Inf",} 12.0
metrics_name_count{cluster="a1",} 12.0
metrics_name_sum{cluster="a1",} 2400.0

Aggregate in [cluster, namespace] level:

metrics_name_bucket{cluster="a1",namespace="b2",le="100.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b2",le="200.0",} 4.0
metrics_name_bucket{cluster="a1",namespace="b2",le="500.0",} 6.0
metrics_name_bucket{cluster="a1",namespace="b2",le="+Inf",} 6.0
metrics_name_count{cluster="a1",namespace="b2",} 6.0
metrics_name_sum{cluster="a1",namespace="b2",} 1200.0

metrics_name_bucket{cluster="a1",namespace="b1",le="100.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b1",le="200.0",} 4.0
metrics_name_bucket{cluster="a1",namespace="b1",le="500.0",} 6.0
metrics_name_bucket{cluster="a1",namespace="b1",le="+Inf",} 6.0
metrics_name_count{cluster="a1",namespace="b1",} 6.0
metrics_name_sum{cluster="a1",namespace="b1",} 1200.0

Summary

Unlike the above meters, SUMMARY is special.
For metrics_name_count and metrics_name_sum, we have to use the SUM aggregator.
But for the timeseries with quantile label, I think AVG aggregator is the best choice.

  • I'm willing to submit the PR
@dhoard
Copy link
Collaborator

dhoard commented Feb 12, 2023

@tjiuming For the group by label name / Counter scenario, you can write your own Collector and register it. This should also work for other types of aggregation/summation.

Example JUnit test / CounterGroupByCollector

package io.prometheus.client;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.util.ArrayList;
import java.util.List;

public class CounterGroupByCollectorTest {

  CollectorRegistry registry;
  Counter counter;
  CounterGroupByCollector counterGroupByCollector;

  @Before
  public void setUp() {
    registry = new CollectorRegistry();
    counter = Counter.build("metrics_name", "metrics_name help").labelNames("cluster", "namespace", "topic").create();
    counterGroupByCollector = new CounterGroupByCollector(counter);
  }

  @Test
  public void test() {
    counter.labels("a1", "b1", "c1").inc();
    counter.labels("a1", "b1", "c2").inc();
    counter.labels("a1", "b2", "c3").inc();
    counter.labels("a1", "b2", "c4").inc();

    System.out.println("Group by \"cluster\"...");
    counterGroupByCollector.groupBy("cluster");

    List<Collector.MetricFamilySamples> mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }

    System.out.println("---");
    System.out.println("No group by...");
    counterGroupByCollector.groupBy(null);

    mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }

    System.out.println("---");
    System.out.println("Group by \"cluster\", \"namespace\"...");
    counterGroupByCollector.groupBy("cluster", "namespace");

    mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }

    System.out.println("---");
    System.out.println("Group by \"cluster\", \"namespace\", \"topic\"...");
    counterGroupByCollector.groupBy("cluster", "namespace", "topic");

    mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }
  }

  public static class CounterGroupByCollector extends Collector {

    private Counter counter;
    private String[] groupByLabelNames;

    public CounterGroupByCollector(Counter counter) {
      this.counter = counter;
    }

    public void groupBy(String ... labelNames) {
      if ((labelNames == null) || (labelNames.length == 0)) {
        synchronized (this) {
          groupByLabelNames = null;
        }

        return;
      }

      if (labelNames.length > counter.labelNames.size()) {
        throw new IllegalArgumentException("Group by labels name contains more labels than Counter");
      }

      List<String> labelNameList = toList(labelNames);
      List<String> counterLabelNameList = counter.labelNames;

      for (int i = 0; i < labelNameList.size(); i++) {
        if (!labelNameList.get(i).equals(counterLabelNameList.get(i))) {
          throw new IllegalArgumentException("Group by labels names are not a subset of Counter label names");
        }
      }

      synchronized (this) {
        this.groupByLabelNames = labelNames;
      }
    }

    @Override
    public List<MetricFamilySamples> collect() {
      String[] localGroupByLabelNames;
      synchronized (this) {
        localGroupByLabelNames = groupByLabelNames;
      }

      if (localGroupByLabelNames == null) {
        return counter.collect();
      }

      Counter localCounter =
              Counter
                      .build("metrics_name", "metrics_name help")
                      .labelNames(localGroupByLabelNames).create();

      List<Collector.MetricFamilySamples> mfs = counter.collect();
      for (Collector.MetricFamilySamples samples : mfs) {
        for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
          if (sample.name.endsWith("_total")) {
            String[] labelValues = sample.labelValues.subList(0, localGroupByLabelNames.length).toArray(new String[localGroupByLabelNames.length]);
            localCounter.labels(labelValues).inc(sample.value);
          }
        }
      }

      return localCounter.collect();
    }
  }

  private static List<String> toList(String ... values) {
    List<String> list = new ArrayList<String>(values.length);
    for (String value : values) {
      list.add(value);
    }
    return list;
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants