Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics aggregation capabilities #26

Merged
merged 4 commits into from
Jun 27, 2022

Commits on Jun 24, 2022

  1. metrics.hh: Support SUMMARY type

    This patch add support for the summary type on the metrics layer.
    
    A summary is a different kind of histogram, it's buckets are percentile
    so the reporting layer (i.e. Prometheus for example) would know to
    report it correctly.
    
    Signed-off-by: Amnon Heiman <amnon@scylladb.com>
    amnonh authored and Vlad Lazar committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    ccdd278 View commit details
    Browse the repository at this point in the history
  2. metrics.cc: missing count and sum when aggregating histograms

    This patch adds a missing part to how histograms are being aggregated,
    it needs to aggregate the sum and count as well.
    
    Signed-off-by: Amnon Heiman <amnon@scylladb.com>
    amnonh authored and Vlad Lazar committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    893e2de View commit details
    Browse the repository at this point in the history
  3. metrics: support aggregate by labels and skip_when_empty

    Aggregate labels are a mechanism for reporting aggregated results. Most
    commonly it allows to report one histogram per node instead of per
    shard.
    
    This patch adds an option to mark a metric with a vector of labels.
    That vector will be part of the metric meta-data so the reporting
    layer would be able to aggregate over it.
    
    Skip when empty, means that metrics that are not in used, will not be
    reported.
    
    A common scenario is that user register a metrics, but that metrics is
    never used.
    The most common case is histogram and summary but it it can also happen
    with counters.
    
    This patch adds an option to mark a metric with skip_when_empty.
    When done so, if a metric was never used (true for histogram, counters
    and summary) it will not be reported.
    
    Signed-off-by: Amnon Heiman <amnon@scylladb.com>
    amnonh authored and Vlad Lazar committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    9c84d52 View commit details
    Browse the repository at this point in the history
  4. prometheus.cc: Optimize reporting

    This patch adds multiple functionality to Prometheus reporting:
    1. Add summary reporting. Summaries are used in seastar to report aggregated
       percentile information (for example p95 and p99)
       The main usage is to report per-shard summary of a latency
       histograms.
    2. Support aggregated metrics. With an aggregated metrics,
       Prometheus would aggregate multiple metrics based on labels and
       would report the result. Usually this would be for reporting a single
       latency histogram per node instead of per shard. But it could be used
       for counters and gauge as well.
    3. Skip empty counters, histograms and summaries. It's a common practice to
       register lots of metrics even if they are not being used.
       Histograms have a huge effect on performance, so not reporting an empty
       histogram is a great performance boost both for the application and
       for the Prometheus server.
       This is true for Summaries and Counters as well, marking a metrics
       with skip_when_empty would mean Prometheus will not report those
       metrics.
    4. As an optimization, the stringstream that is used per metric is
       reused and clear insted of recreated.
    
    Signed-off-by: Amnon Heiman <amnon@scylladb.com>
    amnonh authored and Vlad Lazar committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    3f35820 View commit details
    Browse the repository at this point in the history