Add a way to control the aggregation type for the SelectSeries API #2758

aleks-p · 2023-11-24T17:58:44Z

Adds an optional "aggregation" param to the SelectSeries API, allowing to switch between "sum" (the default), "avg" and "first" (used mostly internally). Other aggregation functions can be added in the future. Right now it only impacts the time series, but I will see how much work it would be (and how expensive it is) to apply it to the flamegraph as well.

A PR in Grafana will follow where we expose this in the Explore view like this:

aleks-p · 2023-11-24T18:36:51Z

pkg/model/series.go

 	}
 	return j + 1
 }
+
+type TimeSeriesAggregator interface {


the name could be a bit confusing since we also have

pyroscope/pkg/distributor/aggregator/aggregator.go

Line 11 in 9e55dd1

type Aggregator[T any] struct {

I think their purpose is slightly different but putting that in the name is hard. Any thoughts @kolesnikovae?

I don't have a strong opinion as these types are not supposed to be used in the same context. The key difference is that distributor/aggregator operates on real-time data streams, therefore we could call it e.g. StreamAggregator

kolesnikovae · 2023-11-27T04:54:22Z

pkg/model/series.go

 	}
 	return j + 1
 }
+
+type TimeSeriesAggregator interface {


I don't have a strong opinion as these types are not supposed to be used in the same context. The key difference is that distributor/aggregator operates on real-time data streams, therefore we could call it e.g. StreamAggregator

api/ingester/v1/ingester.proto

kolesnikovae · 2023-11-27T05:53:37Z

pkg/model/series.go

+func (a *avgTimeSeriesAggregator) Add(ts int64, value float64) {
+	a.ts = ts
+	a.sum += value
+	a.count++
+}


I'd like to make sure I fully understand the aggregation and its relation to grouping. I'm sorry if I got it wrong :)

Given two series (with just one value each):

{pod=a, endpoint=x}: { ts_1, 8 } {pod=a, endpoint=y}: { ts_1, 4 }

Given a query avg (group) by pod. What would be the result: {pod=a} 12 or {pod=a} 6?

My understanding is that the aggregator gets two values at input (from seriesBuilder):

{pod=a}: { ts_1, 4 }, { ts_1, 8 }

Which then will be averaged and the result will be 6. I'm not very sure this is what users expect (despite it is literally the average value)

It will be {pod=a} 6 and I am not sure either though summing will often make no sense as well, depending on the profile type. There are two aspects to this aggregation:

Reducing samples for the exact same timestamp (happens in series.go, previously either "first" or "sum")

Reducing samples within a step interval (happens in querier.go, previously always "sum")

Honestly I am not sure we need 1. Leaving duplicate samples in and letting 2 handle the aggregation would produce more reliable results (currently in 2 we will do avg of avg at times).

Totally agree we should only do 2 to avoid avg of avg. May be only when it's not a sum.

pkg/model/series.go

cyriltovena

LGTM

I don't know if next step is really to expose this in the UI. I think I would rather default to good value based on the profile type for now WDYT ?

aleks-p · 2023-11-30T21:45:04Z

I've simplified things a bit:

series merger only does "sum" or "retain" (we no longer discard duplicates)
the "first value" aggregator is gone (it was only used in the series merger, but not anymore)

This shouldn't change the behavior for sums, but will make averages a bit better (no avg of avg).

For the frontend part, I will hide the dropdown for now and have it make the decision for "avg" or "sum" behind the scenes.

- perform only sum aggregation in series merger - retain duplicate samples in series merger (when sum=false) - remove "first value" aggregator

aleks-p requested a review from a team as a code owner November 24, 2023 17:58

aleks-p commented Nov 24, 2023

View reviewed changes

aleks-p mentioned this pull request Nov 24, 2023

Pyroscope: Add query option for time series aggregation grafana/grafana#78659

Closed

3 tasks

kolesnikovae reviewed Nov 27, 2023

View reviewed changes

cyriltovena reviewed Nov 27, 2023

View reviewed changes

pkg/model/series.go Outdated Show resolved Hide resolved

aleks-p requested a review from a team as a code owner November 27, 2023 12:51

aleks-p removed the request for review from a team November 27, 2023 14:21

cyriltovena approved these changes Nov 30, 2023

View reviewed changes

aleks-p force-pushed the feat/select-series-control-aggregation branch from 71f4b36 to 3d7c0de Compare November 30, 2023 21:39

aleks-p added 10 commits December 4, 2023 08:31

Add mergeFunction to the SelectSeries API

e6a1756

Rename mergeFunction to aggregation, make it optional

b05a593

Rename accumulator to aggregator

b3faf3c

Use aggregation param in series merger as well

7fd393a

Switch to an enum for the time series aggregation type

4e867e5

Restore removed doc

4e291a7

Add aggregation to frontend state

b5537cf

Fix formatting

62bbd4d

Simplify series aggregation:

b8217c1

- perform only sum aggregation in series merger - retain duplicate samples in series merger (when sum=false) - remove "first value" aggregator

Fix issues after merge with main

cc46b5f

aleks-p force-pushed the feat/select-series-control-aggregation branch from 3d7c0de to cc46b5f Compare December 4, 2023 12:57

Simplify test

f4d4cf9

aleks-p merged commit b9966bd into main Dec 4, 2023
19 checks passed

aleks-p deleted the feat/select-series-control-aggregation branch December 4, 2023 13:33

cyriltovena mentioned this pull request Dec 8, 2023

Feedbacks on Grafana UI #2804

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to control the aggregation type for the SelectSeries API #2758

Add a way to control the aggregation type for the SelectSeries API #2758

aleks-p commented Nov 24, 2023

aleks-p Nov 24, 2023

kolesnikovae Nov 27, 2023 •

edited

Loading

kolesnikovae Nov 27, 2023 •

edited

Loading

kolesnikovae Nov 27, 2023

aleks-p Nov 27, 2023

cyriltovena Nov 30, 2023

cyriltovena left a comment

aleks-p commented Nov 30, 2023

Add a way to control the aggregation type for the SelectSeries API #2758

Add a way to control the aggregation type for the SelectSeries API #2758

Conversation

aleks-p commented Nov 24, 2023

aleks-p Nov 24, 2023

Choose a reason for hiding this comment

kolesnikovae Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

kolesnikovae Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

kolesnikovae Nov 27, 2023

Choose a reason for hiding this comment

aleks-p Nov 27, 2023

Choose a reason for hiding this comment

cyriltovena Nov 30, 2023

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

aleks-p commented Nov 30, 2023

kolesnikovae Nov 27, 2023 •

edited

Loading

kolesnikovae Nov 27, 2023 •

edited

Loading