feat: get series labels from store gateway #2431

bryanhuhta · 2023-09-20T14:52:46Z

Related: #2230
Related: https://github.com/grafana/pyroscope-app-plugin/issues/70

Previously series were only queried from the head block in the ingester. This adds a time window to the Series RPC and queries the store gateway if necessary.

~~This PR also adds a rudimentary UI update to capture the selected window start/end and passes it along the RPC.~~ Edit: I reverted this in c72f5c6 to reduce the complexity of this PR. This will be added in a follow-up PR where more time and energy can be used to focus on a "non-rudimentary" implementation.

pkg/phlaredb/block_querier.go

public/app/redux/reducers/continuous/index.ts

public/app/redux/reducers/settings.ts

kolesnikovae

Looks great!

Next we need to update PhlareDB.Series to use the new implementation

kolesnikovae · 2023-09-21T02:36:08Z

pkg/querier/querier.go

+	if req.Msg.Start > req.Msg.End {
+		return nil, connect.NewError(connect.CodeInvalidArgument, errors.New("start must be before end"))


We should also validate the time range in the query-frontend handler, like this: https://github.com/grafana/pyroscope/blob/main/pkg/frontend/frontend_select_series.go#L37-L43

In d8df865, I moved the validation from the querier to the query-frontend.

Should we do validation in both places?

I think doing it in the query-frontend should be enough. We only need to make sure req.Msg.Start > req.Msg.End condition is actually validated there

pkg/phlaredb/head_queriers.go

pkg/phlaredb/block_querier.go

kolesnikovae · 2023-09-21T03:30:33Z

pkg/phlaredb/block_querier.go

+	// TODO(bryan) do we need to close this?
+	err := b.Open(ctx)
+	if err != nil {
+		return nil, err
+	}


Open is very expensive operation, therefore and we do want to keep blocks open. Unfortunately, at the moment we don't have a straight-forward caching / eviction policy, which may easily become a problem. We will change this in the future

Potentially we also what it to have a state where only the TSDB index is there and parquet files are not even looked at.

It makes sense we try keep things open for as long as possible if opens are expensive. I like @simonswine's idea of doing a partial open vs loading the entire block.

pkg/phlaredb/block_querier.go

api/querier/v1/querier.proto

bryanhuhta · 2023-09-26T22:01:00Z

I tested these changes in dev (using 02ec939) and the behavior of querying from the store gateway works as intended. We also maintain legacy query behavior if no interval is sent.

I also ran a quick performance check and the results were, well, substandard at best. I made 20 requests with a variety of intervals to the query frontend and collected the server-side request duration. I tested the existing query pattern, the new one with only synchronous requests (basically a best-case scenario), and the new query pattern with asynchronous requests (the "typical" scenario).

Interval	Old avg. (ms)	New avg. sync (ms)	New avg. async (ms)
1h	215.35	177.85	390.84
6h	218.00	253.25	427.29
24h	200.45	2837.50	6582.29
2d	183.35	5199.00	12021.38
7d	195.60	15423.00	13842.05
14d	196.15	27824.16	23781.07

As the interval increases, the response time increases exponentially in the synchronous case (with an R² of 0.998 according to Google Sheets). The asynchronous case increase almost like the synchronous ones, but taper off towards the end. This, however, is due to the pods OOM and crashing, cancelling requests, and causing artificially lower request times.

Any request interval above 24h are almost guaranteed to make pods crash and requests to fail with the new pattern. Any interval above 24h works, but the application is more or less unusable due to the high latency. Less than 24h interval, the requests latency is largely unnoticeable.

I suspect the OOMs are a result of opening an unbounded number of blocks and scanning them for series labels (though that's just a hunch). We could mitigate this reducing parallelism on a single pod, but that would in turn increase the total request duration. Another obvious solution would be to try cache the series labels so we can avoid opening a block altogether, but caching is always easier to propose than it is to implement.

At this point, the question is: do we need to come up with a strategy to mitigate the cost of hitting the store gateway before or after we merge this PR?

I suspect these numbers are just too high and the experience would be too poor to accept the change in its current form.

kolesnikovae · 2023-09-27T04:24:18Z

Cross-posting from slack:

I'm surprised the performance is so low. The TSDB index is downloaded into memory entirely on the first access to the block and kept there forever (sort of an in-memory cache). I'm wondering if the performance of subsequent queries is any better. Judging by the flamegraph, it is not since we spend most of the time decoding data (in-memory):

We should:

As we have all the data in memory, the query should take no longer than tens of milliseconds. Most likely, there is a better way to iterate the series (rather than decoding series by series). If it does not help, I think we may want to keep decoded data in-memory. Our indices are tiny, a few thousands of series each, this should not be much worse than we have now.
Implement block compaction and block deduplication. We process lots of duplicate data (3-6x). @cyriltovena and @simonswine are working on it.
Make the cache more robust. Maybe pre-load some blocks into memory on start (so it won't happen at the first query) and evict old TSDB indices.

I guess we could try to fix this before merging.

The very first thing I'd do is the late materialisation: when we iterate series via index.(*Decoder).Series, all label keys and values are resolved with reader.lookupSymbol, which contributes most to the overall CPU consumption. I believe we should only resolve symbol references after we collected and deduplicated, so that lookup is performed just once for each unique symbol.
Only fetch labels listed in the request, rather than doing it afterwards with model.Labels.WithLabels (which is responsible for 5%).
Optimize the allocations. index.(*Decoder).Series makes lots of allocations, but I fail to explain them (line numbers can be found in the pprof profile you can get with profilecli). I'm 99% percent sure we can eliminate 99.99% of them.
Skip the chunk-related part. We don't use it.

pkg/phlaredb/block_querier.go

pkg/phlaredb/block_querier_test.go

pkg/querier/querier.go

pkg/phlaredb/block_querier.go

simonswine · 2023-09-27T14:31:58Z

pkg/phlaredb/block_querier.go

+
+	var labelsSet []*typesv1.Labels
+	var lock sync.Mutex
+	group, ctx := errgroup.WithContext(ctx)


We probably should bound the amount of parallelism with some constant.

I set a parallelism constant to 50 in 28d98bb. 50 is only a SWAG.

pkg/phlaredb/block_querier.go

If start/end are not provided, fallback to the legacy query strategy of only querying the head block.

bryanhuhta · 2023-09-28T22:18:30Z

After implementing some of the suggestions raised by @kolesnikovae and @simonswine, here are the new performance metrics:

Interval	Old avg. (ms)	New (v1) avg. (ms)	New (current) avg (ms)
1h	215.35	177.85	118.00
6h	218.00	253.25	141.97
24h	200.45	2837.50	506.45
2d	183.35	5199.00	938.38
7d	195.60	15423.00	2400.54
14d	196.15	27824.16	4908.47

As you can see, there are significant decreases in response time from the first iteration. Even at quite long intervals, the response time hovers around 5s (while not optimal, still manageable). More typical intervals have an adequate request duration, broadly staying under 1s (until it's longer than 2d).

@kolesnikovae had a number of other suggestions we could pursue to improve performance, but at this point, I think these numbers are healthy enough to ship and iterate on.

kolesnikovae · 2023-09-29T05:14:02Z

Great work @bryanhuhta! I think you're right, we should merge it now, and only optimise it if we see the actual need

cyriltovena · 2023-09-29T05:14:42Z

pkg/phlaredb/block_querier.go

+
+	for _, q := range queriers {
+		group.Go(util.RecoverPanic(func() error {
+			labels, err := q.Series(ctx, req)


for later: if we change that API to send back finguerprint ID we should gain even more perf boost

cyriltovena

LGTM

something to investigate if you could use later

pyroscope/pkg/phlaredb/tsdb/index/index.go

Line 1738 in ac9da90

    
           func (r *Reader) Series(id storage.SeriesRef, lbls *phlaremodel.Labels, chks *[]ChunkMeta) (uint64, error) {

simonswine

LGTM. Thanks @bryanhuhta for you patience going though all of our feedback.

Before you merge (sorry), I would like you to change the start/stop time unit from seconds to milliseconds, as the other query endpoints use.

pkg/frontend/frontend_series.go

bryanhuhta added 11 commits September 19, 2023 15:14

Fix generate rule when there are spaces in PATH

5e3db53

Add start/end to series requests

18e457b

Add Series method to store gateway

735aaac

Initial framework for querying store gateway

f9f8417

Create framework for querying store gateway

1960565

Implement store gateway query from querier

e497415

More frameworking for querying block storage

1a8c263

Sketch out series query for block

9361cf8

Stub out store gateway series queries

7c84de7

Implement series query for head queriers

8dafcaa

Query single block

1e8fd94

bryanhuhta self-assigned this Sep 20, 2023

bryanhuhta commented Sep 20, 2023

View reviewed changes

pkg/phlaredb/block_querier.go Outdated Show resolved Hide resolved

bryanhuhta and others added 6 commits September 20, 2023 09:57

Add start/end

4d209d8

Filter labels by labelname set

79198d8

Remove unused func

520238d

Implement matchers

07804ca

Merge branch 'main' into sg-labels

7135bce

Send start/end from frontend

0f81a68

bryanhuhta commented Sep 20, 2023

View reviewed changes

public/app/redux/reducers/continuous/index.ts Outdated Show resolved Hide resolved

bryanhuhta commented Sep 20, 2023

View reviewed changes

public/app/redux/reducers/settings.ts Outdated Show resolved Hide resolved

bryanhuhta marked this pull request as ready for review September 20, 2023 22:04

bryanhuhta requested a review from a team as a code owner September 20, 2023 22:04

yarn format:fix

58ad978

bryanhuhta changed the title ~~Get series labels from store gateway~~ feat: get series labels from store gateway Sep 20, 2023

kolesnikovae reviewed Sep 21, 2023

View reviewed changes

simonswine reviewed Sep 21, 2023

View reviewed changes

pkg/phlaredb/block_querier.go Show resolved Hide resolved

simonswine reviewed Sep 21, 2023

View reviewed changes

api/querier/v1/querier.proto Outdated Show resolved Hide resolved

bryanhuhta added 2 commits September 21, 2023 13:49

Account for legacy requests that have no start/end

f4eec65

Move validation to frontend and add more complete validation

d8df865

simonswine reviewed Sep 27, 2023

View reviewed changes

bryanhuhta and others added 11 commits September 27, 2023 11:08

Rename test

8256b3f

Don't alloc slices every iteration

29cb7bb

Add basic benchmark

416b25c

Refactor singleBlockQuerier.Series

25c8008

Ignore lints

25c3f0f

Set limit for concurrent block queries

28d98bb

Deduplicate labels in store gateway

a9dc85e

Match everything if no labels are provided

bd1c70d

Sort label sets

59f4ed2

Only validate if start/end interval is provided

f508ad3

If start/end are not provided, fallback to the legacy query strategy of only querying the head block.

Merge branch 'main' into sg-labels

1830d74

bryanhuhta requested review from simonswine, cyriltovena, kolesnikovae and a team September 28, 2023 22:31

cyriltovena reviewed Sep 29, 2023

View reviewed changes

cyriltovena approved these changes Sep 29, 2023

View reviewed changes

simonswine approved these changes Sep 29, 2023

View reviewed changes

pkg/frontend/frontend_series.go Outdated Show resolved Hide resolved

pkg/frontend/frontend_series.go Outdated Show resolved Hide resolved

Use unix ms instead of unix s

47fca70

bryanhuhta merged commit 945eb83 into main Sep 29, 2023
16 checks passed

bryanhuhta deleted the sg-labels branch September 29, 2023 15:59

This was referenced Sep 29, 2023

Remove test file #2468

Merged

Pass start/end when fetching series #2482

Merged

bryanhuhta mentioned this pull request Oct 30, 2023

Allow series/profileTypes/labelValues to be querier from block/store #2230

Closed

4 tasks

jdbaldry mentioned this pull request Nov 3, 2023

jdb/2023 11 use github action jdbaldry/pyroscope#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: get series labels from store gateway #2431

feat: get series labels from store gateway #2431

bryanhuhta commented Sep 20, 2023 •

edited

kolesnikovae left a comment

kolesnikovae Sep 21, 2023

bryanhuhta Sep 21, 2023

kolesnikovae Sep 22, 2023

kolesnikovae Sep 21, 2023

simonswine Sep 21, 2023

bryanhuhta Sep 21, 2023

bryanhuhta commented Sep 26, 2023 •

edited

kolesnikovae commented Sep 27, 2023

simonswine Sep 27, 2023

bryanhuhta Sep 28, 2023

bryanhuhta commented Sep 28, 2023

kolesnikovae commented Sep 29, 2023

cyriltovena Sep 29, 2023 •

edited

cyriltovena left a comment

simonswine left a comment

		if req.Msg.Start > req.Msg.End {
		return nil, connect.NewError(connect.CodeInvalidArgument, errors.New("start must be before end"))

feat: get series labels from store gateway #2431

feat: get series labels from store gateway #2431

Conversation

bryanhuhta commented Sep 20, 2023 • edited

kolesnikovae left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryanhuhta commented Sep 26, 2023 • edited

kolesnikovae commented Sep 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryanhuhta commented Sep 28, 2023

kolesnikovae commented Sep 29, 2023

cyriltovena Sep 29, 2023 • edited

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

simonswine left a comment

Choose a reason for hiding this comment

bryanhuhta commented Sep 20, 2023 •

edited

bryanhuhta commented Sep 26, 2023 •

edited

cyriltovena Sep 29, 2023 •

edited