active series: abort streaming on context cancelled #7387

flxbk · 2024-02-14T18:02:03Z

What this PR does

This makes sharded active series queries abort response streaming once the request context is cancelled. Without this, processing would continue in the background even after requests are cancelled by the client.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

ortuman · 2024-02-14T18:13:33Z

pkg/frontend/querymiddleware/shard_active_series.go

-					return true
-				})
-				items <- item
+				select {


i think you could save the select statement and simply check against ctx.Err at the beginning of each iteration. according to the documentation (ref), a non-nil value is always expected when the Done channel is closed.

I'm a bit worried about the performance implications of calling ctx.Err() on every iteration, what do you think about the periodic check I added in 2040eb8?

Sorry for a post-merge comment, but wanted to note on this one:

I'm a bit worried about the performance implications of calling ctx.Err() on every iteration

We actually have benchmarks for this code path, which we can execute with

% go test ./pkg/frontend/querymiddleware/ -run XX -bench 'BenchmarkActiveSeriesMiddlewareMergeResponses/encoding=none'

When one compares the overhead of an if inside the loop and allocating a ticker per response, the former adds a much lower overhead (+1% -vs- +7%) ✌️

% benchstat ~/tmp/{1,2,3}.txt ⋯ goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/frontend/querymiddleware │ /Users/v/tmp/1.txt │ /Users/v/tmp/2.txt │ /Users/v/tmp/3.txt │ │ sec/op │ sec/op vs base │ sec/op vs base │ ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-2-4 7.400µ ± 0% 7.911µ ± 1% +6.91% (p=0.000 n=10) 7.482µ ± 0% +1.11% (p=0.000 n=10) │ /Users/v/tmp/1.txt │ /Users/v/tmp/2.txt │ /Users/v/tmp/3.txt │ │ B/op │ B/op vs base │ B/op vs base │ ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-2-4 1.539Ki ± 0% 1.977Ki ± 0% +28.46% (p=0.000 n=10) 1.586Ki ± 0% +3.05% (p=0.000 n=10) │ /Users/v/tmp/1.txt │ /Users/v/tmp/2.txt │ /Users/v/tmp/3.txt │ │ allocs/op │ allocs/op vs base │ allocs/op vs base │ ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-2-4 34.00 ± 0% 40.00 ± 0% +17.65% (p=0.000 n=10) 34.00 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal

Note, 1.txt are the results from the original code; 2.txt the merged version with ticker; 3.txt a version with if ctx.Err() != nil inside the loop.

Thanks, see follow up in #7396

replay · 2024-02-15T10:31:49Z

pkg/frontend/querymiddleware/shard_active_series.go

+				default:
+					item := labelBuilderPool.Get().(*labels.Builder)
+					it.ReadMapCB(func(iterator *jsoniter.Iterator, s string) bool {
+						item.Set(s, iterator.ReadString())


how many iterations does on call to it.ReadMapCB typically have to perform?
if it's a small number (hundreds) then I think the current solution is fine, if it can sometimes be a large number (millions) then i'd consider putting something like this into the function that you pass into it.ReadMapCB() instead of only doing it outside of it:

select { case <- ticker.C: // check if ctx canceled default: item.Set(s, iterator.ReadString()) } return true

ReadMapCB iterates over the label set of one series, so it should typically be 40 or fewer iterations.

active series: abort streaming on context cancelled

31c632b

flxbk marked this pull request as ready for review February 14, 2024 18:02

flxbk requested a review from a team as a code owner February 14, 2024 18:02

ortuman reviewed Feb 14, 2024

View reviewed changes

Periodically check for canceled context

2040eb8

replay reviewed Feb 15, 2024

View reviewed changes

replay approved these changes Feb 15, 2024

View reviewed changes

changelog

a75ad77

flxbk merged commit da0cc34 into main Feb 15, 2024
28 checks passed

flxbk deleted the felix/active-series-context-cancellation branch February 15, 2024 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

active series: abort streaming on context cancelled #7387

active series: abort streaming on context cancelled #7387

flxbk commented Feb 14, 2024 •

edited

ortuman Feb 14, 2024

flxbk Feb 15, 2024

narqo Feb 15, 2024

flxbk Feb 16, 2024

replay Feb 15, 2024 •

edited

flxbk Feb 15, 2024 •

edited

active series: abort streaming on context cancelled #7387

active series: abort streaming on context cancelled #7387

Conversation

flxbk commented Feb 14, 2024 • edited

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

ortuman Feb 14, 2024

Choose a reason for hiding this comment

flxbk Feb 15, 2024

Choose a reason for hiding this comment

narqo Feb 15, 2024

Choose a reason for hiding this comment

flxbk Feb 16, 2024

Choose a reason for hiding this comment

replay Feb 15, 2024 • edited

Choose a reason for hiding this comment

flxbk Feb 15, 2024 • edited

Choose a reason for hiding this comment

flxbk commented Feb 14, 2024 •

edited

replay Feb 15, 2024 •

edited

flxbk Feb 15, 2024 •

edited