tailsamplingprocessor: Optimize tag mutator memory allocations #27889

brancz · 2023-10-20T15:50:34Z

Description:

Since each tailSamplingSpanProcessor's instance is not concurrently
called by the ticker worker (it's a 1-to-1 relationship) we can safely
reuse a slice for the tag mutators used in makeDecision. Additionally
the tag mutators themselves were causing a lot of allocations and since
they are static, we created constants for them preventing allocations on
each execution of makeDecision.

This improved the makeDecision benchmark by ~31%.

benchstat old.txt new.txt
name         old time/op  new time/op  delta
Sampling-10  51.8µs ± 1%  35.7µs ± 1%  -30.94%  (p=0.008 n=5+5)

Testing: Unit tests unchanged; added a benchmark

Documentation: Perf improvement so no documentation changes needed.

This was all based on production profiling data at Polar Signals running the collector. Here is a snapshot of the original profiling data we started with: https://pprof.me/52a7fab/

Judging by the production profiling data, a 31% improvement on the makeDecision codepath, should translate roughly into a 6% baseline CPU improvement our production deployment of the opentelemetry collector.

The profiling data after improving: https://pprof.me/58c0e84/

This improvement was done as part of the Let's Profile Livestream where we optimize popular open-source projects live: https://www.youtube.com/watch?v=vkMQRjiNTHM

Since each `tailSamplingSpanProcessor`'s instance is not concurrently called by the ticker worker (it's a 1-to-1 relationship) we can safely reuse a slice for the tag mutators used in `makeDecision`. Additionally the tag mutators themselves were causing a lot of allocations and since they are static, we created constants for them preventing allocations on each execution of `makeDecision`. This improved the `makeDecision` benchmark by ~31%. ``` benchstat old.txt new.txt name old time/op new time/op delta Sampling-10 51.8µs ± 1% 35.7µs ± 1% -30.94% (p=0.008 n=5+5) ```

jpkrohling

I followed this live, and I'm good with the changes as long as the CI is passing (I'm not sure the tests were executed).

https://www.youtube.com/watch?v=vkMQRjiNTHM

jiekun · 2023-10-21T03:21:43Z

This looks cool. I jumped into this PR from the YouTube video as well. Nice job.

jpkrohling · 2023-10-24T12:14:46Z

A test is failing on a place that is related to the changes:

--- FAIL: TestTraceIntegrity (0.00s)
panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 15 [running]:
testing.tRunner.func1.2({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1526 +0x372
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1529 +0x650
panic({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/panic.go:890 +0x263
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).makeDecision(0xc000326180, {0x1, 0x2, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...)
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go:305 +0x14b9
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).samplingPolicyOnTick(0xc000326180)

Anyone willing to pick this up? I can probably take a look eventually, but not right now.

…ield mutatorsBuf and panic in CICD. Added this field to all struct in ut.

jiekun · 2023-10-25T01:29:05Z

A test is failing on a place that is related to the changes:

--- FAIL: TestTraceIntegrity (0.00s)
panic: runtime error: index out of range [0] with length 0 [recovered]
	panic: runtime error: index out of range [0] with length 0

goroutine 15 [running]:
testing.tRunner.func1.2({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1526 +0x372
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.20.10/x64/src/testing/testing.go:1529 +0x650
panic({0x112df00, 0xc00016a5a0})
	/opt/hostedtoolcache/go/1.20.10/x64/src/runtime/panic.go:890 +0x263
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).makeDecision(0xc000326180, {0x1, 0x2, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...)
	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/tailsamplingprocessor/processor.go:305 +0x14b9
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).samplingPolicyOnTick(0xc000326180)

Anyone willing to pick this up? I can probably take a look eventually, but not right now.

I appreciate the optimizations made by the original author.

The issue of the unit test only requires a very small change, but since I am unable to submit it on the original branch, I have opened a separate PR #28597 , which could be merged after #27889 . But yes @brancz feel free to add those fix lines to the original branch and I will delete the new pr ^_^

jpkrohling · 2023-10-26T08:25:22Z

I think I was able to add your commits to this PR, @jiekun. Thank you!

codecov · 2023-10-26T08:47:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files	Coverage Δ
processor/tailsamplingprocessor/processor.go	`90.54% <100.00%> (+0.25%)`	⬆️

... and 46 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

…telemetry#27889) **Description:** Since each `tailSamplingSpanProcessor`'s instance is not concurrently called by the ticker worker (it's a 1-to-1 relationship) we can safely reuse a slice for the tag mutators used in `makeDecision`. Additionally the tag mutators themselves were causing a lot of allocations and since they are static, we created constants for them preventing allocations on each execution of `makeDecision`. This improved the `makeDecision` benchmark by ~31%. ``` benchstat old.txt new.txt name old time/op new time/op delta Sampling-10 51.8µs ± 1% 35.7µs ± 1% -30.94% (p=0.008 n=5+5) ``` **Testing:** Unit tests unchanged; added a benchmark **Documentation:** Perf improvement so no documentation changes needed. This was all based on production profiling data at Polar Signals running the collector. Here is a snapshot of the original profiling data we started with: https://pprof.me/52a7fab/ Judging by the production profiling data, a 31% improvement on the `makeDecision` codepath, should translate roughly into a 6% baseline CPU improvement our production deployment of the opentelemetry collector. The profiling data after improving: https://pprof.me/58c0e84/ This improvement was done as part of the Let's Profile Livestream where we optimize popular open-source projects live: https://www.youtube.com/watch?v=vkMQRjiNTHM --------- Co-authored-by: Jiekun <zhujiekun@52tt.com>

brancz added 2 commits October 20, 2023 17:39

tailsamplingprocessor: Create benchmark for makeDecision

2d377ea

brancz requested review from jpkrohling and a team as code owners October 20, 2023 15:50

github-actions bot assigned evan-bradley Oct 20, 2023

github-actions bot added the processor/tailsampling Tail sampling processor label Oct 20, 2023

.chloggen: Add changelog item

dca4276

jpkrohling approved these changes Oct 20, 2023

View reviewed changes

fix: [test] tailSamplingSpanProcessor structs in ut are missing new f…

7b4aac3

…ield mutatorsBuf and panic in CICD. Added this field to all struct in ut.

jiekun mentioned this pull request Oct 25, 2023

tailsamplingprocessor: Fix UT issue after mutator memory allocations PR #28597

Closed

fix: [test] fixed lint in tailprocessor

a82f3ea

jpkrohling approved these changes Oct 26, 2023

View reviewed changes

jpkrohling merged commit acae6fe into open-telemetry:main Oct 26, 2023
163 of 164 checks passed

github-actions bot added this to the next release milestone Oct 26, 2023

jiekun mentioned this pull request Dec 13, 2023

REQUEST: New membership for @jiekun open-telemetry/community#1835

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tailsamplingprocessor: Optimize tag mutator memory allocations #27889

tailsamplingprocessor: Optimize tag mutator memory allocations #27889

brancz commented Oct 20, 2023 •

edited

jpkrohling left a comment

jiekun commented Oct 21, 2023

jpkrohling commented Oct 24, 2023

jiekun commented Oct 25, 2023 •

edited

jpkrohling commented Oct 26, 2023

codecov bot commented Oct 26, 2023

tailsamplingprocessor: Optimize tag mutator memory allocations #27889

tailsamplingprocessor: Optimize tag mutator memory allocations #27889

Conversation

brancz commented Oct 20, 2023 • edited

jpkrohling left a comment

Choose a reason for hiding this comment

jiekun commented Oct 21, 2023

jpkrohling commented Oct 24, 2023

jiekun commented Oct 25, 2023 • edited

jpkrohling commented Oct 26, 2023

codecov bot commented Oct 26, 2023

Codecov Report

brancz commented Oct 20, 2023 •

edited

jiekun commented Oct 25, 2023 •

edited