Reduce allocations in extendReuseSlice growth path#6863
Conversation
There was a problem hiding this comment.
Pull request overview
Optimizes extendReuseSlice in the vParquet5 encoding to reduce allocation overhead on the slice growth path, targeting a hot allocation site during trace-to-parquet conversion used in WAL/block creation.
Changes:
- Replaces
append(...make...)growth logic with a singlemake+copyinextendReuseSlice. - Updates the micro-benchmark to exercise a more varied resize pattern using
[]Attribute.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tempodb/encoding/vparquet5/schema.go |
Implements the new one-allocation growth strategy for extendReuseSlice. |
tempodb/encoding/vparquet5/schema_test.go |
Adjusts the benchmark to resize a reusable buffer across a sequence of sizes. |
| in = in[:cap(in)] | ||
| return append(in, make([]T, sz-len(in))...) | ||
| out := make([]T, sz) | ||
| copy(out, in) |
There was a problem hiding this comment.
The new growth path allocates out := make([]T, sz) and only copys len(in) elements, which changes behavior vs the previous implementation when len(in) < cap(in) and sz > cap(in) (previously it effectively preserved/copy-pasted elements up to cap(in)). Could we add a unit test that covers this scenario (e.g., shrink a slice to a smaller len, then grow past cap) to lock in the intended semantics and prevent regressions?
| copy(out, in) | |
| copy(out, in[:cap(in)]) |
There was a problem hiding this comment.
[...] that's wrong for our use case — callers always overwrite all elements (traceToParquet). The existing test already covers the
len < cap < szcase and expects zero-filled elements. I'll add a test for clarity though.
d4295fe to
c627e5a
Compare
Replace append+make pattern with a single make+copy when the slice needs to grow. The old code allocated a temporary slice via make then appended it, causing two allocations. The new code does one allocation and copies existing data directly.
c627e5a to
4d37321
Compare
| { | ||
| // len < cap < sz: slice was shrunk then grown past cap | ||
| sz: 6, | ||
| in: append(make([]int, 0, 4), 1, 2), | ||
| expected: []int{1, 2, 0, 0, 0, 0}, |
There was a problem hiding this comment.
The new len < cap < sz test case uses a zero-initialized backing array (append(make(...), 1, 2)), so it would pass both the old and new extendReuseSlice implementations and doesn’t actually lock in the intended semantic change (discarding any stale elements beyond len(in)). Could this be adjusted to populate the underlying array beyond len with non-zero values, then reslice down and extend, so the test would fail under the previous behavior? Same idea applies to the equivalent tests in vparquet3/vparquet4.
There was a problem hiding this comment.
Not worth. The old code preserving elements between len and cap was an accidental side effect of the append pattern, not intentional. Those elements are stale data that callers will overwrite anyway (traceToParquetWithMapping fills every position).
javiermolinar
left a comment
There was a problem hiding this comment.
I like the new approach, cleaner, more predictable and safer
| endif | ||
|
|
||
| FILES_TO_FMT=$(shell find . -type d \( -path ./vendor -o -path ./tools/vendor -o -path ./opentelemetry-proto -o -path ./vendor-fix \) -prune -o -name '*.go' -not -name "*.pb.go" -not -name '*.y.go' -not -name '*.gen.go' -print) | ||
| FILES_TO_FMT=$(shell find . -type d \( -path ./vendor -o -path ./tools/vendor -o -path ./opentelemetry-proto -o -path ./vendor-fix -o -path ./.claude \) -prune -o -name '*.go' -not -name "*.pb.go" -not -name '*.y.go' -not -name '*.gen.go' -print) |
There was a problem hiding this comment.
It was causing trouble when running make fmt due to how claude creates worktrees in .claude.
What this PR does:
Replaces the
append+makepattern inextendReuseSlicewith a singlemake+copywhen the slice needs to grow. The old code created a temporary slice viamake([]T, sz-len(in))and thenappend-ed it, causing two heap allocations on the growth path. The new code performs one allocation and copies existing data directly.Internal continuous profiling shows
extendReuseSliceaccounts for ~11.8% of total allocated bytes (memory:alloc_space), called fromtraceToParquetWithMappingduring WAL block writes and block creation. While the growth path is infrequent per-buffer in steady state, the aggregate impact across many tenant instances is significant.Benchmark results
Which issue(s) this PR fixes:
N/A — found via continuous profiling.
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]