db/seg: arena-based MatchFinder (patricia trie)#20136
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an arena/flat representation of the Patricia trie to reduce pointer chasing and GC pressure in substring matching during segment compression, and wires the new matcher into the compression pipeline.
Changes:
- Extracted match post-processing into a shared
deduplicateMatcheshelper. - Added
FlatTree(arena-based Patricia trie) +MatchFinder3to traverse the flattened structure. - Switched pattern-coverage compression to use
MatchFinder3and added correctness/benchmark coverage for the flat implementation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| db/seg/patricia/patricia.go | Factors out match sorting/filtering into deduplicateMatches and uses it from MatchFinder2. |
| db/seg/patricia/patricia_flat.go | Adds FlatTree (flattened arena) and MatchFinder3 implementation. |
| db/seg/patricia/patricia_flat_test.go | Adds correctness tests + benchmarks comparing MatchFinder2 vs MatchFinder3. |
| db/seg/parallel_compress.go | Uses flattened trie + MatchFinder3 in compression workers and single-worker path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
db/seg/sais/sais.go:38
Saisdereferencesbufunconditionally (*buf = ...). If a caller passes a nil*[]int32(easy to do sincebufis a pointer), this will panic, and the current doc/README doesn’t state thatbufmust be non-nil. Consider either explicitly documenting the non-nil requirement or handlingbuf == nilby falling back to an internal scratch allocation.
// Sais computes the suffix array of data into sa, using *buf as reusable scratch space.
// buf is grown as needed. Callers should preserve *buf across calls to amortize allocations:
// without it, recurse_32 allocates ~len(data)/4 ints on every call.
func Sais(data []byte, sa []int32, buf *[]int32) error {
n := len(data)
if n != len(sa) {
panic("sais: len(data) != len(sa)")
}
if n <= 1 {
if n == 1 {
sa[0] = 0
}
return nil
}
clear(sa)
// Pre-size buf to n/2 so recurse_32's "len(tmp) < numLMS" check never triggers.
// numLMS is at most n/2, so a buf of n/2 ints is sufficient for all recursion levels.
needed := max(512, n/2)
*buf = growslice32(*buf, needed)
sais_8_32(data, 256, sa, *buf)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yperbasis
left a comment
There was a problem hiding this comment.
This is a non-trivial performance change and it should target main instead of release/3.4 (don't want big changes that close to release).
07a4421 to
963d329
Compare
yperbasis
left a comment
There was a problem hiding this comment.
Issues
- (Medium) PatriciaTree not released after flattening
In parallel_compress.go:257, after ft := pt.Flatten(), the original pt (and its entire heap-allocated node tree) stays alive for the rest of the function. Since pt is never used again after this point, zeroing
it would let the GC reclaim the pointer-based tree while the flat arena is in use:
ft := pt.Flatten()
pt = patricia.PatriciaTree{} // release heap nodes
This matters because compression runs on large dictionaries (64K patterns) and the pointer tree can consume significant memory alongside the flat arena.
- (Low) First tailLen == 0 check is dead code
In unfold() (both MF2 and MF3), the first if tailLen == 0 block (between the side == 2 block and the tail computation) is unreachable. The carry-over state from a previous unfold() call or loop iteration never
has tailLen == 0 with side != 2. This isn't a bug, but it's worth noting since MF3 faithfully ports dead code from MF2. Could be cleaned up in a follow-up, potentially with a panic guard for clarity.
- (Low) FlatTree.values still uses []any
The flatNode arena is fully GC-invisible (no pointers), but FlatTree.values []any still holds interface headers that the GC must scan. In practice this is minor since the values slice has one entry per
dictionary pattern (not per node), and the patterns themselves are already heap-allocated. But if maximum GC reduction is the goal, this could be replaced with a typed value table or index into the existing
code2pattern slice.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yperbasis
left a comment
There was a problem hiding this comment.
Missing fuzz test for MF3 — FuzzLongestMatch (patricia_fuzz_test.go:86) compares MF1 vs MF2 but doesn't test MF3. Given the complexity of the unfold/fold state machine and the subtle reachability analysis
above, adding MF3 to the fuzz comparison would provide strong confidence. Something like:
ft := pt.Flatten()
mf3 := NewMatchFinder3(ft)
m3 := mf3.FindLongestMatches(data)
// compare m3 against m1 or m2
This is the most important gap — the deterministic tests are good, but fuzz testing is what caught edge cases between MF1 and MF2 originally.
sais API consolidation is clean — The old 2-arg Sais(data, sa) was only used in sais_test.go. All production callers (parallel_compress.go, patricia.go) already used SaisWithBuf. Renaming SaisWithBuf → Sais is
the right call — one API, always with scratch reuse.
deduplicateMatches extraction — Minor: the new helper drops the if i != j guard that avoided self-assignment in the original. Functionally identical, just does a few redundant matches[j] = m writes early on.
Fine.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Bench showing 20% faster on 64K dict size
Reason - during large merge (large file word-level compression). MatchFinder.unfold is a bottleneck:
