Skip to content

db/seg: arena-based MatchFinder (patricia trie)#20136

Merged
yperbasis merged 16 commits intomainfrom
alex/patricia_34
Apr 11, 2026
Merged

db/seg: arena-based MatchFinder (patricia trie)#20136
yperbasis merged 16 commits intomainfrom
alex/patricia_34

Conversation

@AskAlexSharov
Copy link
Copy Markdown
Collaborator

@AskAlexSharov AskAlexSharov commented Mar 24, 2026

  1. Node struct is cache-unfriendly (biggest win potential)

  type node struct {
      val any     // 16 bytes (interface = type ptr + data ptr)
      n0  *node   // 8 bytes
      n1  *node   // 8 bytes
      p0  uint32  // 4 bytes
      p1  uint32  // 4 bytes
  }                // = 40 bytes per node

  Every mf2.top.n0, mf2.top.p0 etc. is a pointer chase to a heap-scattered node. With thousands of nodes, this hammers L1/L2 cache.

  Fix: Flatten into a []node arena, replace *node with uint32 index:

  type node struct {
      p  [2]uint32  // paths (indexed by bit)
      n  [2]uint32  // child indices (0 = nil)
      val uint32    // index into code2pattern, 0 = no value
  }                 // = 20 bytes, no pointers → GC-invisible

  This gives contiguous memory, eliminates GC scanning, and halves node size.

Bench showing 20% faster on 64K dict size

Reason - during large merge (large file word-level compression). MatchFinder.unfold is a bottleneck:
Screenshot 2026-03-24 at 22 41 24

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an arena/flat representation of the Patricia trie to reduce pointer chasing and GC pressure in substring matching during segment compression, and wires the new matcher into the compression pipeline.

Changes:

  • Extracted match post-processing into a shared deduplicateMatches helper.
  • Added FlatTree (arena-based Patricia trie) + MatchFinder3 to traverse the flattened structure.
  • Switched pattern-coverage compression to use MatchFinder3 and added correctness/benchmark coverage for the flat implementation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
db/seg/patricia/patricia.go Factors out match sorting/filtering into deduplicateMatches and uses it from MatchFinder2.
db/seg/patricia/patricia_flat.go Adds FlatTree (flattened arena) and MatchFinder3 implementation.
db/seg/patricia/patricia_flat_test.go Adds correctness tests + benchmarks comparing MatchFinder2 vs MatchFinder3.
db/seg/parallel_compress.go Uses flattened trie + MatchFinder3 in compression workers and single-worker path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/seg/patricia/patricia_flat.go Outdated
Comment thread db/seg/patricia/patricia_flat_test.go Outdated
Comment thread db/seg/patricia/patricia_flat_test.go Outdated
Comment thread db/seg/patricia/patricia_flat_test.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/seg/patricia/patricia_flat.go Outdated
Comment thread db/seg/patricia/patricia_flat_test.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

db/seg/sais/sais.go:38

  • Sais dereferences buf unconditionally (*buf = ...). If a caller passes a nil *[]int32 (easy to do since buf is a pointer), this will panic, and the current doc/README doesn’t state that buf must be non-nil. Consider either explicitly documenting the non-nil requirement or handling buf == nil by falling back to an internal scratch allocation.
// Sais computes the suffix array of data into sa, using *buf as reusable scratch space.
// buf is grown as needed. Callers should preserve *buf across calls to amortize allocations:
// without it, recurse_32 allocates ~len(data)/4 ints on every call.
func Sais(data []byte, sa []int32, buf *[]int32) error {
	n := len(data)
	if n != len(sa) {
		panic("sais: len(data) != len(sa)")
	}
	if n <= 1 {
		if n == 1 {
			sa[0] = 0
		}
		return nil
	}
	clear(sa)

	// Pre-size buf to n/2 so recurse_32's "len(tmp) < numLMS" check never triggers.
	// numLMS is at most n/2, so a buf of n/2 ints is sufficient for all recursion levels.
	needed := max(512, n/2)
	*buf = growslice32(*buf, needed)
	sais_8_32(data, 256, sa, *buf)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/seg/parallel_compress.go
Comment thread db/seg/patricia/patricia_flat.go
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a non-trivial performance change and it should target main instead of release/3.4 (don't want big changes that close to release).

@yperbasis yperbasis requested a review from awskii April 7, 2026 09:45
@AskAlexSharov AskAlexSharov changed the base branch from release/3.4 to main April 8, 2026 09:11
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues

  1. (Medium) PatriciaTree not released after flattening

In parallel_compress.go:257, after ft := pt.Flatten(), the original pt (and its entire heap-allocated node tree) stays alive for the rest of the function. Since pt is never used again after this point, zeroing
it would let the GC reclaim the pointer-based tree while the flat arena is in use:

ft := pt.Flatten()
pt = patricia.PatriciaTree{} // release heap nodes

This matters because compression runs on large dictionaries (64K patterns) and the pointer tree can consume significant memory alongside the flat arena.

  1. (Low) First tailLen == 0 check is dead code

In unfold() (both MF2 and MF3), the first if tailLen == 0 block (between the side == 2 block and the tail computation) is unreachable. The carry-over state from a previous unfold() call or loop iteration never
has tailLen == 0 with side != 2. This isn't a bug, but it's worth noting since MF3 faithfully ports dead code from MF2. Could be cleaned up in a follow-up, potentially with a panic guard for clarity.

  1. (Low) FlatTree.values still uses []any

The flatNode arena is fully GC-invisible (no pointers), but FlatTree.values []any still holds interface headers that the GC must scan. In practice this is minor since the values slice has one entry per
dictionary pattern (not per node), and the patterns themselves are already heap-allocated. But if maximum GC reduction is the goal, this could be replaced with a typed value table or index into the existing
code2pattern slice.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/seg/sais/sais.go
Comment thread db/seg/sais/sais.go
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing fuzz test for MF3 — FuzzLongestMatch (patricia_fuzz_test.go:86) compares MF1 vs MF2 but doesn't test MF3. Given the complexity of the unfold/fold state machine and the subtle reachability analysis
above, adding MF3 to the fuzz comparison would provide strong confidence. Something like:

ft := pt.Flatten()
mf3 := NewMatchFinder3(ft)
m3 := mf3.FindLongestMatches(data)
// compare m3 against m1 or m2

This is the most important gap — the deterministic tests are good, but fuzz testing is what caught edge cases between MF1 and MF2 originally.

sais API consolidation is clean — The old 2-arg Sais(data, sa) was only used in sais_test.go. All production callers (parallel_compress.go, patricia.go) already used SaisWithBuf. Renaming SaisWithBuf → Sais is
the right call — one API, always with scratch reuse.

deduplicateMatches extraction — Minor: the new helper drops the if i != j guard that avoided self-assignment in the original. Functionally identical, just does a few redundant matches[j] = m writes early on.
Fine.

@yperbasis yperbasis changed the title [wip] seg: arena-based MatchFinder (patricia trie) db/seg: arena-based MatchFinder (patricia trie) Apr 9, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@AskAlexSharov AskAlexSharov added this pull request to the merge queue Apr 10, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 10, 2026
@yperbasis yperbasis added this pull request to the merge queue Apr 11, 2026
Merged via the queue into main with commit 57ff445 Apr 11, 2026
35 checks passed
@yperbasis yperbasis deleted the alex/patricia_34 branch April 11, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants