db: EliasFano: Seek perf on big sequences#19788
Conversation
There was a problem hiding this comment.
Pull request overview
This PR is part of optimizing HistoryRange() usage by improving EliasFano seek performance on large sequences, especially when the distribution makes plain binary search less efficient.
Changes:
- Replaced pure binary search in
EliasFanoseek paths with an interpolation-guess + exponential bracketing + binary search approach for upper-bits probing. - Reworked/expanded benchmarks to cover
Seekacross multiple sequence sizes and addedGet2/DoubleEliasFano access benchmarks. - Hardened torrent data-file path derivation by using
strings.TrimSuffix(..., ".torrent")instead of fixed-length slicing.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
db/recsplit/eliasfano32/elias_fano.go |
Adds searchUpperForward/searchUpperReverse helpers and wires them into forward/reverse seek paths to reduce upper() probes. |
db/recsplit/eliasfano32/elias_fano_seek_bench_test.go |
Updates benchmarks to better reflect the new seek behavior and adds new access benchmarks. |
db/integrity/torrent_verify.go |
Switches .torrent stripping logic to TrimSuffix for correctness/robustness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
645ff7e to
301060f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| n := int(hiIdx - lo) | ||
| if n <= 0 { | ||
| return lo |
There was a problem hiding this comment.
In searchUpperReverse, the n <= 0 fast path returns lo, but when lo == hiIdx == ef.count this corresponds to the “no solution” case (e.g., hi < upper(0)), where the correct offset should be ef.count+1 (matching the previous sort.Search(int(ef.count+1), ...) behavior). Returning ef.count causes searchReverse to still probe idx=0, touching underlying data on guaranteed-miss queries and changing the function’s documented semantics. Consider returning lo+1 here (mirroring searchUpperForward) so misses return count+1 and the caller loop is skipped.
| return lo | |
| return lo + 1 |
…tion fast path (#20566) Fixes CR comment from #19788. In `searchUpperReverse`, the `n <= 0` fast path (when `hiIdx == lo`) returned `lo`, causing `searchReverse` to still probe `idx=0` on guaranteed-miss queries. The correct return is `lo + 1`, mirroring `searchUpperForward` and ensuring the caller loop is skipped entirely when there is no solution.
Story: part of
HistoryRange()optimizations forerigon snapshots check-commitment-hist-at-blk-rangecommand.search in logarithmically-distributed array works not well.
so, can