Skip to content

Tx index for Parquet receipt store#3222

Merged
jewei1997 merged 9 commits intomainfrom
tx-index-for-parquet
Apr 10, 2026
Merged

Tx index for Parquet receipt store#3222
jewei1997 merged 9 commits intomainfrom
tx-index-for-parquet

Conversation

@jewei1997
Copy link
Copy Markdown
Contributor

Describe your changes and provide context

Create a tx index to store (tx hash -> block number) to allow parquet to quickly serve getReceiptByHash queries.

Testing performed to validate your change

unit tests

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedApr 10, 2026, 8:05 PM

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 76.27737% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.03%. Comparing base (1739f29) to head (5cc02d2).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
sei-db/ledger_db/receipt/tx_hash_index.go 72.93% 21 Missing and 15 partials ⚠️
sei-db/ledger_db/receipt/parquet_store.go 70.58% 14 Missing and 11 partials ⚠️
sei-db/config/receipt_config.go 83.33% 1 Missing and 1 partial ⚠️
sei-db/ledger_db/parquet/reader.go 94.59% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3222      +/-   ##
==========================================
+ Coverage   59.00%   59.03%   +0.02%     
==========================================
  Files        2065     2066       +1     
  Lines      169362   169617     +255     
==========================================
+ Hits        99931   100128     +197     
- Misses      60671    60703      +32     
- Partials     8760     8786      +26     
Flag Coverage Δ
sei-chain-pr 76.04% <76.27%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/ledger_db/parquet/store.go 69.00% <100.00%> (+0.67%) ⬆️
sei-db/config/receipt_config.go 77.35% <83.33%> (+1.16%) ⬆️
sei-db/ledger_db/parquet/reader.go 80.24% <94.59%> (+1.90%) ⬆️
sei-db/ledger_db/receipt/parquet_store.go 68.10% <70.58%> (+1.43%) ⬆️
sei-db/ledger_db/receipt/tx_hash_index.go 72.93% <72.93%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread sei-db/config/receipt_config.go Outdated
func (r *Reader) fileForBlock(blockNumber uint64) string {
r.mu.RLock()
defer r.mu.RUnlock()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileForBlock() acquires r.mu.RLock() to read closedReceiptFiles, but it does not hold pruneMu. getReceiptByTxHashFromFiles acquires pruneMu.RLock() separately.

Between these two locks, a concurrent prune could delete the file that fileForBlock just returned. The query would then fail on a missing file. The existing GetReceiptByTxHash avoids this by acquiring pruneMu first, then snapshotting files — GetReceiptByTxHashInBlock should follow the same pattern

return err
}

if s.txHashIndex != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index update happens after WriteReceipts happens here. If crash happens in between, it could lead to the parquet data exists but the index doesn't. And WAL replay doesn't seems to reindex theses receipts as well

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this discussed on a call? I'm also curious about how we handle a crash here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Can we make the index update part of the same durable/recoverable flow, or otherwise rebuild missing index entries on startup from closed parquet files?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do WAL crash recovery here

if s.txHashIndex != nil {
.

Comment thread sei-db/ledger_db/receipt/tx_hash_index.go Outdated
Comment thread sei-db/ledger_db/parquet/store.go
@yzang2019 yzang2019 requested a review from Kbhat1 April 10, 2026 16:05
Comment on lines +331 to +332
// should contain blockNumber, falling back to a full scan on miss.
func (r *Reader) GetReceiptByTxHashInBlock(ctx context.Context, txHash common.Hash, blockNumber uint64) (*ReceiptResult, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

falling back to a full scan on miss.

Could this be a potential DOS attack vector? If somebody is sending you requests for transactions that don't exist, will that cause us to do lots of full scans?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think we should do limits on the RPC layer similar to what we do for eth_getLogs today

Comment on lines +374 to +378
func (r *Reader) getReceiptByTxHashFromFiles(ctx context.Context, txHash common.Hash, files []string) (*ReceiptResult, error) {
r.pruneMu.RLock()
defer r.pruneMu.RUnlock()
return r.getReceiptByTxHashFromFilesLocked(ctx, txHash, files)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only places where I see this method called pass in nil for the list of files. Is the files parameter actually needed?

return err
}

if s.txHashIndex != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this discussed on a call? I'm also curious about how we handle a crash here.

Copy link
Copy Markdown
Contributor

@cody-littley cody-littley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM review left the following commnets:

Bug Report: Branch vs Main

Branch commits: 5 commits (tx hash index for parquet receipt store + pruning race fixes)
Scope: 21 changed files in sei-db/
Clean areas: parquet/reader.go, parquet/store.go, config changes, and test changes — no bugs found.


Bug 1 (High): Iterator leak in PruneBefore

File: sei-db/ledger_db/receipt/tx_hash_index.go, lines 146–190

The Pebble iterator opened at line 146 has no defer close. The batch gets a deferred Close() (line 155), but the iterator is only closed explicitly at line 188 on the happy path. There are four early-return error paths (lines 169, 173, 178, 185) that all leak it.

A leaked Pebble iterator pins its memtable/sstable snapshot, blocking compaction and growing memory. Since the pruner retries on a timer, repeated transient I/O errors would accumulate leaked iterators.

Fix: Add defer iter.Close() right after the NewIter nil-error check at line 152, mirroring the batch pattern.


Bug 2 (High): Resource leak when replayWAL() fails during construction

File: sei-db/ledger_db/receipt/parquet_store.go, lines 70–72

If the tx-hash index backend is Pebble, the constructor opens the Pebble DB (line 51) and starts the pruner goroutine (line 63) before calling replayWAL() at line 70. If replayWAL() fails, the error return leaks all three resources:

  1. store (parquet.Store) — open DuckDB connection, parquet writers, its own prune goroutine
  2. idx (PebbleTxHashIndex) — open Pebble database with file locks
  3. pruner goroutine — running in background, holding a reference to the index

Note the contrast with the error paths inside the switch (lines 53, 66) which correctly call store.Close(). The Pebble file lock prevents re-opening the index on retry.

Fix: Call wrapper.Close() before returning the error (it already handles all three teardowns).


Bug 3 (Medium): IndexBlock overwrite leaves a stale reverse-index entry

File: sei-db/ledger_db/receipt/tx_hash_index.go, lines 119–139

When the same tx hash is indexed at a new block number, IndexBlock overwrites the primary key (h + txHash → newBlock) and writes a new reverse key (b + newBlock + txHash), but never deletes the old reverse key (b + oldBlock + txHash).

Scenario:

  1. IndexBlock(100, [A]) → writes h+A → 100 and b+100+A → []
  2. IndexBlock(200, [A]) → writes h+A → 200 and b+200+A → [] — but b+100+A remains
  3. PruneBefore(150) → scans [b+0, b+150), finds b+100+A, deletes h+A
  4. GetBlockNumber(A) → returns (0, false, nil) even though the receipt lives at block 200

The receipt is still in parquet, so the query falls back to a full DuckDB scan — no data loss, but the index silently degrades to full-scan performance for every overwritten-then-pruned hash.

Fix: In IndexBlock, read the existing value of h + txHash before overwriting. If it exists and differs, delete the old b + oldBlock + txHash entry in the same batch.


Bug 4 (Medium): txHashIndexPruner.Stop() panics on double call

File: sei-db/ledger_db/receipt/tx_hash_index.go, lines 252–254

Stop() calls close(p.stopCh) with no sync.Once guard. A second call panics with "close of closed channel". This is reachable because parquetReceiptStore.Close() checks s.indexPruner != nil but never nils it out after stopping. A deferred cleanup + explicit Close() (a normal Go pattern) would crash the node.

Contrast with PebbleTxHashIndex.Close() at line 199, which correctly uses closeOnce.

Fix: Add a sync.Once to Stop(), consistent with the rest of the file.


Bug 5 (Low): Pruner starts before WAL replay, creating a race window

File: sei-db/ledger_db/receipt/parquet_store.go, lines 63 and 70

The pruner goroutine is started at line 63 and immediately executes a prune cycle (the sleep comes after the prune in the loop body). WAL replay doesn't start until line 70. If the WAL replays index entries for blocks that the pruner concurrently decides are pruneable, the two goroutines race on the same Pebble keys.

In practice the window is narrow (the pruner targets old blocks while the WAL replays recent ones), but it's an unnecessary race that's trivially avoided.

Fix: Move pruner.Start() to after replayWAL() returns successfully.

Comment thread sei-db/ledger_db/receipt/tx_hash_index.go Outdated
Comment thread sei-db/ledger_db/receipt/parquet_store.go
@jewei1997 jewei1997 added this pull request to the merge queue Apr 10, 2026
Merged via the queue into main with commit df8c2bb Apr 10, 2026
39 checks passed
@jewei1997 jewei1997 deleted the tx-index-for-parquet branch April 10, 2026 21:17
jewei1997 added a commit that referenced this pull request Apr 13, 2026
Create a tx index to store (tx hash -> block number) to allow parquet to
quickly serve getReceiptByHash queries.

unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants