Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3222 +/- ##
==========================================
+ Coverage 59.00% 59.03% +0.02%
==========================================
Files 2065 2066 +1
Lines 169362 169617 +255
==========================================
+ Hits 99931 100128 +197
- Misses 60671 60703 +32
- Partials 8760 8786 +26
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| func (r *Reader) fileForBlock(blockNumber uint64) string { | ||
| r.mu.RLock() | ||
| defer r.mu.RUnlock() | ||
|
|
There was a problem hiding this comment.
fileForBlock() acquires r.mu.RLock() to read closedReceiptFiles, but it does not hold pruneMu. getReceiptByTxHashFromFiles acquires pruneMu.RLock() separately.
Between these two locks, a concurrent prune could delete the file that fileForBlock just returned. The query would then fail on a missing file. The existing GetReceiptByTxHash avoids this by acquiring pruneMu first, then snapshotting files — GetReceiptByTxHashInBlock should follow the same pattern
| return err | ||
| } | ||
|
|
||
| if s.txHashIndex != nil { |
There was a problem hiding this comment.
Index update happens after WriteReceipts happens here. If crash happens in between, it could lead to the parquet data exists but the index doesn't. And WAL replay doesn't seems to reindex theses receipts as well
There was a problem hiding this comment.
Was this discussed on a call? I'm also curious about how we handle a crash here.
There was a problem hiding this comment.
+1 Can we make the index update part of the same durable/recoverable flow, or otherwise rebuild missing index entries on startup from closed parquet files?
There was a problem hiding this comment.
We do WAL crash recovery here
.| // should contain blockNumber, falling back to a full scan on miss. | ||
| func (r *Reader) GetReceiptByTxHashInBlock(ctx context.Context, txHash common.Hash, blockNumber uint64) (*ReceiptResult, error) { |
There was a problem hiding this comment.
falling back to a full scan on miss.
Could this be a potential DOS attack vector? If somebody is sending you requests for transactions that don't exist, will that cause us to do lots of full scans?
There was a problem hiding this comment.
yeah i think we should do limits on the RPC layer similar to what we do for eth_getLogs today
| func (r *Reader) getReceiptByTxHashFromFiles(ctx context.Context, txHash common.Hash, files []string) (*ReceiptResult, error) { | ||
| r.pruneMu.RLock() | ||
| defer r.pruneMu.RUnlock() | ||
| return r.getReceiptByTxHashFromFilesLocked(ctx, txHash, files) | ||
| } |
There was a problem hiding this comment.
The only places where I see this method called pass in nil for the list of files. Is the files parameter actually needed?
| return err | ||
| } | ||
|
|
||
| if s.txHashIndex != nil { |
There was a problem hiding this comment.
Was this discussed on a call? I'm also curious about how we handle a crash here.
cody-littley
left a comment
There was a problem hiding this comment.
LLM review left the following commnets:
Bug Report: Branch vs Main
Branch commits: 5 commits (tx hash index for parquet receipt store + pruning race fixes)
Scope: 21 changed files in sei-db/
Clean areas: parquet/reader.go, parquet/store.go, config changes, and test changes — no bugs found.
Bug 1 (High): Iterator leak in PruneBefore
File: sei-db/ledger_db/receipt/tx_hash_index.go, lines 146–190
The Pebble iterator opened at line 146 has no defer close. The batch gets a deferred Close() (line 155), but the iterator is only closed explicitly at line 188 on the happy path. There are four early-return error paths (lines 169, 173, 178, 185) that all leak it.
A leaked Pebble iterator pins its memtable/sstable snapshot, blocking compaction and growing memory. Since the pruner retries on a timer, repeated transient I/O errors would accumulate leaked iterators.
Fix: Add defer iter.Close() right after the NewIter nil-error check at line 152, mirroring the batch pattern.
Bug 2 (High): Resource leak when replayWAL() fails during construction
File: sei-db/ledger_db/receipt/parquet_store.go, lines 70–72
If the tx-hash index backend is Pebble, the constructor opens the Pebble DB (line 51) and starts the pruner goroutine (line 63) before calling replayWAL() at line 70. If replayWAL() fails, the error return leaks all three resources:
store(parquet.Store) — open DuckDB connection, parquet writers, its own prune goroutineidx(PebbleTxHashIndex) — open Pebble database with file locksprunergoroutine — running in background, holding a reference to the index
Note the contrast with the error paths inside the switch (lines 53, 66) which correctly call store.Close(). The Pebble file lock prevents re-opening the index on retry.
Fix: Call wrapper.Close() before returning the error (it already handles all three teardowns).
Bug 3 (Medium): IndexBlock overwrite leaves a stale reverse-index entry
File: sei-db/ledger_db/receipt/tx_hash_index.go, lines 119–139
When the same tx hash is indexed at a new block number, IndexBlock overwrites the primary key (h + txHash → newBlock) and writes a new reverse key (b + newBlock + txHash), but never deletes the old reverse key (b + oldBlock + txHash).
Scenario:
IndexBlock(100, [A])→ writesh+A → 100andb+100+A → []IndexBlock(200, [A])→ writesh+A → 200andb+200+A → []— butb+100+AremainsPruneBefore(150)→ scans[b+0, b+150), findsb+100+A, deletesh+AGetBlockNumber(A)→ returns(0, false, nil)even though the receipt lives at block 200
The receipt is still in parquet, so the query falls back to a full DuckDB scan — no data loss, but the index silently degrades to full-scan performance for every overwritten-then-pruned hash.
Fix: In IndexBlock, read the existing value of h + txHash before overwriting. If it exists and differs, delete the old b + oldBlock + txHash entry in the same batch.
Bug 4 (Medium): txHashIndexPruner.Stop() panics on double call
File: sei-db/ledger_db/receipt/tx_hash_index.go, lines 252–254
Stop() calls close(p.stopCh) with no sync.Once guard. A second call panics with "close of closed channel". This is reachable because parquetReceiptStore.Close() checks s.indexPruner != nil but never nils it out after stopping. A deferred cleanup + explicit Close() (a normal Go pattern) would crash the node.
Contrast with PebbleTxHashIndex.Close() at line 199, which correctly uses closeOnce.
Fix: Add a sync.Once to Stop(), consistent with the rest of the file.
Bug 5 (Low): Pruner starts before WAL replay, creating a race window
File: sei-db/ledger_db/receipt/parquet_store.go, lines 63 and 70
The pruner goroutine is started at line 63 and immediately executes a prune cycle (the sleep comes after the prune in the loop body). WAL replay doesn't start until line 70. If the WAL replays index entries for blocks that the pruner concurrently decides are pruneable, the two goroutines race on the same Pebble keys.
In practice the window is narrow (the pruner targets old blocks while the WAL replays recent ones), but it's an unnecessary race that's trivially avoided.
Fix: Move pruner.Start() to after replayWAL() returns successfully.
Create a tx index to store (tx hash -> block number) to allow parquet to quickly serve getReceiptByHash queries. unit tests
Describe your changes and provide context
Create a tx index to store (tx hash -> block number) to allow parquet to quickly serve getReceiptByHash queries.
Testing performed to validate your change
unit tests