Parquet crash testing unit testing hooks by jewei1997 · Pull Request #3028 · sei-protocol/sei-chain

jewei1997 · 2026-03-05T21:52:40Z

Describe your changes and provide context

This PR adds test-only fault-injection hooks to the parquet receipt store so we can simulate crashes at specific points in the write pipeline and validate recovery behavior. The hooks cover the key stages of persistence: after WAL write, before parquet flush, after parquet flush, after closing writers during file rotation, and after WAL clear during rotation.

It also adds a SimulateCrash() helper that intentionally abandons the store without the normal flush/finalization path, which lets the tests mimic abrupt process termination and then reopen the same store directory to verify recovery.

On top of that, this PR adds parquet receipt crash-recovery coverage that:

verifies recovery at each hook point, including file-rotation scenarios
runs randomized multi-crash stress tests to ensure WAL-committed blocks remain readable after reopen
verifies concurrent readers can still read committed receipts and logs while writes are artificially slowed
The goal is to increase confidence in parquet receipt durability and crash recovery behavior without changing normal production behavior outside of tests.

Testing performed to validate your change

go test ./sei-db/ledger_db/receipt -run 'TestCrashRecoveryAtEachHookPoint|TestCrashRecoveryStress|TestSlowFlushWithConcurrentReads' -count=1

github-actions · 2026-03-05T21:53:39Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Mar 9, 2026, 1:49 PM

codecov · 2026-03-05T21:56:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.27%. Comparing base (6cb9631) to head (51fabfe).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3028      +/-   ##
==========================================
- Coverage   58.33%   58.27%   -0.06%     
==========================================
  Files        2079     2077       -2     
  Lines      171723   171262     -461     
==========================================
- Hits       100168    99810     -358     
+ Misses      62630    62563      -67     
+ Partials     8925     8889      -36

Flag	Coverage Δ
sei-chain-pr	`74.24% <100.00%> (?)`
sei-db	`70.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
sei-db/ledger_db/parquet/store.go	`69.66% <100.00%> (+4.98%)`	⬆️

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

blindchaser · 2026-03-06T18:05:43Z

sei-db/ledger_db/parquet/store.go

+// file descriptors and locks so the test process can reopen the same directory.
+func (s *Store) SimulateCrash() {
+	if s.pruneStop != nil {
+		close(s.pruneStop)


set pruneStop to nil after close? otherwise Close() will do double close on a closed channel

blindchaser · 2026-03-06T18:06:17Z

sei-db/ledger_db/receipt/parquet_crash_test.go

+// be recoverable via WAL replay.
+func TestCrashRecoveryStress(t *testing.T) {
+	seed := int64(42)
+	t.Logf("random seed: %d (change to reproduce a specific run)", seed)


it looks the seed is always 42 not random?

## Describe your changes and provide context This PR adds test-only fault-injection hooks to the parquet receipt store so we can simulate crashes at specific points in the write pipeline and validate recovery behavior. The hooks cover the key stages of persistence: after WAL write, before parquet flush, after parquet flush, after closing writers during file rotation, and after WAL clear during rotation. It also adds a SimulateCrash() helper that intentionally abandons the store without the normal flush/finalization path, which lets the tests mimic abrupt process termination and then reopen the same store directory to verify recovery. On top of that, this PR adds parquet receipt crash-recovery coverage that: verifies recovery at each hook point, including file-rotation scenarios runs randomized multi-crash stress tests to ensure WAL-committed blocks remain readable after reopen verifies concurrent readers can still read committed receipts and logs while writes are artificially slowed The goal is to increase confidence in parquet receipt durability and crash recovery behavior without changing normal production behavior outside of tests. ## Testing performed to validate your change go test ./sei-db/ledger_db/receipt -run 'TestCrashRecoveryAtEachHookPoint|TestCrashRecoveryStress|TestSlowFlushWithConcurrentReads' -count=1

jewei1997 added 7 commits March 5, 2026 16:29

parquet crash testing

6c3d139

crash testing on node

61f949a

Fix parquet crash hook defaults

dc46d3e

parquet magic bytes fixes

112989c

cleanup

2c68b31

add crash testing unit tests

e2bee3f

Fix parquet crash test imports

e3870cb

jewei1997 added the non-app-hash-breaking label Mar 5, 2026

jewei1997 marked this pull request as ready for review March 6, 2026 12:54

jewei1997 added 2 commits March 6, 2026 07:54

Merge branch 'main' into STO-378/parquet-crash-testing

109e86a

fix lint

413f99e

blindchaser reviewed Mar 6, 2026

View reviewed changes

blindchaser approved these changes Mar 6, 2026

View reviewed changes

yzang2019 approved these changes Mar 6, 2026

View reviewed changes

Merge branch 'main' into STO-378/parquet-crash-testing

51fabfe

jewei1997 enabled auto-merge (squash) March 9, 2026 13:48

Kbhat1 approved these changes Mar 9, 2026

View reviewed changes

jewei1997 merged commit aabf783 into main Mar 9, 2026
39 checks passed

jewei1997 deleted the STO-378/parquet-crash-testing branch March 9, 2026 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet crash testing unit testing hooks#3028

Parquet crash testing unit testing hooks#3028
jewei1997 merged 10 commits intomainfrom
STO-378/parquet-crash-testing

jewei1997 commented Mar 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

blindchaser Mar 6, 2026

Uh oh!

blindchaser Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jewei1997 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes and provide context

Testing performed to validate your change

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

blindchaser Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

blindchaser Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jewei1997 commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

codecov bot commented Mar 5, 2026 •

edited

Loading