You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Database can be opened from a checkpoint with wal_recovery_mode=kAbsoluteConsistency
Actual behavior
Due to a few data race issues, sometimes active WAL file gets copied in inconsistent state.
Database open fails with one of these errors when wal_recovery_mode=kAbsoluteConsistency:
Corruption: truncated record body
Corruption: error reading trailing data
Steps to reproduce the behavior
Initially I wrote this heavy and flaky test, which sometimes reproduces this issue:
But I've also wrote more precise unit tests using sync points, so I'll include them into my PR with a suggested fix.
Conditions to reproduce are:
wal_size_for_flush is non-zero, so the WAL file gets copied during checkpoint;
while checkpoint is in progress, there are write operations happening in the background;
wal_recovery_mode = WALRecoveryMode::kAbsoluteConsistency when opening DB from the checkpoint.
This happens because size of the active WAL file is captured at a random moment:
truncated record body error happens when WAL file size is captured right after WritableFileWriter flush when in-memory buffer no longer has space for new data
error reading trailing data happens, when WAL record gets broken down into multiple physical records, and WAL file size was captured before last fragment has been written.
The text was updated successfully, but these errors were encountered:
Expected behavior
Database can be opened from a checkpoint with
wal_recovery_mode=kAbsoluteConsistency
Actual behavior
Due to a few data race issues, sometimes active WAL file gets copied in inconsistent state.
Database open fails with one of these errors when
wal_recovery_mode=kAbsoluteConsistency
:Corruption: truncated record body
Corruption: error reading trailing data
Steps to reproduce the behavior
Initially I wrote this heavy and flaky test, which sometimes reproduces this issue:
But I've also wrote more precise unit tests using sync points, so I'll include them into my PR with a suggested fix.
Conditions to reproduce are:
wal_size_for_flush
is non-zero, so the WAL file gets copied during checkpoint;wal_recovery_mode = WALRecoveryMode::kAbsoluteConsistency
when opening DB from the checkpoint.This happens because size of the active WAL file is captured at a random moment:
truncated record body
error happens when WAL file size is captured right afterWritableFileWriter
flush when in-memory buffer no longer has space for new dataerror reading trailing data
happens, when WAL record gets broken down into multiple physical records, and WAL file size was captured before last fragment has been written.The text was updated successfully, but these errors were encountered: