Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Compaction causes data inconsistency when using snapshots #320
When using snapshots, background compaction causes data inconsistency. Existing records can became missing or previously existed records or their values can appear again. Test program that reproduces the issue https://gist.github.com/specialforest/fa136b660bdd672e3d4b
The error occurs when leveldb does a compaction while there are several records with the same key but different sequence numbers. This can occur when snapshots are in play. The compaction algorithm can choose a file such that it will push the latest record from level i to level i+1, then a following get will retrieve the older record from level i which is incorrect. I see two ways to fix: (i) always compact a file set that is closed with respect records with the same key, e.g. if files f1 and f2 have records with key k1 then compact f1 and f2 together, or (ii) when considering f1 for compaction check that no older record of any k in f1 exists at level i, it is sufficient to check the boundary elements of the file f1 I believe, and if the criteria isn't met to try to compact the previous file (I assume you sort based on key and sequence number with increasing sequence numbers being later in the sequence of files at a level).