-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition causing double deletion of ssts #3638
Conversation
Summary: Two threads running DBImpl::FindObsoleteFiles and DBImpl::PurgeObsoleteFiles concurrently can cause double deletion of a file. Test Plan: Write a unit test called double_deletefile_test to reproduce the following race condition. - Thread A runs `DBImpl::FindObsoleteFiles` which returns the deleted file1. - Some external entity installs a new Version, perhaps due to compaction/flush finishing. - Thread B runs `DBImpl::FindObsoleteFiles` with `doing_the_full_scan=true`. It also finds file1 is eligible for deletion because it's not in the current Version and is present in the DB directory. - Thread A runs `DBImpl::PurgeObsoleteFiles` which deletes file1. - Thread B runs `DBImpl::PurgeObsoleteFiles` which also tries to delete file1, but fails. Reviewers: andrewkr Subscribers: Tasks: T26799498 Tags:
Summary: Two threads running DBImpl::FindObsoleteFiles and DBImpl::PurgeObsoleteFiles concurrently can cause double deletion of a file. Test Plan: Write a unit test called double_deletefile_test to reproduce the following race condition. - Thread A runs `DBImpl::FindObsoleteFiles` which returns the deleted file1. - Some external entity installs a new Version, perhaps due to compaction/flush finishing. - Thread B runs `DBImpl::FindObsoleteFiles` with `doing_the_full_scan=true`. It also finds file1 is eligible for deletion because it's not in the current Version and is present in the DB directory. - Thread A runs `DBImpl::PurgeObsoleteFiles` which deletes file1. - Thread B runs `DBImpl::PurgeObsoleteFiles` which also tries to delete file1, but fails. Reviewers: andrewkr Subscribers: Tasks: T26799498 Tags:
@riversand963 has updated the pull request. View: changes |
@riversand963 has updated the pull request. View: changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, a few comments.
db/double_deletefile_test.cc
Outdated
} | ||
}; | ||
|
||
TEST_F(DoubleDeleteFileTest, RaceForObsoleteFileDeletion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we either (1) make this file named more generally (e.g., it could be for all obsolete file tests); or (2) move this test into "db/db_test2.cc"? It's because ideally we don't have a separate file dedicated to each test. If we go with (1), we can follow-up with moving other obsolete file tests into this file (there are more spread across various places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
db/double_deletefile_test.cc
Outdated
CheckFileTypeCounts(options_.wal_dir, 1, 0, 0); | ||
|
||
SyncPoint::GetInstance()->LoadDependency({ | ||
{"DBImpl::BackgroundCallCompaction:2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of our tests do follow this naming convention. But I find tests more readable if we use self-descriptive names, like "DBImpl::BackgroundCallCompaction:FoundObsoleteFiles" for this one, which would indicate it happens right after the call to FindObsoleteFiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
db/version_set.h
Outdated
@@ -938,6 +942,8 @@ class VersionSet { | |||
std::vector<FileMetaData*> obsolete_files_; | |||
std::vector<std::string> obsolete_manifests_; | |||
|
|||
std::unordered_map<uint64_t, bool> ssts_grabbed_for_purge_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the bool
for? can this be an unordered_set
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment in DBImpl#PurgeObsoleteFiles
above the declaration of sst_live_map
local variable:
// Now, convert live list to an unordered map, WITHOUT mutex held;
// set is slow.
The bool
has no use here, at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a typo from 2459f7e#diff-71a34e2e4e097b44064da7d3a2b68bd8R638.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2459f7e#diff-71a34e2e4e097b44064da7d3a2b68bd8R638 explicitly changes unordered_set
to unordered_map
for performance reason. You mean this change is a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got it, I assumed the value of the map is used but it isn't. Siying might be able to clarify the motivation of that change.
Either way, once we have vector or autovector in JobContext and VersionStorageInfo, it won't matter as much which representation is chosen.
db/version_set.h
Outdated
@@ -858,6 +858,10 @@ class VersionSet { | |||
std::vector<std::string>* manifest_filenames, | |||
uint64_t min_pending_output); | |||
|
|||
bool IsGrabbedForPurge(uint64_t file_number) const; | |||
void MarkAsGrabbedForPurge(uint64_t file_number); | |||
void MarkAsPurged(uint64_t file_number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about how we can verify the map is properly cleaned up. One idea is assert the map is empty in DBImpl::CloseHelper
after pending_purge_obsolete_files_
reaches zero. Would it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE.
db/db_impl_compaction_flush.cc
Outdated
PurgeObsoleteFiles(job_context); | ||
TEST_SYNC_POINT("DBImpl::BackgroundCallCompaction:3"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sync point with the same name above is probably unneeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
BTW, you can use the "Import to Phabricator" button, which will create an internal phabricator diff and run other tests on it. |
Cool, I'll do it now. In the meantime, will look into the comments. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@riversand963 has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@riversand963 has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only minor comments.
db/db_impl.h
Outdated
@@ -1384,6 +1388,9 @@ class DBImpl : public DB { | |||
bool HaveManualCompaction(ColumnFamilyData* cfd); | |||
bool MCOverlap(ManualCompactionState* m, ManualCompactionState* m1); | |||
|
|||
bool ShouldIPurge(uint64_t file_number) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "I" for? would ShouldPurge
be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I" stands for this thread, but I think I can remove it.
db/obsolete_files_test.cc
Outdated
SyncPoint::GetInstance()->EnableProcessing(); | ||
|
||
DBImpl* dbi = reinterpret_cast<DBImpl*>(db_); | ||
port::Thread userThread([&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use lowercase underscore-separated variable names per google style guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which variables should follow google style guide, and which should follow Java style? For example, SyncPoint::GetInstance()->EnableProcessing()
vs. user_thread
?
db/db_impl_compaction_flush.cc
Outdated
// and db mutex (mutex_) should already be held. This function performs a | ||
// linear scan of an vector (files_grabbed_for_purge_) in search of a | ||
// certain element. We expect FindObsoleteFiles with full scan to occur once | ||
// every 10 hours, and the size of the vector is small. Therefore, the cost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe mention the 10 hours is the default value to avoid confusion, since it won't be true for everyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@riversand963 has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Possible interleaved execution of background compaction thread calling
FindObsoleteFiles (no full scan) / PurgeObsoleteFiles
and user thread callingFindObsoleteFiles (full scan) / PurgeObsoleteFiles
can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and returnIO error
. This may occur to other files, but this PR targets sst.Also add a unit test to verify that this PR fixes the issue.
The newly added unit test
obsolete_files_test
has a test case for this scenario, implemented inObsoleteFilesTest#RaceForObsoleteFileDeletion
.TestSyncPoint
s are used to coordinate the interleaving theuser_thread
and background compaction thread. They execute as followsWhen
user_thread
invokesFindObsoleteFiles
with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Thenuser_thread
will see an IO error when trying to delete these files inPurgeObsoleteFiles
because background compaction thread has already deleted the file inPurgeObsoleteFiles
.To fix this, we make RocksDB remember which (SST) files have been found by threads after calling
FindObsoleteFiles
(seeDBImpl#files_grabbed_for_purge_
). Therefore, when another thread callsFindObsoleteFiles
with full scan, it will not collect such files.This has also been reported in ISSUE 3573.
@ajkr could you take a look and comment? Thanks!