Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition causing double deletion of ssts #3638

Closed
wants to merge 9 commits into from

Conversation

riversand963
Copy link
Contributor

@riversand963 riversand963 commented Mar 21, 2018

Possible interleaved execution of background compaction thread calling FindObsoleteFiles (no full scan) / PurgeObsoleteFiles and user thread calling FindObsoleteFiles (full scan) / PurgeObsoleteFiles can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return IO error. This may occur to other files, but this PR targets sst.
Also add a unit test to verify that this PR fixes the issue.

The newly added unit test obsolete_files_test has a test case for this scenario, implemented in ObsoleteFilesTest#RaceForObsoleteFileDeletion. TestSyncPoints are used to coordinate the interleaving the user_thread and background compaction thread. They execute as follows

timeline              user_thread                background_compaction thread
t1   |                                          FindObsoleteFiles(full_scan=false)
t2   |     FindObsoleteFiles(full_scan=true)
t3   |                                          PurgeObsoleteFiles
t4   |     PurgeObsoleteFiles
     V

When user_thread invokes FindObsoleteFiles with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then user_thread will see an IO error when trying to delete these files in PurgeObsoleteFiles because background compaction thread has already deleted the file in PurgeObsoleteFiles.
To fix this, we make RocksDB remember which (SST) files have been found by threads after calling FindObsoleteFiles (see DBImpl#files_grabbed_for_purge_). Therefore, when another thread calls FindObsoleteFiles with full scan, it will not collect such files.

This has also been reported in ISSUE 3573.

@ajkr could you take a look and comment? Thanks!

Yanqin Jin added 4 commits March 16, 2018 15:35
Summary: Two threads running DBImpl::FindObsoleteFiles and
DBImpl::PurgeObsoleteFiles concurrently can cause double deletion of a file.

Test Plan:
Write a unit test called double_deletefile_test to reproduce the following race condition.

- Thread A runs `DBImpl::FindObsoleteFiles` which returns the deleted file1.
- Some external entity installs a new Version, perhaps due to compaction/flush finishing.
- Thread B runs `DBImpl::FindObsoleteFiles` with `doing_the_full_scan=true`. It also finds
file1 is eligible for deletion because it's not in the current Version and is present in the DB directory.
- Thread A runs `DBImpl::PurgeObsoleteFiles` which deletes file1.
- Thread B runs `DBImpl::PurgeObsoleteFiles` which also tries to delete file1, but fails.

Reviewers: andrewkr

Subscribers:

Tasks: T26799498

Tags:
Summary: Two threads running DBImpl::FindObsoleteFiles and
DBImpl::PurgeObsoleteFiles concurrently can cause double deletion of a file.

Test Plan:
Write a unit test called double_deletefile_test to reproduce the following race condition.

- Thread A runs `DBImpl::FindObsoleteFiles` which returns the deleted file1.
- Some external entity installs a new Version, perhaps due to compaction/flush finishing.
- Thread B runs `DBImpl::FindObsoleteFiles` with `doing_the_full_scan=true`. It also finds
file1 is eligible for deletion because it's not in the current Version and is present in the DB directory.
- Thread A runs `DBImpl::PurgeObsoleteFiles` which deletes file1.
- Thread B runs `DBImpl::PurgeObsoleteFiles` which also tries to delete file1, but fails.

Reviewers: andrewkr

Subscribers:

Tasks: T26799498

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. View: changes

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. View: changes

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, a few comments.

}
};

TEST_F(DoubleDeleteFileTest, RaceForObsoleteFileDeletion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we either (1) make this file named more generally (e.g., it could be for all obsolete file tests); or (2) move this test into "db/db_test2.cc"? It's because ideally we don't have a separate file dedicated to each test. If we go with (1), we can follow-up with moving other obsolete file tests into this file (there are more spread across various places).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

CheckFileTypeCounts(options_.wal_dir, 1, 0, 0);

SyncPoint::GetInstance()->LoadDependency({
{"DBImpl::BackgroundCallCompaction:2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of our tests do follow this naming convention. But I find tests more readable if we use self-descriptive names, like "DBImpl::BackgroundCallCompaction:FoundObsoleteFiles" for this one, which would indicate it happens right after the call to FindObsoleteFiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

db/version_set.h Outdated
@@ -938,6 +942,8 @@ class VersionSet {
std::vector<FileMetaData*> obsolete_files_;
std::vector<std::string> obsolete_manifests_;

std::unordered_map<uint64_t, bool> ssts_grabbed_for_purge_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the bool for? can this be an unordered_set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment in DBImpl#PurgeObsoleteFiles above the declaration of sst_live_map local variable:

// Now, convert live list to an unordered map, WITHOUT mutex held;
// set is slow.

The bool has no use here, at least for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2459f7e#diff-71a34e2e4e097b44064da7d3a2b68bd8R638 explicitly changes unordered_set to unordered_map for performance reason. You mean this change is a typo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got it, I assumed the value of the map is used but it isn't. Siying might be able to clarify the motivation of that change.

Either way, once we have vector or autovector in JobContext and VersionStorageInfo, it won't matter as much which representation is chosen.

db/version_set.h Outdated
@@ -858,6 +858,10 @@ class VersionSet {
std::vector<std::string>* manifest_filenames,
uint64_t min_pending_output);

bool IsGrabbedForPurge(uint64_t file_number) const;
void MarkAsGrabbedForPurge(uint64_t file_number);
void MarkAsPurged(uint64_t file_number);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about how we can verify the map is properly cleaned up. One idea is assert the map is empty in DBImpl::CloseHelper after pending_purge_obsolete_files_ reaches zero. Would it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE.

PurgeObsoleteFiles(job_context);
TEST_SYNC_POINT("DBImpl::BackgroundCallCompaction:3");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sync point with the same name above is probably unneeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ajkr
Copy link
Contributor

ajkr commented Mar 23, 2018

BTW, you can use the "Import to Phabricator" button, which will create an internal phabricator diff and run other tests on it.

@riversand963
Copy link
Contributor Author

Cool, I'll do it now. In the meantime, will look into the comments. Thanks!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only minor comments.

db/db_impl.h Outdated
@@ -1384,6 +1388,9 @@ class DBImpl : public DB {
bool HaveManualCompaction(ColumnFamilyData* cfd);
bool MCOverlap(ManualCompactionState* m, ManualCompactionState* m1);

bool ShouldIPurge(uint64_t file_number) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "I" for? would ShouldPurge be sufficient?

Copy link
Contributor Author

@riversand963 riversand963 Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I" stands for this thread, but I think I can remove it.

SyncPoint::GetInstance()->EnableProcessing();

DBImpl* dbi = reinterpret_cast<DBImpl*>(db_);
port::Thread userThread([&]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use lowercase underscore-separated variable names per google style guide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which variables should follow google style guide, and which should follow Java style? For example, SyncPoint::GetInstance()->EnableProcessing() vs. user_thread?

// and db mutex (mutex_) should already be held. This function performs a
// linear scan of an vector (files_grabbed_for_purge_) in search of a
// certain element. We expect FindObsoleteFiles with full scan to occur once
// every 10 hours, and the size of the vector is small. Therefore, the cost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention the 10 hours is the default value to avoid confusion, since it won't be true for everyone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants