-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflake DeleteSchedulerTest.StartBGEmptyTrashMultipleTimes #9706
Conversation
The designed sync point may not be hit if trash file is generated faster than deleting. Then the file will be deleted directly instead of waiting for background trash empty thread to do it. Increase SstFileManager Trash/DB ratio to avoid that. Test Plan: `gtest-parallel ./delete_scheduler_test --gtest_filter=DeleteSchedulerTest.StartBGEmptyTrashMultipleTimes -r 10000 -w 100` It was likely to happen on one of the host.
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
file/delete_scheduler_test.cc
Outdated
// If trash file is generated faster than deleting, delete_scheduler will | ||
// delete it directly instead of waiting for background trash empty thread to | ||
// clean it. Set the ratio higher to avoid that. | ||
sst_file_mgr_->SetMaxTrashDBRatio(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like the chosen ratio 10
depends on the number of iterations 5
? If there are more iterations, we will be generating more trash. Suppose trash empty thread lags behind, then it will lag behind even more with more iterations. Therefore, we need to increase the trash/db ratio even more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we should use a const variable for that.
It should only depend on the dummy files number inside of each run, which is 10:
rocksdb/file/delete_scheduler_test.cc
Lines 455 to 456 in 5cdc390
// Generate 10 dummy files and move them to trash | |
for (int i = 0; i < 10; i++) { |
Each run should be indecent, because it
delete_scheduler_->WaitForEmptyTrash();
(and the test assumes all deleted file should go to trash).
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
// Move files to trash, wait for empty trash, start again | ||
for (int run = 1; run <= 5; run++) { | ||
// Generate 10 dummy files and move them to trash | ||
for (int i = 0; i < 10; i++) { | ||
for (int i = 0; i < kTestFileNum; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor: maybe the comment in line 456 should also be updated to
// Generate kTestFileNum dummy files
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The designed sync point may not be hit if trash file is generated faster
than deleting. Then the file will be deleted directly instead of waiting
for background trash empty thread to do it.
Increase SstFileManager Trash/DB ratio to avoid that.
Test Plan:
gtest-parallel ./delete_scheduler_test --gtest_filter=DeleteSchedulerTest.StartBGEmptyTrashMultipleTimes -r 10000 -w 100
It was likely to happen on one of the host.