Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset refitting_level_ flag to false in error paths #7403

Closed
wants to merge 11 commits into from

Conversation

ramvadiv
Copy link
Contributor

Reset refitting_level_ flag to false in error paths in DBImpl::ReFitLevel()

@ajkr
Copy link
Contributor

ajkr commented Sep 18, 2020

I restarted the failed jobs.

This test I've been trying to deflake for a while so will try fixing this failure:

[----------] Global test environment set-up.
[----------] 1 test from DBCompactionTestWithParam/DBCompactionTestWithParam
[ RUN      ] DBCompactionTestWithParam/DBCompactionTestWithParam.FlushAfterIntraL0CompactionCheckConsistencyFail/0
db/db_compaction_test.cc:5349: Failure
Expected equality of these values:
  i + 1
    Which is: 10
  NumTableFilesAtLevel(0)
    Which is: 5
[  FAILED  ] DBCompactionTestWithParam/DBCompactionTestWithParam.FlushAfterIntraL0CompactionCheckConsistencyFail/0, where GetParam() = (1, true) (202 ms)
[----------] 1 test from DBCompactionTestWithParam/DBCompactionTestWithParam (202 ms total)

This one's surprising. Maybe it is flaky:

[ RUN      ] DBOptionsTest.ChangeCompression
Failed to check Status 0x7ffc15331600
#0   ./db_options_test() [0x6bdd4f] rocksdb::port::PrintStack(int)	/home/travis/build/facebook/rocksdb/port/stack_trace.cc:121	
#1   ./db_options_test() [0x409d56] rocksdb::Status::~Status()	/home/travis/build/facebook/rocksdb/./include/rocksdb/status.h:43	
#2   ./db_options_test() [0x445613] rocksdb::Status::~Status()	??:?	
#3   ./db_options_test() [0x43c9d8] testing::AssertionResult::AssertionResult<bool>(bool const&, testing::internal::EnableIf<!testing::internal::ImplicitlyConvertible<bool, testing::AssertionResult>::value>::type*)	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest.h:19710 (discriminator 2)	
#4   ./db_options_test() [0x49db97] void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3959	
#5   ./db_options_test() [0x493f06] testing::internal::UnitTestImpl::os_stack_trace_getter()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6935	
#6   ./db_options_test() [0x494185] testing::Test::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3965	
#7   ./db_options_test() [0x494355] testing::TestInfo::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4124	
#8   ./db_options_test() [0x494835] testing::TestCase::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6685	
#9   ./db_options_test() [0x49e0d7] bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3959	
#10  ./db_options_test() [0x494b23] testing::UnitTest::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6245	
#11  ./db_options_test() [0x40d0ed] main	/home/travis/build/facebook/rocksdb/db/db_options_test.cc:938	
#12  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x1476d1eba830] ??	??:0	
#13  ./db_options_test() [0x427979] _start	??:?	
Received signal 6 (Aborted)
#0   /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x1476d1ecf428] ??	??:0	
#1   /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x1476d1ed102a] ??	??:0	
#2   ./db_options_test() [0x409d5b] rocksdb::Status::~Status() [clone .part.41]	db_options_test.cc:?	
#3   ./db_options_test() [0x445613] rocksdb::Status::~Status()	??:?	
#4   ./db_options_test() [0x43c9d8] testing::AssertionResult::AssertionResult<bool>(bool const&, testing::internal::EnableIf<!testing::internal::ImplicitlyConvertible<bool, testing::AssertionResult>::value>::type*)	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest.h:19710 (discriminator 2)	
#5   ./db_options_test() [0x49db97] void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3959	
#6   ./db_options_test() [0x493f06] testing::internal::UnitTestImpl::os_stack_trace_getter()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6935	
#7   ./db_options_test() [0x494185] testing::Test::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3965	
#8   ./db_options_test() [0x494355] testing::TestInfo::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4124	
#9   ./db_options_test() [0x494835] testing::TestCase::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6685	
#10  ./db_options_test() [0x49e0d7] bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3959	
#11  ./db_options_test() [0x494b23] testing::UnitTest::Run()	/home/travis/build/facebook/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6245	
#12  ./db_options_test() [0x40d0ed] main	/home/travis/build/facebook/rocksdb/db/db_options_test.cc:938	
#13  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x1476d1eba830] ??	??:0	
#14  ./db_options_test() [0x427979] _start	??:?	
/bin/bash: line 1: 11020 Aborted                 (core dumped) ./$t
Makefile:1011: recipe for target 'check_some' failed

For testing, I actually did not find much tests for manual compaction failure scenarios (maybe that's why there's so many bugs). Here's one I wrote recently that might be a good starting point: https://github.com/ajkr/rocksdb/blob/46cf49c896dc4a53bee44c5ba807205610125e24/db/db_compaction_test.cc#L5445-L5537. I think the key difference is you want the L0->L1 to be issued before the ReFitLevel() that does an L2->L1 in order to make the refitting compaction fail. Then, without this PR, any subsequent refitting compaction would also fail. After this PR, they should be able to succeed.

@ramvadiv
Copy link
Contributor Author

This test runs fine in my devserver - but seems to fail pretty consistently on TravisCI. Will investigate more before pushing this thru...

@ajkr
Copy link
Contributor

ajkr commented Sep 18, 2020

BTW, it's a bit hidden under all the test output, but just want to make sure it's noticed that the last paragraph in my above comment has a suggestion for how to test this bug fix. Admittedly, the suggestion requires the most effort per additional line covered I can imagine -- but we are really lacking in CompactRange() failure tests (I could only find that one :/ ) so in terms of important feature coverage it'd be a big step forward.

@ajkr
Copy link
Contributor

ajkr commented Sep 18, 2020

Sorry for spam. I also forgot again to mention we should add a release note in HISTORY.md under "Unreleased" "Bug fixes" section.

@ramvadiv
Copy link
Contributor Author

Andrew - thanks a lot for the pointers/guidance on the test case. I have added a new test to exercise the RefitLevel() error path first and then retest the healthy path. Ran it with old RefitLevel() code and ensured it fails as expected and reran with the fix to ensure it passes.

@ramvadiv ramvadiv force-pushed the fix_refit_level_error_path branch 3 times, most recently from 1a6cbff to 9e65dda Compare September 24, 2020 08:56
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramvadiv has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramvadiv has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a suggestion on the release note.

HISTORY.md Outdated Show resolved Hide resolved
CompactRangeOptions cro;
cro.change_level = true;
cro.target_level = 1;
ASSERT_NOK(dbfull()->CompactRange(cro, &begin, &end));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I didn't notice the failure can be triggered without any sync points.

@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramvadiv has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@ramvadiv has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramvadiv has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ramvadiv ramvadiv deleted the fix_refit_level_error_path branch September 28, 2020 18:55
@facebook-github-bot
Copy link
Contributor

@ramvadiv merged this pull request in c203e01.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
Reset refitting_level_ flag to false in error paths in DBImpl::ReFitLevel()

Pull Request resolved: facebook#7403

Reviewed By: ajkr

Differential Revision: D23909028

Pulled By: ramvadiv

fbshipit-source-id: 521ad9aadc1b734bef9ef9119d1e1ee1fa8126e9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants