Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: fix breakage caused by the compaction threads value change (#197) #198

Merged
merged 1 commit into from
Oct 24, 2022

Conversation

isaac-io
Copy link
Contributor

In #194 the default value for background compaction threads was changed to 8 (from 1). This caused some tests to fail, but they were fixed. Alas, the fixes seems to not be complete, as under stress some tests still fail.

Make it so the tests are truly fixed now. This can be checked by running them in a loop with the machine overloaded:

$ while ./db_compaction_test --gtest_filter=DBCompactionTestWithParam/DBCompactionTestWithParam.PartialCompactionFailure/1; do sleep 1; done

And

$ while ./deletefile_test --gtest_filter=DeleteFileTest.BackgroundPurgeCopyOptions; do sleep 1; done

@isaac-io isaac-io self-assigned this Oct 23, 2022
@isaac-io isaac-io linked an issue Oct 23, 2022 that may be closed by this pull request
Copy link
Contributor

@Yuval-Ariel Yuval-Ariel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yuval-Ariel
Copy link
Contributor

CI failed, looks like theres another such test :
2122 : 1666534254.761 0.105 0 803 1 0 ./t/run-db_range_del_test-DBRangeDelTest.TableEvictedDuringScan >& t/log-run-db_range_del_test-DBRangeDelTest.TableEvictedDuringScan || bash -c "cat t/log-run-db_range_del_test-DBRangeDelTest.TableEvictedDuringScan; exit $?"

but this seems more related to the setting of purges as low priority. iterator deletion timing?

@isaac-io
Copy link
Contributor Author

CI failed, looks like theres another such test : 2122 : 1666534254.761 0.105 0 803 1 0 ./t/run-db_range_del_test-DBRangeDelTest.TableEvictedDuringScan >& t/log-run-db_range_del_test-DBRangeDelTest.TableEvictedDuringScan || bash -c "cat t/log-run-db_range_del_test-DBRangeDelTest.TableEvictedDuringScan; exit $?"

but this seems more related to the setting of purges as low priority. iterator deletion timing?

Doesn't seem to be the case. The iterator is not created with background purge, and in any case I don't see why it would get stuck because of purges. There's nothing blocking them. Also, this test is about snapshots, and that should not block purges (it only uses the iterator after the compaction finished, so there shouldn't be anything to purge anyway).

I'm also unable to reproduce it locally, even under stress, so this seems like a spurious error. I'll trigger the unit tests run again just to be sure.

@Yuval-Ariel
Copy link
Contributor

heres the error:
[ RUN ] DBRangeDelTest.TableEvictedDuringScan
db/db_range_del_test.cc:659: Failure
Expected: (NumTableFilesAtLevel(1)) > (1), actual: 1 vs 1
[ FAILED ] DBRangeDelTest.TableEvictedDuringScan (45 ms)
seems unrelated to this change so once the CI passes its good to go

@isaac-io
Copy link
Contributor Author

heres the error: [ RUN ] DBRangeDelTest.TableEvictedDuringScan db/db_range_del_test.cc:659: Failure Expected: (NumTableFilesAtLevel(1)) > (1), actual: 1 vs 1 [ FAILED ] DBRangeDelTest.TableEvictedDuringScan (45 ms) seems unrelated to this change so once the CI passes its good to go

Oh. That's actually probably caused by the changed in #194 as well. The test has two levels (L0 and L1) with compaction trigger and stop write trigger set to 4. It then creates 25 L0 files, which should compact to 1 or more L1 files. However, the test require there to be more than a single file in L1, and here timing matters, because it might be that the call to TEST_WaitForCompact() is delayed enough to catch a second concurrently scheduled compaction, causing it to arrive at a state where only a single file is left at L1. Nice catch! I'll push a fix shortly.

@isaac-io isaac-io force-pushed the 197-unit-tests-intermittently-failing-on-main branch 2 times, most recently from c9cdded to b0a56cd Compare October 23, 2022 17:59
@isaac-io
Copy link
Contributor Author

@isaac-io
Copy link
Contributor Author

Since this is a test-only change, I don't think that we need to wait for the full CI to pass. Thoughts, @assaf-speedb?

In #194 the default value for background compaction threads was changed
to 8 (from 1). This caused some tests to fail, but they were fixed. Alas,
the fixes seems to not be complete, as under stress some tests still fail.

Make it so the tests are truly fixed now. This can be checked by running
them in a loop with the machine overloaded:

```
$ while ./db_compaction_test --gtest_filter=DBCompactionTestWithParam/DBCompactionTestWithParam.PartialCompactionFailure/1; do sleep 1; done
```

And

```
$ while ./deletefile_test --gtest_filter=DeleteFileTest.BackgroundPurgeCopyOptions; do sleep 1; done
```
@isaac-io isaac-io force-pushed the 197-unit-tests-intermittently-failing-on-main branch from b0a56cd to 55d6894 Compare October 24, 2022 18:19
@isaac-io isaac-io merged commit a54d76c into main Oct 24, 2022
@isaac-io isaac-io deleted the 197-unit-tests-intermittently-failing-on-main branch October 24, 2022 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unit tests intermittently failing on main
2 participants