Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db_stress begin tracking expected state after verification #9470

Closed
wants to merge 2 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Jan 31, 2022

Previously we enabled tracking expected state changes during
FinishInitDb(), as soon as the DB was opened. This meant tracing was
enabled during the VerifyDb() in the setup phase (pre-OperateDb()). This cost extra CPU by requiring
DBImpl::trace_mutex_ to be acquired on each read operation. It was
unnecessary since we know there are no expected state changes during the
setup phase. So, this PR delays tracking expected state changes
until the end of the setup phase.

Test Plan:

Measured this PR reduced the VerifyDb() time in the setup phase 76% (387 -> 92 seconds) with
-disable_wal=1 (i.e., expected state tracking enabled).

  • benchmark command: ./db_stress -max_key=100000000 -ops_per_thread=1 -destroy_db_initially=1 -expected_values_dir=/dev/shm/dbstress_expected/ -db=/dev/shm/dbstress/ --clear_column_family_one_in=0 --disable_wal=1 --reopen=0
  • without this PR, VerifyDb() in the setup phase takes 387 seconds:
2022/01/30-21:43:04  Initializing worker threads
Crash-recovery verification passed :)
2022/01/30-21:49:31  Starting database operations
  • with this PR, VerifyDb() in the setup phase takes 92 seconds
2022/01/30-21:59:06  Initializing worker threads
Crash-recovery verification passed :)
2022/01/30-22:00:38  Starting database operations

Previously we enabled tracking expected state changes during
`FinishInitDb()`, as soon as the DB was opened. This meant tracing was
enabled during `VerifyDb()`. This cost extra CPU by requiring
`DBImpl::trace_mutex_` to be acquired on each read operation. It was
unnecessary since we know there are no expected state changes during the
`VerifyDb()` phase. So, this PR delays tracking expected state changes
until after the `VerifyDb()` phase has completed.

Test Plan:

Measured this PR reduced `VerifyDb()` 76% (387 -> 92 seconds) with
`-disable_wal=1` (i.e., expected state tracking enabled).

- benchmark command: `./db_stress -max_key=100000000 -ops_per_thread=1 -destroy_db_initially=1 -expected_values_dir=/dev/shm/dbstress_expected/ -db=/dev/shm/dbstress/ --clear_column_family_one_in=0 --disable_wal=1 --reopen=0`
- without this PR, `VerifyDb()` takes 387 seconds:

```
2022/01/30-21:43:04  Initializing worker threads
Crash-recovery verification passed :)
2022/01/30-21:49:31  Starting database operations
```

- with this PR, `VerifyDb()` takes 92 seconds

```
2022/01/30-21:59:06  Initializing worker threads
Crash-recovery verification passed :)
2022/01/30-22:00:38  Starting database operations
```
@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@riversand963
Copy link
Contributor

riversand963 commented Jan 31, 2022

There is an option verify_db_one_in which allows the worker threads to perform verification while still executing OperateDb(). Should this be checked?

UPDATE:
there is also another option continuous_verification_interval which starts another thread doing verification.

@ajkr
Copy link
Contributor Author

ajkr commented Jan 31, 2022

Sorry let me update the description. I only meant to make the VerifyDb() in the setup phase faster as that one frequently takes longer than the whole crash interval (two minutes in our CI's blackbox mode), causing entire db_stress runs to not mutate the DB at all.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ajkr
Copy link
Contributor Author

ajkr commented Jan 31, 2022

Thanks for the review!

ajkr added a commit to ajkr/rocksdb that referenced this pull request Jan 31, 2022
Despite attempts to optimize `db_stress` setup phase (i.e.,
pre-`OperateDb()`) latency in facebook#9470 and facebook#9475, it still always took tens
of seconds. Since we still aren't able to setup a 100M key `db_stress`
efficiently, we should reduce the number of keys. This PR reduces it 4x
while increasing `value_size_mult` 4x (from its default value of 8) so
that memtables and SST files fill up similarly quickly.

Also disabled bzip2 compression since we'll probably never use it and
I noticed many CI runs spending majority of CPU on bzip2 decompression.
facebook-github-bot pushed a commit that referenced this pull request Jan 31, 2022
Summary:
Despite attempts to optimize `db_stress` setup phase (i.e.,
pre-`OperateDb()`) latency in #9470 and #9475, it still always took tens
of seconds. Since we still aren't able to setup a 100M key `db_stress`
quickly, we should reduce the number of keys. This PR reduces it 4x
while increasing `value_size_mult` 4x (from its default value of 8) so
that memtables and SST files fill at a similar rate compared to before this PR.

Also disabled bzip2 compression since we'll probably never use it and
I noticed many CI runs spending majority of CPU on bzip2 decompression.

Pull Request resolved: #9476

Reviewed By: siying

Differential Revision: D33898520

Pulled By: ajkr

fbshipit-source-id: 855021784ad9664f2be5bce21f0339a1cf93230d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants