Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db_bench should use a good seed when --seed is not set or set to 0 #9740

Closed
wants to merge 1 commit into from

Conversation

mdcallag
Copy link
Contributor

Summary:

This is for #9737

I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.

Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).

With this change the actual seed is printed when it was 0 at process start:
Set seed to 1647992570365606 because --seed was 0

Test Plan:

Perf results:

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
readrandom : 6.469 micros/op 154583 ops/sec; 17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
readrandom : 6.565 micros/op 152321 ops/sec; 16.9 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
readrandom : 6.461 micros/op 154777 ops/sec; 17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
readrandom : 6.525 micros/op 153244 ops/sec; 17.0 MB/s (4000000 of 4000000 found)

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

This is for facebook#9737

I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).

With this change the actual seed is printed when it was 0 at process start:
  Set seed to 1647992570365606 because --seed was 0

Test Plan:

Perf results:

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
  readrandom   :       6.469 micros/op 154583 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
  readrandom   :       6.565 micros/op 152321 ops/sec;   16.9 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
  readrandom   :       6.461 micros/op 154777 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
  readrandom   :       6.525 micros/op 153244 ops/sec;   17.0 MB/s (4000000 of 4000000 found)

Reviewers:

Subscribers:

Tasks:

Tags:
@@ -8097,6 +8098,15 @@ int db_bench_tool(int argc, char** argv) {
FLAGS_use_existing_db |= FLAGS_readonly;
#endif // ROCKSDB_LITE

if (!FLAGS_seed) {
uint64_t now = FLAGS_env->GetSystemClock()->NowMicros();
seed_base = static_cast<int64_t>(now);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to change result of performance result: https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can for benchmarks that have been using it incorrectly. And by incorrect I usually mean:

  • buffered IO is used
  • database is larger than memory
  • db_bench run multiple times

In this case the first runs warm up the OS page cache, and the following tests can benefit from great cache hit rates. This is usually a mistake. Tests that want it can just do --seed=X where X is != 0 but constant for all db_bench runs.

tools/benchmark.sh and tools/regression_test.sh avoid this problem via --seed=$( date +%s )

Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@mdcallag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Mar 28, 2022
Summary:
These should have been part of the original PRs that changed db_bench, but I forgot to do that.
The PRs are:
* #9740
* #9733

Pull Request resolved: #9759

Test Plan: No test needed.

Reviewed By: jay-zhuang

Differential Revision: D35159553

Pulled By: mdcallag

fbshipit-source-id: b44d075527309ee0bd4c5a92e5dd94ebf72f363e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants