-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db_bench should use a good seed when --seed is not set or set to 0 #9740
Conversation
Summary: This is for facebook#9737 I have wasted more than a few hours running db_bench benchmarks where --seed was not set and getting better than expected results because cache hit rates are great because multiple invocations of db_bench used the same value for --seed or did not set it, and then all used 0. The result is that all see the same sequence of keys. A good way to avoid this is to set it to the equivalent of gettimeofday() when either --seed is not set or it is set to 0 (the default). With this change the actual seed is printed when it was 0 at process start: Set seed to 1647992570365606 because --seed was 0 Test Plan: Perf results: ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 readrandom : 6.469 micros/op 154583 ops/sec; 17.1 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0 readrandom : 6.565 micros/op 152321 ops/sec; 16.9 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1 readrandom : 6.461 micros/op 154777 ops/sec; 17.1 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2 readrandom : 6.525 micros/op 153244 ops/sec; 17.0 MB/s (4000000 of 4000000 found) Reviewers: Subscribers: Tasks: Tags:
@@ -8097,6 +8098,15 @@ int db_bench_tool(int argc, char** argv) { | |||
FLAGS_use_existing_db |= FLAGS_readonly; | |||
#endif // ROCKSDB_LITE | |||
|
|||
if (!FLAGS_seed) { | |||
uint64_t now = FLAGS_env->GetSystemClock()->NowMicros(); | |||
seed_base = static_cast<int64_t>(now); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it going to change result of performance result: https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can for benchmarks that have been using it incorrectly. And by incorrect I usually mean:
- buffered IO is used
- database is larger than memory
- db_bench run multiple times
In this case the first runs warm up the OS page cache, and the following tests can benefit from great cache hit rates. This is usually a mistake. Tests that want it can just do --seed=X where X is != 0 but constant for all db_bench runs.
tools/benchmark.sh and tools/regression_test.sh avoid this problem via --seed=$( date +%s )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mdcallag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: These should have been part of the original PRs that changed db_bench, but I forgot to do that. The PRs are: * #9740 * #9733 Pull Request resolved: #9759 Test Plan: No test needed. Reviewed By: jay-zhuang Differential Revision: D35159553 Pulled By: mdcallag fbshipit-source-id: b44d075527309ee0bd4c5a92e5dd94ebf72f363e
Summary:
This is for #9737
I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.
Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.
A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).
With this change the actual seed is printed when it was 0 at process start:
Set seed to 1647992570365606 because --seed was 0
Test Plan:
Perf results:
./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
readrandom : 6.469 micros/op 154583 ops/sec; 17.1 MB/s (4000000 of 4000000 found)
./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
readrandom : 6.565 micros/op 152321 ops/sec; 16.9 MB/s (4000000 of 4000000 found)
./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
readrandom : 6.461 micros/op 154777 ops/sec; 17.1 MB/s (4000000 of 4000000 found)
./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
readrandom : 6.525 micros/op 153244 ops/sec; 17.0 MB/s (4000000 of 4000000 found)
Reviewers:
Subscribers:
Tasks:
Tags: