db_bench should use a good seed when --seed is not set or set to 0 #9740

mdcallag · 2022-03-23T16:45:02Z

Summary:

This is for #9737

I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.

Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).

With this change the actual seed is printed when it was 0 at process start:
Set seed to 1647992570365606 because --seed was 0

Test Plan:

Perf results:

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
readrandom : 6.469 micros/op 154583 ops/sec; 17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
readrandom : 6.565 micros/op 152321 ops/sec; 16.9 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
readrandom : 6.461 micros/op 154777 ops/sec; 17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
readrandom : 6.525 micros/op 153244 ops/sec; 17.0 MB/s (4000000 of 4000000 found)

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: This is for facebook#9737 I have wasted more than a few hours running db_bench benchmarks where --seed was not set and getting better than expected results because cache hit rates are great because multiple invocations of db_bench used the same value for --seed or did not set it, and then all used 0. The result is that all see the same sequence of keys. A good way to avoid this is to set it to the equivalent of gettimeofday() when either --seed is not set or it is set to 0 (the default). With this change the actual seed is printed when it was 0 at process start: Set seed to 1647992570365606 because --seed was 0 Test Plan: Perf results: ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 readrandom : 6.469 micros/op 154583 ops/sec; 17.1 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0 readrandom : 6.565 micros/op 152321 ops/sec; 16.9 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1 readrandom : 6.461 micros/op 154777 ops/sec; 17.1 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2 readrandom : 6.525 micros/op 153244 ops/sec; 17.0 MB/s (4000000 of 4000000 found) Reviewers: Subscribers: Tasks: Tags:

jay-zhuang · 2022-03-24T16:09:24Z

tools/db_bench_tool.cc

@@ -8097,6 +8098,15 @@ int db_bench_tool(int argc, char** argv) {
  FLAGS_use_existing_db |= FLAGS_readonly;
 #endif  // ROCKSDB_LITE

+  if (!FLAGS_seed) {
+    uint64_t now = FLAGS_env->GetSystemClock()->NowMicros();
+    seed_base = static_cast<int64_t>(now);


Is it going to change result of performance result: https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks

It can for benchmarks that have been using it incorrectly. And by incorrect I usually mean:

buffered IO is used

database is larger than memory

db_bench run multiple times

In this case the first runs warm up the OS page cache, and the following tests can benefit from great cache hit rates. This is usually a mistake. Tests that want it can just do --seed=X where X is != 0 but constant for all db_bench runs.

tools/benchmark.sh and tools/regression_test.sh avoid this problem via --seed=$( date +%s )

jay-zhuang

LGTM

facebook-github-bot · 2022-03-25T15:46:31Z

@mdcallag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: These should have been part of the original PRs that changed db_bench, but I forgot to do that. The PRs are: * #9740 * #9733 Pull Request resolved: #9759 Test Plan: No test needed. Reviewed By: jay-zhuang Differential Revision: D35159553 Pulled By: mdcallag fbshipit-source-id: b44d075527309ee0bd4c5a92e5dd94ebf72f363e

mdcallag requested a review from jay-zhuang March 23, 2022 16:45

facebook-github-bot added the CLA Signed label Mar 23, 2022

jay-zhuang reviewed Mar 24, 2022

View reviewed changes

jay-zhuang approved these changes Mar 25, 2022

View reviewed changes

facebook-github-bot closed this in 1a130fa Mar 25, 2022

mdcallag mentioned this pull request Mar 25, 2022

Update HISTORY for db_bench changes #9759

Closed

mdcallag mentioned this pull request Mar 31, 2022

Use a good seed when --seed is not set or set to 0 #9737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db_bench should use a good seed when --seed is not set or set to 0 #9740

db_bench should use a good seed when --seed is not set or set to 0 #9740

mdcallag commented Mar 23, 2022

jay-zhuang Mar 24, 2022

mdcallag Mar 24, 2022

jay-zhuang left a comment

facebook-github-bot commented Mar 25, 2022

db_bench should use a good seed when --seed is not set or set to 0 #9740

db_bench should use a good seed when --seed is not set or set to 0 #9740

Conversation

mdcallag commented Mar 23, 2022

jay-zhuang Mar 24, 2022

Choose a reason for hiding this comment

mdcallag Mar 24, 2022

Choose a reason for hiding this comment

jay-zhuang left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Mar 25, 2022