Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a good seed when --seed is not set or set to 0 #9737

Closed
mdcallag opened this issue Mar 22, 2022 · 3 comments
Closed

Use a good seed when --seed is not set or set to 0 #9737

mdcallag opened this issue Mar 22, 2022 · 3 comments
Assignees

Comments

@mdcallag
Copy link
Contributor

I have wasted more than a few hours running db_bench benchmarks where --seed was not set and getting better than expected results because cache hit rates are great because multiple invocations of db_bench used the same value for --seed or did not set it, and then all used 0. The result is that all see the same sequence of keys.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either --seed is not set or it is set to 0 (the default).

@mdcallag mdcallag self-assigned this Mar 22, 2022
mdcallag added a commit to mdcallag/rocksdb-1 that referenced this issue Mar 23, 2022
Summary:

This is for facebook#9737

I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).

With this change the actual seed is printed when it was 0 at process start:
  Set seed to 1647992570365606 because --seed was 0

Test Plan:

Perf results:

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
  readrandom   :       6.469 micros/op 154583 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
  readrandom   :       6.565 micros/op 152321 ops/sec;   16.9 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
  readrandom   :       6.461 micros/op 154777 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
  readrandom   :       6.525 micros/op 153244 ops/sec;   17.0 MB/s (4000000 of 4000000 found)

Reviewers:

Subscribers:

Tasks:

Tags:
facebook-github-bot pushed a commit that referenced this issue Mar 25, 2022
…9740)

Summary:
This is for #9737

I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.

Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).

With this change the actual seed is printed when it was 0 at process start:
  Set seed to 1647992570365606 because --seed was 0

Pull Request resolved: #9740

Test Plan:
Perf results:

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
  readrandom   :       6.469 micros/op 154583 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
  readrandom   :       6.565 micros/op 152321 ops/sec;   16.9 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
  readrandom   :       6.461 micros/op 154777 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
  readrandom   :       6.525 micros/op 153244 ops/sec;   17.0 MB/s (4000000 of 4000000 found)

Reviewed By: jay-zhuang

Differential Revision: D35145361

Pulled By: mdcallag

fbshipit-source-id: 2b35b153ccec46b27d7c9405997523555fc51267
@ajkr
Copy link
Contributor

ajkr commented Mar 30, 2022

Is it fixed so can be closed?

@mdcallag
Copy link
Contributor Author

Fixed by #9740

@mdcallag
Copy link
Contributor Author

Is it fixed so can be closed?

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants