Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fillseq is 17% slower from force_consistency_checks (PR 7446) #9354

Closed
mdcallag opened this issue Jan 4, 2022 · 2 comments
Closed

fillseq is 17% slower from force_consistency_checks (PR 7446) #9354

mdcallag opened this issue Jan 4, 2022 · 2 comments
Assignees
Labels
performance Issues related to performance that may or may not be bugs regression

Comments

@mdcallag
Copy link
Contributor

mdcallag commented Jan 4, 2022

PR 7446 with (git hash ddbc5da) enables force_consistency_checks by default. With that change fillseq throughput is ~17% slower on a fast server (many core, fast SSD).

The 17% slowdown (700k/s vs 579k/s) is for the average throughput of a test that inserts 800M KV pairs in 20 minutes. When I look at the insert rate for the last 200 seconds of the test then the slowdown is 25% (598k/s vs 450k/s)

5f33436 is the diff before ddbc5da and 5997f6b is the diff after

User CPU increased by ~10% with this diff (see 3.7 vs 3.4 for u_cpu in the charts below) and this has (elapsed, user, system) times from db_bench measured by /usr/bin/time. The user CPU time increases from 3375.20 to 3657.10.

1143.54 3375.20 164.56. 5f33436
1382.85 3657.10 165.47  ddbc5da
1383.60 3663.79 164.34. 5997f6b

Benchmark results for the diff in question and the diffs immediately before and after it

ops_sec - operations per second
mb_sec - ops_sec * size-of-operation-in-MB
db_size - database size
c_wgb - GB written by compaction
w_amp - Write-amplification as (bytes written by compaction / bytes written by memtable flush)
c_mbps - Average write rate for compaction
c_secs - Wall clock seconds doing compaction
usec_op - Microseconds per operation
p50, p99, p99.9, p99.99 - 50th, 99th, 99.9th, 99.99th percentile response time in usecs
pmax - max response time in usecs
uptime - RocksDB uptime in seconds
stall% - Percentage of time writes are stalled
Nstall - Number of stalls
u_cpu - #seconds/1000 of user CPU
s_cpu - #seconds/1000 of system CPU
test - Name of test
date - Date/time of test
version - RocksDB version
job_id - User-provided job ID
githash - git hash for the version tested

ops_sec	mb_sec	db_size	c_wgb	w_amp	c_mbps	c_secs	usec_op	p50	p99	p99.9	p99.99	pmax	uptime	stall%	Nstall	u_cpu	s_cpu	test	date	version	job_id.  githash
700223  280.5   175GB   175.4   1.0     157.51  1913.4  1.4     0.5     3       8       1211    17368   1140    1.2     279     3.4     0.2     fillseq.wal_disabled.v400       2022-01-3T18:55:30      6.13    5f33436
579102  232.0   176GB   175.6   1.0     130.25  1931.8  1.7     0.5     3       8       1973    43971   1380    0.8     224     3.7     0.2     fillseq.wal_disabled.v400       2022-01-3T19:14:54      6.13    ddbc5da
578665  231.8   175GB   175.4   1.0     130.17  1929.4  1.7     0.5     3       8       1966    49903   1380    0.7     181     3.7     0.2     fillseq.wal_disabled.v400       2022-01-3T19:38:18      6.13    5997f6b

The reproduction is:
/usr/bin/time -f '%e %U %S' -o bm.lc.nt16.cm1.d0/1146.ddbc5da/benchmark_fillseq.wal_disabled.v400.log.time numactl --interleave=all ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/m/rx --wal_dir=/data/m/rx --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --seed=1641266093

@mdcallag mdcallag added bug Confirmed RocksDB bugs regression labels Jan 4, 2022
@pdillinger pdillinger removed the bug Confirmed RocksDB bugs label Jan 5, 2022
@pdillinger
Copy link
Contributor

Here's a quicker reproducer:

./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -disable_wal=1 -write_buffer_size=1000000

force_consistency_checks=true (new default) -> 725360 ops/s
force_consistency_checks=false (old default) -> 783072 ops/s

I can see that around the end of the above run, CheckConsistency is using about 7% of db_bench CPU. In Meta production, CPU for CheckConsistency is around 0.001% of RocksDB CPU cycles. That negligible production impact indicated that the cost for benefit (catching kinds of logical errors seen before, before they corrupt data again) was well worthwhile.

I intend to soften the language "almost no cost in extra CPU" for the option, but I don't think this is a high priority issue without something more compelling than an artificial benchmark. If I understand things correctly, an argument for fillseq artificiality is that the recommended efficient way to populate large amounts of data is with external file generation and ingestion.

Nor do I consider this a bug, because we made an intentional trade-off choice in default behavior. In this case, the old behavior is still available (though not in #9351).

Yes it would be nice to fix the scalability problems of force_consistency_checks, and to have a nice, simple story around benchmark performance. I don't believe either of those are top priorities. Clever work-arounds or mitigations would be nice, but the ones I've thought of would notably degrade the completeness of the check. (E.g. skip check if only adding files as in flush, or skip check if already run in last n seconds.)

pdillinger added a commit to pdillinger/rocksdb that referenced this issue Jan 6, 2022
Summary: In response to facebook#9354, this PR adds a way for users to "opt out"
of extra checks that can impact peak write performance, which
currently only includes force_consistency_checks. I considered including
some other options but did not see a db_bench performance difference.

Also clarify in comment for force_consistency_checks that it can "slow
down saturated writing."

Test Plan: basic coverage in unit tests

Using my perf test in facebook#9354 comment, I see

force_consistency_checks=true -> 725360 ops/s
force_consistency_checks=false -> 783072 ops/s
@mdcallag mdcallag added the performance Issues related to performance that may or may not be bugs label Jan 10, 2022
facebook-github-bot pushed a commit that referenced this issue Jan 19, 2022
)

Summary:
In response to #9354, this PR adds a way for users to "opt out"
of extra checks that can impact peak write performance, which
currently only includes force_consistency_checks. I considered including
some other options but did not see a db_bench performance difference.

Also clarify in comment for force_consistency_checks that it can "slow
down saturated writing."

Pull Request resolved: #9363

Test Plan:
basic coverage in unit tests

Using my perf test in #9354 comment, I see

force_consistency_checks=true -> 725360 ops/s
force_consistency_checks=false -> 783072 ops/s

Reviewed By: mrambacher

Differential Revision: D33636559

Pulled By: pdillinger

fbshipit-source-id: 25bfd006f4844675e7669b342817dd4c6a641e84
@mdcallag
Copy link
Contributor Author

mdcallag commented Feb 1, 2022

Thank you. If it isn't there I will eventually add an option to db_bench so this can optionally be disabled and make it easier for me to spot regressions across versions.

Closing because I don't think there is more for you to do.

@mdcallag mdcallag closed this as completed Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues related to performance that may or may not be bugs regression
Projects
None yet
Development

No branches or pull requests

2 participants