fillseq is 17% slower from force_consistency_checks (PR 7446) #9354

mdcallag · 2022-01-04T16:33:04Z

PR 7446 with (git hash ddbc5da) enables force_consistency_checks by default. With that change fillseq throughput is ~17% slower on a fast server (many core, fast SSD).

The 17% slowdown (700k/s vs 579k/s) is for the average throughput of a test that inserts 800M KV pairs in 20 minutes. When I look at the insert rate for the last 200 seconds of the test then the slowdown is 25% (598k/s vs 450k/s)

5f33436 is the diff before ddbc5da and 5997f6b is the diff after

User CPU increased by ~10% with this diff (see 3.7 vs 3.4 for u_cpu in the charts below) and this has (elapsed, user, system) times from db_bench measured by /usr/bin/time. The user CPU time increases from 3375.20 to 3657.10.

1143.54 3375.20 164.56. 5f33436
1382.85 3657.10 165.47  ddbc5da
1383.60 3663.79 164.34. 5997f6b

Benchmark results for the diff in question and the diffs immediately before and after it

ops_sec - operations per second
mb_sec - ops_sec * size-of-operation-in-MB
db_size - database size
c_wgb - GB written by compaction
w_amp - Write-amplification as (bytes written by compaction / bytes written by memtable flush)
c_mbps - Average write rate for compaction
c_secs - Wall clock seconds doing compaction
usec_op - Microseconds per operation
p50, p99, p99.9, p99.99 - 50th, 99th, 99.9th, 99.99th percentile response time in usecs
pmax - max response time in usecs
uptime - RocksDB uptime in seconds
stall% - Percentage of time writes are stalled
Nstall - Number of stalls
u_cpu - #seconds/1000 of user CPU
s_cpu - #seconds/1000 of system CPU
test - Name of test
date - Date/time of test
version - RocksDB version
job_id - User-provided job ID
githash - git hash for the version tested

ops_sec	mb_sec	db_size	c_wgb	w_amp	c_mbps	c_secs	usec_op	p50	p99	p99.9	p99.99	pmax	uptime	stall%	Nstall	u_cpu	s_cpu	test	date	version	job_id.  githash
700223  280.5   175GB   175.4   1.0     157.51  1913.4  1.4     0.5     3       8       1211    17368   1140    1.2     279     3.4     0.2     fillseq.wal_disabled.v400       2022-01-3T18:55:30      6.13    5f33436
579102  232.0   176GB   175.6   1.0     130.25  1931.8  1.7     0.5     3       8       1973    43971   1380    0.8     224     3.7     0.2     fillseq.wal_disabled.v400       2022-01-3T19:14:54      6.13    ddbc5da
578665  231.8   175GB   175.4   1.0     130.17  1929.4  1.7     0.5     3       8       1966    49903   1380    0.7     181     3.7     0.2     fillseq.wal_disabled.v400       2022-01-3T19:38:18      6.13    5997f6b

The reproduction is:
/usr/bin/time -f '%e %U %S' -o bm.lc.nt16.cm1.d0/1146.ddbc5da/benchmark_fillseq.wal_disabled.v400.log.time numactl --interleave=all ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/m/rx --wal_dir=/data/m/rx --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --seed=1641266093

The text was updated successfully, but these errors were encountered:

pdillinger · 2022-01-05T21:04:29Z

Here's a quicker reproducer:

./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -disable_wal=1 -write_buffer_size=1000000

force_consistency_checks=true (new default) -> 725360 ops/s
force_consistency_checks=false (old default) -> 783072 ops/s

I can see that around the end of the above run, CheckConsistency is using about 7% of db_bench CPU. In Meta production, CPU for CheckConsistency is around 0.001% of RocksDB CPU cycles. That negligible production impact indicated that the cost for benefit (catching kinds of logical errors seen before, before they corrupt data again) was well worthwhile.

I intend to soften the language "almost no cost in extra CPU" for the option, but I don't think this is a high priority issue without something more compelling than an artificial benchmark. If I understand things correctly, an argument for fillseq artificiality is that the recommended efficient way to populate large amounts of data is with external file generation and ingestion.

Nor do I consider this a bug, because we made an intentional trade-off choice in default behavior. In this case, the old behavior is still available (though not in #9351).

Yes it would be nice to fix the scalability problems of force_consistency_checks, and to have a nice, simple story around benchmark performance. I don't believe either of those are top priorities. Clever work-arounds or mitigations would be nice, but the ones I've thought of would notably degrade the completeness of the check. (E.g. skip check if only adding files as in flush, or skip check if already run in last n seconds.)

Summary: In response to facebook#9354, this PR adds a way for users to "opt out" of extra checks that can impact peak write performance, which currently only includes force_consistency_checks. I considered including some other options but did not see a db_bench performance difference. Also clarify in comment for force_consistency_checks that it can "slow down saturated writing." Test Plan: basic coverage in unit tests Using my perf test in facebook#9354 comment, I see force_consistency_checks=true -> 725360 ops/s force_consistency_checks=false -> 783072 ops/s

) Summary: In response to #9354, this PR adds a way for users to "opt out" of extra checks that can impact peak write performance, which currently only includes force_consistency_checks. I considered including some other options but did not see a db_bench performance difference. Also clarify in comment for force_consistency_checks that it can "slow down saturated writing." Pull Request resolved: #9363 Test Plan: basic coverage in unit tests Using my perf test in #9354 comment, I see force_consistency_checks=true -> 725360 ops/s force_consistency_checks=false -> 783072 ops/s Reviewed By: mrambacher Differential Revision: D33636559 Pulled By: pdillinger fbshipit-source-id: 25bfd006f4844675e7669b342817dd4c6a641e84

mdcallag · 2022-02-01T02:40:48Z

Thank you. If it isn't there I will eventually add an option to db_bench so this can optionally be disabled and make it easier for me to spot regressions across versions.

Closing because I don't think there is more for you to do.

mdcallag assigned pdillinger Jan 4, 2022

mdcallag added bug Confirmed RocksDB bugs regression labels Jan 4, 2022

pdillinger removed the bug Confirmed RocksDB bugs label Jan 5, 2022

ltamasi mentioned this issue Jan 5, 2022

fillseq throughput is 13% slower from PR 6862 #9351

Closed

pdillinger mentioned this issue Jan 6, 2022

Add Options::DisableExtraChecks, clarify force_consistency_checks #9363

Closed

mdcallag added the performance Issues related to performance that may or may not be bugs label Jan 10, 2022

mdcallag closed this as completed Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fillseq is 17% slower from force_consistency_checks (PR 7446) #9354

fillseq is 17% slower from force_consistency_checks (PR 7446) #9354

mdcallag commented Jan 4, 2022

pdillinger commented Jan 5, 2022

mdcallag commented Feb 1, 2022 •

edited

fillseq is 17% slower from force_consistency_checks (PR 7446) #9354

fillseq is 17% slower from force_consistency_checks (PR 7446) #9354

Comments

mdcallag commented Jan 4, 2022

pdillinger commented Jan 5, 2022

mdcallag commented Feb 1, 2022 • edited

mdcallag commented Feb 1, 2022 •

edited