Import range query improvements #13

Yuval-Ariel · 2022-06-15T13:30:15Z

Why:
Seek is a commonly used operation with rocksdb. Improving it can significantly improve the overall performance of the application.
improve short seek performance with large amount of levels that don’t have much overlap

What :
Improve the creation of range filters to hold less irrelevant items and reducing the seek times

ayulas · 2022-06-26T12:33:18Z

to use the spdb range query builder you need to set use_spdb_query_builder to true

erez-speedb · 2022-07-24T08:40:04Z

No improvements, no degradation.

Short test
https://admin.speedb.io/performance?items=Z11dnyIhlJtJqGjvsfjS&items=eqd36PbjSouloFRQQM2e&colors=%23F06292&colors=%2300796b
Small obj
https://admin.speedb.io/performance?items=v2QiIvfovkv3t6sogZyc&items=efkS8brnN9uNii7vjhTA&colors=%23F06292&colors=%2300796b
Large Obj
https://admin.speedb.io/performance?items=NT67puzyBrSIIBk4MhIp&items=8XHUSwkEKQRY3slFXvWo&colors=%23F06292&colors=%2300796b

isaac-io · 2022-08-02T13:17:43Z

We need a db_bench scenario to showcase the best case where this might show an improvement (e.g. with universal compaction).

isaac-io · 2022-08-16T12:50:25Z

Currently some of the unit tests are failing. We want to show a scenario with improvement first before we commit to fixing them.

Yuval-Ariel · 2022-09-04T07:52:21Z

test plan is as follows:

compare Import range query improvements #13 (df98b70) with v7.2.2 (f2f26b1)
achieve IO-bound workload so that additional seeks are noticed in the ops/sec
do seek combined with writes so that the number of levels (internal iterators) is not constant. (the problem with running seeks without writes is that one run could have 10 internal iterators while the other has 11 throughout the test which would naturally introduce unwanted bias in the results)
use a setup which has many sorted runs and target seek key is sometimes not in range, since the purpose of the optimization is to save unneeded seeks when the target key is not in the range of the internal iterator. e.g. universal compaction and seek to 10X the num in fillup.

fillup cmd:

./db_bench --benchmarks=fillseq --level0_file_num_compaction_trigger=8 --compression_type=None --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_flushes=4 --max_background_compactions=12 --max_write_buffer_number=4 --db=/data/ --num=150000000 --num_levels=40 --key_size=50 --value_size=1000 --block_size=8192 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --memtablerep=skip_list --open_files=-1 --compaction_style=1 --universal_compression_size_percent=80 --universal_min_merge_width=2 --universal_max_merge_width=20 --universal_size_ratio=1 --universal_max_size_amplification_percent=200 --universal_allow_trivial_move=1 --universal_compression_size_percent=-1 --sync=0 --threads=1 --seed=1649982947

1st seek cmd:

./db_bench --benchmarks=seekrandomwhilewriting --compression_type=None –cache_size=10737418240 --delayed_write_rate=536870912 --level0_file_num_compaction_trigger=8 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_flushes=4 --max_background_compactions=1 --max_write_buffer_number=4 --db=/data/ --num=1500000000 --num_levels=40 --key_size=50 --value_size=1000 --block_size=8192 --write_buffer_size=16777216 --target_file_size_base=16777216 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --report_interval_seconds=5 --memtablerep=skip_list --open_files=-1 --compaction_style=1 --universal_compression_size_percent=80 --universal_min_merge_width=2 --universal_max_merge_width=20 --universal_size_ratio=1 --universal_max_size_amplification_percent=200 --universal_allow_trivial_move=1 --universal_compression_size_percent=-1 --use_existing_db=1 --sync=0 --threads=10 --seed=1649982947 -duration=900

results:

1st

13:___ 250.735 micros/op 39880 ops/sec; (375342 of 3604999 found)
base:_ 239.131 micros/op 39492 ops/sec; (389445 of 3699999 found)

2nd

same as 1st seek cmd with modifications --threads=2 -duration=1800
13:___ 222.335 micros/op 8995 ops/sec; (1062929 of 8111999 found)
base:_ 227.083 micros/op 8806 ops/sec; (1030695 of 7948999 found)

3rd

same as 1st seek cmd with modifications --num=4500000000(30X fillup num) -open_files=27000
(open files was set since there was a crash of too many open files (limit is 30000))
13:___ 301.155 micros/op 33202 ops/sec; (168579 of 2998999 found)
base:_ 276.143 micros/op 36209 ops/sec; (178650 of 3246999 found)
8% worse

4rth

repetition of 3rd
13:___ 429.774 micros/op 22918 ops/sec; (121471 of 2080999 found)
base:_ 341.864 micros/op 29246 ops/sec; (153157 of 2634999 found)
21% worse

5th

another rep of 3rd
13:___ 321.832 micros/op 31068 ops/sec; (162358 of 2775999 found)
base:_ 671.295 micros/op 10180 ops/sec; (78182 of 1340999 found)
200% better

6th

another rep but 1hr duration
13:___ 480.577 micros/op 20806 ops/sec; (469218 of 7483999 found)
base:_ 449.047 micros/op 20678 ops/sec; (504404 of 8018999 found)

7th

another rep but 10hr duration with --perf_level=5 --statistics=1
13:___ 1154.879 micros/op 8658 ops/sec; (3330073 of 31190999 found)
base:_ 1299.046 micros/op 7244 ops/sec; (2912547 of 27711999 found)
20% better

perf results:

13 performance.xlsx
the first 3 results (-1 , -2 , -3) are not from the above experiment but from similar ones
the -10hr is the 7th rep from above.
confusing results. the variance in the results is very high in certain metrics and the trend in the 10hr is sometimes in contradiction to the first 3.
we expect seek_child_seek_time and count to be lower in #13 but it doesnt seem to be the case which raises speculation to how efficient the setup was.
one metric which is constant for all 4 tests is db_mutex_lock_nanos which is higher by about 23 % in #13 .

next step

further discussion is required regarding these results

isaac-io · 2022-09-13T13:05:29Z

Should only be relevant for Hybrid compaction.

Yuval-Ariel added the performance label Jun 15, 2022

Yuval-Ariel assigned ayulas Jun 22, 2022

Yuval-Ariel mentioned this issue Jun 23, 2022

#13: merge_iterator: range query improvements #26

Closed

ayulas assigned erez-speedb and Yuval-Ariel Jun 26, 2022

Yuval-Ariel removed their assignment Jun 27, 2022

bosmatt added the enhancement New feature or request label Jun 28, 2022

erez-speedb removed their assignment Jul 3, 2022

isaac-io assigned assaf-speedb and unassigned ayulas Jul 17, 2022

assaf-speedb assigned ayulas and unassigned assaf-speedb Jul 19, 2022

isaac-io mentioned this issue Jul 20, 2022

23 spdb write flow #60

Closed

osnatbm assigned Yuval-Ariel and unassigned ayulas Jul 21, 2022

osnatbm assigned udi-speedb and unassigned udi-speedb Jul 28, 2022

isaac-io closed this as completed Sep 13, 2022

udi-speedb mentioned this issue Nov 10, 2022

db_bench: Fix a RocksDB bug in the cleanup of a Benchmark when using multiple DBs (-num_multi_db) #234

Closed

udi-speedb mentioned this issue Dec 7, 2022

Generate a concise report / csv file conveniently listing the failures in make check #288

Open

erez-speedb mentioned this issue Dec 20, 2022

Improve spdb memtable performance #298

Closed

Yuval-Ariel mentioned this issue Jan 17, 2023

ZSTD stress asan error #367

Open

ayulas mentioned this issue Aug 16, 2023

deadlock in write _thread linkone function during stall #637

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import range query improvements #13

Import range query improvements #13

Yuval-Ariel commented Jun 15, 2022 •

edited by Guyme

Loading

ayulas commented Jun 26, 2022

erez-speedb commented Jul 24, 2022

isaac-io commented Aug 2, 2022

isaac-io commented Aug 16, 2022

Yuval-Ariel commented Sep 4, 2022

isaac-io commented Sep 13, 2022

Import range query improvements #13

Import range query improvements #13

Comments

Yuval-Ariel commented Jun 15, 2022 • edited by Guyme Loading

ayulas commented Jun 26, 2022

erez-speedb commented Jul 24, 2022

isaac-io commented Aug 2, 2022

isaac-io commented Aug 16, 2022

Yuval-Ariel commented Sep 4, 2022

test plan is as follows:

fillup cmd:

1st seek cmd:

results:

1st

2nd

3rd

4rth

5th

6th

7th

perf results:

next step

isaac-io commented Sep 13, 2022

Yuval-Ariel commented Jun 15, 2022 •

edited by Guyme

Loading