Improvement for upper/lower bound #5059

zhangjinpeng87 · 2019-03-11T02:29:51Z

Current behavior when scan a range with lower & upper bound is set:
Fetch a key from memtable/sst/block, compare the key with lower/upper bound, return invalid if the key exceed bound. So there is one comparing for each key.

Improvement suggestion:
When a sst is totally covered by lower & upper bound, all keys returned from this sst don't need to compare with bound.
When a block is totally covered by lower & upper bound, all keys returned from this block don't need to compare with bound.

This can improve scan speed.

@yiwu-arbug

anand1976 · 2019-03-12T17:50:55Z

This is a good suggestion. Thanks.
Are you interested in submitting a PR for it?

yiwu-arbug · 2019-03-12T18:25:02Z

I plan to try the idea this month. Will update.

ajkr · 2019-03-12T20:58:55Z

@yiwu-arbug Another thing I noticed is we're actually doing two comparisons for each key that comes from an SST: here (

rocksdb/table/block_based_table_reader.cc

Lines 2505 to 2511 in 8a1ecd1

    
           // Check upper bound on the current key 
        
           bool reached_upper_bound = 
        
               (read_options_.iterate_upper_bound != nullptr && 
        
                block_iter_points_to_real_block_ && block_iter_.Valid() && 
        
                icomp_.user_comparator()->Compare(ExtractUserKey(block_iter_.key()), 
        
                                                  *read_options_.iterate_upper_bound) >= 
        
                    0);

) and then again here (

rocksdb/db/db_iter.cc

Lines 454 to 457 in ca89ac2

    
           if (iterate_upper_bound_ != nullptr && 
        
               user_comparator_->Compare(ikey_.user_key, *iterate_upper_bound_) >= 0) { 
        
             break; 
        
           }

). I feel the former one is unnecessary.

yiwu-arbug · 2019-03-21T18:36:22Z

I did a quick prototype and number of key comparison is reduced by 1/3 for long range scan, when I apply both of (a) avoid recheck upper-bound when it is greater than block's largest key, and (b) avoid checking upper-bound at the point @ajkr points out. I also find that rocksdb is not updating perf_context.user_key_comparison_count correctly for range scan. I'll work on a PR and share more concrete benchmark result soon.

yiwu-arbug · 2019-03-22T22:08:09Z

Patch to fix perf_context.user_key_comparison_count: #5098

yiwu-arbug · 2019-03-23T22:04:13Z

Fixing duplicated per-key upper bound check: #5101

siying · 2019-03-26T20:37:38Z

Whether this generates net saving depends on how likely the one in table reader can actually invalidate the iterator. For example, in your primary key point-lookup operation, it's more likely that the upper bound check will filter out most SST files from putting into the iterator heap. Any iterator put into the iterator heap will generate more than one comparisons. I'm sure that in the case you mentioned, the long scan, you'll see smaller number of comparisons. But how about put it together and run your whole system? Also, number of comparisons is not right counter to measure gain. CPU is a better one. When you have this optimization applied to your whole system, will you see CPU saving?

yiwu-arbug · 2019-03-26T22:30:49Z

The actual improvement and db_bench result: #5111

yiwu-arbug · 2019-03-26T22:35:37Z

@siying Appreciate your comment! For background, the optimization targets for queries like "select count(*)" for relatively small tables in our system (so that the table can still be cached in memory, but the scan is long enough). Sorry I didn't provide enough test result, as it take me a while to get them. But they are in #5111. db_bench do show CPU savings for particular workload. Sure for whole system the net saving would be small, but it is not insignificant, for the the change is relatively simple.

siying · 2019-03-27T19:59:24Z

How about a compromise: we check the upper bound in block based table after Seek(), and not after Next()?

yiwu-arbug · 2019-03-27T22:17:20Z

Do you mind explain a little bit more? DBIter need to check upper bound on Next() to make sure key is within bound. The proposed optimization is to use block boundaries key to reduce this per-key key comparison. It is mostly useful for relatively longer range scan, where the iterator points to a place out of block initially seek to.

siying · 2019-03-27T22:19:39Z

@yiwu-arbug I mean we do per key check inside block based table only in Seek(), and remove it for Next(). In this way relatively longer range scan will be fast, and the cases where most levels can be filtered out by per key filtering can also be fast.

yiwu-arbug · 2019-03-27T23:04:23Z

The proposed change doesn't touch the Seek() flow at all. Its capability to filter out levels remain the same.

yiwu-arbug · 2019-03-27T23:09:24Z

@siying I get what you mean. I'll update #5101 to address your comment.

yiwu-arbug · 2019-04-02T22:06:34Z

Blocking on fixing issue with format_version>=3 (see #5101 (comment)). Fix of the issue is blocking by #5139, which by itself is a major issue.

edited: #5139 is non-issue.

yiwu-arbug · 2019-04-02T23:33:01Z

Retrying PR5101: #5142

yiwu-arbug mentioned this issue Mar 14, 2019

Improvement for upper/lower bound tikv/rust-rocksdb#279

Closed

yiwu-arbug closed this as completed Oct 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement for upper/lower bound #5059

Improvement for upper/lower bound #5059

zhangjinpeng87 commented Mar 11, 2019 •

edited

Loading

anand1976 commented Mar 12, 2019

yiwu-arbug commented Mar 12, 2019

ajkr commented Mar 12, 2019

yiwu-arbug commented Mar 21, 2019

yiwu-arbug commented Mar 22, 2019

yiwu-arbug commented Mar 23, 2019

siying commented Mar 26, 2019

yiwu-arbug commented Mar 26, 2019

yiwu-arbug commented Mar 26, 2019

siying commented Mar 27, 2019

yiwu-arbug commented Mar 27, 2019

siying commented Mar 27, 2019

yiwu-arbug commented Mar 27, 2019

yiwu-arbug commented Mar 27, 2019

yiwu-arbug commented Apr 2, 2019 •

edited

Loading

yiwu-arbug commented Apr 2, 2019

Improvement for upper/lower bound #5059

Improvement for upper/lower bound #5059

Comments

zhangjinpeng87 commented Mar 11, 2019 • edited Loading

anand1976 commented Mar 12, 2019

yiwu-arbug commented Mar 12, 2019

ajkr commented Mar 12, 2019

yiwu-arbug commented Mar 21, 2019

yiwu-arbug commented Mar 22, 2019

yiwu-arbug commented Mar 23, 2019

siying commented Mar 26, 2019

yiwu-arbug commented Mar 26, 2019

yiwu-arbug commented Mar 26, 2019

siying commented Mar 27, 2019

yiwu-arbug commented Mar 27, 2019

siying commented Mar 27, 2019

yiwu-arbug commented Mar 27, 2019

yiwu-arbug commented Mar 27, 2019

yiwu-arbug commented Apr 2, 2019 • edited Loading

yiwu-arbug commented Apr 2, 2019

zhangjinpeng87 commented Mar 11, 2019 •

edited

Loading

yiwu-arbug commented Apr 2, 2019 •

edited

Loading