Skip to content

Commit

Permalink
#42: Range in form tbl.key > const doesn't work in reverse column family
Browse files Browse the repository at this point in the history
Summary:
Fix issue #42. Also issue #43: Ranges on t.key > const are not handled
efficiently:

Make ha_rocksdb::index_read_map() do the right thing when invoked with
find_flag=HA_READ_KEY_OR_NEXT, both for forward and reverse column
families.

Detailed explanation of actions is provided in rocksdb-range-access.txt

Test Plan: mtr t/rocksdb_range.test, used gcov to check the new code is covered

Reviewers: maykov, hermanlee4, jonahcohen, jtolmer, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D35103
  • Loading branch information
spetrunia authored and jtolmer committed Jan 5, 2016
1 parent 1376d9b commit 81b1e07
Show file tree
Hide file tree
Showing 4 changed files with 190 additions and 60 deletions.
52 changes: 52 additions & 0 deletions mysql-test/r/rocksdb_range.result
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,56 @@ select * from t3 where pk>=1000000;
pk a b
select * from t2 where pk>=1000000;
pk a b
#
# #42: Range in form tbl.key > const doesn't work in reverse column family
#
explain
select count(*) from t2 force index (a) where a>0;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 range a a 4 NULL 10 Using where; Using index
select count(*) from t2 force index (a) where a>0;
count(*)
990
explain
select count(*) from t2 force index (a) where a>99;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 range a a 4 NULL 10 Using where; Using index
select count(*) from t2 force index (a) where a>99;
count(*)
0
select * from t2 where pk>1000000;
pk a b
select * from t3 where pk>1000000;
pk a b
explain
select count(*) from t2 force index (a) where a=2 and pk>25;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 range a a 8 NULL 10 Using where; Using index
select count(*) from t2 force index (a) where a=2 and pk>25;
count(*)
4
select * from t2 force index (a) where a>-10 and a < 1;
pk a b
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 0 6
7 0 7
8 0 8
9 0 9
select * from t3 force index (a) where a>-10 and a < 1;
pk a b
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 0 6
7 0 7
8 0 8
9 0 9
drop table t0,t1,t2,t3;
22 changes: 22 additions & 0 deletions mysql-test/t/rocksdb_range.test
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,27 @@ insert into t3 select * from t2;
select * from t3 where pk>=1000000;
select * from t2 where pk>=1000000;

--echo #
--echo # #42: Range in form tbl.key > const doesn't work in reverse column family
--echo #
explain
select count(*) from t2 force index (a) where a>0;
select count(*) from t2 force index (a) where a>0;

explain
select count(*) from t2 force index (a) where a>99;
select count(*) from t2 force index (a) where a>99;

select * from t2 where pk>1000000;
select * from t3 where pk>1000000;

explain
select count(*) from t2 force index (a) where a=2 and pk>25;
select count(*) from t2 force index (a) where a=2 and pk>25;


select * from t2 force index (a) where a>-10 and a < 1;
select * from t3 force index (a) where a>-10 and a < 1;

drop table t0,t1,t2,t3;

101 changes: 41 additions & 60 deletions storage/rocksdb/ha_rocksdb.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2696,7 +2696,8 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
keypart_map);

if (find_flag == HA_READ_PREFIX_LAST_OR_PREV ||
find_flag == HA_READ_PREFIX_LAST)
find_flag == HA_READ_PREFIX_LAST ||
find_flag == HA_READ_AFTER_KEY)
{
/* See below */
kd->successor(sec_key_packed_tuple, packed_size);
Expand All @@ -2715,7 +2716,7 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
greater than the lookup tuple.
*/
setup_index_scan(kd, &slice, use_all_keys, is_ascending(kd, find_flag));
bool move_forward= true;
//bool move_forward= true;


switch (find_flag) {
Expand Down Expand Up @@ -2753,11 +2754,45 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
}
break;
}
case HA_READ_BEFORE_KEY:
{
//move_forward= false;
/* We want to read the record that's right *before* the given key. */
if (!scan_it->Valid())
{
/*
All the values in the database are smaller than our key. Two cases
- our index is the last in db. Its last value is a match
- our index has no records (in that case we will get a record from
our index and detect it below)
*/
scan_it->SeekToLast();
}
else
{
/*
RocksDB iterator is positioned at "the first key in the source that
at or past target".
We need to step one key back, so that we're at the last key that is
before the target.
If the passed key is greater than the max. value that is found in the
table, then iterator is pointing at the *first* record in subsequent
table/index.
*/
scan_it->Prev();
}
/* fall through */
}
case HA_READ_AFTER_KEY:
case HA_READ_KEY_OR_NEXT:
{
/*
We are looking for the first record such that
index_tuple >= lookup_tuple.
index_tuple $GT lookup_tuple
with HA_READ_AFTER_KEY, $GT = '>',
with HA_READ_KEY_OR_NEXT, $GT = '>='
*/
if (key_descr[active_index]->is_reverse_cf)
{
Expand All @@ -2768,8 +2803,9 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
else
{
/*
When using a full extended key tuple: we should not step back if
we've found an exact match.
We should step back
- when not using full extended key
- when using full extended key and when we've got an exact match
*/
rkey= scan_it->key();
if (!using_full_key || !kd->value_matches_prefix(rkey, slice))
Expand All @@ -2794,61 +2830,6 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
}
break;
}
case HA_READ_BEFORE_KEY:
{
move_forward= false;
/* We want to read the record that's right *before* the given key. */
if (!scan_it->Valid())
{
/*
All the values in the database are smaller than our key. Two cases
- our index is the last in db. Its last value is a match
- our index has no records (in that case we will get a record from
our index and detect it below)
*/
scan_it->SeekToLast();
}
else
{
/*
RocksDB iterator is positioned at "the first key in the source that
at or past target".
We need to step one key back, so that we're at the last key that is
before the target.
If the passed key is greater than the max. value that is found in the
table, then iterator is pointing at the *first* record in subsequent
table/index.
*/
scan_it->Prev();
}
/* fall through */
}
case HA_READ_AFTER_KEY:
{
bool in_key;
bool have_row;
/*
Walk forward until we've found a record that is not equal to the lookup
tuple, but still belongs to this index.
*/
while ((have_row= scan_it->Valid()))
{
rkey= scan_it->key();
if (!(in_key= kd->covers_key(rkey.data(), rkey.size())) ||
kd->cmp_full_keys(rkey.data(), rkey.size(),
slice.data(), slice.size(),
n_used_parts))
break;

if (move_forward)
scan_it->Next();
else
scan_it->Prev();
}
if (!have_row || !in_key)
rc= HA_ERR_END_OF_FILE;
break;
}
case HA_READ_KEY_OR_PREV:
{
if (!scan_it->Valid())
Expand Down
75 changes: 75 additions & 0 deletions storage/rocksdb/rocksdb-range-access.txt
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,78 @@ RocksDB calls:

if (it->Valid() && kd->covers_key(..))
return record.

== HA_READ_AFTER_KEY, forward CF ==

This is finding min(key) such that key > lookup_key.

Suppose lookup_key = kv-bbb

( kv )-aaa-pk
# ( kv )-bbb
( kv )-bbb-pk1 <--- Seek("kv-bbb") will put us here. We need to
( kv )-bbb-pk2 get to the value that is next after 'bbb'.
( kv )-bbb-pk3
( kv )-bbb-pk4
( kv )-bbb-pk5
( kv )-ccc-pkN <-- That is, we need to be here.

However, we don't know that the next value is kv-ccc. Instead, we seek to the
first value that strictly greater than 'kv-bbb'. It is Successor(kv-bbb).

It doesn't matter if we're using a full extended key or not.

RocksDB calls:

Seek(Successor(kv-bbb));
if (it->Valid() && kd->covers_key(it.key()))
return record;

Note that the code is the same as with HA_READ_KEY_OR_NEXT, except that
we seek to Successor($lookup_key) instead of $lookup_key itself.

== HA_READ_AFTER_KEY, backward CF ==

Suppose, the lookup key is 'kv-bbb':

(kv+1)-xxx-pk
( kv )-ccc-pk7
( kv )-ccc-pk6 <-- We need to be here.
# Successor(kv-bbb) <-- We get here when we call Seek(Successor(kv-bbb))
( kv )-bbb-pk5 and we will need to call Prev() (*)
( kv )-bbb-pk4
( kv )-bbb-pk3
( kv )-bbb-pk2
( kv )-bbb-pk1
# ( kv )-bbb <-- We would get here if we called Seek(kv-bbb).
( kv )-aaa-pk

(*) - unless Successor(kv-bbb)=(kv-ccc), and Seek(kv-ccc) hits the row. In
that case, we won't need to call Prev().

RocksDB calls:

Seek(Successor(kv-bbb));
if (!it->Valid())
{
/*
We may get EOF if rows with 'kv-bbb' (below the Successor... line in the
diagram) do not exist. This doesn't mean that rows with values kv-ccc
do not exist.
*/
it->SeekToLast();
}
else
{
if (!using_full_key ||
!kd->value_matches_prefix(...))
{
it->Prev();
}
}

if (it->Valid() && kd->covers_key(...))
return record.

Note that the code is the same as with HA_READ_KEY_OR_NEXT, except that
we seek to Successor($lookup_key) instead of $lookup_key itself.

0 comments on commit 81b1e07

Please sign in to comment.