Stats for false positive rate of full filtesr #3681

maysamyabandeh · 2018-04-05T16:21:19Z

Adds two stats to allow us measuring the false positive rate of full filters:

The total count of positives: rocksdb.bloom.filter.full.positive
The total count of true positives: rocksdb.bloom.filter.full.true.positive
Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches.

Tested by:
$ ./db_bench -benchmarks=fillrandom -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10
$ ./db_bench -benchmarks="readwhilewriting" -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log
$ grep filter.full /tmp/full.log
rocksdb.bloom.filter.full.positive COUNT : 3613989
rocksdb.bloom.filter.full.true.positive COUNT : 3521270
rocksdb.bloom.filter.useful COUNT : 7935349

correction: using FPR=FP/(FP+N) it gives 1.15%

facebook-github-bot

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

yiwu-arbug · 2018-04-05T18:00:21Z

table/block_based_table_reader.cc

@@ -2062,6 +2062,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
  if (!FullFilterKeyMayMatch(read_options, filter, key, no_io)) {
    RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_USEFUL);
  } else {
+    RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_FULL_POSITIVE);


Shall we log both of BLOOM_FILTER_FULL_POSITIVE and BLOOM_FILTER_FULL_TRUE_POSITIVE only when filter exists, i.e. filter != nullptr?

Good point! But one could view absence of the filter as a kind of filter that always says the key might exist. Moreover since our production traffic runs with filters anyway I would rather avoid the cost of another if-then-else branch here.

The BLOOM_FILTER_USEFUL counter was useful to me at one time to figure out the bloom filter is not enabled (when it =0). I think we should make the filter counters consistently being 0 if filter non-exists.

Is not it still useful? BLOOM_FILTER_USEFUL will be still 0 if there is no filter set.

but it will look confusing if one don't set bloom filter but still see non-zero filter counters..

sure. I changed that to update the new stats only if filter exist and only if it full filter.

yiwu-arbug · 2018-04-05T18:07:28Z

table/block_based_table_reader.cc

@@ -2110,6 +2112,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
          if (!ParseInternalKey(biter.key(), &parsed_key)) {
            s = Status::Corruption(Slice());
          }
+          found = true;


Here biter can seek to a key after key. Maybe we need to check found inside SaveValue.

SaveValue returns false if the key is not visible yet, or it needs the next value for merge, ... Independent of the return value of SaveValue, the fact that the key is present means that the bloom filter was correct about the presence of the key. The bloom filter's job is not to tell us if the key is usable as well.

Yeah, but biter can seek to a key after key when key does not exist. So at this line you are still not sure if the key exist. If you look at SaveValue the first thing it does is check if the keys are equal.

thanks for updating! how about this?

oh i missed this comment. fixed that by counting it only if the key matches the user key.

facebook-github-bot · 2018-04-05T19:21:20Z

@maysamyabandeh has updated the pull request. View: changes, changes since last import

facebook-github-bot

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2018-04-05T21:27:11Z

@maysamyabandeh has updated the pull request. View: changes, changes since last import

yiwu-arbug

LGTM!

yiwu-arbug · 2018-04-05T21:29:00Z

table/get_context.cc

  assert((state_ != kMerge && parsed_key.type != kTypeMerge) ||
         merge_context_ != nullptr);
  if (ucmp_->Equal(parsed_key.user_key, user_key_)) {
+    *matched = true;


assert(matched != nullptr) ?

facebook-github-bot · 2018-04-05T21:35:09Z

@maysamyabandeh has updated the pull request. View: changes, changes since last import

facebook-github-bot

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@maysamyabandeh is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Adds two stats to allow us measuring the false positive rate of full filters: - The total count of positives: rocksdb.bloom.filter.full.positive - The total count of true positives: rocksdb.bloom.filter.full.true.positive Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches. Closes #3680 Tested by: $ ./db_bench -benchmarks=fillrandom -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10 $ ./db_bench -benchmarks="readwhilewriting" -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log $ grep filter.full /tmp/full.log rocksdb.bloom.filter.full.positive COUNT : 3628593 rocksdb.bloom.filter.full.true.positive COUNT : 3536026 which gives the false positive rate of 2.5% Closes #3681 Differential Revision: D7517570 Pulled By: maysamyabandeh fbshipit-source-id: 630ab1a473afdce404916d297035b6318de4c052

full filter false positive rate

08a57e2

maysamyabandeh requested review from yiwu-arbug and ajkr April 5, 2018 16:21

facebook-github-bot added the CLA Signed label Apr 5, 2018

facebook-github-bot reviewed Apr 5, 2018

View reviewed changes

yiwu-arbug reviewed Apr 5, 2018

View reviewed changes

apply comments

cacb7c8

facebook-github-bot reviewed Apr 5, 2018

View reviewed changes

count true positive only when the key really matches

cbef3f3

yiwu-arbug approved these changes Apr 5, 2018

View reviewed changes

assert matched

3751b93

facebook-github-bot reviewed Apr 5, 2018

View reviewed changes

facebook-github-bot closed this in 6718267 Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stats for false positive rate of full filtesr #3681

Stats for false positive rate of full filtesr #3681

maysamyabandeh commented Apr 5, 2018 •

edited

Loading

facebook-github-bot left a comment

yiwu-arbug Apr 5, 2018

maysamyabandeh Apr 5, 2018

yiwu-arbug Apr 5, 2018

maysamyabandeh Apr 5, 2018

yiwu-arbug Apr 5, 2018 •

edited

Loading

maysamyabandeh Apr 5, 2018

yiwu-arbug Apr 5, 2018 •

edited

Loading

maysamyabandeh Apr 5, 2018

yiwu-arbug Apr 5, 2018

yiwu-arbug Apr 5, 2018

maysamyabandeh Apr 5, 2018

facebook-github-bot commented Apr 5, 2018

facebook-github-bot left a comment

facebook-github-bot commented Apr 5, 2018

yiwu-arbug left a comment

yiwu-arbug Apr 5, 2018

facebook-github-bot commented Apr 5, 2018

facebook-github-bot left a comment

facebook-github-bot left a comment

Stats for false positive rate of full filtesr #3681

Stats for false positive rate of full filtesr #3681

Conversation

maysamyabandeh commented Apr 5, 2018 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiwu-arbug Apr 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiwu-arbug Apr 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 5, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 5, 2018

yiwu-arbug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 5, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

maysamyabandeh commented Apr 5, 2018 •

edited

Loading

yiwu-arbug Apr 5, 2018 •

edited

Loading

yiwu-arbug Apr 5, 2018 •

edited

Loading