Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats for false positive rate of full filtesr #3681

Closed
wants to merge 4 commits into from

Conversation

maysamyabandeh
Copy link
Contributor

@maysamyabandeh maysamyabandeh commented Apr 5, 2018

Adds two stats to allow us measuring the false positive rate of full filters:

  • The total count of positives: rocksdb.bloom.filter.full.positive
  • The total count of true positives: rocksdb.bloom.filter.full.true.positive
    Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches.

Closes #3680

Tested by:
$ ./db_bench -benchmarks=fillrandom -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10
$ ./db_bench -benchmarks="readwhilewriting" -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log
$ grep filter.full /tmp/full.log
rocksdb.bloom.filter.full.positive COUNT : 3613989
rocksdb.bloom.filter.full.true.positive COUNT : 3521270
rocksdb.bloom.filter.useful COUNT : 7935349

correction: using FPR=FP/(FP+N) it gives 1.15%

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -2062,6 +2062,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
if (!FullFilterKeyMayMatch(read_options, filter, key, no_io)) {
RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_USEFUL);
} else {
RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_FULL_POSITIVE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we log both of BLOOM_FILTER_FULL_POSITIVE and BLOOM_FILTER_FULL_TRUE_POSITIVE only when filter exists, i.e. filter != nullptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! But one could view absence of the filter as a kind of filter that always says the key might exist. Moreover since our production traffic runs with filters anyway I would rather avoid the cost of another if-then-else branch here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BLOOM_FILTER_USEFUL counter was useful to me at one time to figure out the bloom filter is not enabled (when it =0). I think we should make the filter counters consistently being 0 if filter non-exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not it still useful? BLOOM_FILTER_USEFUL will be still 0 if there is no filter set.

Copy link
Contributor

@yiwu-arbug yiwu-arbug Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it will look confusing if one don't set bloom filter but still see non-zero filter counters..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. I changed that to update the new stats only if filter exist and only if it full filter.

@@ -2110,6 +2112,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
if (!ParseInternalKey(biter.key(), &parsed_key)) {
s = Status::Corruption(Slice());
}
found = true;
Copy link
Contributor

@yiwu-arbug yiwu-arbug Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here biter can seek to a key after key. Maybe we need to check found inside SaveValue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SaveValue returns false if the key is not visible yet, or it needs the next value for merge, ... Independent of the return value of SaveValue, the fact that the key is present means that the bloom filter was correct about the presence of the key. The bloom filter's job is not to tell us if the key is usable as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but biter can seek to a key after key when key does not exist. So at this line you are still not sure if the key exist. If you look at SaveValue the first thing it does is check if the keys are equal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating! how about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i missed this comment. fixed that by counting it only if the key matches the user key.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

assert((state_ != kMerge && parsed_key.type != kTypeMerge) ||
merge_context_ != nullptr);
if (ucmp_->Equal(parsed_key.user_key, user_key_)) {
*matched = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(matched != nullptr) ?

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maysamyabandeh is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

maysamyabandeh pushed a commit that referenced this pull request Apr 12, 2018
Summary:
Adds two stats to allow us measuring the false positive rate of full filters:
- The total count of positives: rocksdb.bloom.filter.full.positive
- The total count of true positives: rocksdb.bloom.filter.full.true.positive
Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches.

Closes #3680

Tested by:
$ ./db_bench -benchmarks=fillrandom  -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10
$ ./db_bench -benchmarks="readwhilewriting"  -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log
$ grep filter.full /tmp/full.log
rocksdb.bloom.filter.full.positive COUNT : 3628593
rocksdb.bloom.filter.full.true.positive COUNT : 3536026
which gives the false positive rate of 2.5%
Closes #3681

Differential Revision: D7517570

Pulled By: maysamyabandeh

fbshipit-source-id: 630ab1a473afdce404916d297035b6318de4c052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants