-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats for false positive rate of full filtesr #3681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
table/block_based_table_reader.cc
Outdated
@@ -2062,6 +2062,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key, | |||
if (!FullFilterKeyMayMatch(read_options, filter, key, no_io)) { | |||
RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_USEFUL); | |||
} else { | |||
RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_FULL_POSITIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we log both of BLOOM_FILTER_FULL_POSITIVE and BLOOM_FILTER_FULL_TRUE_POSITIVE only when filter exists, i.e. filter != nullptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! But one could view absence of the filter as a kind of filter that always says the key might exist. Moreover since our production traffic runs with filters anyway I would rather avoid the cost of another if-then-else branch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BLOOM_FILTER_USEFUL counter was useful to me at one time to figure out the bloom filter is not enabled (when it =0). I think we should make the filter counters consistently being 0 if filter non-exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not it still useful? BLOOM_FILTER_USEFUL will be still 0 if there is no filter set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it will look confusing if one don't set bloom filter but still see non-zero filter counters..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. I changed that to update the new stats only if filter exist and only if it full filter.
table/block_based_table_reader.cc
Outdated
@@ -2110,6 +2112,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key, | |||
if (!ParseInternalKey(biter.key(), &parsed_key)) { | |||
s = Status::Corruption(Slice()); | |||
} | |||
found = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here biter can seek to a key after key
. Maybe we need to check found
inside SaveValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveValue returns false if the key is not visible yet, or it needs the next value for merge, ... Independent of the return value of SaveValue, the fact that the key is present means that the bloom filter was correct about the presence of the key. The bloom filter's job is not to tell us if the key is usable as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but biter
can seek to a key after key
when key
does not exist. So at this line you are still not sure if the key exist. If you look at SaveValue
the first thing it does is check if the keys are equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating! how about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i missed this comment. fixed that by counting it only if the key matches the user key.
@maysamyabandeh has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@maysamyabandeh has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
table/get_context.cc
Outdated
assert((state_ != kMerge && parsed_key.type != kTypeMerge) || | ||
merge_context_ != nullptr); | ||
if (ucmp_->Equal(parsed_key.user_key, user_key_)) { | ||
*matched = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(matched != nullptr) ?
@maysamyabandeh has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysamyabandeh is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Adds two stats to allow us measuring the false positive rate of full filters: - The total count of positives: rocksdb.bloom.filter.full.positive - The total count of true positives: rocksdb.bloom.filter.full.true.positive Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches. Closes #3680 Tested by: $ ./db_bench -benchmarks=fillrandom -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10 $ ./db_bench -benchmarks="readwhilewriting" -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log $ grep filter.full /tmp/full.log rocksdb.bloom.filter.full.positive COUNT : 3628593 rocksdb.bloom.filter.full.true.positive COUNT : 3536026 which gives the false positive rate of 2.5% Closes #3681 Differential Revision: D7517570 Pulled By: maysamyabandeh fbshipit-source-id: 630ab1a473afdce404916d297035b6318de4c052
Adds two stats to allow us measuring the false positive rate of full filters:
Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches.
Closes #3680
Tested by:
$ ./db_bench -benchmarks=fillrandom -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10
$ ./db_bench -benchmarks="readwhilewriting" -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log
$ grep filter.full /tmp/full.log
rocksdb.bloom.filter.full.positive COUNT : 3613989
rocksdb.bloom.filter.full.true.positive COUNT : 3521270
rocksdb.bloom.filter.useful COUNT : 7935349
correction: using FPR=FP/(FP+N) it gives 1.15%