-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/consolidate legacy Bloom implementation details #5784
Conversation
ed10447
to
f22eab9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good refactoring. Separating the core bloom filter part and the logic wrapping around it will make the code more maintainable. I just have one comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
So I accidentally pulled this in already in PR 5780 when I "force-pushed the pdillinger:plain-table-reader-logic branch from 4f6c433 to 8b824fb yesterday" for a rebase and/or formatting amend. Grrr |
f22eab9
to
ba3e5e3
Compare
@pdillinger has updated the pull request. Re-import the pull request |
Summary: Refactoring to consolidate implementation details of legacy Bloom filters. This helps to organize and document some related, obscure code. Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to compile and run unit tests for non-native cache line size. (Fixed a related test failure in db_properties_test.) Test Plan: make check, including Recently added Bloom schema unit tests (in ./plain_table_db_test && ./bloom_test), and including with TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the schema tests with temporary fault injection into new implementations.
ba3e5e3
to
7e88095
Compare
@pdillinger has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@pdillinger merged this pull request in 6862624. |
Summary: Example: using the tool before and after PR facebook#5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 + Dry run ns/op: 22.2301 + Single filter ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Also, tolerate DEFINE_uint32 where not provided by gflags
Summary: Example: using the tool before and after PR facebook#5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 + Dry run ns/op: 22.2301 + Single filter ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Also, tolerate DEFINE_uint32 where not provided by gflags
Summary: Example: using the tool before and after PR facebook#5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 + Dry run ns/op: 22.2301 + Single filter ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Also, tolerate DEFINE_uint32 where not provided by gflags
Summary: Example: using the tool before and after PR #5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): Before: - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 After: + Dry run ns/op: 22.2301 + Single filter run ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Pull Request resolved: #5825 Differential Revision: D17804987 Pulled By: pdillinger fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5
Summary: Refactoring to consolidate implementation details of legacy Bloom filters. This helps to organize and document some related, obscure code. Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to compile and run unit tests for non-native cache line size. (Fixed a related test failure in db_properties_test.) Pull Request resolved: facebook#5784 Test Plan: make check, including Recently added Bloom schema unit tests (in ./plain_table_db_test && ./bloom_test), and including with TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the schema tests with temporary fault injection into new implementations. Some performance testing with modified unit tests suggest a small to moderate improvement in speed. Differential Revision: D17381384 Pulled By: pdillinger fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df
Summary: Example: using the tool before and after PR facebook#5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): Before: - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 After: + Dry run ns/op: 22.2301 + Single filter run ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Pull Request resolved: facebook#5825 Differential Revision: D17804987 Pulled By: pdillinger fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5
Summary: Refactoring to consolidate implementation details of legacy
Bloom filters. This helps to organize and document some related,
obscure code.
Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to
compile and run unit tests for non-native cache line size. (Fixed a
related test failure in db_properties_test.)
Test Plan: make check, including Recently added Bloom schema unit tests
(in ./plain_table_db_test && ./bloom_test), and including with
TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the
schema tests with temporary fault injection into new implementations.
Some performance testing with modified unit tests suggest a small to moderate
improvement in speed.