-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine Ribbon configuration, improve testing, add Homogeneous #7879
Conversation
Summary: TODO Test Plan: unit test updates included
// // When true, enables a special "homogeneous" filter implementation that | ||
// // is slightly faster to construct, and never fails to construct though | ||
// // FP rate can quickly explode in cases where corresponding | ||
// // non-homogeneous filter would fail (or nearly fail?) to construct. | ||
// // For smaller filters, you can configure with ConstructionFailureChance | ||
// // smaller than desired FP rate to largely counteract this effect. | ||
// // TODO: configuring Homogeneous Ribbon for arbitrarily large filters | ||
// // based on data from OptimizeHomogAtScale | ||
// static constexpr bool kHomogeneous; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just have a special hash function always return 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly what you mean, but for sharing the same general algorithms, GetResultRowFromHash always returns 0 when kHomogeneous. Regardless of kHomogeneous, we work from a 64-bit hash of the input key for start position and coefficient row block, being smart about how we re-use/re-mix that hash info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was why having a special flag kHomogeneous
, as it's just a special case for Ribbon that hash function always return 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag also controls whether we populate free variable rows in the solution pseudorandomly or simply as zeros.
I would agree with this: in RocksDB, we generally prefer dynamic if-based configuration to static template-based configuration. This is good for simplicity, easier testing, avoiding explosion of compiled code size, etc. We do not optimize performance at all software engineering costs.
If this were a case of implementing a known algorithm for use in RocksDB, where we would just implement the subset of features we want, I would generally prefer that approach, like I did with the FastLocalBloom. But this is a new algorithm and I wanted to create a generic reference implementation. We aren't yet sure which subset of features we will want to keep for long-term use in RocksDB. And even if we aren't sensitive to performance down to the ns, some people are, and we want to show the best numbers we reasonably can for a paper.
util/ribbon_impl.h
Outdated
@@ -7,7 +7,9 @@ | |||
|
|||
#include <cmath> | |||
|
|||
#include "port/lang.h" // for FALLTHROUGH_INTENDED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Seems FALLTHROUGH_INTENDED
is not used here.
static uint32_t GetNumToAdd( | ||
uint32_t num_slots, | ||
ConstructionFailureChance max_failure = kDefaultFailureChance) { | ||
switch (max_failure) { | ||
default: | ||
assert(false); | ||
FALLTHROUGH_INTENDED; | ||
case kOneIn20: { | ||
using H1 = BandingConfigHelper1TS<kOneIn20, TypesAndSettings>; | ||
return H1::GetNumToAdd(num_slots); | ||
} | ||
case kOneIn2: { | ||
using H1 = BandingConfigHelper1TS<kOneIn2, TypesAndSettings>; | ||
return H1::GetNumToAdd(num_slots); | ||
} | ||
case kOneIn1000: { | ||
using H1 = BandingConfigHelper1TS<kOneIn1000, TypesAndSettings>; | ||
return H1::GetNumToAdd(num_slots); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at the moment, but I could see it being useful for when space-time trade-off is selected dynamically. (Part of trying to build a general API.)
util/ribbon_config.cc
Outdated
template <ConstructionFailureChance kCfc, uint64_t kCoeffBits, bool kUseSmash, | ||
bool kHomogeneous> | ||
uint32_t BandingConfigHelper1MaybeSupported< | ||
kCfc, kCoeffBits, kUseSmash, kHomogeneous, | ||
true /* kIsSupported */>::GetNumSlots(uint32_t num_to_add) { | ||
using Data = detail::BandingConfigHelperData<kCfc, kCoeffBits, kUseSmash>; | ||
|
||
if (num_to_add == 0) { | ||
return 0; | ||
} | ||
if (kHomogeneous) { | ||
// Reverse of above in GetNumToAdd | ||
num_to_add += 8; | ||
} | ||
double log2_num_to_add = std::log(num_to_add) * 1.4426950409; | ||
uint32_t approx_log2_slots = static_cast<uint32_t>(log2_num_to_add + 0.5); | ||
assert(approx_log2_slots <= 32); // help clang-analyze | ||
|
||
double lower_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots); | ||
double upper_num_to_add; | ||
if (approx_log2_slots == 0 || lower_num_to_add == /* unsupported */ 0) { | ||
// Return minimum non-zero slots in standard implementation | ||
return kUseSmash ? kCoeffBits : 2 * kCoeffBits; | ||
} else if (num_to_add < lower_num_to_add) { | ||
upper_num_to_add = lower_num_to_add; | ||
--approx_log2_slots; | ||
lower_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots); | ||
} else { | ||
upper_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots + 1); | ||
} | ||
|
||
assert(num_to_add >= lower_num_to_add); | ||
assert(num_to_add < upper_num_to_add); | ||
|
||
double upper_portion = | ||
(num_to_add - lower_num_to_add) / (upper_num_to_add - lower_num_to_add); | ||
|
||
double lower_num_slots = 1.0 * (uint64_t{1} << approx_log2_slots); | ||
|
||
// Interpolation, round up | ||
return static_cast<uint32_t>(upper_portion * lower_num_slots + | ||
lower_num_slots + 0.999999999); | ||
} | ||
|
||
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ false, | ||
/*hm*/ false, /*sup*/ true>; | ||
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ true, | ||
/*hm*/ false, /*sup*/ true>; | ||
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ false, | ||
/*hm*/ true, /*sup*/ true>; | ||
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ true, | ||
/*hm*/ true, /*sup*/ true>; | ||
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ false, | ||
/*hm*/ false, /*sup*/ true>; | ||
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ true, | ||
/*hm*/ false, /*sup*/ true>; | ||
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ false, | ||
/*hm*/ true, /*sup*/ true>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template could be hard to read for n00b like me. I'm wondering why it's preferred for ConfigHelper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have gone too far in trying to minimize space (compiled code size) & time (looking up the data). Currently all the data is about 2KB, so I probably shouldn't worry about that. I'll revise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I've tried cutting down on the templates while hiding as much implementation detail in the .cc file as I can. In order to interpret the TypesAndSettings, to avoid people mixing up settings during configuration, you still need a lot of boilerplate in the .h file to translate from template parameters to dynamic values passed to a function in the .cc file. And then more boilerplate in the .cc file to share those settings between the various helper functions. I'm finding it not attractive enough to throw out this version, where we at least get time & space optimization for all the boilerplate.
I'll add some more comments about template instantiation.
// // When true, enables a special "homogeneous" filter implementation that | ||
// // is slightly faster to construct, and never fails to construct though | ||
// // FP rate can quickly explode in cases where corresponding | ||
// // non-homogeneous filter would fail (or nearly fail?) to construct. | ||
// // For smaller filters, you can configure with ConstructionFailureChance | ||
// // smaller than desired FP rate to largely counteract this effect. | ||
// // TODO: configuring Homogeneous Ribbon for arbitrarily large filters | ||
// // based on data from OptimizeHomogAtScale | ||
// static constexpr bool kHomogeneous; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was why having a special flag kHomogeneous
, as it's just a special case for Ribbon that hash function always return 0.
util/ribbon_test.cc
Outdated
: 6U + static_cast<uint32_t>(kCoeffBits / 16) + | ||
std::max(log2_thoroughness, uint32_t{5}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why this is a reasonable add entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some comments
util/ribbon_test.cc
Outdated
|
||
const uint32_t log2_min_add = | ||
static_cast<uint32_t>(ROCKSDB_NAMESPACE::FloorLog2( | ||
static_cast<uint32_t>(0.85 * SimpleSoln::RoundUpNumSlots(1)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 0.85 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some comments
util/ribbon_test.cc
Outdated
// pick a power of two scale uniformly, with a minimum so | ||
// that minimum size is not over-tested due to rounding up | ||
uint32_t log2_add = | ||
static_cast<uint32_t>(3.14159 * i) % (log2_max_add - log2_min_add) + | ||
log2_min_add; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is power of two scale uniformly, isn't it evenly distributed between [log2_min_add, log2_max_add]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll extend the comment, perhaps simplify.
// TODO: unit tests for small filter FP rates | ||
// Not a real test, but a tool to understand Homogeneous Ribbon | ||
// behavior (TODO: configuration APIs & tests) | ||
TYPED_TEST(RibbonTypeParamTest, OptimizeHomogAtScale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's not for unittest, should it be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me DISABLED means "TODO: fix and re-enable this test." There's nothing to fix and re-enable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's not an unittest, the benefit of having DISABLED would remove it from test run output. but anyway, it's okay either way.
I did benchmark for the new filter: https://github.com/jay-zhuang/rocksdb/pull/5/files#diff-05e2b61c9411c11e9832b1eb0defc90d480e3d45ff4a2d08417abe1dbabeccccR56
Here is the benchmark result details: Does these match your expectation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@pdillinger has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@pdillinger merged this pull request in a8b3b9a. |
…ok#7879) Summary: This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings. This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing). Homogenous Ribbon: This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate. Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit. Other misc item specifics: * Ribbon APIs in util/ribbon_config.h now provide configuration data for not just 5% construction failure rate (95% success), but also 50% and 0.1%. * Note that the Ribbon structure does not exhibit "threshold" behavior as standard Xor filter does, so there is a roughly fixed space penalty to cut construction failure rate in half. Thus, there isn't really an "almost sure" setting. * Although we can extrapolate settings for large filters, we don't have a good formula for configuring smaller filters (< 2^17 slots or so), and efforts to summarize with a formula have failed. Thus, small data is hard-coded from updated FindOccupancy tool. * Enhances ApproximateNumEntries for public API Ribbon using more precise data (new API GetNumToAdd), thus a more accurate but not perfect reversal of CalculateSpace. (bloom_test updated to expect the greater precision) * Move EndianSwapValue from coding.h to coding_lean.h to keep Ribbon code easily transferable from RocksDB * Add some missing 'const' to member functions * Small optimization to 128-bit BitParity * Small refactoring of BandingStorage in ribbon_alg.h to support Homogeneous Ribbon * CompactnessAndBacktrackAndFpRate now has an "expand" test: on construction failure, a possible alternative to re-seeding hash functions is simply to increase the number of slots (allocated space overhead) and try again with essentially the same hash values. (Start locations will be different roundings of the same scaled hash values--because fastrange not mod.) This seems to be as effective or more effective than re-seeding, as long as we increase the number of slots (m) by roughly m += m/w where w is the Ribbon width. This way, there is effectively an expansion by one slot for each ribbon-width window in the banding. (This approach assumes that getting "bad data" from your hash function is as unlikely as it naturally should be, e.g. no adversary.) * 32-bit and 16-bit Ribbon configurations are added to ribbon_test for understanding their behavior, e.g. with FindOccupancy. They are not considered useful at this time and not tested with CompactnessAndBacktrackAndFpRate. Pull Request resolved: facebook#7879 Test Plan: unit test updates included Reviewed By: jay-zhuang Differential Revision: D26371245 Pulled By: pdillinger fbshipit-source-id: da6600d90a3785b99ad17a88b2a3027710b4ea3a
Summary: This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings.
This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing).
Homogenous Ribbon:
This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate.
Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit.
Other misc item specifics:
Test Plan: unit test updates included