Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine Ribbon configuration, improve testing, add Homogeneous #7879

Closed
wants to merge 13 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Jan 18, 2021

Summary: This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings.

This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing).

Homogenous Ribbon:
This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate.

Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit.

Other misc item specifics:

  • Ribbon APIs in util/ribbon_config.h now provide configuration data for not just 5% construction failure rate (95% success), but also 50% and 0.1%.
    • Note that the Ribbon structure does not exhibit "threshold" behavior as standard Xor filter does, so there is a roughly fixed space penalty to cut construction failure rate in half. Thus, there isn't really an "almost sure" setting.
    • Although we can extrapolate settings for large filters, we don't have a good formula for configuring smaller filters (< 2^17 slots or so), and efforts to summarize with a formula have failed. Thus, small data is hard-coded from updated FindOccupancy tool.
  • Enhances ApproximateNumEntries for public API Ribbon using more precise data (new API GetNumToAdd), thus a more accurate but not perfect reversal of CalculateSpace. (bloom_test updated to expect the greater precision)
  • Move EndianSwapValue from coding.h to coding_lean.h to keep Ribbon code easily transferable from RocksDB
  • Add some missing 'const' to member functions
  • Small optimization to 128-bit BitParity
  • Small refactoring of BandingStorage in ribbon_alg.h to support Homogeneous Ribbon
  • CompactnessAndBacktrackAndFpRate now has an "expand" test: on construction failure, a possible alternative to re-seeding hash functions is simply to increase the number of slots (allocated space overhead) and try again with essentially the same hash values. (Start locations will be different roundings of the same scaled hash values--because fastrange not mod.) This seems to be as effective or more effective than re-seeding, as long as we increase the number of slots (m) by roughly m += m/w where w is the Ribbon width. This way, there is effectively an expansion by one slot for each ribbon-width window in the banding. (This approach assumes that getting "bad data" from your hash function is as unlikely as it naturally should be, e.g. no adversary.)
  • 32-bit and 16-bit Ribbon configurations are added to ribbon_test for understanding their behavior, e.g. with FindOccupancy. They are not considered useful at this time and not tested with CompactnessAndBacktrackAndFpRate.

Test Plan: unit test updates included

Summary: TODO

Test Plan: unit test updates included
@pdillinger pdillinger marked this pull request as ready for review January 20, 2021 19:25
Comment on lines +56 to +64
// // When true, enables a special "homogeneous" filter implementation that
// // is slightly faster to construct, and never fails to construct though
// // FP rate can quickly explode in cases where corresponding
// // non-homogeneous filter would fail (or nearly fail?) to construct.
// // For smaller filters, you can configure with ConstructionFailureChance
// // smaller than desired FP rate to largely counteract this effect.
// // TODO: configuring Homogeneous Ribbon for arbitrarily large filters
// // based on data from OptimizeHomogAtScale
// static constexpr bool kHomogeneous;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have a special hash function always return 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly what you mean, but for sharing the same general algorithms, GetResultRowFromHash always returns 0 when kHomogeneous. Regardless of kHomogeneous, we work from a 64-bit hash of the input key for start position and coefficient row block, being smart about how we re-use/re-mix that hash info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was why having a special flag kHomogeneous, as it's just a special case for Ribbon that hash function always return 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag also controls whether we populate free variable rows in the solution pseudorandomly or simply as zeros.

I would agree with this: in RocksDB, we generally prefer dynamic if-based configuration to static template-based configuration. This is good for simplicity, easier testing, avoiding explosion of compiled code size, etc. We do not optimize performance at all software engineering costs.

If this were a case of implementing a known algorithm for use in RocksDB, where we would just implement the subset of features we want, I would generally prefer that approach, like I did with the FastLocalBloom. But this is a new algorithm and I wanted to create a generic reference implementation. We aren't yet sure which subset of features we will want to keep for long-term use in RocksDB. And even if we aren't sensitive to performance down to the ns, some people are, and we want to show the best numbers we reasonably can for a paper.

@@ -7,7 +7,9 @@

#include <cmath>

#include "port/lang.h" // for FALLTHROUGH_INTENDED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Seems FALLTHROUGH_INTENDED is not used here.

Comment on lines +132 to +152
static uint32_t GetNumToAdd(
uint32_t num_slots,
ConstructionFailureChance max_failure = kDefaultFailureChance) {
switch (max_failure) {
default:
assert(false);
FALLTHROUGH_INTENDED;
case kOneIn20: {
using H1 = BandingConfigHelper1TS<kOneIn20, TypesAndSettings>;
return H1::GetNumToAdd(num_slots);
}
case kOneIn2: {
using H1 = BandingConfigHelper1TS<kOneIn2, TypesAndSettings>;
return H1::GetNumToAdd(num_slots);
}
case kOneIn1000: {
using H1 = BandingConfigHelper1TS<kOneIn1000, TypesAndSettings>;
return H1::GetNumToAdd(num_slots);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the moment, but I could see it being useful for when space-time trade-off is selected dynamically. (Part of trying to build a general API.)

Comment on lines 396 to 453
template <ConstructionFailureChance kCfc, uint64_t kCoeffBits, bool kUseSmash,
bool kHomogeneous>
uint32_t BandingConfigHelper1MaybeSupported<
kCfc, kCoeffBits, kUseSmash, kHomogeneous,
true /* kIsSupported */>::GetNumSlots(uint32_t num_to_add) {
using Data = detail::BandingConfigHelperData<kCfc, kCoeffBits, kUseSmash>;

if (num_to_add == 0) {
return 0;
}
if (kHomogeneous) {
// Reverse of above in GetNumToAdd
num_to_add += 8;
}
double log2_num_to_add = std::log(num_to_add) * 1.4426950409;
uint32_t approx_log2_slots = static_cast<uint32_t>(log2_num_to_add + 0.5);
assert(approx_log2_slots <= 32); // help clang-analyze

double lower_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots);
double upper_num_to_add;
if (approx_log2_slots == 0 || lower_num_to_add == /* unsupported */ 0) {
// Return minimum non-zero slots in standard implementation
return kUseSmash ? kCoeffBits : 2 * kCoeffBits;
} else if (num_to_add < lower_num_to_add) {
upper_num_to_add = lower_num_to_add;
--approx_log2_slots;
lower_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots);
} else {
upper_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots + 1);
}

assert(num_to_add >= lower_num_to_add);
assert(num_to_add < upper_num_to_add);

double upper_portion =
(num_to_add - lower_num_to_add) / (upper_num_to_add - lower_num_to_add);

double lower_num_slots = 1.0 * (uint64_t{1} << approx_log2_slots);

// Interpolation, round up
return static_cast<uint32_t>(upper_portion * lower_num_slots +
lower_num_slots + 0.999999999);
}

template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ false,
/*hm*/ false, /*sup*/ true>;
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ true,
/*hm*/ false, /*sup*/ true>;
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ false,
/*hm*/ true, /*sup*/ true>;
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ true,
/*hm*/ true, /*sup*/ true>;
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ false,
/*hm*/ false, /*sup*/ true>;
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ true,
/*hm*/ false, /*sup*/ true>;
template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ false,
/*hm*/ true, /*sup*/ true>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template could be hard to read for n00b like me. I'm wondering why it's preferred for ConfigHelper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have gone too far in trying to minimize space (compiled code size) & time (looking up the data). Currently all the data is about 2KB, so I probably shouldn't worry about that. I'll revise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've tried cutting down on the templates while hiding as much implementation detail in the .cc file as I can. In order to interpret the TypesAndSettings, to avoid people mixing up settings during configuration, you still need a lot of boilerplate in the .h file to translate from template parameters to dynamic values passed to a function in the .cc file. And then more boilerplate in the .cc file to share those settings between the various helper functions. I'm finding it not attractive enough to throw out this version, where we at least get time & space optimization for all the boilerplate.

I'll add some more comments about template instantiation.

Comment on lines +56 to +64
// // When true, enables a special "homogeneous" filter implementation that
// // is slightly faster to construct, and never fails to construct though
// // FP rate can quickly explode in cases where corresponding
// // non-homogeneous filter would fail (or nearly fail?) to construct.
// // For smaller filters, you can configure with ConstructionFailureChance
// // smaller than desired FP rate to largely counteract this effect.
// // TODO: configuring Homogeneous Ribbon for arbitrarily large filters
// // based on data from OptimizeHomogAtScale
// static constexpr bool kHomogeneous;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was why having a special flag kHomogeneous, as it's just a special case for Ribbon that hash function always return 0.

Comment on lines 423 to 424
: 6U + static_cast<uint32_t>(kCoeffBits / 16) +
std::max(log2_thoroughness, uint32_t{5});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why this is a reasonable add entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some comments


const uint32_t log2_min_add =
static_cast<uint32_t>(ROCKSDB_NAMESPACE::FloorLog2(
static_cast<uint32_t>(0.85 * SimpleSoln::RoundUpNumSlots(1))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 0.85 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some comments

Comment on lines 472 to 476
// pick a power of two scale uniformly, with a minimum so
// that minimum size is not over-tested due to rounding up
uint32_t log2_add =
static_cast<uint32_t>(3.14159 * i) % (log2_max_add - log2_min_add) +
log2_min_add;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is power of two scale uniformly, isn't it evenly distributed between [log2_min_add, log2_max_add]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll extend the comment, perhaps simplify.

// TODO: unit tests for small filter FP rates
// Not a real test, but a tool to understand Homogeneous Ribbon
// behavior (TODO: configuration APIs & tests)
TYPED_TEST(RibbonTypeParamTest, OptimizeHomogAtScale) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's not for unittest, should it be disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me DISABLED means "TODO: fix and re-enable this test." There's nothing to fix and re-enable here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's not an unittest, the benefit of having DISABLED would remove it from test run output. but anyway, it's okay either way.

@jay-zhuang
Copy link
Contributor

jay-zhuang commented Jan 25, 2021

I did benchmark for the new filter: https://github.com/jay-zhuang/rocksdb/pull/5/files#diff-05e2b61c9411c11e9832b1eb0defc90d480e3d45ff4a2d08417abe1dbabeccccR56
Here are the findings:

  • ➕ confirms the ~30% space save with the default 10 bits_per_key setting
  • ➕ more space saving with higher bits_per_key (~34% with 20 bits_per_key).
  • The build speed is slower (~5x for 1M keys).
  • ➕ The FP rate is almost the same as expected or even better.
  • Positive query time is about 80% increased
  • negative query is about 50% increased.
  • ➕ ribbon has better negative query performance, which is good for us, as most of our query should be negative.

Here is the benchmark result details:
https://gist.github.com/jay-zhuang/e0016cf42e27e776aeb661e469d5b4ae

Does these match your expectation?

Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in a8b3b9a.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
…ok#7879)

Summary:
This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings.

This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing).

Homogenous Ribbon:
This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate.

Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit.

Other misc item specifics:
* Ribbon APIs in util/ribbon_config.h now provide configuration data for not just 5% construction failure rate (95% success), but also 50% and 0.1%.
  * Note that the Ribbon structure does not exhibit "threshold" behavior as standard Xor filter does, so there is a roughly fixed space penalty to cut construction failure rate in half. Thus, there isn't really an "almost sure" setting.
  * Although we can extrapolate settings for large filters, we don't have a good formula for configuring smaller filters (< 2^17 slots or so), and efforts to summarize with a formula have failed. Thus, small data is hard-coded from updated FindOccupancy tool.
* Enhances ApproximateNumEntries for public API Ribbon using more precise data (new API GetNumToAdd), thus a more accurate but not perfect reversal of CalculateSpace. (bloom_test updated to expect the greater precision)
* Move EndianSwapValue from coding.h to coding_lean.h to keep Ribbon code easily transferable from RocksDB
* Add some missing 'const' to member functions
* Small optimization to 128-bit BitParity
* Small refactoring of BandingStorage in ribbon_alg.h to support Homogeneous Ribbon
* CompactnessAndBacktrackAndFpRate now has an "expand" test: on construction failure, a possible alternative to re-seeding hash functions is simply to increase the number of slots (allocated space overhead) and try again with essentially the same hash values. (Start locations will be different roundings of the same scaled hash values--because fastrange not mod.) This seems to be as effective or more effective than re-seeding, as long as we increase the number of slots (m) by roughly m += m/w where w is the Ribbon width. This way, there is effectively an expansion by one slot for each ribbon-width window in the banding. (This approach assumes that getting "bad data" from your hash function is as unlikely as it naturally should be, e.g. no adversary.)
* 32-bit and 16-bit Ribbon configurations are added to ribbon_test for understanding their behavior, e.g. with FindOccupancy. They are not considered useful at this time and not tested with CompactnessAndBacktrackAndFpRate.

Pull Request resolved: facebook#7879

Test Plan: unit test updates included

Reviewed By: jay-zhuang

Differential Revision: D26371245

Pulled By: pdillinger

fbshipit-source-id: da6600d90a3785b99ad17a88b2a3027710b4ea3a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants