Refine Ribbon configuration, improve testing, add Homogeneous #7879

pdillinger · 2021-01-18T07:33:45Z

Summary: This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings.

This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing).

Homogenous Ribbon:
This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate.

Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit.

Other misc item specifics:

Ribbon APIs in util/ribbon_config.h now provide configuration data for not just 5% construction failure rate (95% success), but also 50% and 0.1%.
- Note that the Ribbon structure does not exhibit "threshold" behavior as standard Xor filter does, so there is a roughly fixed space penalty to cut construction failure rate in half. Thus, there isn't really an "almost sure" setting.
- Although we can extrapolate settings for large filters, we don't have a good formula for configuring smaller filters (< 2^17 slots or so), and efforts to summarize with a formula have failed. Thus, small data is hard-coded from updated FindOccupancy tool.
Enhances ApproximateNumEntries for public API Ribbon using more precise data (new API GetNumToAdd), thus a more accurate but not perfect reversal of CalculateSpace. (bloom_test updated to expect the greater precision)
Move EndianSwapValue from coding.h to coding_lean.h to keep Ribbon code easily transferable from RocksDB
Add some missing 'const' to member functions
Small optimization to 128-bit BitParity
Small refactoring of BandingStorage in ribbon_alg.h to support Homogeneous Ribbon
CompactnessAndBacktrackAndFpRate now has an "expand" test: on construction failure, a possible alternative to re-seeding hash functions is simply to increase the number of slots (allocated space overhead) and try again with essentially the same hash values. (Start locations will be different roundings of the same scaled hash values--because fastrange not mod.) This seems to be as effective or more effective than re-seeding, as long as we increase the number of slots (m) by roughly m += m/w where w is the Ribbon width. This way, there is effectively an expansion by one slot for each ribbon-width window in the banding. (This approach assumes that getting "bad data" from your hash function is as unlikely as it naturally should be, e.g. no adversary.)
32-bit and 16-bit Ribbon configurations are added to ribbon_test for understanding their behavior, e.g. with FindOccupancy. They are not considered useful at this time and not tested with CompactnessAndBacktrackAndFpRate.

Test Plan: unit test updates included

Summary: TODO Test Plan: unit test updates included

jay-zhuang · 2021-01-21T19:13:52Z

util/ribbon_impl.h

+//   // When true, enables a special "homogeneous" filter implementation that
+//   // is slightly faster to construct, and never fails to construct though
+//   // FP rate can quickly explode in cases where corresponding
+//   // non-homogeneous filter would fail (or nearly fail?) to construct.
+//   // For smaller filters, you can configure with ConstructionFailureChance
+//   // smaller than desired FP rate to largely counteract this effect.
+//   // TODO: configuring Homogeneous Ribbon for arbitrarily large filters
+//   // based on data from OptimizeHomogAtScale
+//   static constexpr bool kHomogeneous;


Why not just have a special hash function always return 0?

I don't know exactly what you mean, but for sharing the same general algorithms, GetResultRowFromHash always returns 0 when kHomogeneous. Regardless of kHomogeneous, we work from a 64-bit hash of the input key for start position and coefficient row block, being smart about how we re-use/re-mix that hash info.

My question was why having a special flag kHomogeneous, as it's just a special case for Ribbon that hash function always return 0.

This flag also controls whether we populate free variable rows in the solution pseudorandomly or simply as zeros.

I would agree with this: in RocksDB, we generally prefer dynamic if-based configuration to static template-based configuration. This is good for simplicity, easier testing, avoiding explosion of compiled code size, etc. We do not optimize performance at all software engineering costs.

If this were a case of implementing a known algorithm for use in RocksDB, where we would just implement the subset of features we want, I would generally prefer that approach, like I did with the FastLocalBloom. But this is a new algorithm and I wanted to create a generic reference implementation. We aren't yet sure which subset of features we will want to keep for long-term use in RocksDB. And even if we aren't sensitive to performance down to the ns, some people are, and we want to show the best numbers we reasonably can for a paper.

jay-zhuang · 2021-01-23T21:51:44Z

util/ribbon_impl.h

@@ -7,7 +7,9 @@

 #include <cmath>

+#include "port/lang.h"  // for FALLTHROUGH_INTENDED


Is this needed? Seems FALLTHROUGH_INTENDED is not used here.

jay-zhuang · 2021-01-23T22:04:18Z

util/ribbon_config.h

+  static uint32_t GetNumToAdd(
+      uint32_t num_slots,
+      ConstructionFailureChance max_failure = kDefaultFailureChance) {
+    switch (max_failure) {
+      default:
+        assert(false);
+        FALLTHROUGH_INTENDED;
+      case kOneIn20: {
+        using H1 = BandingConfigHelper1TS<kOneIn20, TypesAndSettings>;
+        return H1::GetNumToAdd(num_slots);
+      }
+      case kOneIn2: {
+        using H1 = BandingConfigHelper1TS<kOneIn2, TypesAndSettings>;
+        return H1::GetNumToAdd(num_slots);
+      }
+      case kOneIn1000: {
+        using H1 = BandingConfigHelper1TS<kOneIn1000, TypesAndSettings>;
+        return H1::GetNumToAdd(num_slots);
+      }
+    }
+  }


Is this function needed?

Not at the moment, but I could see it being useful for when space-time trade-off is selected dynamically. (Part of trying to build a general API.)

jay-zhuang · 2021-01-23T22:19:08Z

util/ribbon_config.cc

+template <ConstructionFailureChance kCfc, uint64_t kCoeffBits, bool kUseSmash,
+          bool kHomogeneous>
+uint32_t BandingConfigHelper1MaybeSupported<
+    kCfc, kCoeffBits, kUseSmash, kHomogeneous,
+    true /* kIsSupported */>::GetNumSlots(uint32_t num_to_add) {
+  using Data = detail::BandingConfigHelperData<kCfc, kCoeffBits, kUseSmash>;
+
+  if (num_to_add == 0) {
+    return 0;
+  }
+  if (kHomogeneous) {
+    // Reverse of above in GetNumToAdd
+    num_to_add += 8;
+  }
+  double log2_num_to_add = std::log(num_to_add) * 1.4426950409;
+  uint32_t approx_log2_slots = static_cast<uint32_t>(log2_num_to_add + 0.5);
+  assert(approx_log2_slots <= 32);  // help clang-analyze
+
+  double lower_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots);
+  double upper_num_to_add;
+  if (approx_log2_slots == 0 || lower_num_to_add == /* unsupported */ 0) {
+    // Return minimum non-zero slots in standard implementation
+    return kUseSmash ? kCoeffBits : 2 * kCoeffBits;
+  } else if (num_to_add < lower_num_to_add) {
+    upper_num_to_add = lower_num_to_add;
+    --approx_log2_slots;
+    lower_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots);
+  } else {
+    upper_num_to_add = Data::GetNumToAddForPow2(approx_log2_slots + 1);
+  }
+
+  assert(num_to_add >= lower_num_to_add);
+  assert(num_to_add < upper_num_to_add);
+
+  double upper_portion =
+      (num_to_add - lower_num_to_add) / (upper_num_to_add - lower_num_to_add);
+
+  double lower_num_slots = 1.0 * (uint64_t{1} << approx_log2_slots);
+
+  // Interpolation, round up
+  return static_cast<uint32_t>(upper_portion * lower_num_slots +
+                               lower_num_slots + 0.999999999);
+}
+
+template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ false,
+                                                   /*hm*/ false, /*sup*/ true>;
+template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ true,
+                                                   /*hm*/ false, /*sup*/ true>;
+template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ false,
+                                                   /*hm*/ true, /*sup*/ true>;
+template struct BandingConfigHelper1MaybeSupported<kOneIn2, 128U, /*sm*/ true,
+                                                   /*hm*/ true, /*sup*/ true>;
+template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ false,
+                                                   /*hm*/ false, /*sup*/ true>;
+template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ true,
+                                                   /*hm*/ false, /*sup*/ true>;
+template struct BandingConfigHelper1MaybeSupported<kOneIn2, 64U, /*sm*/ false,
+                                                   /*hm*/ true, /*sup*/ true>;


template could be hard to read for n00b like me. I'm wondering why it's preferred for ConfigHelper?

I may have gone too far in trying to minimize space (compiled code size) & time (looking up the data). Currently all the data is about 2KB, so I probably shouldn't worry about that. I'll revise.

Actually, I've tried cutting down on the templates while hiding as much implementation detail in the .cc file as I can. In order to interpret the TypesAndSettings, to avoid people mixing up settings during configuration, you still need a lot of boilerplate in the .h file to translate from template parameters to dynamic values passed to a function in the .cc file. And then more boilerplate in the .cc file to share those settings between the various helper functions. I'm finding it not attractive enough to throw out this version, where we at least get time & space optimization for all the boilerplate.

I'll add some more comments about template instantiation.

jay-zhuang · 2021-01-23T22:23:02Z

util/ribbon_impl.h

+//   // When true, enables a special "homogeneous" filter implementation that
+//   // is slightly faster to construct, and never fails to construct though
+//   // FP rate can quickly explode in cases where corresponding
+//   // non-homogeneous filter would fail (or nearly fail?) to construct.
+//   // For smaller filters, you can configure with ConstructionFailureChance
+//   // smaller than desired FP rate to largely counteract this effect.
+//   // TODO: configuring Homogeneous Ribbon for arbitrarily large filters
+//   // based on data from OptimizeHomogAtScale
+//   static constexpr bool kHomogeneous;


My question was why having a special flag kHomogeneous, as it's just a special case for Ribbon that hash function always return 0.

jay-zhuang · 2021-01-24T01:18:46Z

util/ribbon_test.cc

+                             : 6U + static_cast<uint32_t>(kCoeffBits / 16) +
+                                   std::max(log2_thoroughness, uint32_t{5});


Question: why this is a reasonable add entries?

I'll add some comments

jay-zhuang · 2021-01-24T01:25:59Z

util/ribbon_test.cc

+
+  const uint32_t log2_min_add =
+      static_cast<uint32_t>(ROCKSDB_NAMESPACE::FloorLog2(
+          static_cast<uint32_t>(0.85 * SimpleSoln::RoundUpNumSlots(1))));


why 0.85 here?

I'll add some comments

jay-zhuang · 2021-01-24T01:50:09Z

util/ribbon_test.cc

+      // pick a power of two scale uniformly, with a minimum so
+      // that minimum size is not over-tested due to rounding up
+      uint32_t log2_add =
+          static_cast<uint32_t>(3.14159 * i) % (log2_max_add - log2_min_add) +
+          log2_min_add;


Why this is power of two scale uniformly, isn't it evenly distributed between [log2_min_add, log2_max_add]?

I'll extend the comment, perhaps simplify.

jay-zhuang · 2021-01-24T02:17:45Z

util/ribbon_test.cc

-// TODO: unit tests for small filter FP rates
+// Not a real test, but a tool to understand Homogeneous Ribbon
+// behavior (TODO: configuration APIs & tests)
+TYPED_TEST(RibbonTypeParamTest, OptimizeHomogAtScale) {


As it's not for unittest, should it be disabled?

To me DISABLED means "TODO: fix and re-enable this test." There's nothing to fix and re-enable here.

As it's not an unittest, the benefit of having DISABLED would remove it from test run output. but anyway, it's okay either way.

jay-zhuang · 2021-01-25T18:29:52Z

I did benchmark for the new filter: https://github.com/jay-zhuang/rocksdb/pull/5/files#diff-05e2b61c9411c11e9832b1eb0defc90d480e3d45ff4a2d08417abe1dbabeccccR56
Here are the findings:

➕ confirms the ~30% space save with the default 10 bits_per_key setting
➕ more space saving with higher bits_per_key (~34% with 20 bits_per_key).
The build speed is slower (~5x for 1M keys).
➕ The FP rate is almost the same as expected or even better.
Positive query time is about 80% increased
negative query is about 50% increased.
➕ ribbon has better negative query performance, which is good for us, as most of our query should be negative.

Here is the benchmark result details:
https://gist.github.com/jay-zhuang/e0016cf42e27e776aeb661e469d5b4ae

Does these match your expectation?

jay-zhuang

👍

facebook-github-bot

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-02-23T21:45:55Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-02-26T16:50:55Z

@pdillinger merged this pull request in a8b3b9a.

…ok#7879) Summary: This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings. This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing). Homogenous Ribbon: This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate. Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit. Other misc item specifics: * Ribbon APIs in util/ribbon_config.h now provide configuration data for not just 5% construction failure rate (95% success), but also 50% and 0.1%. * Note that the Ribbon structure does not exhibit "threshold" behavior as standard Xor filter does, so there is a roughly fixed space penalty to cut construction failure rate in half. Thus, there isn't really an "almost sure" setting. * Although we can extrapolate settings for large filters, we don't have a good formula for configuring smaller filters (< 2^17 slots or so), and efforts to summarize with a formula have failed. Thus, small data is hard-coded from updated FindOccupancy tool. * Enhances ApproximateNumEntries for public API Ribbon using more precise data (new API GetNumToAdd), thus a more accurate but not perfect reversal of CalculateSpace. (bloom_test updated to expect the greater precision) * Move EndianSwapValue from coding.h to coding_lean.h to keep Ribbon code easily transferable from RocksDB * Add some missing 'const' to member functions * Small optimization to 128-bit BitParity * Small refactoring of BandingStorage in ribbon_alg.h to support Homogeneous Ribbon * CompactnessAndBacktrackAndFpRate now has an "expand" test: on construction failure, a possible alternative to re-seeding hash functions is simply to increase the number of slots (allocated space overhead) and try again with essentially the same hash values. (Start locations will be different roundings of the same scaled hash values--because fastrange not mod.) This seems to be as effective or more effective than re-seeding, as long as we increase the number of slots (m) by roughly m += m/w where w is the Ribbon width. This way, there is effectively an expansion by one slot for each ribbon-width window in the banding. (This approach assumes that getting "bad data" from your hash function is as unlikely as it naturally should be, e.g. no adversary.) * 32-bit and 16-bit Ribbon configurations are added to ribbon_test for understanding their behavior, e.g. with FindOccupancy. They are not considered useful at this time and not tested with CompactnessAndBacktrackAndFpRate. Pull Request resolved: facebook#7879 Test Plan: unit test updates included Reviewed By: jay-zhuang Differential Revision: D26371245 Pulled By: pdillinger fbshipit-source-id: da6600d90a3785b99ad17a88b2a3027710b4ea3a

Refine Ribbon configuration, improve testing, add Homogeneous

63c7468

Summary: TODO Test Plan: unit test updates included

facebook-github-bot added the CLA Signed label Jan 18, 2021

pdillinger added 7 commits January 18, 2021 08:35

Try to make compilers happy

faf09c5

Appease clang-analyze and release build

36e2977

OptimizeHomogAtScale, and other fixes

02fc134

Fix no GFLAGS

e686032

Fix division by zero for MSVC and optimize 128 BitParity

43a3dec

Merge branch 'master' of github.com:facebook/rocksdb into ribbon7

2453770

Update some comments

621265e

pdillinger marked this pull request as ready for review January 20, 2021 19:25

pdillinger requested a review from jay-zhuang January 20, 2021 19:25

jay-zhuang reviewed Jan 21, 2021

View reviewed changes

jay-zhuang reviewed Jan 24, 2021

View reviewed changes

jay-zhuang approved these changes Jan 25, 2021

View reviewed changes

pdillinger added 3 commits January 25, 2021 21:20

Improve comments, simplify num_to_add in test

46581c7

One more comment

d01bbbb

Merge remote-tracking branch 'origin/master' into ribbon7

ab85774

facebook-github-bot reviewed Feb 10, 2021

View reviewed changes

pdillinger added 2 commits February 23, 2021 10:53

Merge remote-tracking branch 'origin/master' into ribbon7

bf1770e

More comments

56d2692

facebook-github-bot reviewed Feb 25, 2021

View reviewed changes

facebook-github-bot closed this in a8b3b9a Feb 26, 2021

facebook-github-bot added the Merged label Feb 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine Ribbon configuration, improve testing, add Homogeneous #7879

Refine Ribbon configuration, improve testing, add Homogeneous #7879

pdillinger commented Jan 18, 2021 •

edited

Loading

jay-zhuang Jan 21, 2021

pdillinger Jan 21, 2021

jay-zhuang Jan 23, 2021

pdillinger Jan 25, 2021

jay-zhuang Jan 23, 2021

jay-zhuang Jan 23, 2021

pdillinger Jan 25, 2021

jay-zhuang Jan 23, 2021

pdillinger Jan 25, 2021

pdillinger Feb 23, 2021

jay-zhuang Jan 23, 2021

jay-zhuang Jan 24, 2021

pdillinger Jan 25, 2021

jay-zhuang Jan 24, 2021

pdillinger Jan 25, 2021

jay-zhuang Jan 24, 2021

pdillinger Jan 25, 2021

jay-zhuang Jan 24, 2021

pdillinger Jan 25, 2021

jay-zhuang Jan 25, 2021

jay-zhuang commented Jan 25, 2021 •

edited

Loading

jay-zhuang left a comment

facebook-github-bot left a comment

facebook-github-bot commented Feb 23, 2021

facebook-github-bot left a comment

facebook-github-bot commented Feb 26, 2021

		@@ -7,7 +7,9 @@

		#include <cmath>

		#include "port/lang.h" // for FALLTHROUGH_INTENDED

		: 6U + static_cast<uint32_t>(kCoeffBits / 16) +
		std::max(log2_thoroughness, uint32_t{5});

Refine Ribbon configuration, improve testing, add Homogeneous #7879

Refine Ribbon configuration, improve testing, add Homogeneous #7879

Conversation

pdillinger commented Jan 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jay-zhuang commented Jan 25, 2021 • edited Loading

jay-zhuang left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 23, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 26, 2021

pdillinger commented Jan 18, 2021 •

edited

Loading

jay-zhuang commented Jan 25, 2021 •

edited

Loading