-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ribbon: initial (general) algorithms and basic unit test #7491
Conversation
Oops, forgot to make sure it compiles with TEST_UINT128_COMPAT |
util/ribbon_alg.h
Outdated
for (Index j = 0; j < kResultBits; ++j) { | ||
// Compute next solution bit at row i, column j (see derivation below) | ||
CoeffRow tmp = state[j] << 1; | ||
int bit = BitParity(tmp & cr) ^ ((rr >> j) & 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be bool bit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe some people (not sure about compilers) consider it bad style to cast between integral types and bool. And Google style calls for 'int' for small integer values.
But it might generate the same code, with one less static_cast, to do
bool bit = (parity) != 0;
tmp |= bit ? CoeffRow{1} : CoeffRow{0};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That new version is generally about the same, except generating one less instruction on gcc 4.9.x :) https://godbolt.org/z/9MP8zT
num_starts_ = num_starts; | ||
} | ||
|
||
Index GetNumStarts() const { return num_starts_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why we store num_starts
vs. num_slots
? I feel num_slots
is more straight forward. Or num_keys
if it also store a kFactor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add a comment for that. It's because num_starts is on the extreme critical path for queries: its value is needed (for FastRange) to know where to start probing in memory. We don't want any extra instructions for getting that value.
using Key = Hash; | ||
using Seed = typename RehasherTypesAndSettings::Seed; | ||
|
||
static Hash HashFn(const Hash& input, Seed seed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, how it's different from XXH3p_64bits_withSeed
:
Lines 1093 to 1099 in 3e74505
XXH3p_64bits_withSeed(const void* input, size_t len, XXH64_hash_t seed) | |
{ | |
if (len <= 16) return XXH3p_len_0to16_64b((const xxh_u8*)input, len, kSecret, seed); | |
if (len <= 128) return XXH3p_len_17to128_64b((const xxh_u8*)input, len, kSecret, sizeof(kSecret), seed); | |
if (len <= XXH3p_MIDSIZE_MAX) return XXH3p_len_129to240_64b((const xxh_u8*)input, len, kSecret, sizeof(kSecret), seed); | |
return XXH3p_hashLong_64b_withSeed((const xxh_u8*)input, len, seed); | |
} |
Could them be combined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XXH3p_len_4to8_64b (used in XXH3p_64bits_withSeed) has some extra mixing that seems to be unnecessary in this application. (Trying to minimize unnecessary hash computation.)
xxh_u64 const mix64 = len + ((keyed ^ (keyed >> 51)) * PRIME32_1);
XXH3p_avalanche((mix64 ^ (mix64 >> 47)) * PRIME64_2)
I should probably say that StandardRehasher is not intended for use (at least untested) with hash-sized values that are not already uniformly distributed.
// more rows. Thus, for valid solution, the dot product of the | ||
// solution column with the coefficient row has to equal the result | ||
// at that column, | ||
// BitParity(tmp & cr) == ((rr >> j) & 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: how to make sure BitParity(tmp & cr) == ((rr >> j) & 1)
as tmp
could be changed for next row?
For example, to simplify, assume kResultBits = 1
, r = 2
. then tmp only has 2 bits, the first bit tmp[0]
(tmp[x]
means the x
bit in tmp
) it makes sure: (tmp[0]+cr[0])^(tmp[1]+cr[1]) == rr[0]
. But tmp[1]
is not assigned yet and will be generated later. How do we make sure we don't need to go back and modify tmp[0]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our initial back-substitution state is all zeros, though the initial state is unused because that would mean referencing an "out of bounds" slot, and unoccupied banding rows have cr as all zeros.
Each step through here we set new tmp[x+1] (bit x + 1) to old tmp[x] (bit x). Since we're moving backward through the rows, the second bit of tmp (aka tmp[1] or (tmp >> 1) & 1) will hold the value assigned to the "next" (but most recently assigned) row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, as the last r
rows is an upper triangular matrix which will be used to set the initial state. I was confused that it is a band matrix.
StandardHasher<TypesAndSettings>::ResetSeed(); | ||
do { | ||
Reset(num_slots); | ||
bool success = AddRange(begin, end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the success rate? Say r=128
, 1 million keys
, kFactor=1.025
. rr = 8 bits
. Is there a formula for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no code here for that yet. Other code will come that choses an appropriate factor for appropriate success rate (generally staying above 50%).
} | ||
} while (StandardHasher<TypesAndSettings>::NextSeed(max_seed)); | ||
// no seed through max_seed worked | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: what's the expectation from the user if it's failed? Increase the kNumSlots
and retry? If that's the case, a helper function to do that might be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's going to be application specific. Increasing kNumSlots could be an approach, but I expect us to fall back on Bloom filter if we find ourselves with extraordinarily bad luck, or something is broken. We should be able to make that reliably extremely rare without approaches like increasing kNumSlots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
// #################### Ribbon on-the-fly banding ####################### | ||
// | ||
// "Banding" is what we call the process of reducing the inputs to an | ||
// upper-triangluar r-band matrix ready for finishing a solution with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: triangular
// The enhanced algorithm is based on these observations: | ||
// - When processing a coefficient row with first 1 in column j, | ||
// - If it's the first at column j to be processed, it can be part of | ||
// the banding at row j. (And that descision never overwritten, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: decision
// // Given a hash value, return the r-bit sequence of coefficients to | ||
// // associate with it. It's generally OK if | ||
// // sizeof(CoeffRow) > sizeof(Hash) | ||
// // as long as the hash itself is not too prone to collsions for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: collisions
|
||
// InterleavedSolutionStorage is row-major at a high level, for good | ||
// locality, and column-major at a low level, for CPU efficiency | ||
// especially in filter querys or relatively small number of result bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queries?
// to apply a different seed. This hasher seeds a 1-to-1 mixing | ||
// transformation to apply a seed to an existing hash (or hash-sized key). | ||
// | ||
// Testing suggests essentially no degredation of solution success rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
degradation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice algorithm 👍👍👍
Thanks, I'll fix those typos (and more) in next PR |
@pdillinger merged this pull request in 25d54c7. |
Summary: This is intended as the first commit toward a near-optimal alternative to static Bloom filters for SSTs. Stephan Walzer and I have agreed upon the name "Ribbon" for a PHSF based on his linear system construction in "Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications" ("SGauss") and my much faster "on the fly" algorithm for gaussian elimination (or for this linear system, "banding"), which can be faster than peeling while also more compact and flexible. See util/ribbon_alg.h for more detailed introduction and background. RIBBON = Rapid Incremental Boolean Banding ON-the-fly This commit just adds generic (templatized) core algorithms and a basic unit test showing some features, including the ability to construct structures within 2.5% space overhead vs. information theoretic lower bound. (Compare to cache-local Bloom filter's ~50% space overhead -> ~30% reduction anticipated.) This commit does not include the storage scheme necessary to make queries fast, especially for filter queries, nor fractional "result bits", but there is some description already and those implementations will come soon. Nor does this commit add FilterPolicy support, for use in SST files, but that will also come soon. Pull Request resolved: facebook#7491 Reviewed By: jay-zhuang Differential Revision: D24517954 Pulled By: pdillinger fbshipit-source-id: 0119ee597e250d7e0edd38ada2ba50d755606fa7
This is intended as the first commit toward a near-optimal alternative to static Bloom filters for SSTs. Stephan Walzer and I have agreed upon the name "Ribbon" for a PHSF based on his linear system construction in "Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications" ("SGauss") and my much faster "on the fly" algorithm for gaussian elimination (or for this linear system, "banding"), which can be faster than peeling while also more compact and flexible. See util/ribbon_alg.h for more detailed introduction and background. RIBBON = Rapid Incremental Boolean Banding ON-the-fly
This commit just adds generic (templatized) core algorithms and a basic unit test showing some features, including the ability to construct structures within 2.5% space overhead vs. information theoretic lower bound. (Compare to cache-local Bloom filter's ~50% space overhead -> ~30% reduction anticipated.) This commit does not include the storage scheme necessary to make queries fast, especially for filter queries, nor fractional "result bits", but there is some description already and those implementations will come soon. Nor does this commit add FilterPolicy support, for use in SST files, but that will also come soon.