Cool trick to store 4 5-bit values in one 16-bit value.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore Initial commit Jan 27, 2018
README.md apply changes suggested in #7 Jan 31, 2018
main.cpp fixed issue with packed overload implementation Feb 1, 2018
makefile remove binary Jan 28, 2018

README.md

Bucket Compression Trick

What is it?

It's a neat little trick to efficiently squeeze four 5-bit values into a single 16-bit value when we don't care about the order of those 5-bit values and we want to get real close to that Shannon limit.

This is good for storing things like a series of order independent options in an entry or entry identifiers in a very large hash table. Note that even for values > 5 bits, we can still apply this trick if we're willing to move the rest of the to match the sorted order of those 5 bits but obviously you wouldn't want to shuffle around hundreds of bytes just to save 4 bits.

How does it work?

This trick works by trading control over the ordering of the numbers for extra precision in the four numbers.

There are multichoose(24, 4) = 3876 possible unique values for a multiset of 4 4-bit values, whereas there are 65536 possible unique values for a 16 bit integer. log2(multichoose(24, 4)) = 11.92 so if we can map all possible values to a 12 bit representation, it leaves us with 4 bits to play around with and that's enough to store the extra bit in our four 5-bit numbers. Theoretically we have a bit more than 4 bits to play around with as log2(216 / 3876) = 4.08 or from 16 - 11.92 but in practice that fraction is a bit harder to extract without a larger or more indirect lookup table.

More generally, a multiset of m n-bit numbers cannot hold more than m * n bits of information and it is possible to leverage that difference to store m' n'-bit numbers in the same m * n bits as long as log2(multichoose(m', n')) <= m * n. log2(multichoose(25, 4)) = 15.67 bits < 16 bits.

Further reading: