Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
small hash #102
optimize hashes with <= 3-5 keys to a simple array of keys and values with linear lookup.
The best would be a he-array alike inlined len/char*/flags/val array to be cache concious. (as in #24
But there are many more simple hash optims, which we do first.
I estimate you can use linear or serial (unsorted) lookup up to 100 keys or even more, depending on benchmarks.
In my port of LCS::BV from Perl to C I began with Bob Jenkins hash and ended the tuning using VLAs (variable length arrays) on the stack, the array serially filled (\0 terminated). See llcs_seq_a() and the used
Of course in my example I can benefit from the known restrictions: maximum size, keys strings immutable, typed values (uint_64).
So many? I thought I only want to fill one cache line, so just very few
On Wed, Apr 27, 2016, 20:59 Helmut Wollmersdorfer email@example.com
You should trust only numbers you benchmarked yourself;-)
Hash is said to have complexity O(1). But as always it is O(1*k), where k is the implementation factor.
Serial has O((n/2)*k). A break even point of n=4 between hash and serial would need k_hash = 2 * k_serial. I.e. the hash algorithm executes only the double amount of instructions compared to one iteration of the loop of serial. My serial has 3 instructions (C operators) in the loop including conditions. So for a break even n=4 it would need a hash function (locating the entry in the array) to only use 6 instructions.
I didn't optimize for cache friendlyness directly. Serial just maps a nearly indefinite (sparse) alphabet to a minimal one (none sparse) and keeps nearly the order of filling, which is memory and cache friendly. Hash algorithms (if not perfect hashes) map sparse to not so sparse, but still sparse.