-
Notifications
You must be signed in to change notification settings - Fork 415
Description
I believe the commit f823dc4 broke the purpose of the cm sketch to have four rows. The bitwise-xor and mask are not creating the mix of indexes expected by the design. The same Increment/Estimate results, the same values, are achieved from a single row with or without the bitwise-xor. The earlier implementation seems to have been a good and very fast approximation to a distinct hash for each row except when the high 32 bits were all zero. One solution to fixing the earlier version could be to bitwise-or a single bit into the top 32 half. I can provide my testing and benchmarking on this offline if interested.
For small values of row length as used in the current unit tests, this doesn't matter. As the row length gets larger, the gap between the earlier algorithm and the current one widens and I believe becomes significant.