Skip to content

cmSketch not benefitting from four rows #108

@FrankReh

Description

@FrankReh

I believe the commit f823dc4 broke the purpose of the cm sketch to have four rows. The bitwise-xor and mask are not creating the mix of indexes expected by the design. The same Increment/Estimate results, the same values, are achieved from a single row with or without the bitwise-xor. The earlier implementation seems to have been a good and very fast approximation to a distinct hash for each row except when the high 32 bits were all zero. One solution to fixing the earlier version could be to bitwise-or a single bit into the top 32 half. I can provide my testing and benchmarking on this offline if interested.

For small values of row length as used in the current unit tests, this doesn't matter. As the row length gets larger, the gap between the earlier algorithm and the current one widens and I believe becomes significant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    optimizationpriority/P1Serious issue that requires eventual attention (can wait a bit)status/acceptedWe accept to work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions