New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection hashCode performance improvements #7212

Open
wants to merge 1 commit into
base: 2.13.x
from

Conversation

Projects
None yet
6 participants
@szeiger
Contributor

szeiger commented Sep 12, 2018

  • Use constant-time hashes for Range. Ranges of less than 2 elements
    still use the standard Seq hash. Other Ranges compute the hash
    directly from start, step and end without iterating through all
    elements.

  • In order to keep Range hashes consistent with other Seq hashes we
    recognize Seqs where the elements’ hashes change sequentially with a
    fixed step and produce a Range hash instead of a regular Seq hash
    for these. The performance impact is mixed and relatively small.
    On Java 8 HotSpot x64 there are some slowdowns for smallish Seqs
    (but not < 2 elements because these are treated specially) and
    speedups for larger Seqs (which is counter-intuitive because the
    new algorithm is strictly more complex than the old one; it is still
    faster because HotSpot performed loop unrolling for the old algorithm
    which had a negative performance impact). The Range optimization is
    applied to arrayHash, orderedHash and listHash.

  • Add (range-optimized) IndexedSeq hashing. Unlike what earlier comments
    in our MurmurHash3 implementation claim, there now is a significant
    performance boost from a specialized implementation.

/cc @pnf

Here's a full benchmark run without the IndexedSeq optimizations:

[info] Benchmark                                              (size)  Mode  Cnt      Score      Error  Units
[info] MurmurHash3Benchmark.oldArrayHashOrdered                    0  avgt   20      4.766 ±    0.114  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                    1  avgt   20      7.683 ±    0.171  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                    2  avgt   20      9.591 ±    0.201  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                    3  avgt   20     10.862 ±    0.214  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                   10  avgt   20     22.588 ±    0.351  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                  100  avgt   20    197.064 ±    2.166  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                 1000  avgt   20   2103.802 ±   28.463  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                10000  avgt   20  21378.506 ±  335.843  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                     0  avgt   20      4.583 ±    0.365  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                     1  avgt   20      8.794 ±    0.104  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                     2  avgt   20     12.617 ±    0.135  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                     3  avgt   20     14.956 ±    0.199  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                    10  avgt   20     36.975 ±    0.402  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                   100  avgt   20    305.132 ±    4.119  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                  1000  avgt   20   2934.705 ±   41.807  ns/op
[info] MurmurHash3Benchmark.oldListHashOrdered                 10000  avgt   20  29357.418 ±  243.042  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered                  0  avgt   20      8.871 ±    0.294  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered                  1  avgt   20     15.960 ±    0.235  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered                  2  avgt   20     27.215 ±    0.373  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered                  3  avgt   20     35.660 ±    0.858  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered                 10  avgt   20     94.874 ±    1.504  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered                100  avgt   20    921.109 ±   21.339  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered               1000  avgt   20   9254.958 ±  120.631  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashOrdered              10000  avgt   20  92146.909 ± 1397.921  ns/op
[info] MurmurHash3Benchmark.rangeHash                              0  avgt   20      8.070 ±    0.329  ns/op
[info] MurmurHash3Benchmark.rangeHash                              1  avgt   20      7.770 ±    0.072  ns/op
[info] MurmurHash3Benchmark.rangeHash                              2  avgt   20      7.821 ±    0.108  ns/op
[info] MurmurHash3Benchmark.rangeHash                              3  avgt   20      7.780 ±    0.099  ns/op
[info] MurmurHash3Benchmark.rangeHash                             10  avgt   20      7.853 ±    0.127  ns/op
[info] MurmurHash3Benchmark.rangeHash                            100  avgt   20      7.806 ±    0.087  ns/op
[info] MurmurHash3Benchmark.rangeHash                           1000  avgt   20      7.675 ±    0.243  ns/op
[info] MurmurHash3Benchmark.rangeHash                          10000  avgt   20      7.503 ±    0.103  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1          0  avgt   20      4.808 ±    0.077  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1          1  avgt   20      7.330 ±    0.129  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1          2  avgt   20     10.928 ±    0.171  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1          3  avgt   20     11.406 ±    0.268  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1         10  avgt   20     22.662 ±    0.251  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1        100  avgt   20    199.573 ±    2.649  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1       1000  avgt   20   2095.051 ±   27.376  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1      10000  avgt   20  21226.062 ±  282.746  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2          0  avgt   20      4.806 ±    0.070  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2          1  avgt   20      7.335 ±    0.098  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2          2  avgt   20     10.956 ±    0.129  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2          3  avgt   20     11.257 ±    0.135  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2         10  avgt   20     24.134 ±    0.351  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2        100  avgt   20    192.812 ±    4.138  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2       1000  avgt   20   1854.044 ±   30.974  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2      10000  avgt   20  18535.794 ±  227.588  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered         0  avgt   20      4.924 ±    0.083  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered         1  avgt   20      7.300 ±    0.059  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered         2  avgt   20     11.315 ±    0.185  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered         3  avgt   20     12.675 ±    0.472  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered        10  avgt   20     24.923 ±    0.321  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered       100  avgt   20    189.875 ±    2.612  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered      1000  avgt   20   1844.843 ±   18.641  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered     10000  avgt   20  18762.374 ±  236.343  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1           0  avgt   20      4.364 ±    0.056  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1           1  avgt   20      9.274 ±    0.136  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1           2  avgt   20     16.256 ±    0.236  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1           3  avgt   20     18.545 ±    0.216  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1          10  avgt   20     47.798 ±    0.542  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1         100  avgt   20    341.948 ±    5.100  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1        1000  avgt   20   3216.831 ±   42.486  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed1       10000  avgt   20  32054.327 ±  395.534  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2           0  avgt   20      4.362 ±    0.049  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2           1  avgt   20      9.310 ±    0.150  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2           2  avgt   20     16.178 ±    0.301  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2           3  avgt   20     18.669 ±    0.326  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2          10  avgt   20     47.033 ±    0.618  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2         100  avgt   20    338.261 ±    5.645  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2        1000  avgt   20   3416.425 ±   56.577  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashMixed2       10000  avgt   20  32040.805 ±  564.557  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered          0  avgt   20      4.306 ±    0.077  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered          1  avgt   20      9.205 ±    0.093  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered          2  avgt   20     16.359 ±    0.230  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered          3  avgt   20     21.200 ±    0.451  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered         10  avgt   20     48.405 ±    0.808  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered        100  avgt   20    345.697 ±    5.619  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered       1000  avgt   20   3312.353 ±   46.730  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedListHashOrdered      10000  avgt   20  31130.434 ±  429.699  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1        0  avgt   20     19.946 ±    0.201  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1        1  avgt   20     29.845 ±    0.400  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1        2  avgt   20     44.786 ±    0.693  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1        3  avgt   20     49.877 ±    0.663  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1       10  avgt   20    112.911 ±    2.222  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1      100  avgt   20    961.552 ±   17.077  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1     1000  avgt   20   9377.366 ±  126.620  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed1    10000  avgt   20  92066.418 ± 2607.347  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2        0  avgt   20     19.466 ±    0.440  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2        1  avgt   20     28.894 ±    0.348  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2        2  avgt   20     43.234 ±    0.524  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2        3  avgt   20     47.874 ±    0.616  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2       10  avgt   20    109.906 ±    2.372  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2      100  avgt   20    920.379 ±   11.051  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2     1000  avgt   20   9133.441 ±  154.972  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashMixed2    10000  avgt   20  91156.502 ± 1256.069  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered       0  avgt   20     19.365 ±    0.272  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered       1  avgt   20     28.740 ±    0.337  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered       2  avgt   20     43.736 ±    2.251  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered       3  avgt   20     56.564 ±    5.880  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered      10  avgt   20    119.910 ±    2.254  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered     100  avgt   20    956.385 ±   27.940  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered    1000  avgt   20   9141.034 ±  139.801  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashOrdered   10000  avgt   20  91790.572 ± 1229.346  ns/op

Looking only at Array hashing from an earlier benchmark run. This is the method we started with:

[info] MurmurHash3Benchmark.oldArrayHashOrdered                  0  avgt   20      4.770 ±   0.264  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                  1  avgt   20      7.669 ±   0.125  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                  2  avgt   20      9.673 ±   0.126  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                  3  avgt   20     10.777 ±   0.185  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                 10  avgt   20     22.471 ±   0.310  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered                100  avgt   20    195.470 ±   2.468  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered               1000  avgt   20   2092.634 ±  32.907  ns/op
[info] MurmurHash3Benchmark.oldArrayHashOrdered              10000  avgt   20  20974.062 ± 227.221  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1        0  avgt   20      4.871 ±   0.089  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1        1  avgt   20      7.204 ±   0.162  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1        2  avgt   20     11.214 ±   0.151  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1        3  avgt   20     11.307 ±   0.195  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1       10  avgt   20     22.204 ±   0.345  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1      100  avgt   20    196.264 ±   2.131  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1     1000  avgt   20   2081.643 ±  27.508  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed1    10000  avgt   20  21133.974 ± 218.761  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2        0  avgt   20      4.816 ±   0.073  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2        1  avgt   20      7.216 ±   0.104  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2        2  avgt   20     10.842 ±   0.117  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2        3  avgt   20     11.189 ±   0.114  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2       10  avgt   20     23.946 ±   1.247  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2      100  avgt   20    187.109 ±   2.299  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2     1000  avgt   20   1816.470 ±  22.179  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashMixed2    10000  avgt   20  18222.006 ± 206.935  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered       0  avgt   20      4.785 ±   0.067  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered       1  avgt   20      7.219 ±   0.096  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered       2  avgt   20     10.767 ±   0.333  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered       3  avgt   20     12.057 ±   0.276  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered      10  avgt   20     24.467 ±   0.696  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered     100  avgt   20    189.314 ±   3.318  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered    1000  avgt   20   1808.171 ±  28.840  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedArrayHashOrdered   10000  avgt   20  18243.981 ± 316.244  ns/op

And here's the new range-optimized indexedSeqHash vs the old orderedHash:

[info] Benchmark                                                     (size)  Mode  Cnt       Score       Error  Units
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered                  0  avgt   20      24.588 ±     0.251  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered                  1  avgt   20      40.172 ±     0.596  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered                  2  avgt   20      56.306 ±     0.834  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered                  3  avgt   20      72.232 ±     0.668  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered                 10  avgt   20     181.092 ±     2.014  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered                100  avgt   20     393.088 ±     5.167  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered               1000  avgt   20    9933.940 ±    86.177  ns/op
[info] MurmurHash3Benchmark.oldOrderedHashIndexedOrdered              10000  avgt   20  167985.951 ± 35574.693  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1               0  avgt   20       5.644 ±     0.092  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1               1  avgt   20       8.673 ±     0.117  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1               2  avgt   20      14.739 ±     2.036  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1               3  avgt   20      15.704 ±     1.692  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1              10  avgt   20      27.555 ±     0.290  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1             100  avgt   20     203.227 ±     2.759  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1            1000  avgt   20    4513.190 ±    60.270  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed1           10000  avgt   20   37923.481 ±   474.004  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2               0  avgt   20       5.632 ±     0.086  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2               1  avgt   20       8.994 ±     0.384  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2               2  avgt   20      13.153 ±     0.202  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2               3  avgt   20      14.593 ±     0.177  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2              10  avgt   20      28.561 ±     0.253  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2             100  avgt   20     227.804 ±     2.814  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2            1000  avgt   20    5490.555 ±   109.540  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashMixed2           10000  avgt   20   44285.149 ±   904.101  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered              0  avgt   20       5.696 ±     0.180  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered              1  avgt   20       8.790 ±     0.094  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered              2  avgt   20      13.402 ±     0.361  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered              3  avgt   20      15.755 ±     0.404  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered             10  avgt   20      30.878 ±     1.783  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered            100  avgt   20     230.209 ±     4.743  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered           1000  avgt   20    5347.704 ±    76.944  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedIndexedHashOrdered          10000  avgt   20   50813.890 ±   605.234  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1        0  avgt   20      35.492 ±     0.514  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1        1  avgt   20      30.531 ±     0.257  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1        2  avgt   20      36.939 ±     0.508  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1        3  avgt   20      36.057 ±     0.257  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1       10  avgt   20      56.351 ±     0.647  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1      100  avgt   20     336.290 ±     4.927  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1     1000  avgt   20    9936.769 ±   131.861  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed1    10000  avgt   20  100285.056 ±  1552.665  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2        0  avgt   20      35.753 ±     0.553  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2        1  avgt   20      31.010 ±     0.748  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2        2  avgt   20      36.827 ±     0.824  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2        3  avgt   20      36.452 ±     0.443  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2       10  avgt   20      57.640 ±     0.810  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2      100  avgt   20     352.868 ±     8.634  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2     1000  avgt   20   10204.176 ±   178.471  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedMixed2    10000  avgt   20  103142.856 ±  1223.103  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered       0  avgt   20      35.917 ±     0.485  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered       1  avgt   20      30.488 ±     0.376  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered       2  avgt   20      36.842 ±     0.482  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered       3  avgt   20      39.613 ±     0.442  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered      10  avgt   20      62.243 ±     0.814  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered     100  avgt   20     371.507 ±     4.134  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered    1000  avgt   20   10432.483 ±   246.626  ns/op
[info] MurmurHash3Benchmark.rangeOptimizedOrderedHashIndexedOrdered   10000  avgt   20  105692.600 ±  1361.435  ns/op
Collection hashCode performance improvements
- Use constant-time hashes for Range. Ranges of less than 2 elements
  still use the standard Seq hash. Other Ranges compute the hash
  directly from start, step and end without iterating through all
  elements.

- In order to keep Range hashes consistent with other Seq hashes we
  recognize Seqs where the elements’ hashes change sequentially with a
  fixed step and produce a Range hash instead of a regular Seq hash
  for these. The performance impact is mixed and relatively small.
  On Java 8 HotSpot x64 there are some slowdowns for smallish Seqs
  (but not < 2 elements because these are treated specially) and
  speedups for larger Seqs (which is counter-intuitive because the
  new algorithm is strictly more complex than the old one; it is still
  faster because HotSpot performed loop unrolling for the old algorithm
  which had a negative performance impact). The Range optimization is
  applied to arrayHash, orderedHash and listHash.

- Add (range-optimized) IndexedSeq hashing. Unlike what earlier comments
  in our MurmurHash3 implementation claim, there now is a significant
  performance boost from a specialized implementation.

@scala-jenkins scala-jenkins added this to the 2.13.0-RC1 milestone Sep 12, 2018

@pnf

This comment has been minimized.

Show comment
Hide comment
@pnf

pnf Sep 17, 2018

Contributor

An exhaustive and snarky analysis of the performance anomaly: http://blog.podsnap.com/rollo.html

Contributor

pnf commented Sep 17, 2018

An exhaustive and snarky analysis of the performance anomaly: http://blog.podsnap.com/rollo.html

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Sep 21, 2018

Contributor

Nobody excited enough about O(n) to O(1) to review this?

Contributor

szeiger commented Sep 21, 2018

Nobody excited enough about O(n) to O(1) to review this?

@plokhotnyuk

This comment has been minimized.

Show comment
Hide comment
@plokhotnyuk

plokhotnyuk Sep 21, 2018

@szeiger, great work!

also I'm dreaming about adding of some seed argument to all hashCode functions that can save us from DoS attacks that exploit hash map collision handling: https://www.slideshare.net/stamparm/hash-dos-attack

plokhotnyuk commented Sep 21, 2018

@szeiger, great work!

also I'm dreaming about adding of some seed argument to all hashCode functions that can save us from DoS attacks that exploit hash map collision handling: https://www.slideshare.net/stamparm/hash-dos-attack

@pnf

This comment has been minimized.

Show comment
Hide comment
@pnf

pnf Sep 21, 2018

Contributor

I do not feel I can be an impartial reviewer, but I was wondering why arrayHash has more hand-optimization than orderedHash.

Contributor

pnf commented Sep 21, 2018

I do not feel I can be an impartial reviewer, but I was wondering why arrayHash has more hand-optimization than orderedHash.

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Sep 21, 2018

Contributor

@pnf I didn't expect significant improvements from optimizing it. Now that I resurrected indexedSeqHash you would only use it for LinearSeqs where the micro-optimizations of arrayHash would be dwarfed by the overhead of iterating through the collection with an Iterator. In other words: I was too lazy to implement and benchmark it :-)

Contributor

szeiger commented Sep 21, 2018

@pnf I didn't expect significant improvements from optimizing it. Now that I resurrected indexedSeqHash you would only use it for LinearSeqs where the micro-optimizations of arrayHash would be dwarfed by the overhead of iterating through the collection with an Iterator. In other words: I was too lazy to implement and benchmark it :-)

var prev = a(1).##
val rangeDiff = prev - initial
var i = 2
while (i < l) {

This comment has been minimized.

@Ichoran

Ichoran Sep 21, 2018

Contributor

This might be faster if the loop condition included a test for rangeDiff != hash - prev and the still-have-i < l case is detected afterwards. I'm not sure whether you tested it both ways, but usually small loops optimize better.

@Ichoran

Ichoran Sep 21, 2018

Contributor

This might be faster if the loop condition included a test for rangeDiff != hash - prev and the still-have-i < l case is detected afterwards. I'm not sure whether you tested it both ways, but usually small loops optimize better.

var rangeState = 0 // 0 = no data, 1 = first elem read, 2 = has valid diff, 3 = invalid
var rangeDiff = 0
var prev = 0
var initial = 0
xs.iterator foreach { x =>

This comment has been minimized.

@Ichoran

Ichoran Sep 21, 2018

Contributor

Why not use the iterator bare here (i.e. with hasNext and next)? The logic should be identical to the while loop at that point.

@Ichoran

Ichoran Sep 21, 2018

Contributor

Why not use the iterator bare here (i.e. with hasNext and next)? The logic should be identical to the while loop at that point.

h = mix(h, x.##)
val hash = x.##
h = mix(h, hash)
rangeState match {

This comment has been minimized.

@Ichoran

Ichoran Sep 21, 2018

Contributor

Is this really faster than manually unrolling the special cases?

@Ichoran

Ichoran Sep 21, 2018

Contributor

Is this really faster than manually unrolling the special cases?

This comment has been minimized.

@Ichoran

Ichoran Sep 21, 2018

Contributor

The first two branches of the case are really obvious targets for unrolling, so I'd think if the JVM can't do it itself, doing it manually would help reduce the workload (the branch should be pretty predictable, but still).

@Ichoran

Ichoran Sep 21, 2018

Contributor

The first two branches of the case are really obvious targets for unrolling, so I'd think if the JVM can't do it itself, doing it manually would help reduce the workload (the branch should be pretty predictable, but still).

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Sep 21, 2018

Contributor

This is a really neat idea. I'm surprised the overhead is so minimal! I like it!

Contributor

Ichoran commented Sep 21, 2018

This is a really neat idea. I'm surprised the overhead is so minimal! I like it!

@lrytz

Nice trick :)

n += 1
}
finalizeHash(h, n)
if(rangeState == 2) rangeHash(initial, rangeDiff, prev, seed)

This comment has been minimized.

@lrytz

lrytz Oct 3, 2018

Member

We could restrict this with if (seed == seqSeed), but I guess either way is fine?

@lrytz

lrytz Oct 3, 2018

Member

We could restrict this with if (seed == seqSeed), but I guess either way is fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment