unionArrayBy: Find next 1-bits with countTrailingZeros #395

sjakobi · 2022-03-24T01:33:45Z

Closes #374.

sjakobi · 2022-03-24T01:47:20Z

The benchmark results are pretty surprising: The mean runtime of the HashMap.union benchmark has literally doubled from ~60μs to ~120μs!

I think this is due to the weird benchmark data of contiguous Ints. Of course it doesn't hurt to increment the index by 1 if the node is full! :/

So, I'll need to improve the benchmark data, maybe simply try with ByteStrings.

Also, check the generated code.

Data/HashMap/Internal.hs

sjakobi · 2022-03-24T08:28:30Z

The benchmark results are pretty surprising: The mean runtime of the HashMap.union benchmark has literally doubled from ~60μs to ~120μs!

These numbers are from my desktop with an i7-4790K CPU.

I'm now using my old laptop with an i3-2350M CPU. For the union.Int benchmark I'm seeing a slowdown from 128 μs to 277 μs (116%), for the new union.ByteString benchmark I'm also seeing a slowdown from 167 μs to 240 μs (43%).

That's pretty terrible. I'll have to look at the generated code. At least I know that it matters!

…and bring back array indices.

sjakobi · 2022-03-25T22:26:23Z

The latest version (eedc326) finally brings a speed up for the union.ByteString benchmark: 167μs -> 143 μs (-14%).

The union.Int benchmark is still slower at 128μs -> 140μs (+9%), but IMHO less realistic.

(Both measurements are from the ancient laptop.)

I should try another version that computes the indices instead of having them as arguments in the loop.

It might also be interesting to combine the various bitmap checks into a single case expression. Maybe GHC can generate something like a lookup table from this?!

sjakobi · 2022-03-26T09:04:57Z

I should try another version that computes the indices instead of having them as arguments in the loop.

Much worse, even if I keep i as a loop variable.

sjakobi · 2022-03-27T22:16:18Z

The latest version (eedc326) finally brings a speed up for the union.ByteString benchmark: 167μs -> 143 μs (-14%).

The union.Int benchmark is still slower at 128μs -> 140μs (+9%), but IMHO less realistic.

(Both measurements are from the ancient laptop.)

On my relatively beefier machine with the i7-4790K I'm getting the following numbers:

union.Int: 60μs -> 66μs (+10%)
union.ByteString: 78μs -> 66μs (-15%)

sjakobi · 2022-03-29T16:27:15Z

It might also be interesting to combine the various bitmap checks into a single case expression. Maybe GHC can generate something like a lookup table from this?!

I tried this:

    let go !i !i1 !i2 !b
            | b == 0 = return ()
            | otherwise = do
                let m = 1 `unsafeShiftL` countTrailingZeros b
                let testBit :: Word -> Word
                    testBit x = fromIntegral (fromEnum (x .&. m /= 0))
                let b'' = b .&. complement m
                let t1 = testBit b1
                let t2 = testBit b2 `unsafeShiftL` 1
                case t1 .|. t2 of
                    1 -> do
                        A.write mary i =<< A.indexM ary1 i1
                        go (i+1) (i1+1)  i2    b''
                    2 -> do
                        A.write mary i =<< A.indexM ary2 i2
                        go (i+1)  i1    (i2+1) b''
                    _ -> do
                        x1 <- A.indexM ary1 i1
                        x2 <- A.indexM ary2 i2
                        A.write mary i $! f x1 x2
                        go (i+1) (i1+1) (i2+1) b''
    go 0 0 0 b'

Unfortunately it's slower (measured on the old laptop):

    union
      Int:        OK (1.78s)
        162  μs ± 1.6 μs
      ByteString: OK (10.76s)
        156  μs ± 1.5 μs

Data/HashMap/Internal.hs

sjakobi · 2022-03-30T00:19:43Z

I think I'll leave it at this for now. Ultimately it would be nice to recover the perf losses for cases like the union.Int benchmark, but I'll leave that for future work.

unionArrayBy: Find next 1-bits with countTrailingZeros

a780a8d

Closes #374.

sjakobi commented Mar 24, 2022

View reviewed changes

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

sjakobi commented Mar 24, 2022

View reviewed changes

Data/HashMap/Internal.hs Outdated Show resolved Hide resolved

Add union.ByteString benchmark

b00fbd7

Avoid range checks due to bit, clearBit, testBit

eedc326

…and bring back array indices.

sjakobi force-pushed the sjakobi/374-unionArrayBy-clz branch from 0f4f623 to eedc326 Compare March 25, 2022 22:13

sjakobi added 2 commits March 28, 2022 13:49

Remove outdated comment

605800e

Cleanup

1d386c0

More cleanup

f55e79a

treeowl reviewed Mar 29, 2022

View reviewed changes

Data/HashMap/Internal.hs Show resolved Hide resolved

sjakobi marked this pull request as ready for review March 29, 2022 17:44

sjakobi merged commit b6bde46 into master Mar 31, 2022

sjakobi deleted the sjakobi/374-unionArrayBy-clz branch March 31, 2022 19:50

sjakobi mentioned this pull request Apr 2, 2022

unionArrayBy could be improved for some corner cases #398

Open

sjakobi mentioned this pull request Apr 14, 2022

Optimization idea for submapBitmapIndexed #414

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unionArrayBy: Find next 1-bits with countTrailingZeros #395

unionArrayBy: Find next 1-bits with countTrailingZeros #395

sjakobi commented Mar 24, 2022

sjakobi commented Mar 24, 2022

sjakobi commented Mar 24, 2022

sjakobi commented Mar 25, 2022

sjakobi commented Mar 26, 2022

sjakobi commented Mar 27, 2022

sjakobi commented Mar 29, 2022 •

edited

Loading

sjakobi commented Mar 30, 2022

unionArrayBy: Find next 1-bits with countTrailingZeros #395

unionArrayBy: Find next 1-bits with countTrailingZeros #395

Conversation

sjakobi commented Mar 24, 2022

sjakobi commented Mar 24, 2022

sjakobi commented Mar 24, 2022

sjakobi commented Mar 25, 2022

sjakobi commented Mar 26, 2022

sjakobi commented Mar 27, 2022

sjakobi commented Mar 29, 2022 • edited Loading

sjakobi commented Mar 30, 2022

sjakobi commented Mar 29, 2022 •

edited

Loading