Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we improve fromDistinctAscList for IntSet and IntMap #654

Open
treeowl opened this issue Jul 4, 2019 · 12 comments
Open

Can we improve fromDistinctAscList for IntSet and IntMap #654

treeowl opened this issue Jul 4, 2019 · 12 comments

Comments

@treeowl
Copy link
Contributor

treeowl commented Jul 4, 2019

The control flow of fromDistinctAscList for IntSet and IntMap is rather ... opaque. It operates using an explicit stack that seems to represent a sort of zipper for the tree being constructed. That stack is, of course, allocated on the heap, which doesn't seem ideal. The fact that the code is just a bit more complicated than I can really understand is unfortunate. My guess is that it's probably possible to shift that explicit stack onto GHC's native reduction stack, which seems likely to make the code less unreadable and more efficient.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 4, 2019

@int-e I know you were involved in writing this code many years ago; I wonder if you might be interested in revisiting it.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 4, 2019

The reason this came up: #653 seems to make fromList for ascending and descending lists almost as fast as fromDistinctAscList for these types, which hints that the latter should be able to do better.

@int-e
Copy link
Contributor

int-e commented Jul 6, 2019

I wouldn't expect a huge speedup... most of the speedup of fromDistrinctAscList comes from avoiding the repeated traversal and recreation of paths all the way from the root, and with the improved fromList you get that same effect. That said, I'm seeing a small speedup over the existing fromDistinctAscList with the following code:

fromDistinctAscList1 :: [(Key,a)] -> IntMap a
fromDistinctAscList1 []              = Nil
fromDistinctAscList1 ((kx,vx) : zs0) = addAll kx (Tip kx vx) zs0
  where
    -- invariant for `addAll` and `addMany`: `kx` is a key in the map `tx`.
    addAll !kx !tx []
        = tx
    addAll !kx !tx ((ky,vy) : zs)
        | m <- branchMask kx ky
        , Inserted ty zs' <- addMany m ky (Tip ky vy) zs
        = addAll kx (link ky ty kx tx) zs'

    -- addMany adds all elements that have the same prefix as `kx` w.r.t.
    -- the branch mask `m` to `tx`.
    addMany !m !kx tx []
        = Inserted tx []
    addMany !m !kx tx zs0@((ky,vy) : zs)
        | mask kx m /= mask ky m
        = Inserted tx zs0
        | Inserted ty zs' <- addMany (branchMask kx ky) ky (Tip ky vy) zs
        = addMany m kx (link ky ty kx tx) zs'

data Inserted a = Inserted !(IntMap a) ![(Key,a)]

There is some wasted effort here, since most of the time, one can predict which way the link will go. But annoyingly, the Bin has to be flipped if both negative and positive keys occur.

For IntSet, I imagine there's a bit more potential for speedup from having a specialized inner loop that combines the bits for keys with the same prefix.

Finally, regarding fromAscList, it's currently a two-pass algorithm; for raw speed it may be worthwhile to bite the bullet and fuse them manually...

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

I would imagine that CPU branch prediction could handle the link thing. It would, of course, be nicer if we had a branchless way to order the subtrees, but I doubt we'll get one soon. I personally find your proposed code much easier to read, so as long as it's not slower, I want it. Since you say it's a little faster, so much the better!

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

You write

Invariant for addAll and addMany: kx is a key in the map tx.

That doesn't sound right, since you call addAll with mask kx m. What did you mean exactly?

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

Would it be worth writing a separate version for the special case where the first element is non-negative? That is a pretty common scenario.

@int-e
Copy link
Contributor

int-e commented Jul 6, 2019

Invariant for addAll and addMany: kx is a key in the map tx.

That doesn't sound right, since you call addAll with mask kx m. What did you mean exactly?

That was unintended. In fact, any value that agrees with the prefix of the tree tx is fine... initially I passed in the actual prefix, and the mask kx m was a leftover of that. But keys are more readily available.

@int-e
Copy link
Contributor

int-e commented Jul 6, 2019

Would it be worth writing a separate version for the special case where the first element is non-negative? That is a pretty common scenario.

Hmm, actually the link calls inside addMany are predictable, so this would work:

fromDistinctAscList1a :: [(Key,a)] -> IntMap a
fromDistinctAscList1a []              = Nil
fromDistinctAscList1a ((kx,vx) : zs0) = addAll kx (Tip kx vx) zs0
  where
    -- invariant for `addAll` and `addMany`: `kx` is a key in the map `tx`.
    addAll !kx tx []
        = tx
    addAll !kx tx ((ky,vy) : zs)
        | m <- branchMask kx ky
        , Inserted ty zs' <- addMany m ky (Tip ky vy) zs
        = addAll kx (link ky ty kx tx) zs'

    -- addMany adds all elements that have the same prefix as `kx` w.r.t.
    -- the branch mask `m` to `tx`.
    addMany !m !kx tx []
        = Inserted tx []
    addMany !m !kx tx zs0@((ky,vy) : zs)
        | mask kx m /= mask ky m
        = Inserted tx zs0
        | m' <- branchMask kx ky
        , Inserted ty zs' <- addMany m' ky (Tip ky vy) zs
        = addMany m kx (Bin (mask kx m') m' tx ty) zs'

However, I don't see a speedup compared to fromDistinctAscList1. But my benchmarking may be too naive:

main = do
    print $ foldl' (+) 0 $ Prelude.map size $ do
        sz <- [2^i | i <- [0..10]]
        shift <- [v * 2^i | i <- [0..20], v <- [-1,1]]
        mult <- [1..100]
        [fromDistinctAscList1 [(shift + mult * i, ()) | i <- [0..sz]]]

Btw, it's interesting to ponder when fromDistinctAscList1 works: it works for inputs such that, for each prefix, the keys corresponding to that prefix occur consecutively in the list, and the list has no duplicate keys. So in particular it works for both increasing and decreasing lists.

@int-e
Copy link
Contributor

int-e commented Jul 6, 2019

I'm working on this. Here are some detailed benchmarks obtained using gauge (section [n,1] uses keys [0..n] as input keys)... fromList1 is the code from #653, while fromList is the current implementation:

fromList :: [(Key,a)] -> IntMap a
fromList xs
= Foldable.foldl' ins empty xs
where
ins t (k,x) = insert k x t

[1,1]/fromList                           mean 19.90 ns  ( +- 103.2 ps  )
[1,1]/fromList1                          mean 24.60 ns  ( +- 27.25 ps  )
[1,1]/fromAscList                        mean 34.58 ns  ( +- 44.19 ps  )
[1,1]/fromDistinctAscList                mean 25.65 ns  ( +- 37.96 ps  )
[1,1]/fromDistinctAscList1               mean 20.15 ns  ( +- 32.51 ps  )
[1,1]/fromDistinctAscList1a              mean 19.90 ns  ( +- 24.34 ps  )

[10,1]/fromList                          mean 137.5 ns  ( +- 2.841 ns  )
[10,1]/fromList1                         mean 162.2 ns  ( +- 2.894 ns  )
[10,1]/fromAscList                       mean 294.3 ns  ( +- 5.420 ns  )
[10,1]/fromDistinctAscList               mean 186.4 ns  ( +- 3.141 ns  )
[10,1]/fromDistinctAscList1              mean 144.5 ns  ( +- 1.696 ns  )
[10,1]/fromDistinctAscList1a             mean 149.5 ns  ( +- 3.939 ns  )

[100,1]/fromList                         mean 2.064 μs  ( +- 68.65 ns  )
[100,1]/fromList1                        mean 1.740 μs  ( +- 43.99 ns  )
[100,1]/fromAscList                      mean 3.035 μs  ( +- 92.46 ns  )
[100,1]/fromDistinctAscList              mean 1.837 μs  ( +- 79.14 ns  )
[100,1]/fromDistinctAscList1             mean 1.773 μs  ( +- 16.59 ns  )
[100,1]/fromDistinctAscList1a            mean 1.605 μs  ( +- 55.38 ns  )

[1000,1]/fromList                        mean 36.36 μs  ( +- 744.9 ns  )
[1000,1]/fromList1                       mean 18.21 μs  ( +- 325.1 ns  )
[1000,1]/fromAscList                     mean 33.16 μs  ( +- 918.7 ns  )
[1000,1]/fromDistinctAscList             mean 21.37 μs  ( +- 346.2 ns  )
[1000,1]/fromDistinctAscList1            mean 18.13 μs  ( +- 482.7 ns  )
[1000,1]/fromDistinctAscList1a           mean 17.60 μs  ( +- 372.6 ns  )

[10000,1]/fromList                       mean 1.162 ms  ( +- 6.459 μs  )
[10000,1]/fromList1                      mean 256.8 μs  ( +- 4.090 μs  )
[10000,1]/fromAscList                    mean 1.013 ms  ( +- 14.06 μs  )
[10000,1]/fromDistinctAscList            mean 326.7 μs  ( +- 3.084 μs  )
[10000,1]/fromDistinctAscList1           mean 256.2 μs  ( +- 5.021 μs  )
[10000,1]/fromDistinctAscList1a          mean 248.9 μs  ( +- 3.901 μs  )

[100000,1]/fromList                      mean 13.93 ms  ( +- 465.4 μs  )
[100000,1]/fromList1                     mean 9.134 ms  ( +- 231.1 μs  )
[100000,1]/fromAscList                   mean 11.49 ms  ( +- 350.1 μs  )
[100000,1]/fromDistinctAscList           mean 10.02 ms  ( +- 338.8 μs  )
[100000,1]/fromDistinctAscList1          mean 9.199 ms  ( +- 278.7 μs  )
[100000,1]/fromDistinctAscList1a         mean 9.111 ms  ( +- 358.4 μs  )

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

Have you tried with non-contiguous keys at all?

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

It would also be good to get larger sizes in the mix.

@int-e
Copy link
Contributor

int-e commented Jul 6, 2019

As already mentioned in #658, I have more benchmark results here: https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6. These include sparse lists (the second test parameter is the distance between consecutive keys). Sparse lists make a huge difference for IntSet, but for IntMap, the effect is pretty small, which is why I didn't include them above.

I've tested maps with 1M entries, but gauge started spewing warnings about the tests taking long, and there was nothing interesting to see, so I dropped them. But in any case, you have the source code now, so feel free to play around yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants