Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fusible Set.fromDistinctAscList definition #949

Closed
meooow25 opened this issue May 28, 2023 · 10 comments · Fixed by #950
Closed

Fusible Set.fromDistinctAscList definition #949

meooow25 opened this issue May 28, 2023 · 10 comments · Fixed by #950

Comments

@meooow25
Copy link
Contributor

I was curious if Set.fromDistinctAscList could be written to fuse with the input list, so I gave it a shot. Currently it looks like:

fromDistinctAscList :: [a] -> Set a
fromDistinctAscList [] = Tip
fromDistinctAscList (x0 : xs0) = go (1::Int) (Bin 1 x0 Tip Tip) xs0
where
go !_ t [] = t
go s l (x : xs) = case create s xs of
(r :*: ys) -> let !t' = link x l r
in go (s `shiftL` 1) t' ys
create !_ [] = (Tip :*: [])
create s xs@(x : xs')
| s == 1 = (Bin 1 x Tip Tip :*: xs')
| otherwise = case create (s `shiftR` 1) xs of
res@(_ :*: []) -> res
(l :*: (y:ys)) -> case create (s `shiftR` 1) ys of
(r :*: zs) -> (link y l r :*: zs)

And here's what I got:

data SetPart a
    = PartL !Int !(Set a)
    | PartLM !Int !(Set a) !a

fromDistinctAscList :: [a] -> Set a
fromDistinctAscList = mergeParts . List.foldl' f []
  where
    f (PartL h l : parts) !x = PartLM h l x : parts
    f parts0              x0 = mergeInto 0 (Bin 1 x0 Tip Tip) parts0
      where
        mergeInto h !r (PartLM h' l x : parts)
            | h+1 == h' = mergeInto h' (link x l r) parts
        mergeInto h l parts = PartL (h+1) l : parts
    mergeParts = List.foldl' f' Tip where
        f' r (PartL _ l)    = merge l r
        f' r (PartLM _ l x) = link x l r
{-# INLINE fromDistinctAscList #-}

The idea is that we keep a stack of partially constructed sets as we go along the list, and merge them whenever we get the chance.
We can do a similar thing for Map too.


Now how does it compare to the original definition? Let's benchmark.

-- in benchmarks/Set.hs
, bench "fromDistinctAscList" $ whnf S.fromDistinctAscList elems  -- elems = [1..2^12]
, bench "fromDistinctAscList2" $ whnf (\n -> S.fromDistinctAscList [1..n]) (2^12 :: Int)  -- To test with fusion

With GHC 9.2.5:

Current:

  fromDistinctAscList:  OK (0.15s)
    37.9 μs ± 3.1 μs, 159 KB allocated, 3.1 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.12s)
    58.7 μs ± 5.8 μs, 448 KB allocated,  12 KB copied, 7.0 MB peak memory

New:

  fromDistinctAscList:  OK (0.16s)
    39.8 μs ± 3.1 μs, 263 KB allocated, 5.2 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.15s)
    34.6 μs ± 2.9 μs, 327 KB allocated, 8.4 KB copied, 7.0 MB peak memory

It's a lot better in the second case because it doesn't construct the list. In the first case, the time doesn't change but it does allocate more, so it's not a clear win.


So, what do you think about this definition? Is it worth changing?
I would guess fromDistinctAscList [a..b] is a common usage and would benefit from this change.

As an aside, I want to try the same thing with fromList, but this seemed simpler to try first.

@meooow25
Copy link
Contributor Author

meooow25 commented May 28, 2023

I just realized it is safe to use bin instead of link in mergeInto, which decreases the time further.

  fromDistinctAscList:  OK (0.22s)
    26.5 μs ± 2.3 μs, 264 KB allocated, 5.3 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.22s)
    26.4 μs ± 1.6 μs, 328 KB allocated, 8.7 KB copied, 7.0 MB peak memory

But I suspect this can be adapted to the current definition too, so I would not consider this in favor of the new definition.

@treeowl
Copy link
Contributor

treeowl commented May 28, 2023

Your description reminds me of the way we used to do it for IntMap. IIRC, we switched from that because the simpler way performed at least as well (in that context) and it was easier for my poor brain to understand. That said, you're making a good case here. However, I can't understand your implementation. Could you use scoped type variables to give the local functions types, and add comments to explain what they do? Also, have you considered phased rewrite rules to turn the direct way into the stack-based way and then turn it back if things don't fuse?

@meooow25
Copy link
Contributor Author

The IntSet/IntMap change does look similar (#658)! I'll check that out.


How my implementation above works is very similar to counting up in binary:

[]      + 1 -> [1]
[1]     + 1 -> [1,1] -> [2]
[2]     + 1 -> [1,2]
[1,2]   + 1 -> [1,1,2] -> [2,2] -> [4]
[4]     + 1 -> [1,4]
[1,4]   + 1 -> [1,1,4] -> [2,4]
[2,4]   + 1 -> [1,2,4]
[1,2,4] + 1 -> [1,1,2,4] -> [2,2,4] -> [4,4] -> [8]
...

But a little different because perfect binary trees have size $2^n-1$ instead of $2^n$.

Also note that my implementation constructs exactly the same tree as the current algorithm, linking all the same trees, it's just done in a different, and I would say slightly simpler, way. Here's a cleaned up version:

data SetPart a
    = PartL  !(Set a)     -- (PartL l) invariant: l is perfect
    | PartLM !(Set a) !a  -- (PartLM l x) invariant: l is a perfect and maximum l < x

fromDistinctAscList :: forall a. [a] -> Set a
fromDistinctAscList = List.foldl' mergePart Tip . List.foldl' next []
  where
    next :: [SetPart a] -> a -> [SetPart a]
    next (PartL l : parts) !x = PartLM l x : parts
    next parts0 x0 = mergeInto (Bin 1 x0 Tip Tip) parts0
      where
        mergeInto !r (PartLM l x : parts)
            | sz r == sz l = mergeInto (bin x l r) parts
        mergeInto l parts = PartL l : parts

    mergePart :: Set a -> SetPart a -> Set a
    mergePart r (PartL l)    = merge l r
    mergePart r (PartLM l x) = link x l r

    sz :: Set a -> Int
    sz (Bin s _ _ _) = s
    sz Tip = error "impossible"
  fromDistinctAscList:  OK (0.21s)
    25.7 μs ± 1.6 μs, 240 KB allocated, 4.8 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.21s)
    25.7 μs ± 2.1 μs, 304 KB allocated, 8.1 KB copied, 7.0 MB peak memory

There is a property that PartL will only occur at the head of the stack for every other element. So the code could be alternately written as

data SetPart a = PartLM !(Set a) !a
data SetBuildState a
    = StateEven [SetPart a]
    | StateOdd [SetPart a] !(Set a)

fromDistinctAscList :: [a] -> Set a
fromDistinctAscList = mergeParts . List.foldl' next (StateEven [])
  where
    next (StateOdd parts l) !x = StateEven (PartLM l x : parts)
    next (StateEven parts0) x0 = mergeInto (Bin 1 x0 Tip Tip) parts0
      where
        mergeInto !r (PartLM l x : parts)
            | sz r == sz l = mergeInto (bin x l r) parts
        mergeInto l parts = StateOdd parts l
    
    mergeParts (StateOdd parts r) = List.foldl' mergePart r parts
    mergeParts (StateEven parts)  = List.foldl' mergePart Tip parts

    mergePart r (PartLM l x) = link x l r

    sz (Bin s _ _ _) = s
    sz Tip = error "impossible"

But this performs slightly worse.

  fromDistinctAscList:  OK (0.11s)
    28.6 μs ± 2.7 μs, 256 KB allocated, 4.9 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.12s)
    28.4 μs ± 2.8 μs, 320 KB allocated, 8.3 KB copied, 7.0 MB peak memory

About rewrite rules, I'll test it out but I'm not sure it's worth the complexity. It would also be atypical because I've only seen rewrite back rules for good consumer + producers, but here we're only working with a good consumer.

@treeowl
Copy link
Contributor

treeowl commented May 29, 2023

Don't worry about "atypical". Such rules are justified whenever another implementation is faster, thriftier, or smaller when fusion doesn't occur.

@meooow25
Copy link
Contributor Author

I was able to get the rewrite rules working as

"Set.fromDistinctAscList" [~1]
    forall xs. fromDistinctAscList xs = List.foldr foo bar xs baz
"Set.fromDistinctAscList back" [1]
    forall xs. List.foldr foo bar xs baz = fromDistinctAscList xs

I suspect this can be adapted to the current definition too, so I would not consider this in favor of the new definition.

But I was wrong about this and I can't find a way to make the current definition as fast as the new one.

So I say let's go with only the new definition.

@treeowl
Copy link
Contributor

treeowl commented May 30, 2023

Other things being roughly equal, a substantial increase in allocation is bad, particularly for concurrent programs. So I would prefer to get the version with rewrite rules.

@meooow25
Copy link
Contributor Author

I wouldn't say it's roughly equal.

Current:

  fromDistinctAscList: OK (0.15s)
    37.9 μs ± 3.3 μs, 159 KB allocated, 3.1 KB copied, 7.0 MB peak memory

Current, that I tried to optimize to use bin but it barely had any effect:

  fromDistinctAscList: OK (0.15s)
    35.1 μs ± 3.5 μs, 159 KB allocated, 3.1 KB copied, 7.0 MB peak memory

The new implementation (in my comment above):

  fromDistinctAscList:  OK (0.21s)
    25.7 μs ± 1.6 μs, 240 KB allocated, 4.8 KB copied, 7.0 MB peak memory

Another version of the new implementation where I tried to adopt the Stack from the old IntSet to reduce allocations:

  fromDistinctAscList: OK (0.21s)
    26.6 μs ± 2.0 μs, 224 KB allocated, 4.4 KB copied, 7.0 MB peak memory

The new version takes at least 25% less time compared to current.
We might still deliberately choose to trade off time for space, but the time difference is not negligible.

@treeowl
Copy link
Contributor

treeowl commented May 30, 2023

Oh, it looks like I misread the timings. Impressive! Could you open a PR?

@meooow25
Copy link
Contributor Author

meooow25 commented Jun 3, 2023

I see that fromAscList, fromAscListWith, fromAscListWithKey all delegate to fromDistinctAscList after some combining function, which is quite convenient.
Once we have this change, it seems like a good idea to make them participate in fusion too, which can be done by just making the combining function a good consumer and producer.

@meooow25
Copy link
Contributor Author

Thanks for the merge 🎉
As mentioned fromAscList and friends can also be improved, but it might take a couple of weeks before I can start working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants