Fusible Set.fromDistinctAscList definition #949

meooow25 · 2023-05-28T18:05:06Z

I was curious if Set.fromDistinctAscList could be written to fuse with the input list, so I gave it a shot. Currently it looks like:

containers/containers/src/Data/Set/Internal.hs

Lines 1175 to 1190 in 67752b2

    
           fromDistinctAscList :: [a] -> Set a 
        
           fromDistinctAscList [] = Tip 
        
           fromDistinctAscList (x0 : xs0) = go (1::Int) (Bin 1 x0 Tip Tip) xs0 
        
             where 
        
               go !_ t [] = t 
        
               go s l (x : xs) = case create s xs of 
        
                                   (r :*: ys) -> let !t' = link x l r 
        
                                                 in go (s `shiftL` 1) t' ys 
        
               create !_ [] = (Tip :*: []) 
        
               create s xs@(x : xs') 
        
                 | s == 1 = (Bin 1 x Tip Tip :*: xs') 
        
                 | otherwise = case create (s `shiftR` 1) xs of 
        
                                 res@(_ :*: []) -> res 
        
                                 (l :*: (y:ys)) -> case create (s `shiftR` 1) ys of 
        
                                   (r :*: zs) -> (link y l r :*: zs)

And here's what I got:

data SetPart a
    = PartL !Int !(Set a)
    | PartLM !Int !(Set a) !a

fromDistinctAscList :: [a] -> Set a
fromDistinctAscList = mergeParts . List.foldl' f []
  where
    f (PartL h l : parts) !x = PartLM h l x : parts
    f parts0              x0 = mergeInto 0 (Bin 1 x0 Tip Tip) parts0
      where
        mergeInto h !r (PartLM h' l x : parts)
            | h+1 == h' = mergeInto h' (link x l r) parts
        mergeInto h l parts = PartL (h+1) l : parts
    mergeParts = List.foldl' f' Tip where
        f' r (PartL _ l)    = merge l r
        f' r (PartLM _ l x) = link x l r
{-# INLINE fromDistinctAscList #-}

The idea is that we keep a stack of partially constructed sets as we go along the list, and merge them whenever we get the chance.
We can do a similar thing for Map too.

Now how does it compare to the original definition? Let's benchmark.

-- in benchmarks/Set.hs
, bench "fromDistinctAscList" $ whnf S.fromDistinctAscList elems  -- elems = [1..2^12]
, bench "fromDistinctAscList2" $ whnf (\n -> S.fromDistinctAscList [1..n]) (2^12 :: Int)  -- To test with fusion

With GHC 9.2.5:

Current:

  fromDistinctAscList:  OK (0.15s)
    37.9 μs ± 3.1 μs, 159 KB allocated, 3.1 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.12s)
    58.7 μs ± 5.8 μs, 448 KB allocated,  12 KB copied, 7.0 MB peak memory

New:

  fromDistinctAscList:  OK (0.16s)
    39.8 μs ± 3.1 μs, 263 KB allocated, 5.2 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.15s)
    34.6 μs ± 2.9 μs, 327 KB allocated, 8.4 KB copied, 7.0 MB peak memory

It's a lot better in the second case because it doesn't construct the list. In the first case, the time doesn't change but it does allocate more, so it's not a clear win.

So, what do you think about this definition? Is it worth changing?
I would guess fromDistinctAscList [a..b] is a common usage and would benefit from this change.

As an aside, I want to try the same thing with fromList, but this seemed simpler to try first.

The text was updated successfully, but these errors were encountered:

meooow25 · 2023-05-28T18:18:19Z

I just realized it is safe to use bin instead of link in mergeInto, which decreases the time further.

  fromDistinctAscList:  OK (0.22s)
    26.5 μs ± 2.3 μs, 264 KB allocated, 5.3 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.22s)
    26.4 μs ± 1.6 μs, 328 KB allocated, 8.7 KB copied, 7.0 MB peak memory

But I suspect this can be adapted to the current definition too, so I would not consider this in favor of the new definition.

treeowl · 2023-05-28T21:23:38Z

Your description reminds me of the way we used to do it for IntMap. IIRC, we switched from that because the simpler way performed at least as well (in that context) and it was easier for my poor brain to understand. That said, you're making a good case here. However, I can't understand your implementation. Could you use scoped type variables to give the local functions types, and add comments to explain what they do? Also, have you considered phased rewrite rules to turn the direct way into the stack-based way and then turn it back if things don't fuse?

meooow25 · 2023-05-29T05:21:36Z

The IntSet/IntMap change does look similar (#658)! I'll check that out.

How my implementation above works is very similar to counting up in binary:

[]      + 1 -> [1]
[1]     + 1 -> [1,1] -> [2]
[2]     + 1 -> [1,2]
[1,2]   + 1 -> [1,1,2] -> [2,2] -> [4]
[4]     + 1 -> [1,4]
[1,4]   + 1 -> [1,1,4] -> [2,4]
[2,4]   + 1 -> [1,2,4]
[1,2,4] + 1 -> [1,1,2,4] -> [2,2,4] -> [4,4] -> [8]
...

But a little different because perfect binary trees have size $2^n-1$ instead of $2^n$.

Also note that my implementation constructs exactly the same tree as the current algorithm, linking all the same trees, it's just done in a different, and I would say slightly simpler, way. Here's a cleaned up version:

data SetPart a
    = PartL  !(Set a)     -- (PartL l) invariant: l is perfect
    | PartLM !(Set a) !a  -- (PartLM l x) invariant: l is a perfect and maximum l < x

fromDistinctAscList :: forall a. [a] -> Set a
fromDistinctAscList = List.foldl' mergePart Tip . List.foldl' next []
  where
    next :: [SetPart a] -> a -> [SetPart a]
    next (PartL l : parts) !x = PartLM l x : parts
    next parts0 x0 = mergeInto (Bin 1 x0 Tip Tip) parts0
      where
        mergeInto !r (PartLM l x : parts)
            | sz r == sz l = mergeInto (bin x l r) parts
        mergeInto l parts = PartL l : parts

    mergePart :: Set a -> SetPart a -> Set a
    mergePart r (PartL l)    = merge l r
    mergePart r (PartLM l x) = link x l r

    sz :: Set a -> Int
    sz (Bin s _ _ _) = s
    sz Tip = error "impossible"

  fromDistinctAscList:  OK (0.21s)
    25.7 μs ± 1.6 μs, 240 KB allocated, 4.8 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.21s)
    25.7 μs ± 2.1 μs, 304 KB allocated, 8.1 KB copied, 7.0 MB peak memory

There is a property that PartL will only occur at the head of the stack for every other element. So the code could be alternately written as

data SetPart a = PartLM !(Set a) !a
data SetBuildState a
    = StateEven [SetPart a]
    | StateOdd [SetPart a] !(Set a)

fromDistinctAscList :: [a] -> Set a
fromDistinctAscList = mergeParts . List.foldl' next (StateEven [])
  where
    next (StateOdd parts l) !x = StateEven (PartLM l x : parts)
    next (StateEven parts0) x0 = mergeInto (Bin 1 x0 Tip Tip) parts0
      where
        mergeInto !r (PartLM l x : parts)
            | sz r == sz l = mergeInto (bin x l r) parts
        mergeInto l parts = StateOdd parts l
    
    mergeParts (StateOdd parts r) = List.foldl' mergePart r parts
    mergeParts (StateEven parts)  = List.foldl' mergePart Tip parts

    mergePart r (PartLM l x) = link x l r

    sz (Bin s _ _ _) = s
    sz Tip = error "impossible"

But this performs slightly worse.

  fromDistinctAscList:  OK (0.11s)
    28.6 μs ± 2.7 μs, 256 KB allocated, 4.9 KB copied, 7.0 MB peak memory
  fromDistinctAscList2: OK (0.12s)
    28.4 μs ± 2.8 μs, 320 KB allocated, 8.3 KB copied, 7.0 MB peak memory

About rewrite rules, I'll test it out but I'm not sure it's worth the complexity. It would also be atypical because I've only seen rewrite back rules for good consumer + producers, but here we're only working with a good consumer.

treeowl · 2023-05-29T06:43:27Z

Don't worry about "atypical". Such rules are justified whenever another implementation is faster, thriftier, or smaller when fusion doesn't occur.

meooow25 · 2023-05-29T22:18:33Z

I was able to get the rewrite rules working as

"Set.fromDistinctAscList" [~1]
    forall xs. fromDistinctAscList xs = List.foldr foo bar xs baz
"Set.fromDistinctAscList back" [1]
    forall xs. List.foldr foo bar xs baz = fromDistinctAscList xs

I suspect this can be adapted to the current definition too, so I would not consider this in favor of the new definition.

But I was wrong about this and I can't find a way to make the current definition as fast as the new one.

So I say let's go with only the new definition.

treeowl · 2023-05-30T04:05:47Z

Other things being roughly equal, a substantial increase in allocation is bad, particularly for concurrent programs. So I would prefer to get the version with rewrite rules.

meooow25 · 2023-05-30T19:43:17Z

I wouldn't say it's roughly equal.

Current:

  fromDistinctAscList: OK (0.15s)
    37.9 μs ± 3.3 μs, 159 KB allocated, 3.1 KB copied, 7.0 MB peak memory

Current, that I tried to optimize to use bin but it barely had any effect:

  fromDistinctAscList: OK (0.15s)
    35.1 μs ± 3.5 μs, 159 KB allocated, 3.1 KB copied, 7.0 MB peak memory

The new implementation (in my comment above):

  fromDistinctAscList:  OK (0.21s)
    25.7 μs ± 1.6 μs, 240 KB allocated, 4.8 KB copied, 7.0 MB peak memory

Another version of the new implementation where I tried to adopt the Stack from the old IntSet to reduce allocations:

  fromDistinctAscList: OK (0.21s)
    26.6 μs ± 2.0 μs, 224 KB allocated, 4.4 KB copied, 7.0 MB peak memory

The new version takes at least 25% less time compared to current.
We might still deliberately choose to trade off time for space, but the time difference is not negligible.

treeowl · 2023-05-30T19:56:34Z

Oh, it looks like I misread the timings. Impressive! Could you open a PR?

meooow25 · 2023-06-03T01:27:13Z

I see that fromAscList, fromAscListWith, fromAscListWithKey all delegate to fromDistinctAscList after some combining function, which is quite convenient.
Once we have this change, it seems like a good idea to make them participate in fusion too, which can be done by just making the combining function a good consumer and producer.

meooow25 · 2023-06-24T18:27:06Z

Thanks for the merge 🎉
As mentioned fromAscList and friends can also be improved, but it might take a couple of weeks before I can start working on it.

meooow25 mentioned this issue May 31, 2023

Improve {Set,Map}.fromDistinct{Asc,Desc}List #950

Merged

This was referenced Jun 16, 2023

Data.Set.fromDistinctAscListN #894

Open

Fusible IntSet.fromDistinctAscList definition #951

Open

treeowl closed this as completed in #950 Jun 24, 2023

meooow25 mentioned this issue Jul 16, 2023

Improve {Set,Map}.fromAscList and friends #963

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fusible Set.fromDistinctAscList definition #949

Fusible Set.fromDistinctAscList definition #949

meooow25 commented May 28, 2023

meooow25 commented May 28, 2023 •

edited

treeowl commented May 28, 2023

meooow25 commented May 29, 2023

treeowl commented May 29, 2023

meooow25 commented May 29, 2023

treeowl commented May 30, 2023

meooow25 commented May 30, 2023

treeowl commented May 30, 2023

meooow25 commented Jun 3, 2023

meooow25 commented Jun 24, 2023

Fusible Set.fromDistinctAscList definition #949

Fusible Set.fromDistinctAscList definition #949

Comments

meooow25 commented May 28, 2023

meooow25 commented May 28, 2023 • edited

treeowl commented May 28, 2023

meooow25 commented May 29, 2023

treeowl commented May 29, 2023

meooow25 commented May 29, 2023

treeowl commented May 30, 2023

meooow25 commented May 30, 2023

treeowl commented May 30, 2023

meooow25 commented Jun 3, 2023

meooow25 commented Jun 24, 2023

meooow25 commented May 28, 2023 •

edited