Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define partitionKeys: fused version of restrictKeys and withoutKeys #975

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sergv
Copy link

@sergv sergv commented Oct 15, 2023

As mentioned in #158, sometimes we'd like to get results from both restrictKeys and withoutKeys for the same map and set. It can be done more efficiently by fusing traversals.

I named new function partitionKeys instead of partitionSet because the originals it's fusing end in *Keys so I believe this is more consistent.

Benchmarks show that new version is around 20-40% faster, depending on inputs. Here's a run with locally modified containers benchmarking suite that measures with even and odd keys floating around (for some reason odd keys show more speedup hence that's what I committed):

$ cabal run map-benchmarks -- -p '/restrictKeys+withoutKeys/ || /partitionKeys/'
All
  even
    restrictKeys+withoutKeys: OK
      75.5 μs ± 5.3 μs
    partitionKeys:            OK
      62.4 μs ± 5.6 μs, 0.83x
  odd
    restrictKeys+withoutKeys: OK
      194  μs ± 6.1 μs
    partitionKeys:            OK
      118  μs ±  12 μs, 0.61x

In the process of checking generated core I noticed that splitMember gets called with explicit Ord dictionary, so I changed it a bit so that it would specialize. I've only checked core on 9.6.2 though.

@sergv sergv changed the title Define partitionKEys: fused version of restrictKeys and withoutKeys Define partitionKeys: fused version of restrictKeys and withoutKeys Oct 15, 2023
@Bodigrim
Copy link
Contributor

Bodigrim commented Nov 1, 2023

@treeowl any chance to look at this please?

@treeowl
Copy link
Contributor

treeowl commented Nov 2, 2023

Yeah, I'll take a look. Sorry for the delay.

Copy link
Contributor

@konsumlamm konsumlamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is added, it should also be added for IntMaps.

That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys and withoutKeys). There are a lot of operations that could be fused together to be more efficient, but I don't think that alone warrants adding special functions for them. Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k `member` s) m, which is equally clear IMO, albeit maybe a bit slower.

Comment on lines 1949 to 1950
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'.
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Partition the map according to a set.
-- The first map contains the input 'Map' restricted to those keys found in the 'Set',
-- the second map contains the input 'Map' without all keys in the 'Set'.
-- This is more efficient than using ' restrictKeys' and 'withoutKeys' together.

-- m \`partitionKeys\` s = (m ``restrictKeys`` s, m ``withoutKeys`` s)
-- @
--
-- @since 0.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.7 has already been released, so this needs to be updated.

@sergv
Copy link
Author

sergv commented Jan 1, 2024

I've updated the docs.

If this is added, it should also be added for IntMaps.

I agree that it would be nice to add partitionKeys for IntMap too. However thanks to different map structure the implementation's probably going to be more convoluted that for regular map. Given that I didn't find a usecase for this function on IntMaps I didn't implement it. I'd prefer to get partitionKeys for regular maps first - maybe someone else will be motivated to add one for int maps, who knows.

If it's a hard requirement to have the same functions in Data.Map and Data.IntMap I guess I can add a simple partitionKeys f xs = (withoutKeys f xs, restrictKeys f xs). More efficient version could come later, from me in case I'll need this function on IntMaps myself. Please let me know your perferences whether to do it.

That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys and withoutKeys). There are a lot of operations that could be fused together to be more efficient, but I don't think that alone warrants adding special functions for them. Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k member s) m, which is equally clear IMO, albeit maybe a bit slower.

I see the attached benchmark as a clue that it's worthwhile to extend API with partitionKeys. It does seem that fusion could be of benefit so library seems like a natural place to have it in.

One of the synthetic benchmarks shows 40% speedup - looks like pretty good speedup. Another (arguably equivalent) benchmarks shows 20% speedup - these numbers motivated me to do the PR since I want to get those speedups in my programs.

API growth is unfortunate, but what's the cost of using slower version? It affects all the users and the runtime cost is paid every time their programs run.

Regarding many other operations that can be fused together I don't think it's realistic to foresee them all and add them beforehand. There're many of those and it's not clear whether anyone actually needs it. I'd advocate for reactive approach like this PR - when someone finds a usecase for fusing some operations and is motivated enough to implement it then it could be considered for inclusion.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jan 1, 2024

That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys and withoutKeys).

It depends on your application, I use restrictKeys and withoutKeys a lot, they are crisp and idiomatic. In my line of work partitionKeys would be helpful as well.

There is a general API pattern take / drop / spanAt, takeWhile / dropWhile / span, filter / partition, and I think partitionKeys fits it nicely.

@sergv
Copy link
Author

sergv commented Jan 2, 2024

Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k member s) m, which is equally clear IMO, albeit maybe a bit slower.

I did benchmark partitionWithKey. My benchmark is at sergv@416888c. The results are:

$ TERM=dumb cabal run map-benchmarks --builddir /tmp/dist -- -p '/All.even/ || /All.odd/'
Created semaphore called cabal_semaphore_6 with 32 slots.
All
  even
    restrictKeys+withoutKeys: OK
      102  μs ± 6.0 μs
    partitionKeys:            OK
      83.3 μs ± 5.7 μs, 0.82x
    partitionWithKey:         OK
      284  μs ±  25 μs, 2.80x
  odd
    restrictKeys+withoutKeys: OK
      224  μs ±  21 μs
    partitionKeys:            OK
      140  μs ±  11 μs, 0.63x
    partitionWithKey:         OK
      298  μs ±  25 μs, 1.33x

All 6 tests passed (1.11s)

So far it doesn't look like partitionKeys is competitive with restrictKeys + withoutKeys pair - it is slower. In my particular use case I want faster alternative if it can be achieved with reasonable effort. I could have made a mistake in the benchmark though - please correct me if that's the case.

@Bodigrim
Copy link
Contributor

@treeowl just a gentle reminder to review.

@Bodigrim
Copy link
Contributor

So far it doesn't look like partitionKeys is competitive with restrictKeys + withoutKeys pair - it is slower.

I suppose you meant partitionWithKeys instead of partitionKeys?..


@treeowl I know that you are exceedingly busy, so I feel bad for being annoying, but I could benefit from a faster partitionKeys indeed. Could we possibly ask someone else to review (@meooow25?) to speed things up?

@sergv
Copy link
Author

sergv commented Mar 31, 2024

So far it doesn't look like partitionKeys is competitive with restrictKeys + withoutKeys pair - it is slower.

Yes, I meant partitionWithKeys (defined as \ks -> M.partitionWithKey (\k _ -> S.member k ks) m).

containers/src/Data/Set/Internal.hs Outdated Show resolved Hide resolved
containers/src/Utils/Containers/Internal/StrictTriple.hs Outdated Show resolved Hide resolved
!(lmWith :*: lmWithout) = go lm ls'
!(rmWith :*: rmWithout) = go rm rs'

!(!ls', b, !rs') = Set.splitMember k s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously, restrictKeys splits the Set but withoutKeys splits the Map. Could you check which is faster in practice?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your suggestion with splitting map, was not able to measure significant difference with the current version that splits sets. They're more or less on par with each other, I can shore benchmarks if you're curious.

I like version that splits sets a little bit more because if we want to split maps then I need ta adjust splitMember :: Ord k => k -> Map k a -> (Map k a, Bool, Map k a) function. I'll need to lift worker out of it and having Bool in there is not enough - a Maybe a is needed instead to define partitionKeys. But then regular splitMember will need to convert Maybe to Bool which it currently doesn't do. Since everything is INLINABLE then Maybe may very well get allocated each time which is strictly worse that what we currently have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can share benchmarks if you're curious.

Sure, please do!

You could also use the SetOperations setup to compare different pairs of inputs, though it will require some tweaks to work with two different types (map and set).

...then Maybe may very well get allocated each time which is strictly worse that what we currently have.

I don't think this necessarily indicates it would be worse. We're making O(log n) allocations with the left and right maps anyway.

Btw, splitMember already exists, separately implemented from splitLookup

splitMember :: Ord k => k -> Map k a -> (Map k a,Bool,Map k a)

splitLookup :: Ord k => k -> Map k a -> (Map k a,Maybe a,Map k a)

Anyway, if benchmarks have shown no difference then the current implementation is fine.

Copy link
Author

@sergv sergv Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so here's my original benchmarks https://github.com/sergv/benchmark-containers. They generate N random integers, turn them into strings and prefix (K < N) of the list of random integers will serve as the set to split on. N and K and in benchmark names. My original use case has string keys that's why I'm using them here.

Arguably this benchmark is not fully representative because it doesn't check what happens when set we're splitting on contains entries not present in map keys.

Here's the output, without trivial benchmarks taken from original containers.

All
  Map 1
    Set 0
      restrictKeys+withoutKeys:  OK
        6.86 ns ± 460 ps
      partitionKeys - split set: OK
        5.04 ns ± 376 ps, 0.73x
      partitionKeys - split map: OK
        5.05 ns ± 340 ps, 0.74x
  Map 10
    Set 1
      restrictKeys+withoutKeys:  OK
        78.8 ns ± 5.2 ns
      partitionKeys - split set: OK
        61.9 ns ± 2.7 ns
      partitionKeys - split map: OK
        68.1 ns ± 6.3 ns
    Set 9
      restrictKeys+withoutKeys:  OK
        244  ns ±  21 ns
      partitionKeys - split set: OK
        157  ns ±  13 ns, 0.64x
      partitionKeys - split map: OK
        156  ns ±  10 ns, 0.64x
  Map 100
    Set 1
      restrictKeys+withoutKeys:  OK
        98.4 ns ± 9.1 ns
      partitionKeys - split set: OK
        79.8 ns ± 5.3 ns
      partitionKeys - split map: OK
        75.8 ns ± 5.7 ns
    Set 10
      restrictKeys+withoutKeys:  OK
        1.02 μs ±  73 ns
      partitionKeys - split set: OK
        683  ns ±  60 ns
      partitionKeys - split map: OK
        677  ns ±  53 ns
    Set 99
      restrictKeys+withoutKeys:  OK
        2.21 μs ± 136 ns
      partitionKeys - split set: OK
        1.44 μs ± 124 ns, 0.65x
      partitionKeys - split map: OK
        1.14 μs ±  87 ns, 0.52x
  Map 999
    Set 1
      restrictKeys+withoutKeys:  OK
        152  ns ±  12 ns
      partitionKeys - split set: OK
        116  ns ± 6.0 ns
      partitionKeys - split map: OK
        109  ns ± 6.4 ns
    Set 10
      restrictKeys+withoutKeys:  OK
        1.66 μs ±  85 ns
      partitionKeys - split set: OK
        1.23 μs ±  86 ns
      partitionKeys - split map: OK
        1.06 μs ±  95 ns
    Set 100
      restrictKeys+withoutKeys:  OK
        9.31 μs ± 701 ns
      partitionKeys - split set: OK
        6.96 μs ± 677 ns
      partitionKeys - split map: OK
        6.36 μs ± 356 ns
    Set 998
      restrictKeys+withoutKeys:  OK
        23.8 μs ± 1.3 μs
      partitionKeys - split set: OK
        14.9 μs ± 1.3 μs, 0.63x
      partitionKeys - split map: OK
        13.5 μs ± 879 ns, 0.57x
  Map 9,988
    Set 1
      restrictKeys+withoutKeys:  OK
        167  ns ±  11 ns
      partitionKeys - split set: OK
        119  ns ±  12 ns
      partitionKeys - split map: OK
        135  ns ±  11 ns
    Set 10
      restrictKeys+withoutKeys:  OK
        1.21 μs ±  49 ns
      partitionKeys - split set: OK
        1.04 μs ±  89 ns
      partitionKeys - split map: OK
        936  ns ±  61 ns
    Set 100
      restrictKeys+withoutKeys:  OK
        14.4 μs ± 865 ns
      partitionKeys - split set: OK
        10.7 μs ± 807 ns
      partitionKeys - split map: OK
        10.2 μs ± 698 ns
    Set 1,000
      restrictKeys+withoutKeys:  OK
        131  μs ±  13 μs
      partitionKeys - split set: OK
        98.4 μs ± 6.0 μs
      partitionKeys - split map: OK
        94.3 μs ± 7.6 μs
    Set 9,987
      restrictKeys+withoutKeys:  OK
        352  μs ±  29 μs
      partitionKeys - split set: OK
        227  μs ±  18 μs, 0.64x
      partitionKeys - split map: OK
        166  μs ± 6.1 μs, 0.47x
  Map 99,237
    Set 1
      restrictKeys+withoutKeys:  OK
        252  ns ±  22 ns
      partitionKeys - split set: OK
        189  ns ±  16 ns
      partitionKeys - split map: OK
        173  ns ±  12 ns
    Set 10
      restrictKeys+withoutKeys:  OK
        2.07 μs ± 131 ns
      partitionKeys - split set: OK
        1.52 μs ± 100 ns
      partitionKeys - split map: OK
        1.62 μs ±  87 ns
    Set 100
      restrictKeys+withoutKeys:  OK
        19.8 μs ± 1.4 μs
      partitionKeys - split set: OK
        15.1 μs ± 1.4 μs
      partitionKeys - split map: OK
        14.6 μs ± 871 ns
    Set 999
      restrictKeys+withoutKeys:  OK
        251  μs ±  24 μs
      partitionKeys - split set: OK
        186  μs ±  12 μs, 0.74x
      partitionKeys - split map: OK
        179  μs ± 5.5 μs, 0.71x
    Set 9,980
      restrictKeys+withoutKeys:  OK
        2.79 ms ± 133 μs
      partitionKeys - split set: OK
        1.85 ms ±  73 μs, 0.66x
      partitionKeys - split map: OK
        1.99 ms ±  70 μs, 0.71x
    Set 99,236
      restrictKeys+withoutKeys:  OK
        4.34 ms ± 421 μs
      partitionKeys - split set: OK
        2.80 ms ± 152 μs, 0.64x
      partitionKeys - split map: OK
        2.76 ms ±  69 μs, 0.64x
  Map 989,525
    Set 1
      restrictKeys+withoutKeys:  OK
        216  ns ±  10 ns
      partitionKeys - split set: OK
        164  ns ± 6.9 ns
      partitionKeys - split map: OK
        163  ns ± 9.3 ns
    Set 10
      restrictKeys+withoutKeys:  OK
        2.31 μs ± 209 ns
      partitionKeys - split set: OK
        1.92 μs ± 173 ns
      partitionKeys - split map: OK
        1.79 μs ±  89 ns
    Set 100
      restrictKeys+withoutKeys:  OK
        22.8 μs ± 1.5 μs
      partitionKeys - split set: OK
        18.6 μs ± 1.8 μs
      partitionKeys - split map: OK
        17.7 μs ± 1.4 μs
    Set 1,000
      restrictKeys+withoutKeys:  OK
        357  μs ±  28 μs
      partitionKeys - split set: OK
        274  μs ±  14 μs
      partitionKeys - split map: OK
        273  μs ±  20 μs
    Set 9,985
      restrictKeys+withoutKeys:  OK
        9.28 ms ± 417 μs
      partitionKeys - split set: OK
        6.00 ms ± 208 μs, 0.65x
      partitionKeys - split map: OK
        7.02 ms ± 212 μs, 0.76x
    Set 99,278
      restrictKeys+withoutKeys:  OK
        84.3 ms ± 6.2 ms
      partitionKeys - split set: OK
        62.1 ms ± 4.6 ms, 0.74x
      partitionKeys - split map: OK
        60.3 ms ± 4.5 ms, 0.71x
    Set 989,524
      restrictKeys+withoutKeys:  OK
        136  ms ±  12 ms
      partitionKeys - split set: OK
        89.1 ms ± 3.6 ms, 0.65x
      partitionKeys - split map: OK
        84.8 ms ± 8.3 ms, 0.62x

I have also made SetOperations-like benchmark within containers, it's at https://github.com/sergv/containers/tree/benchmark-partitionKeys (previous benchmark should be run against this branch as well because this is where partitionKeysSplitMap is defined).

The results are mixed but more distinct. Overall it looks like map splitting version may be better but then I look closely at the error bounds they overlap and the speedup is not so obvious any more.

All
  partitionKeys-block_nn:               OK
    317  μs ±  29 μs
  partitionKeys-block_nn_swap:          OK
    339  μs ±  16 μs
  partitionKeys-block_ns:               OK
    37.1 μs ± 3.3 μs
  partitionKeys-block_sn_swap:          OK
    39.5 μs ± 3.0 μs
  partitionKeys-common_nn:              OK
    4.25 ms ± 196 μs
  partitionKeys-common_nn_swap:         OK
    550  μs ±  50 μs
  partitionKeys-common_ns:              OK
    1.73 ms ± 167 μs
  partitionKeys-common_nt:              OK
    83.3 μs ± 6.1 μs
  partitionKeys-common_sn_swap:         OK
    721  μs ±  70 μs
  partitionKeys-common_tn_swap:         OK
    54.8 μs ± 2.8 μs
  partitionKeys-disj_nn:                OK
    2.47 μs ± 100 ns
  partitionKeys-disj_nn_swap:           OK
    2.89 μs ± 235 ns
  partitionKeys-disj_ns:                OK
    1.99 μs ± 171 ns
  partitionKeys-disj_nt:                OK
    1.14 μs ±  64 ns
  partitionKeys-disj_sn_swap:           OK
    2.06 μs ±  92 ns
  partitionKeys-disj_tn_swap:           OK
    1.34 μs ±  93 ns
  partitionKeys-mix_nn:                 OK
    4.00 ms ± 400 μs
  partitionKeys-mix_nn_swap:            OK
    4.32 ms ± 360 μs
  partitionKeys-mix_ns:                 OK
    967  μs ±  50 μs
  partitionKeys-mix_nt:                 OK
    62.5 μs ± 4.3 μs
  partitionKeys-mix_sn_swap:            OK
    994  μs ±  56 μs
  partitionKeys-mix_tn_swap:            OK
    67.9 μs ± 6.2 μs
  partitionKeysSplitMap-block_nn:       OK
    283  μs ±  16 μs, 0.89x
  partitionKeysSplitMap-block_nn_swap:  OK
    298  μs ±  22 μs, 0.88x
  partitionKeysSplitMap-block_ns:       OK
    30.0 μs ± 1.3 μs, 0.81x
  partitionKeysSplitMap-block_sn_swap:  OK
    28.3 μs ± 1.3 μs, 0.72x
  partitionKeysSplitMap-common_nn:      OK
    4.56 ms ± 352 μs, 1.07x
  partitionKeysSplitMap-common_nn_swap: OK
    428  μs ±  16 μs, 0.78x
  partitionKeysSplitMap-common_ns:      OK
    2.75 ms ±  54 μs, 1.59x
  partitionKeysSplitMap-common_nt:      OK
    85.4 μs ± 5.9 μs, 1.03x
  partitionKeysSplitMap-common_sn_swap: OK
    941  μs ±  58 μs, 1.31x
  partitionKeysSplitMap-common_tn_swap: OK
    55.1 μs ± 3.0 μs, 1.00x
  partitionKeysSplitMap-disj_nn:        OK
    2.03 μs ± 196 ns, 0.82x
  partitionKeysSplitMap-disj_nn_swap:   OK
    1.83 μs ± 166 ns, 0.63x
  partitionKeysSplitMap-disj_ns:        OK
    1.51 μs ± 103 ns, 0.76x
  partitionKeysSplitMap-disj_nt:        OK
    946  ns ±  52 ns, 0.83x
  partitionKeysSplitMap-disj_sn_swap:   OK
    1.51 μs ±  94 ns, 0.73x
  partitionKeysSplitMap-disj_tn_swap:   OK
    896  ns ±  43 ns, 0.67x
  partitionKeysSplitMap-mix_nn:         OK
    4.40 ms ± 199 μs, 1.10x
  partitionKeysSplitMap-mix_nn_swap:    OK
    3.50 ms ± 155 μs, 0.81x
  partitionKeysSplitMap-mix_ns:         OK
    832  μs ±  57 μs, 0.86x
  partitionKeysSplitMap-mix_nt:         OK
    67.4 μs ± 5.5 μs, 1.08x
  partitionKeysSplitMap-mix_sn_swap:    OK
    873  μs ±  82 μs, 0.88x
  partitionKeysSplitMap-mix_tn_swap:    OK
    57.1 μs ± 4.0 μs, 0.84x

All 44 tests passed (48.06s)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! So I think we can conclude that one is not definitively better than the other.

containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
!(lmWith :*: lmWithout) = go lm ls'
!(rmWith :*: rmWithout) = go rm rs'

!(!ls', b, !rs') = Set.splitMember k s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can share benchmarks if you're curious.

Sure, please do!

You could also use the SetOperations setup to compare different pairs of inputs, though it will require some tweaks to work with two different types (map and set).

...then Maybe may very well get allocated each time which is strictly worse that what we currently have.

I don't think this necessarily indicates it would be worse. We're making O(log n) allocations with the left and right maps anyway.

Btw, splitMember already exists, separately implemented from splitLookup

splitMember :: Ord k => k -> Map k a -> (Map k a,Bool,Map k a)

splitLookup :: Ord k => k -> Map k a -> (Map k a,Maybe a,Map k a)

Anyway, if benchmarks have shown no difference then the current implementation is fine.

containers/src/Data/Map/Internal.hs Outdated Show resolved Hide resolved
containers-tests/benchmarks/Map.hs Outdated Show resolved Hide resolved
containers-tests/benchmarks/Map.hs Outdated Show resolved Hide resolved
containers-tests/benchmarks/Map.hs Outdated Show resolved Hide resolved
!(lmWith :*: lmWithout) = go lm ls'
!(rmWith :*: rmWithout) = go rm rs'

!(!ls', b, !rs') = Set.splitMember k s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! So I think we can conclude that one is not definitively better than the other.

Copy link
Contributor

@meooow25 meooow25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, now it's up to @treeowl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants