-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define partitionKeys: fused version of restrictKeys and withoutKeys #975
base: master
Are you sure you want to change the base?
Conversation
@treeowl any chance to look at this please? |
Yeah, I'll take a look. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is added, it should also be added for IntMap
s.
That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys
and withoutKeys
). There are a lot of operations that could be fused together to be more efficient, but I don't think that alone warrants adding special functions for them. Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k `member` s) m
, which is equally clear IMO, albeit maybe a bit slower.
containers/src/Data/Map/Internal.hs
Outdated
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys | ||
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Restrict a 'Map' to only those keys | |
-- found in a 'Set' Remove all keys in a 'Set' from a 'Map'. | |
-- | \(O\bigl(m \log\bigl(\frac{n}{m}+1\bigr)\bigr), \; 0 < m \leq n\). Partition the map according to a set. | |
-- The first map contains the input 'Map' restricted to those keys found in the 'Set', | |
-- the second map contains the input 'Map' without all keys in the 'Set'. | |
-- This is more efficient than using ' restrictKeys' and 'withoutKeys' together. |
containers/src/Data/Map/Internal.hs
Outdated
-- m \`partitionKeys\` s = (m ``restrictKeys`` s, m ``withoutKeys`` s) | ||
-- @ | ||
-- | ||
-- @since 0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.7 has already been released, so this needs to be updated.
I've updated the docs.
I agree that it would be nice to add If it's a hard requirement to have the same functions in
I see the attached benchmark as a clue that it's worthwhile to extend API with One of the synthetic benchmarks shows 40% speedup - looks like pretty good speedup. Another (arguably equivalent) benchmarks shows 20% speedup - these numbers motivated me to do the PR since I want to get those speedups in my programs. API growth is unfortunate, but what's the cost of using slower version? It affects all the users and the runtime cost is paid every time their programs run. Regarding many other operations that can be fused together I don't think it's realistic to foresee them all and add them beforehand. There're many of those and it's not clear whether anyone actually needs it. I'd advocate for reactive approach like this PR - when someone finds a usecase for fusing some operations and is motivated enough to implement it then it could be considered for inclusion. |
It depends on your application, I use There is a general API pattern |
I did benchmark
So far it doesn't look like |
@treeowl just a gentle reminder to review. |
I suppose you meant @treeowl I know that you are exceedingly busy, so I feel bad for being annoying, but I could benefit from a faster |
Yes, I meant |
containers/src/Data/Map/Internal.hs
Outdated
!(lmWith :*: lmWithout) = go lm ls' | ||
!(rmWith :*: rmWithout) = go rm rs' | ||
|
||
!(!ls', b, !rs') = Set.splitMember k s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curiously, restrictKeys
splits the Set but withoutKeys
splits the Map. Could you check which is faster in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your suggestion with splitting map, was not able to measure significant difference with the current version that splits sets. They're more or less on par with each other, I can shore benchmarks if you're curious.
I like version that splits sets a little bit more because if we want to split maps then I need ta adjust splitMember :: Ord k => k -> Map k a -> (Map k a, Bool, Map k a)
function. I'll need to lift worker out of it and having Bool
in there is not enough - a Maybe a
is needed instead to define partitionKeys
. But then regular splitMember
will need to convert Maybe
to Bool
which it currently doesn't do. Since everything is INLINABLE
then Maybe may very well get allocated each time which is strictly worse that what we currently have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can share benchmarks if you're curious.
Sure, please do!
You could also use the SetOperations
setup to compare different pairs of inputs, though it will require some tweaks to work with two different types (map and set).
...then Maybe may very well get allocated each time which is strictly worse that what we currently have.
I don't think this necessarily indicates it would be worse. We're making O(log n) allocations with the left and right maps anyway.
Btw, splitMember
already exists, separately implemented from splitLookup
containers/containers/src/Data/Map/Internal.hs
Line 3922 in c651094
splitMember :: Ord k => k -> Map k a -> (Map k a,Bool,Map k a) |
containers/containers/src/Data/Map/Internal.hs
Line 3898 in c651094
splitLookup :: Ord k => k -> Map k a -> (Map k a,Maybe a,Map k a) |
Anyway, if benchmarks have shown no difference then the current implementation is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so here's my original benchmarks https://github.com/sergv/benchmark-containers. They generate N random integers, turn them into strings and prefix (K < N) of the list of random integers will serve as the set to split on. N and K and in benchmark names. My original use case has string keys that's why I'm using them here.
Arguably this benchmark is not fully representative because it doesn't check what happens when set we're splitting on contains entries not present in map keys.
Here's the output, without trivial benchmarks taken from original containers.
All
Map 1
Set 0
restrictKeys+withoutKeys: OK
6.86 ns ± 460 ps
partitionKeys - split set: OK
5.04 ns ± 376 ps, 0.73x
partitionKeys - split map: OK
5.05 ns ± 340 ps, 0.74x
Map 10
Set 1
restrictKeys+withoutKeys: OK
78.8 ns ± 5.2 ns
partitionKeys - split set: OK
61.9 ns ± 2.7 ns
partitionKeys - split map: OK
68.1 ns ± 6.3 ns
Set 9
restrictKeys+withoutKeys: OK
244 ns ± 21 ns
partitionKeys - split set: OK
157 ns ± 13 ns, 0.64x
partitionKeys - split map: OK
156 ns ± 10 ns, 0.64x
Map 100
Set 1
restrictKeys+withoutKeys: OK
98.4 ns ± 9.1 ns
partitionKeys - split set: OK
79.8 ns ± 5.3 ns
partitionKeys - split map: OK
75.8 ns ± 5.7 ns
Set 10
restrictKeys+withoutKeys: OK
1.02 μs ± 73 ns
partitionKeys - split set: OK
683 ns ± 60 ns
partitionKeys - split map: OK
677 ns ± 53 ns
Set 99
restrictKeys+withoutKeys: OK
2.21 μs ± 136 ns
partitionKeys - split set: OK
1.44 μs ± 124 ns, 0.65x
partitionKeys - split map: OK
1.14 μs ± 87 ns, 0.52x
Map 999
Set 1
restrictKeys+withoutKeys: OK
152 ns ± 12 ns
partitionKeys - split set: OK
116 ns ± 6.0 ns
partitionKeys - split map: OK
109 ns ± 6.4 ns
Set 10
restrictKeys+withoutKeys: OK
1.66 μs ± 85 ns
partitionKeys - split set: OK
1.23 μs ± 86 ns
partitionKeys - split map: OK
1.06 μs ± 95 ns
Set 100
restrictKeys+withoutKeys: OK
9.31 μs ± 701 ns
partitionKeys - split set: OK
6.96 μs ± 677 ns
partitionKeys - split map: OK
6.36 μs ± 356 ns
Set 998
restrictKeys+withoutKeys: OK
23.8 μs ± 1.3 μs
partitionKeys - split set: OK
14.9 μs ± 1.3 μs, 0.63x
partitionKeys - split map: OK
13.5 μs ± 879 ns, 0.57x
Map 9,988
Set 1
restrictKeys+withoutKeys: OK
167 ns ± 11 ns
partitionKeys - split set: OK
119 ns ± 12 ns
partitionKeys - split map: OK
135 ns ± 11 ns
Set 10
restrictKeys+withoutKeys: OK
1.21 μs ± 49 ns
partitionKeys - split set: OK
1.04 μs ± 89 ns
partitionKeys - split map: OK
936 ns ± 61 ns
Set 100
restrictKeys+withoutKeys: OK
14.4 μs ± 865 ns
partitionKeys - split set: OK
10.7 μs ± 807 ns
partitionKeys - split map: OK
10.2 μs ± 698 ns
Set 1,000
restrictKeys+withoutKeys: OK
131 μs ± 13 μs
partitionKeys - split set: OK
98.4 μs ± 6.0 μs
partitionKeys - split map: OK
94.3 μs ± 7.6 μs
Set 9,987
restrictKeys+withoutKeys: OK
352 μs ± 29 μs
partitionKeys - split set: OK
227 μs ± 18 μs, 0.64x
partitionKeys - split map: OK
166 μs ± 6.1 μs, 0.47x
Map 99,237
Set 1
restrictKeys+withoutKeys: OK
252 ns ± 22 ns
partitionKeys - split set: OK
189 ns ± 16 ns
partitionKeys - split map: OK
173 ns ± 12 ns
Set 10
restrictKeys+withoutKeys: OK
2.07 μs ± 131 ns
partitionKeys - split set: OK
1.52 μs ± 100 ns
partitionKeys - split map: OK
1.62 μs ± 87 ns
Set 100
restrictKeys+withoutKeys: OK
19.8 μs ± 1.4 μs
partitionKeys - split set: OK
15.1 μs ± 1.4 μs
partitionKeys - split map: OK
14.6 μs ± 871 ns
Set 999
restrictKeys+withoutKeys: OK
251 μs ± 24 μs
partitionKeys - split set: OK
186 μs ± 12 μs, 0.74x
partitionKeys - split map: OK
179 μs ± 5.5 μs, 0.71x
Set 9,980
restrictKeys+withoutKeys: OK
2.79 ms ± 133 μs
partitionKeys - split set: OK
1.85 ms ± 73 μs, 0.66x
partitionKeys - split map: OK
1.99 ms ± 70 μs, 0.71x
Set 99,236
restrictKeys+withoutKeys: OK
4.34 ms ± 421 μs
partitionKeys - split set: OK
2.80 ms ± 152 μs, 0.64x
partitionKeys - split map: OK
2.76 ms ± 69 μs, 0.64x
Map 989,525
Set 1
restrictKeys+withoutKeys: OK
216 ns ± 10 ns
partitionKeys - split set: OK
164 ns ± 6.9 ns
partitionKeys - split map: OK
163 ns ± 9.3 ns
Set 10
restrictKeys+withoutKeys: OK
2.31 μs ± 209 ns
partitionKeys - split set: OK
1.92 μs ± 173 ns
partitionKeys - split map: OK
1.79 μs ± 89 ns
Set 100
restrictKeys+withoutKeys: OK
22.8 μs ± 1.5 μs
partitionKeys - split set: OK
18.6 μs ± 1.8 μs
partitionKeys - split map: OK
17.7 μs ± 1.4 μs
Set 1,000
restrictKeys+withoutKeys: OK
357 μs ± 28 μs
partitionKeys - split set: OK
274 μs ± 14 μs
partitionKeys - split map: OK
273 μs ± 20 μs
Set 9,985
restrictKeys+withoutKeys: OK
9.28 ms ± 417 μs
partitionKeys - split set: OK
6.00 ms ± 208 μs, 0.65x
partitionKeys - split map: OK
7.02 ms ± 212 μs, 0.76x
Set 99,278
restrictKeys+withoutKeys: OK
84.3 ms ± 6.2 ms
partitionKeys - split set: OK
62.1 ms ± 4.6 ms, 0.74x
partitionKeys - split map: OK
60.3 ms ± 4.5 ms, 0.71x
Set 989,524
restrictKeys+withoutKeys: OK
136 ms ± 12 ms
partitionKeys - split set: OK
89.1 ms ± 3.6 ms, 0.65x
partitionKeys - split map: OK
84.8 ms ± 8.3 ms, 0.62x
I have also made SetOperations-like benchmark within containers, it's at https://github.com/sergv/containers/tree/benchmark-partitionKeys (previous benchmark should be run against this branch as well because this is where partitionKeysSplitMap
is defined).
The results are mixed but more distinct. Overall it looks like map splitting version may be better but then I look closely at the error bounds they overlap and the speedup is not so obvious any more.
All
partitionKeys-block_nn: OK
317 μs ± 29 μs
partitionKeys-block_nn_swap: OK
339 μs ± 16 μs
partitionKeys-block_ns: OK
37.1 μs ± 3.3 μs
partitionKeys-block_sn_swap: OK
39.5 μs ± 3.0 μs
partitionKeys-common_nn: OK
4.25 ms ± 196 μs
partitionKeys-common_nn_swap: OK
550 μs ± 50 μs
partitionKeys-common_ns: OK
1.73 ms ± 167 μs
partitionKeys-common_nt: OK
83.3 μs ± 6.1 μs
partitionKeys-common_sn_swap: OK
721 μs ± 70 μs
partitionKeys-common_tn_swap: OK
54.8 μs ± 2.8 μs
partitionKeys-disj_nn: OK
2.47 μs ± 100 ns
partitionKeys-disj_nn_swap: OK
2.89 μs ± 235 ns
partitionKeys-disj_ns: OK
1.99 μs ± 171 ns
partitionKeys-disj_nt: OK
1.14 μs ± 64 ns
partitionKeys-disj_sn_swap: OK
2.06 μs ± 92 ns
partitionKeys-disj_tn_swap: OK
1.34 μs ± 93 ns
partitionKeys-mix_nn: OK
4.00 ms ± 400 μs
partitionKeys-mix_nn_swap: OK
4.32 ms ± 360 μs
partitionKeys-mix_ns: OK
967 μs ± 50 μs
partitionKeys-mix_nt: OK
62.5 μs ± 4.3 μs
partitionKeys-mix_sn_swap: OK
994 μs ± 56 μs
partitionKeys-mix_tn_swap: OK
67.9 μs ± 6.2 μs
partitionKeysSplitMap-block_nn: OK
283 μs ± 16 μs, 0.89x
partitionKeysSplitMap-block_nn_swap: OK
298 μs ± 22 μs, 0.88x
partitionKeysSplitMap-block_ns: OK
30.0 μs ± 1.3 μs, 0.81x
partitionKeysSplitMap-block_sn_swap: OK
28.3 μs ± 1.3 μs, 0.72x
partitionKeysSplitMap-common_nn: OK
4.56 ms ± 352 μs, 1.07x
partitionKeysSplitMap-common_nn_swap: OK
428 μs ± 16 μs, 0.78x
partitionKeysSplitMap-common_ns: OK
2.75 ms ± 54 μs, 1.59x
partitionKeysSplitMap-common_nt: OK
85.4 μs ± 5.9 μs, 1.03x
partitionKeysSplitMap-common_sn_swap: OK
941 μs ± 58 μs, 1.31x
partitionKeysSplitMap-common_tn_swap: OK
55.1 μs ± 3.0 μs, 1.00x
partitionKeysSplitMap-disj_nn: OK
2.03 μs ± 196 ns, 0.82x
partitionKeysSplitMap-disj_nn_swap: OK
1.83 μs ± 166 ns, 0.63x
partitionKeysSplitMap-disj_ns: OK
1.51 μs ± 103 ns, 0.76x
partitionKeysSplitMap-disj_nt: OK
946 ns ± 52 ns, 0.83x
partitionKeysSplitMap-disj_sn_swap: OK
1.51 μs ± 94 ns, 0.73x
partitionKeysSplitMap-disj_tn_swap: OK
896 ns ± 43 ns, 0.67x
partitionKeysSplitMap-mix_nn: OK
4.40 ms ± 199 μs, 1.10x
partitionKeysSplitMap-mix_nn_swap: OK
3.50 ms ± 155 μs, 0.81x
partitionKeysSplitMap-mix_ns: OK
832 μs ± 57 μs, 0.86x
partitionKeysSplitMap-mix_nt: OK
67.4 μs ± 5.5 μs, 1.08x
partitionKeysSplitMap-mix_sn_swap: OK
873 μs ± 82 μs, 0.88x
partitionKeysSplitMap-mix_tn_swap: OK
57.1 μs ± 4.0 μs, 0.84x
All 44 tests passed (48.06s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! So I think we can conclude that one is not definitively better than the other.
containers/src/Data/Map/Internal.hs
Outdated
!(lmWith :*: lmWithout) = go lm ls' | ||
!(rmWith :*: rmWithout) = go rm rs' | ||
|
||
!(!ls', b, !rs') = Set.splitMember k s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can share benchmarks if you're curious.
Sure, please do!
You could also use the SetOperations
setup to compare different pairs of inputs, though it will require some tweaks to work with two different types (map and set).
...then Maybe may very well get allocated each time which is strictly worse that what we currently have.
I don't think this necessarily indicates it would be worse. We're making O(log n) allocations with the left and right maps anyway.
Btw, splitMember
already exists, separately implemented from splitLookup
containers/containers/src/Data/Map/Internal.hs
Line 3922 in c651094
splitMember :: Ord k => k -> Map k a -> (Map k a,Bool,Map k a) |
containers/containers/src/Data/Map/Internal.hs
Line 3898 in c651094
splitLookup :: Ord k => k -> Map k a -> (Map k a,Maybe a,Map k a) |
Anyway, if benchmarks have shown no difference then the current implementation is fine.
containers/src/Data/Map/Internal.hs
Outdated
!(lmWith :*: lmWithout) = go lm ls' | ||
!(rmWith :*: rmWithout) = go rm rs' | ||
|
||
!(!ls', b, !rs') = Set.splitMember k s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! So I think we can conclude that one is not definitively better than the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, now it's up to @treeowl
As mentioned in #158, sometimes we'd like to get results from both
restrictKeys
andwithoutKeys
for the same map and set. It can be done more efficiently by fusing traversals.I named new function
partitionKeys
instead ofpartitionSet
because the originals it's fusing end in*Keys
so I believe this is more consistent.Benchmarks show that new version is around 20-40% faster, depending on inputs. Here's a run with locally modified containers benchmarking suite that measures with even and odd keys floating around (for some reason odd keys show more speedup hence that's what I committed):
In the process of checking generated core I noticed that
splitMember
gets called with explicitOrd
dictionary, so I changed it a bit so that it would specialize. I've only checked core on9.6.2
though.