Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add general merge API #226

Open
treeowl opened this issue Feb 12, 2019 · 21 comments
Open

Add general merge API #226

treeowl opened this issue Feb 12, 2019 · 21 comments

Comments

@treeowl
Copy link
Collaborator

treeowl commented Feb 12, 2019

Data.Map and Data.IntMap offer merge and mergeA operations that can take unions, intersections, differences, etc. We should be able to do something quite similar here. I suspect optimal merge will have to be a bit different from mergeA rather than just a specialization, thanks to the way arrays work. Also, as I've explored in a slightly different context, mergeA can probably be implemented particularly efficiently for certain transformer stacks on top of IO or ST s, again thanks to the mutable array business. Dunno if that sort of thing is worth bothering with.

@sjakobi
Copy link
Member

sjakobi commented Jun 20, 2020

This addition would help us address the feature requests #240 and #242.

I guess implementing merge and testing it against the Data.Map version would be a good first step!?

@sjakobi
Copy link
Member

sjakobi commented Jun 20, 2020

This should be the right type, no?

merge :: (Hashable k, Eq k)
             => SimpleWhenMissing k a c -- ^ What to do with keys in @m1@ but not @m2@
             -> SimpleWhenMissing k b c -- ^ What to do with keys in @m2@ but not @m1@
             -> SimpleWhenMatched k a b c -- ^ What to do with keys in both @m1@ and @m2@
             -> HashMap k a
             -> HashMap k b
             -> HashMap k c

Could the merge strategies ([Simple]WhenMissing etc) simply be copied from http://hackage.haskell.org/package/containers-0.6.2.1/docs/Data-Map-Merge-Lazy.html?

@treeowl
Copy link
Collaborator Author

treeowl commented Jun 20, 2020

That's the general idea. You'll probably want to look at the Data.IntMap version too. The hard part is figuring out how mergeA should work. The rest is much easier.

@sjakobi
Copy link
Member

sjakobi commented Jun 20, 2020

@BebeSparkelSparkel would you possibly be interested in working on this?

@BebeSparkelSparkel
Copy link

BebeSparkelSparkel commented Jun 20, 2020

I am concerned about the the performance of foldr (merge f0 f1 f2) mempty (see below) in my use case #240 . It looks like it would create a lot of maps during the fold, but maybe I'm wrong about that.

@sjakobi
Copy link
Member

sjakobi commented Jun 20, 2020

@BebeSparkelSparkel Note that unions is also simply defined as foldl' union empty.

Do you have a more efficient implementation in mind?

@treeowl
Copy link
Collaborator Author

treeowl commented Jun 20, 2020

I doubt a much more generally efficient implementation of unions is possible. It'll either be foldl' union empty or foldr' union empty, depending on the Foldable.

@BebeSparkelSparkel
Copy link

BebeSparkelSparkel commented Jun 21, 2020

I was mistaken. foldr (merge f0 f1 f2) mempty is incorrect. What I actually need is more like fmap f0 . foldr (merge f1 f2 f3) mempty if using merge.

#240 details Foldable f => (NonEmpty a -> b) -> f (HashMap k a) -> HashMap k b has the merge and map while only creating the resulting HashMap.

It seems to me like #226 and #240 are not addressing the same problem.

@sjakobi
Copy link
Member

sjakobi commented Jun 21, 2020

I think I'll give merge a spin myself. It looks like a nice exercise for me to better understand the internals.

@svenkeidel
Copy link
Contributor

@sjakobi, any progress on this? I could help. In Sturdy we need a mergeA function to implement efficient widening operators.

@sjakobi
Copy link
Member

sjakobi commented Jul 17, 2020

@svenkeidel I have started working on merge in https://github.com/haskell-unordered-containers/unordered-containers/compare/sjakobi/226-merge, but realized that it's a bit much for a first project for someone who isn't really familiar with the internals.

I'd be very happy if you would take this over and implement mergeA and/or merge!

I'm curious about your usecase though! Does the order in which we sequence effects matter? I wasn't sure whether we could do anything better than keeping it completely arbitrary and unspecified.

@sjakobi sjakobi removed their assignment Jul 17, 2020
@svenkeidel
Copy link
Contributor

svenkeidel commented Jul 17, 2020

I'm curious about your usecase though! Does the order in which we sequence effects matter? I wasn't sure whether we could do anything better than keeping it completely arbitrary and unspecified.

I need to combine two maps, while simultaneously finding out if a key in one of the maps was missing or one of the values in the resulting map has changed (https://github.com/svenkeidel/sturdy/blob/master/lib/src/Data/Abstract/Map.hs#L57-L67). mergeA is the perfect fit for that. The order of effects does not matter.

More generally, I think the order of map entries should not matter since hashmaps are inherently unordered, hence the name of the library. If users rely on a predictable order of entries within the map, they have to use a different datastructure.

@treeowl
Copy link
Collaborator Author

treeowl commented Apr 12, 2022

@oberblastmeister, how would you like to tackle this challenge while merging mechanics are still fresh in your mind?

@oberblastmeister
Copy link
Contributor

@treeowl yes I would like to do this, but maybe after I do difference?

@treeowl
Copy link
Collaborator Author

treeowl commented Apr 12, 2022

If you think that'll help you prepare! Another option would be to use mergeA to implement difference.

@oberblastmeister
Copy link
Contributor

Yeah I think I'll use mergeA to implement difference. Though probably merge, because applicatives are bad on arrays right?

@sjakobi
Copy link
Member

sjakobi commented Apr 13, 2022

I'd feel more comfortable about adding functions like mergeA once we have tests for the internal invariants (#366).

@oberblastmeister
Copy link
Contributor

Are we planning to make merge public, or only use it to implement functions like intersection, differrence, etc. Looking at containers merge seems to be internal.

@sjakobi
Copy link
Member

sjakobi commented Apr 13, 2022

merge should be public, as it is in containers. See e.g. https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Map-Merge-Strict.html#v:merge.

@oberblastmeister
Copy link
Contributor

Woops, I must have clicked the wrong one

@sjakobi
Copy link
Member

sjakobi commented May 6, 2022

I'd feel more comfortable about adding functions like mergeA once we have tests for the internal invariants (#366).

This is resolved BTW: basic tooling for invariant checking was added in #444. More tests are being added in #455.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants