Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support symmetric diff on maps and sets #768

Open
dsyme opened this issue Jul 8, 2019 · 15 comments

Comments

@dsyme
Copy link
Collaborator

commented Jul 8, 2019

From fsprojects/Fabulous#258 (comment)

Watching Yaron Minsky's Strangeloop talk, I notice that F# immutable maps should likely support a "symmetric diff" operation with signature such as

val symmetricDiff: Map<'K,'V> -> Map<'K,'V> -> ('V -> 'V -> bool) -> seq<'K * Choice<'V, 'V, ('V * 'V)>>

The idea here is that the output sequence contains only elements where the two maps differ, returning the key and a choice between "left map only", "right map only" and "both but different values". Identical elements are not returned.

Sets should also support this with signature like:

val symmetricDiff: Set<'T> -> Set<'T> -> seq<'T * bool>

where the bool indicates "left set only" or "right set only" and again identical elements are not returned. In both cases pointer equality n the internal trees for the maps/sets is used to avoid re-traversing entire data structures.

The talk also mentions an incrementalizing diff operation that relies on a particular incrementalization library for self-adjusting computations.

Notes from the OCaml docs:

val symmetric_diff : ('k, 'v, 'cmp) t ->
       ('k, 'v, 'cmp) t ->
       data_equal:('v -> 'v -> bool) ->
       ('k * [ `Left of 'v | `Right of 'v | `Unequal of 'v * 'v ]) list

symmetric_diff t1 t2 ~data_equal returns a list of changes between t1 and t2. It is intended to be efficient in the case where t1 and t2 share a large amount of structure.

As an aside, I also notice Map.merge:

val merge : ('k, 'v1, 'cmp) t ->
       ('k, 'v2, 'cmp) t ->
       f:(key:'k ->
          [ `Both of 'v1 * 'v2 | `Left of 'v1 | `Right of 'v2 ] -> 'v3 option) ->
       ('k, 'v3, 'cmp) t

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • [ x This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this
@dsyme

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 8, 2019

There's a rough outline of where the code would have to go here: https://github.com/dotnet/fsharp/compare/master...dsyme:sd?expand=1

I started looking at the implementation but I notice that the implementation for set comparison looks like it allocates a lot. I'm wondering if someone would like to take a look at implementing a much more efficient set/map difference and comparison, which allocates a lot less.

@dsyme

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 8, 2019

I updated the prototype for set diff'ing to use an imperative 'while' loop - I haven't tested it yet, nor performance-tested it

@dsyme

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 8, 2019

I suspect for maximum performance (and lowest allocation) the diffing should probably use this signature:

        val symmetricDiffWith: set1:Set<'T> -> set2:Set<'T> -> consumer: ('T -> bool -> unit) -> unit

rather than returning a sequence which is then enumerated.

@thinkbeforecoding

This comment has been minimized.

Copy link

commented Jul 9, 2019

I was thinking of using

seq<'K * Choice<'V, ('V * 'V), 'V,>>

for symmetricDiff.symmetricdiff for a better symmetry.. This way,
we have Left, Unequal or Right ...

For sets, seq<'T, bool> and seq<Choice2<'T,'T>> are isomorphic, but using a boolean doesn't seem the best option... Which side represent true and false ? We will be googling for doc each time .😬

@thinkbeforecoding

This comment has been minimized.

Copy link

commented Jul 9, 2019

What about a symmetricDiffFold ?

val symmetricDiffFold: set1:Set<'T> -> set2:Set<'T> -> consumer: ('acc -> Choice2<'T,'T> -> 'acc ) -> seed: 'acc -> 'acc 
@Tarmil

This comment has been minimized.

Copy link

commented Jul 9, 2019

I was going to propose the same as @thinkbeforecoding regarding sets and seq<Choice2<'T, 'T>>. But if we do this, then maybe it's more consistent to keep the Map version as Choice<'V, 'V, ('V * 'V)> rather than make the type parameters symmetric? This way it would be Choice1OfX for the first source and Choice2OfX for the second source for both Map and Set.

@yatli

This comment has been minimized.

Copy link

commented Jul 9, 2019

consumer: ('T -> bool -> unit)

I suggest consumer: ('bool-> 'T -> unit), where two T->unit can be multiplexed.

edit:

and thus, if we're also doing fold, make consumer: ( 'a1 -> 'a2 -> bool -> 'T -> 'a1*'a2)

Which side represent true and false

Perhaps stick to partition?

@dsyme

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 9, 2019

val symmetricDiffFold: set1:Set<'T> -> set2:Set<'T> -> consumer: ('acc -> Choice2<'T,'T> -> 'acc ) -> seed: 'acc -> 'acc

I guess this is why returning seq<...> is sensible. I'd like to check performance difference between these cases.

@dsyme

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 9, 2019

BTW one annoying problem with Choice is that it is an allocation per element of the map....

@thinkbeforecoding

This comment has been minimized.

Copy link

commented Jul 9, 2019

Isn't it the case with tuples too ?

@thinkbeforecoding

This comment has been minimized.

Copy link

commented Jul 9, 2019

What about

[<Struct>]
type Side<'t> = Left of 't | Right of 't

?

@dsyme

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 9, 2019

Isn't it the case with tuples too ?

Yes, though we could make it a struct tuple. FSharp.Core doesn't (yet) contain StructChoice unfortunately

@Tarmil

This comment has been minimized.

Copy link

commented Jul 16, 2019

Yes, though we could make it a struct tuple. FSharp.Core doesn't (yet) contain StructChoice unfortunately

It would be odd to have one stdlib function use struct tuples in its signature while the rest of the stdlib uses ref tuples. Maybe it's time to add struct-based variants of all the relevant stdlib functions (eg zip, partition, etc) in dedicated modules?

@krauthaufen

This comment has been minimized.

Copy link

commented Jul 18, 2019

Hi everyone, quite some time ago we copied the map implementation from F# and extended it with some rather useful functions like choose2 : ('k -> Option<'a> -> Option<'b> -> Option<'c>) -> Map<'k,'a> -> Map<'k, 'b> -> Map<'k, 'c> which can easily be used to get the symmetric diff as above. These combinators are heavily tested in practice, but as far as i remember there a very limited unit-tests.

https://github.com/aardvark-platform/aardvark.base/blob/master/src/Aardvark.Base.FSharp/Datastructures/Immutable/MapExt.fs#L724

Feel free to take whatever you want 😀

Another thing is that the count of the map can easily be maintained (and is in our fork)

Cheers

@krauthaufen

This comment has been minimized.

Copy link

commented Jul 18, 2019

Forgot to mention: I'd be happy to migrate some of our changes back, and especially alter/choose2/unionWith would be nice to have.

Just let me know it you're willing to give it a try and I'll put together a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.