Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved behavior for symmetric difference #911

Merged
merged 3 commits into from Nov 7, 2016
Merged

Improved behavior for symmetric difference #911

merged 3 commits into from Nov 7, 2016

Conversation

ab5tract
Copy link
Collaborator

@ab5tract ab5tract commented Nov 2, 2016

Previously the symmetric difference operator would only return a Set. This is in contrast to some fudged tests and (likely) against expectations of DWIMiness by the user.

This patch changes the behavior such that chained symmetric difference invocations will evaluate one pair at a time. This changes the expectations found in the S03-operators/set.t tests, which appeared to be doing some form of conjoined evaluation.

Instead, the results look like:

$s (^) $s (^) $s == $s; # returns True

Step by step:

$s (^) $s; # returns set(), there were zero items appearing in only one of the two sets
set() (^) $s; # returns $s, because all items in $s are missing from the empty set

This also means that Mix and Bag symmetric difference operations can be sensibly chained:

my $m = ( :x(4.4), :y(5.5), :z(6.6) ).Mix;
my $b = ( :x(4), :y(5), :z(6) ).Bag;
$m (^) $b; # returns mix( :x(0.4), :y(0.5), :z(0.6) )
$m (^) $b (^) $b;  # returns mix( :x(3.6), :y(4.5), :z(5.4) )

my $s = ( :x, :y, :z ).Set;
$m (^) $b (^) $b (^) $s;  # returns mix( :x(2.6), :y(3.5), :z(4.4) )

Symmetric difference is not actually symmetric when chained (order matters). This patch takes a "higher orders of complexity have priority" approach. You could also argue that it is Set which should have precedence over Mix or Bag, such that:

$m (^) $b (^) $b (^) $s;  # returns set()

All the keys in $s are present in the symmetric difference of $m (^) $b (^) $b, so the result is an empty set.

In my opinion we lose a lot of the utility of (^) if we take this set-centric approach, but I felt it worth mentioning in case someone has strong opinions in the other direction.

@ab5tract
Copy link
Collaborator Author

ab5tract commented Nov 2, 2016

Tests are available in this PR Raku/roast#179

@ab5tract
Copy link
Collaborator Author

ab5tract commented Nov 2, 2016

Oh, also, I had taken the approach to carefully only coerce those elements which needed coercing, but somehow this broke the behavior of Mix and Bag in unexpected ways.

That is probably a separate bug in one of the other set operators referenced in (^), but it is a separate issue in my opinion. Once it is resolved, we can improve the performance slightly.

@zoffixznet
Copy link
Contributor

zoffixznet commented Nov 2, 2016

I'm not sure I like this behaviour, considering the symmetric set difference is a proper mathematical operation with some defined properties[^1], so the current behaviour of coercing the invocants to Sets makes perfect sense to me.

From what I see, this PR maintains those properties for Sets, but not for Bags and Mixes. For the latter, the symmetric set difference operator now is a weights subtraction operator, so we're implementing a secondary behaviour for a well-defined mathematical operator.

I'm also spotting some quirky behaviour, which I'm unsure whether it's intentional or just missed edge-case handling:

$ perl6 -e 'my $a = (:20a).Bag; my $b = (:s(-30)).Mix; say $b (^) $a'
Current behaviour: set(a, s)
This PR's behaviour: mix(a(20))  # where did the `s` go?
$ perl6 -e 'my $a = (:a(-20)).Mix; my $b = (:s(-30)).Mix; say $b (^) $a'
Current behaviour: set(a, s)
This PR's behaviour: mix()  # where are the keys? Just a failure to consider negatives?
$ perl6 -e 'my $a = (:a(-0.1)).Mix; my $b = (:a(0.1)).Mix; say $a (^) $b'
Current behaviour: set()
This PR's behaviour: mix(a(0.2)) # this kinda just feels weird that two weights of equal 
                            # magnitude result in positive weight of twice the magnitude

Lastly, with the two different behaviours this PR overloads the sym. set dif. operator, we get whacky results when one of the elements is a set. For example, in the case below, instead of operating on the elements of the mix and the elements of the set, the elements of the set get (basically invented) weights assigned to it, and the difference of those weights is returned:

perl6 -e 'my $a = (:a).Set; my $b = (:a(0.1)).Mix; say $a (^) $b'
Current behaviour: set()
This PR's behaviour: mix(a(0.9))

So my initial opinion of this PR is a small 👎, based on principle of uniformity and least surprise. Rather than doing a lot of different things and doing this weight manipulation business, the original behaviour of coercing to Sets is straightforward and matches known principles of mathematics.

[1] I did spot a quirk in our, code, but I'd argue the bag result in that case is a bug.

@zoffixznet
Copy link
Contributor

zoffixznet commented Nov 2, 2016

Comments from TimToady:

I very much like that the set behavior is a degenerate case of the bag behavior; what I do not like is any approach that autocoerces bags or mixes to sets, which is pretty non-sensical, so I think such coercions should explicitly pull out the keys and make a set, if that's really what is wanted, and fail otherwise

Which I take to mean this PR is fine. @ab5tract would you be able to fix the negatives, though? From what I understand, under the new behaviour, my $a = (:a(-20)).Mix; my $b = (:s(-30)).Mix; say $b (^) $a would give a mix with :a(20), :s(30) in it. Now that results in an empty mix.

@zoffixznet zoffixznet merged commit 40704b8 into rakudo:nom Nov 7, 2016
@zoffixznet
Copy link
Contributor

The negatives seem to be an issue with ∪ and not to do with this PR.

Merged. Thanks.

zoffixznet added a commit to Raku/roast that referenced this pull request Nov 7, 2016
zoffixznet added a commit that referenced this pull request Nov 7, 2016
@zoffixznet
Copy link
Contributor

Oops... That didn't go too well and I ended up reverting both this PR and roast changes.

I totally missed that this changed test was not a fudged test.

The particular issue is that test exists in 6.c-errata. As we make a promise to our users that a 6.c compiler will pass all of those tests, we cannot introduce changes that break those tests.

You can consider proposing these changes in the v6.d document. But note the tests that broke refer to RT#1311476 where @pmichaud does say that the operator's current behaviour is the desired result.

@ab5tract
Copy link
Collaborator Author

ab5tract commented Nov 7, 2016

When I was hacking the branch, I was thinking against @pmichaud 's interpretation so that there would be a homogeneity of behavior between Set and Mix/Bag variants. Looking back at it I can see that I myself would prefer the behavior he suggests for Set.

Phrased differently, I think the sensible behavior for n args to (^) is indeed defined as "occurring in exactly n-1 Sets in the list of Sets".

That then raises a question:

We could potentially apply a similar logic to the Mix and Bag variants: only keys with matching weights in n-1 Sets will exist in the returning object. Mixes and Bags are then mainly incompatible except where a Mix contains integers as weights.

Part of me prefers that, to be honest. Maybe the behavior in this PR could then be reproduced in a different "set" (really more of a mix/bag) operator?

@zoffixznet
Copy link
Contributor

zoffixznet commented Nov 7, 2016

TimToady actually convinced me of the viewpoint that a Set is a degenerate Bag/Mix and I think what this PR did, to subtract lower weight from higher weight, is correct (and anyone who wants a Setty behaviour can just coerce their Bag/Mix to sets first).

So the new behaviour would be as this PR implemented: that perl6 -e 'my $a = (:a).Set; my $b = (:a(0.1)).Mix; say $a (^) $b' gives mix(a(0.9)) instead of the current set(), because Set, being a degenerate case of Bag/Mix has weight of 1 and the subtraction gives 0.9.

The only thing different from this PR that is needed is to maintain the behaviour for n-args, as far as what/how the args get excluded/subtracted, etc (no idea if that's even compatible or not).

(P.S: I already fixed the issue with negative weights being ignored by ∪ operator).

@ab5tract
Copy link
Collaborator Author

ab5tract commented Nov 7, 2016

Okay, that should be fairly straightforward. Thanks for fixing ∪ , and the shout out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants