Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Distinct with custom comparer (hash bug) #1078

Merged
merged 4 commits into from Nov 16, 2022

Conversation

StefanBertels
Copy link
Contributor

There is a bug in .Distinct() when a custom comparer is used. PR includes test and fix.

Bug is that default GetHashCode is used -- this is not valid with a custom (key) comparer.

First both commits only fix one occurence of the bug (there are more). Some things come to my mind (before adding more tests/fixes):

  1. IMHO there should be a Distinct<EQ, T, K>(...) variant which I would prefer for my use case (avoiding runtime comparer).
  2. Maybe avoid 'func parameter' style Func<string,string,bool> for helper functions like this (comparer). This would avoid hiding a gethashcode "problem" (either needs to be supplied or otherwise risks performance penalty). Should be enough to allow a variant with IEqualityComparer parameter and make something like EqCompare<T> public to allow on-the-fly creation of Comparers using equals+gethashcode.
  3. Do you mind merging Eq.ToEqualityComparer (Ord.ToComparer) #841 (which is related here / giving wrapper EQ => IEqualityComparer)? This would be a (semi) replacement for (1.)

PS: In case you like to stay with 'comparer func parameter': Maybe code (and bugfix) should be improved by moving IfNone and Match outside to avoid calling this within each call of of the comparer. Maybe comparer the should be non-optional (adding another signature without comparer) to clean up code.

@@ -476,7 +476,7 @@ public static class SeqExtensions
/// <returns>A new sequence with all duplicate values removed</returns>
[Pure]
public static Seq<T> Distinct<T, K>(this Seq<T> list, Func<T, K> keySelector, Option<Func<K, K, bool>> compare = default(Option<Func<K, K, bool>>)) =>
toSeq(Enumerable.Distinct(list, new EqCompare<T>((a, b) => compare.IfNone(default(EqDefault<K>).Equals)(keySelector(a), keySelector(b)), a => keySelector(a)?.GetHashCode() ?? 0)));
toSeq(Enumerable.Distinct(list, new EqCompare<T>((a, b) => compare.IfNone(default(EqDefault<K>).Equals)(keySelector(a), keySelector(b)), a => compare.Match(_ => 0, () => keySelector(a)?.GetHashCode() ?? 0))));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would probably be clearer an the intent:

    public static Seq<T> Distinct<T, K>(this Seq<T> list, Func<T, K> keySelector, Option<Func<K, K, bool>> compare = default) =>
        toSeq(Enumerable.Distinct(list, 
            new EqCompare<T>(
                (a, b) => compare.IfNone(default(EqDefault<K>).Equals)(keySelector(a), keySelector(b)), 
                a => compare.Match(Some: _  => 0, None: () => default(EqDefault<K>).GetHashCode(keySelector(a))))));

@louthy louthy merged commit dc20d42 into louthy:main Nov 16, 2022
@louthy
Copy link
Owner

louthy commented Nov 16, 2022

Thanks @StefanBertels 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants