-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition for groupings #210
Conversation
In Sample 1 we have
Is it right? |
I like this. I have needed methods like this one before, and ended up using A few overall questions though:
PS: It is unfortunate that c# does not have syntax that would allow making this truly generic over the number of possible buckets... |
@leandromoh Obviously not. :) Looks like a copy & paste error (printing @fsateler Glad to hear you like it. I've needed this countless times too & just got tired & decided it was time to give it a go (and in a generalised way over groupings). Looking forward to your review.
To be honest, I wanted to start with simpler signatures and not having to deal with
Unless you're in a concurrent world? And how can you assume how the grouping was created? An |
I agree that using a type already given to us is good. OTOH,
Ack.
That is the problem of relying on |
Sorry, not sure if I'm following you here. Care to rephrase? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are missing. Looking forward to having this in MoreLINQ!
MoreLinq/GroupPartition.cs
Outdated
/// </summary> | ||
|
||
public static TResult Partition<T, TResult>(this IEnumerable<IGrouping<bool, T>> source, | ||
Func<IEnumerable<T>, IEnumerable<T>, TResult> resultSelector) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first use of expression-bodied members in MoreLINQ. Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, parameters indented at same level as body looks consfusing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done in the interest of speed to get this published for early feedback.
I use expression-bodied members whenever possible though I'm willing to revert in the interest of consistency with the rest of the code base. Once 2.0 is golden, it'll be time to make a sweeping change & upgrade the code base to C# 6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use expression-bodied members whenever possible though I'm willing to revert in the interest of consistency with the rest of the code base.
Yes, consistency is good. Also, I thought there was interest in being able to compile with C# < 6. I'm not saying it is necessary to preserve (I don't use C#<6), but I wasn't sure if this was a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once 2.0 is golden
BTW, maybe it is a good idea to establish a milestone, to define what needs to be done before releasing 2.0. Then people can contribute towards that end, and maybe even decide which PRs to look at before or after release.
MoreLinq/GroupPartition.cs
Outdated
/// <summary> | ||
/// Partitions a grouping by Boolean keys into a projection of true | ||
/// elements and false elements, respectively. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter and return documentation is missing, on all overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for the particular boolean overloads, it should be documented which of the resultSelector
parameters corresponds to true
and false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I documented just the summaries for now, which are the most important bits. The rest will follow as time permits. Meanwhile, this could be released earlier, and dare I say even without tests, if it's put into the edge namespace as being discussed in #209.
for the particular boolean overloads, it should be documented which of the
resultSelector
parameters corresponds totrue
andfalse
.
This is already done: a projection of true elements and false elements, respectively. Perhaps it's not obvious but “respectively” means in the aforementioned order.
On that note, this is different from F# where, for example, List.partition
returns a couple of false (first) & true (second) parts. Should we align? It does mean though that in the case of bool?
keys, the parts become: null, false, true. Or should we go opinionated, that true then false reads better in practice because it's easier to express & think in terms of passing than failing conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the particular boolean overloads, it should be documented which of the resultSelector parameters corresponds to true and false.
This is already done: a projection of true elements and false elements, respectively. Perhaps it's not obvious but “respectively” means in the aforementioned order.
Right, sorry. I meant that this needs to be in the <param />
entry.
On that note, this is different from F# where, for example, List.partition returns a couple of false (first) & true (second) parts. Should we align? It does mean though that in the case of bool? keys, the parts become: null, false, true. Or should we go opinionated, that true then false reads better in practice because it's easier to express & think in terms of passing than failing conditions?
Hmm. OTOH, those from a C-like background may find it more natural in the F#
order, given that false
is 0
and true
is (usually) 1
, and then parameters are in order. I value consistency, but I'm not sure how many people use both F# and C#. I'm undecided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I documented just the summaries for now, which are the most important bits. The rest will follow as time permits. Meanwhile, this could be released earlier, and dare I say even without tests, if it's put into the edge namespace as being discussed in #209.
Hmmm. Maybe the guidelines for acceptance should also be discussed in #209. But I figure at least the trivial tests (null arguments, etc) should be there before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that
false
is0
andtrue
is (usually)1
I think true is usually some non-zero value. For example, in VBA, True
is -1 when converting to a signed integer. Does that mean true comes before false? 😉 Joking aside, where would you put the null grouping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joking aside, where would you put the null grouping?
I like putting it last. It feels a lot more natural, because "real" values are first.
MoreLinq/GroupPartition.cs
Outdated
if (i < 0) | ||
(etc ?? (etc = new List<IGrouping<TKey, TElement>>())).Add(e); | ||
else | ||
groups[i] = e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the source enumerable is not unique by keys, then the last group seen will be the one used. This should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point but isn't it better to turn it into an error? Although doing so for non-matching groups would be unnecessarily expensive. I'm inclined to document this as undefined behaviour even if someone observes what you say. I guess the best test is what you'd expect Partition
to do if the came from GroupAdjacent
instead of GroupBy
. Have to think about this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point but isn't it better to turn it into an error? Although doing so for non-matching groups would be unnecessarily expensive.
Maybe just defining the error for the known keys is sufficient.
I'm inclined to document this as undefined behaviour even if someone observes what you say.
Uhm. I'm not particularly fond of undefined behavior. However, undefined behavior might be too strong. Maybe just say "if the keys are not unique, a single group will be passed to resultSelector
. It is not specified which one."
I have to say I prefer the error.
I guess the best test is what you'd expect Partition to do if the came from GroupAdjacent instead of GroupBy. Have to think about this one.
GroupAdjacent
has 2 main uses in my mind: partial grouping on infinite sequences, and grouping optimization when the source is known to be ordered. The first use case is not relevant for Partition
as it can't deal with an infinite sequence anyway. For the second use case, I think I'd prefer if Partition
would tell me the keys weren't unique.
MoreLinq/GroupPartition.cs
Outdated
: -1; | ||
|
||
if (i < 0) | ||
(etc ?? (etc = new List<IGrouping<TKey, TElement>>())).Add(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assignment within an lvalue looks weird, I think it is more readable to expand this:
etc = etc ?? new List<IGrouping<TKey, TElement>>();
etc.Add(e);
MoreLinq/GroupPartition.cs
Outdated
var i = count > 0 && comparer.Equals(e.Key, k1) ? 0 | ||
: count > 1 && comparer.Equals(e.Key, k2) ? 1 | ||
: count > 2 && comparer.Equals(e.Key, k3) ? 2 | ||
: -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find nested ?:
operators very difficult to follow. What is the precedence of ?
and :
? Again, I would expand this:
int i;
if (count > 0 && comparer.Equals(e.Key, k1))
i = 0;
else if (count > 1 && comparer.Equals(e.Key, k2))
i = 1;
else if (count > 2 && comparer.Equals(e.Key, k3))
i = 2;
else
i = -1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is meant to be read here like a switch
/case
where case
can be an expression and the whole switch
is an expression evaluating to a single value. I'd like to keep this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is meant to be read here like a switch/case where case can be an expression and the whole switch is an expression evaluating to a single value. I'd like to keep this as-is.
OK. I still disagree, but there is nothing else to claim other than style preferences 😉
MoreLinq/GroupPartition.cs
Outdated
|
||
/// <summary> | ||
/// Partitions a grouping into a projection of elements matching two | ||
/// sets of keys and those groups that do not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/sets of // ? Or instead "elements matching a set of two keys". Same for the 3 key overloads
MoreLinq/GroupPartition.cs
Outdated
{ | ||
Debug.Assert(count > 0 && count <= 3); | ||
|
||
if (source == null) throw new ArgumentNullException(nameof(source)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First instance of nameof
in MoreLINQ, same question as for expression-bodied members.
MoreLinq/GroupPartition.cs
Outdated
|
||
public static TResult Partition<T, TResult>(this IEnumerable<IGrouping<bool?, T>> source, | ||
Func<IEnumerable<T>, IEnumerable<T>, IEnumerable<T>, TResult> resultSelector) => | ||
source.Partition(true, false, null, (t, f, n, _) => resultSelector(t, f, n)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you use _
to mark an unused parameter, I think you should do this in all overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do that where it's limited to one unused parameter as that's all C# can afford us. Do you suggest to use 2 underscores for the second unused? As in:
PartitionImpl(source, 1, key, default(TKey), default(TKey), comparer, (a, _, __, rest) => resultSelector(a, rest));
MoreLinq/GroupPartition.cs
Outdated
int count, TKey k1, TKey k2, TKey k3, IEqualityComparer<TKey> comparer, | ||
Func<IEnumerable<TElement>, IEnumerable<TElement>, IEnumerable<TElement>, IEnumerable<IGrouping<TKey, TElement>>, TResult> resultSelector) | ||
{ | ||
Debug.Assert(count > 0 && count <= 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be changed to a regular check+throw, so that unit tests have a chance of catching this.
MoreLinq/GroupPartition.cs
Outdated
@@ -0,0 +1,144 @@ | |||
#region License and Terms | |||
// MoreLINQ - Extensions to LINQ to Objects | |||
// Copyright (c) 2009 Atif Aziz. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2009 => 2016
I mean that as long as the implementation is based on top of |
Actually, now that I think of this, why not pass the public static TResult Partition<T, TResult>(this IEnumerable<IGrouping<bool, T>> source,
Func<IGrouping<bool, T>, IGrouping<bool, T>, TResult> resultSelector)
// ... |
With Do you have a concrete example to demonstrate what the current design/signature would prevent? |
No. That was mostly thinking out loud, and your explanation makes a lot of sense. There is little value in preserving the |
Alternative names to avoid collision with the already existing
|
Thanks for those names suggestions, @fsateler. I'll give them some thought.
If you look at the changes between the last beta and up to the release of 2.0, |
We don't have an issue for this PR so let's pair it with #264. |
38bdb1c
to
0015b72
Compare
@fsateler Let me know if I missed anything from your review/feedback. |
Failed : MoreLinq.Test.NullArgumentTest.Partition: 'predicate' (1); TResult Partition[T,TResult](System.Collections.Generic.IEnumerable`1[T], System.Func`2[T,System.Boolean], System.Func`3[System.Collections.Generic.IEnumerable`1[T],System.Collections.Generic.IEnumerable`1[T],TResult]) Expected string length 9 but was 11. Strings differ at index 0. Expected: "predicate" But was: "keySelector" -----------^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice except for a few details.
BTW, the GroupAdjacentTest.cs
file renaming seems spurious?
/// <typeparam name="TElement">Type of elements in source groupings.</typeparam> | ||
/// <typeparam name="TResult">Type of the result.</typeparam> | ||
/// <param name="source">The source sequence.</param> | ||
/// <param name="key">The key to partition.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be my non-native English, but I fail to parse this description. I think it would be clearer to say: The key of the matching partition
. Mutatis mutandis, the same applies to the overloads with more key
arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my thinking…the method is called Partition
so it's partitioning or extracting the group with that key. Would the “the key to partition on“ read clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the “the key to partition on“ read clearer?
That would be a bit better. At least I can parse that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTY now with 97ed654?
MoreLinq.Test/PartitionTest.cs
Outdated
{ | ||
var (evens, odds) = | ||
Enumerable.Range(0, 10) | ||
.Partition(x => x % 2 == 0, ValueTuple.Create); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice example of this overload. Maybe it should be included in the xmldoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 7ef8442 but just for the simplest overload.
|
||
var r2 = r.Read(); | ||
Assert.That(r2.Key, Is.EqualTo(2)); | ||
Assert.That(r2, Is.EquivalentTo(new[] { 2, 5, 8 })); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test (and all the tests that have more than one grouping in etc
) assert that the result is ordered by the order in which keys are seen, but that is not documented. So either the documentation should be updated to include this guarantee, or change these tests to relax that requirement.
MoreLinq.Test/PartitionTest.cs
Outdated
Assert.That(r1, Is.EquivalentTo(new[] { 1, 4, 7 })); | ||
Assert.That(r2, Is.EquivalentTo(new[] { 2, 5, 8 })); | ||
using (var r = etc.Read()) | ||
r.ReadEnd(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it clearer to Assert.That(etc, Is.Empty)
, any particular reason to use the SequenceReader
in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason to use the
SequenceReader
in this context?
Besides me being sloppy, no. 😕
Wonder if this is also not a good time to add an overload without any projection function that simply returns a tuple: public static (IEnumerable<T> True, IEnumerable<T> False)
Partition<T>(this IEnumerable<T> source, Func<T, bool> predicate) =>
source.Partition(predicate, ValueTuple.Create); The usage would then be: var (evens, odds) = Enumerable.Range(0, 10).Partition(x => x % 2 == 0); Here it is in action in a C# Interactive session:
Note tuple element names of |
Oh, that sounds very useful. |
Added in 2a9fa55. |
@fsateler We good to merge this? Did I miss anything from the changes you requested in your review? |
You did not comment on my comment on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo the merge conflicts)
Thanks! 👍
Enables partitioning of groupings of Boolean keys and other any 3 cases.
Tests are pending.
Sample 1
Partition a grouping where the key is a Boolean:
Output:
Sample 2
Partition a grouping where the key is a nullable Boolean:
Output:
Sample 3
Partition grouping into multiples of 3 and the rest:
Output:
Sample 4
Parition groupings by keys (remainders) matching 0, 1 & 2 (the non-matching groupings
_
are ignored because they can never occur):Output:
Sample 5
Partition of groupings using a key comparer:
Output:
Sample 6
Cascaded partition of groupings:
Output: