Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: Add the `group_by` and `group_by_mut` methods to slice #2477
Conversation
This comment has been minimized.
This comment has been minimized.
CodesInChaos
commented
Jun 15, 2018
•
|
I don't like the naming. group by in SQL and C# work quite differently from what you propose, they map each item to a key and then group all items having the same key. Something like |
This comment has been minimized.
This comment has been minimized.
Lokathor
commented
Jun 15, 2018
|
I guess, but this is exactly what groupBy does in Haskell, so the name should stay. |
This comment has been minimized.
This comment has been minimized.
|
@CodesInChaos the function you describe has the same behavior as GroupWith in Haskell. |
This comment has been minimized.
This comment has been minimized.
ssokolow
commented
Jun 15, 2018
•
I don't like the "My language is more important than yours" tone of this exchange. That said, from my perspective, a name like Assuming that |
This comment has been minimized.
This comment has been minimized.
rvolgers
commented
Jun 15, 2018
|
This seems pretty similar to the existing I can imagine some cases where you iterate over a list and you can skip some work if some predicate hasn't changed. But in that case, it seems clearer to me to write it as a single loop with that part of the behavior written out with a |
This comment has been minimized.
This comment has been minimized.
I don't think that's it. There's a good reason to favor a Haskell based name in the case of iterator methods, functions, and related things because Rust already uses Haskell based naming in such cases. For example: There's also precedent from I also think it's a telling and descriptive name. You are grouping elements by where the predicate applies. (Yes, a split happens every time the predicate doesn't match, so the order matters, but if you sort, then it does not. The minor difference can be cleared up in documentation..)
Grouping is a pretty common operation; Think of
Generally speaking, at least to me, iterator style composition makes data flow clearer :) |
Centril
added
the
T-libs
label
Jun 15, 2018
This comment has been minimized.
This comment has been minimized.
leonardo-m
commented
Jun 15, 2018
|
A related by different function in D language: And Python: So what semantics do we prefer? |
This comment has been minimized.
This comment has been minimized.
|
@leonardo-m It seems to me that the semantics of D and Python can be recovered from the proposed semantics in this RFC (which is the same as the semantics in the itertools crate) so I would say that the proposed solution is more general. |
This comment has been minimized.
This comment has been minimized.
|
To me, it's not even sufficiently clear what the RFC is proposing. The name The reference implementation of Another problem with providing an implementation and no specification is that I have no idea what the guarantees/requirements are for the predicate. Maybe this is a silly question, but am I guaranteed that it gets invoked with (a[0], a[1]) then (a[1], a[2]) and so on in that order exactly once each, or is the implementation allowed to do things like (a[0], a[1]) then (a[0], a[2]) and so on? I assume everyone's assuming the former, but since there isn't a single example or test of a predicate other than equality ( Which is all a really long way of saying I don't think we can properly bikeshed the name quite yet. In particular, I think we need to see some examples of more interesting predicates (I can't come up with any) to figure out whether "groups", "runs" or "splitting" is the least misleading term for what's going on. |
This comment has been minimized.
This comment has been minimized.
Have a look at the few tests I wrote in my allocation version, here. |
This comment has been minimized.
This comment has been minimized.
|
@Ixrec I would expect the behavior to be equivalent to https://docs.rs/itertools/0.7.8/itertools/trait.Itertools.html#method.group_by except that you get a slice as the element of the returned iterator instead of an @phaazon What's the prior art for @Kerollmops Small library additions such as the one proposed in this RFC have historically been accepted with a PR against |
This comment has been minimized.
This comment has been minimized.
|
@Centril About
|
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@Centril Interesting, the itertools method expects an What other use cases does this proposal hope to support by generalizing to a |
This comment has been minimized.
This comment has been minimized.
|
(Also, not related to the actual discussion, this is more meta about the commit: @Kerollmops, you seem to have pushed this commit with a company email address. I think you should check twice and possibly rebase with your OSS identity – your commit is also not verified). |
This comment has been minimized.
This comment has been minimized.
|
@Ixrec I would provide the following hierarchy:
The two latter ones can be implemented in terms of the first one. I would not use the name EDIT: The use cases for the most general version is for when you have some different notion of equality than the natural one ( |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I think it would be confusing to have |
This comment has been minimized.
This comment has been minimized.
Lokathor
commented
Jun 15, 2018
|
Yeah, split is absolutely not the verb, even if group is also not the verb (though i think group is the best verb here) |
This comment has been minimized.
This comment has been minimized.
uberjay
commented
Jun 15, 2018
•
|
Ruby calls this chunk — https://ruby-doc.org/core-2.5.1/Enumerable.html#method-i-chunk Of course, that’s already taken. Would it be too confusing if it were
Now... that’s probably not a huge issue, because (in rust :) nobody will accidentally treat the result of group_by as a HashMap? |
This comment has been minimized.
This comment has been minimized.
|
It seems like I'm the only one that finds it unintuitive for |
This comment has been minimized.
This comment has been minimized.
uberjay
commented
Jun 15, 2018
|
@lxrec - I also find For fun, I asked someone nearby who isn't familiar with any of the languages in question, and they felt pretty strongly that Now, there's already |
This comment has been minimized.
This comment has been minimized.
ssokolow
commented
Jun 15, 2018
•
|
I'd go with the (That is, assuming the "contiguous runs" interpretation which would preserve lazy evaluation of the iterator. Otherwise, perhaps something in the vein of |
This comment has been minimized.
This comment has been minimized.
Lokathor
commented
Jun 15, 2018
|
Actually Remember that The only question left is one of the ones you already said: assuming the first slice starts with (a[0], a[1]), do we keep checking the next element against the start of the slice or against the latest element of the slice? |
This comment has been minimized.
This comment has been minimized.
uberjay
commented
Jun 16, 2018
|
Ok, a few interesting alternatives from the thesaurus:
|
This comment has been minimized.
This comment has been minimized.
ssokolow
commented
Jun 16, 2018
•
|
I'd go with Aside from having no precedent I'm aware of, it's too focused on process rather than effect. "Snipping" is what you do to accomplish a goal like "trimming" or "splitting" rather than a goal in itself and says nothing about which snipped pieces, if any, will be retained or discarded. (ie. Are you "snipping [something] off" or "snipping [things] apart"?) Therefore, it's neither as intuitive as is ideal when looking through a list of methods with a goal in mind, nor obvious about what it does when looking at a use of it in code. It also generally has a "too informal to fit in with the other terminology" feel to it. |
This comment has been minimized.
This comment has been minimized.
|
The motivation here reads to me like "this makes it easier to do what this does", which I don't find persuasive. Why is it common to have this problem, to have the input in exactly the form needed for this proposed implementation, and to not just want the eager @Ixrec You're definitely not alone; I would absolutely expect "group by" to have the relational algebra meaning as well. |
This comment has been minimized.
This comment has been minimized.
Lokathor
commented
Jun 16, 2018
|
well this works in a no_alloc situation, for one |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 16, 2018
|
I suppose
|
This comment has been minimized.
This comment has been minimized.
True, but the itertools
It does: extern crate itertools;
use itertools::Itertools;
fn main() {
let arr = [1, 1, 2, 2, 1, 1, 3, 3];
for (key, group) in arr.iter().group_by(|&&elt| elt).into_iter() {
println!("{:?}", group.collect::<Vec<_>>());
}
}outputs: [1, 1]
[2, 2]
[1, 1] // Notice this!
[3, 3] |
This comment has been minimized.
This comment has been minimized.
|
@Centril, as you propose above, I have done the work to add to the The problem is that it will not compile if I do not specify an issue on the unstable stuff but there is no traking issue for the moment, so I put "42" as a placeholder. Right, the itertools |
This comment has been minimized.
This comment has been minimized.
|
@Kerollmops Leaving some preliminary notes here on your work (ill leave more as inline comments later when you file the PR):
|
This comment has been minimized.
This comment has been minimized.
|
Why everyone immediately starts bikeshedding the name I agreed with #2477 (comment) that the RFC doesn't demonstrate why Furthermore, checking sourcegraph, after removing the repositories You almost always need to sort the slice before using let mut packages = metadata.packages;
packages.sort_by(|a, b| a.name.cmp(&b.name)); // <------
for group in packages.group_by(|a, b| a.name == b.name) {
let name = &group[0].name;
if group.len() > 1 { ... }
}The Map-based refactoring isn't that bad compared with this. let mut grouped_packages: HashMap<_, Vec<_>> = HashMap::new();
for package in &metadata.packages {
grouped_packages.entry(&package.name).or_default().push(package);
}
for (name, group) in grouped_packages {
if group.len() > 1 { ... }
} |
Kerollmops
referenced this pull request
Jun 17, 2018
Closed
Add GroupBy and GroupByMut iterators to the slice #51606
This comment has been minimized.
This comment has been minimized.
|
The real advantage of this The itertools
It depends on how you insert your data in the Before I propose this iterator I wrote a custom implementation for a project of mine that looked like this. So it will probably not be visible in the sourcegraph you linked. let mut previous = None;
let mut iter = slice.iter();
while let Some(elem) = iter.next() {
if previous.is_none() || previous != Some(elem) {
previous = Some(elem);
// do something here with `elem`: the first element of each group
}
}@Centril I have opened a PR as you asked for. |
This comment has been minimized.
This comment has been minimized.
|
Now I start to get why @kennytm and @scottmcm feel this way. The RFC should be modified – especially its motivation – to explain why this should be merged to |
Kerollmops
force-pushed the
Kerollmops:group-by
branch
from
2b744bb
to
e14e0a6
Jun 17, 2018
This comment has been minimized.
This comment has been minimized.
|
I updated the motivation section as requested. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I have multiple usecases in one of my project and I wrote custom loops just as temporary fixes. I do not think that this method fit in the |
Kerollmops
force-pushed the
Kerollmops:group-by
branch
from
e14e0a6
to
5ee7a72
Jun 17, 2018
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 17, 2018
|
Arguably, these should be called You can almost implement
It doesn't quite work however since |
This comment has been minimized.
This comment has been minimized.
|
I think something like this can correctly emulated the current pub fn split<F>(&self, pred: F) -> impl Iterator<Item=&[T]>
where F: FnMut(&T) -> bool
{
self.group_by(|a, _| pred(a))
.enumerate()
.map(|(i, slice)| {
match i {
0 => slice,
_ => slice[1..],
}
})
}EDIT: this code ^ is wrong ! But this is not really the subject here, For the real discussion here is: do we really want to add this method to the standard library ? And I did not understand what you mean with the |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 18, 2018
|
I think our discussion of deriving As I said, it's unfortunate Afaik, you'd need the lifetime so the return can borrow both
In principle, the new |
This comment has been minimized.
This comment has been minimized.
uberjay
commented
Jun 18, 2018
|
Apologies about the immediate bike-shedding. In terms of the actual functionality, I've found the corresponding functionality pretty useful in Ruby-land for segmenting (or chunking, or grouping, if you will :) line-based output into logical chunks more easily. It's not a great day when you're stuck parsing the output of some random command line tool, but So, it's been useful for me in situations where sorting beforehand doesn't make sense. |
This comment has been minimized.
This comment has been minimized.
|
I think the immediate bikeshedding is a sign that people don't have any larger issues with the proposal... |
This comment has been minimized.
This comment has been minimized.
|
Ok so the only blocking thing about this RFC seems to be the name chosen ( @uberjay propose It seems that the only way I can make this RFC move forward is by changing the name from I am not convinced about this renaming, as @mark-i-m says:
So I propose to tag this RFC as |
This comment has been minimized.
This comment has been minimized.
|
Ping ! |
This comment has been minimized.
This comment has been minimized.
uberjay
commented
Aug 26, 2018
|
Totally agree -- it'd be great to have this, regardless of what it's named! I've run across a couple times where it would have come in handy over the past couple months, even! |
This comment has been minimized.
This comment has been minimized.
|
As the RFC seems to get stuck, I will ping someone from the Library team to take a decision. |
This comment has been minimized.
This comment has been minimized.
|
I propose to rename the method Note that the The same pattern can be found on other types like |
joshtriplett
added
the
I-nominated
label
Sep 26, 2018
This comment has been minimized.
This comment has been minimized.
|
Nominating for discussion in the next libs team meeting, to get the process un-stuck after the various bikeshed-painting. :) |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett When does the next libs team meeting occurs ? |
Centril
added
A-slice
A-types-libstd
labels
Nov 22, 2018
This comment has been minimized.
This comment has been minimized.
|
For those interrested, I have made a temporary library that I will not publish to crates.io, providing a temporary workaround to this RFC/PR that has not been merged. It compiles on stable rust, the version that will be merged will use unstable functions to improve performance and clarity (i.e. offset_from). let slice = &[1, 1, 1, 3, 3, 2, 2, 2];
let mut iter = GroupBy::new(slice, |a, b| a == b);
assert_eq!(iter.next(), Some(&[1, 1, 1][..]));
assert_eq!(iter.next(), Some(&[3, 3][..]));
assert_eq!(iter.next(), Some(&[2, 2, 2][..]));
assert_eq!(iter.next(), None); |
Kerollmops commentedJun 15, 2018
•
edited
This RFC propose to add two new methods to the slice, the
group_byandgroup_by_mut. These two will provide a way to iterate over non-overlapping sub-slices of a base slice that are separated by the predicate given by the user (e.g.Partial::eq,|a, b| a < b).The predicate is called on two elements following themselves, it means the predicate is called on
slice[0]andslice[1]then onslice[1]andslice[2]...Pending Pull Request
Work around temporary library
Rendered