New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for the GroupBy and GroupByMut iterators #80552
Comments
It's probably too late, but would you consider renaming this functions to a more proper name like group_by kind of should give a transitive closure of a relationship. |
Just an alternative name suggestion: |
I think that Here a "Group" is a subslice of contiguous elements, each of the subslices is delimited by the element(s) where the predicate fails. Is there other important definition of a group that should be prioritized over this? Minor thing, but I would prefer is there was a indication that the elements are an adjacent/contiguous subslice, is that a common definition of a group? Some names I thought about:
|
I usually think of a group as a set of elements, not necessarily contiguous. This is the meaning it has for example in Java and SQL. In the original RFC someone proposed calling them |
Yes - group_by in most languages (and my own macros in c/c++ by example) does not require a contiguous range but rather the entire width of the iterator |
In response to the comment in Java, the common approach in that language is to apply a collect(___) to the end of a stream (iterator) and inside the parens choose the type of grouping, such as Collections.toList() or Collections.toMap(key_fn, value_fn) |
I'm looking at this issue after a confusion with So i'm adding my voice here to a change of name, |
Couple more candidates
|
For me the name is perfect, it has the same meaning as Python's groupby: https://docs.python.org/3/library/itertools.html#itertools.groupby |
Not to argue with you as you’re just expressing your subjective opinion, but it got me thinking about it more deeply. I’m not a linguist, but I think the function with I would like to use objective criteria here and not historical reasons. Are there any linguists or mathematicians here? By the way I almost never seen pythons |
Mathematician here - don't think you're going to get much from math:
The clearest parallel, in my 2c, is to the |
D calls this operation Re @purpleP comment about language and meaning: PS: Looking very much forward to this feature as I'm porting functional-style D to Rust; what's blocking besides naming? [0] https://dlang.org/library/std/algorithm/iteration/chunk_by.html |
In my opinion
I would like to voice against
That's not entirely true, |
I'm finding this function quite useful in my code. It's similar to a split, but it takes a function (predicate) with two arguments (diadic) instead of a function with one argument. So if you don't like the "group_by" Python-inspired name (that takes a single argument predicate) then do you like split_on_adjacent? :-) Regarding the implementation, currently it performs a linear scan:
When (after the sort) the runs are long enough, a more aggressive strategy could be faster. Something like galloping of D language (exponential grow, followed by binary search), or an arithmetic grow followed by binary search. This can't work if the input slice isn't sorted. |
Hey @leonardo-m, If you need something more polished than this nightly method in the standard library, I have also worked on a crate with something similar to the galloping algorithm you describe here. The library is called |
I've just tried that crate and it works well (in my case it gives no speedup probably because my runs are generally very short). |
Another note, inside next() of GroupBy there's this line:
Inside slice::split_at there's this assert:
I've seen that such assert isn't removed in the asm of my group_by usages. I think it's because of this missed LLVM optimization: So are benchmarks telling us that it is performance-wise worth giving a hint (inside next() of GroupBy) to help LLVM remove that assert on mid? |
Note that with a carefully placed |
SkiFire13, despite I have some experience with optimizing Rust code and finding ways to remove bound tests, your code feels a bit magical. Very nice. I think that change is worth pushing in the Rust stdlib. |
One thing that I needed in my use-case in addition to what |
The chunks are slices and do not overlap. So the name |
i wanted to sort items from an iterator into groups so i searched "rust group iterator", found these methods, and was extremely confused for about 90 seconds until i had processed that these do something entirely different from what i am trying to do. Thinking about this for a bit, |
Taking a cue from more-itertools, call it |
Feature gate:
#![feature(slice_group_by)]
This is a tracking issue for the
GroupBy
andGroupByMut
iterators.This feature exposes the
group_by
andgroup_by_mut
methods on the slice and mutable slice types, these methods return theGroupBy
andGroupByMut
iterators structs respectively. Those two iterators return subslices of the original slice where a user-defined function returnstrue
for two following elements of the slice.Public API
These methods can return subslices that contains equal elements:
they can also be used to extract the sorted subslices:
Steps / History
group_by
andgroup_by_mut
methods to slice rfcs#2477 (it was determined that an RFC wasn't needed)Unresolved Questions
group_by
? Or should we reserve that name for another higher-level combinator?RFC: Add the
group_by
andgroup_by_mut
methods to slice rfcs#2477 (comment)The text was updated successfully, but these errors were encountered: