-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for the GroupBy and GroupByMut iterators #80552
Comments
It's probably too late, but would you consider renaming this functions to a more proper name like group_by kind of should give a transitive closure of a relationship. |
Just an alternative name suggestion: |
I think that Here a "Group" is a subslice of contiguous elements, each of the subslices is delimited by the element(s) where the predicate fails. Is there other important definition of a group that should be prioritized over this? Minor thing, but I would prefer is there was a indication that the elements are an adjacent/contiguous subslice, is that a common definition of a group? Some names I thought about:
|
I usually think of a group as a set of elements, not necessarily contiguous. This is the meaning it has for example in Java and SQL. In the original RFC someone proposed calling them |
Yes - group_by in most languages (and my own macros in c/c++ by example) does not require a contiguous range but rather the entire width of the iterator |
In response to the comment in Java, the common approach in that language is to apply a collect(___) to the end of a stream (iterator) and inside the parens choose the type of grouping, such as Collections.toList() or Collections.toMap(key_fn, value_fn) |
I'm looking at this issue after a confusion with So i'm adding my voice here to a change of name, |
Couple more candidates
|
For me the name is perfect, it has the same meaning as Python's groupby: https://docs.python.org/3/library/itertools.html#itertools.groupby |
Not to argue with you as you’re just expressing your subjective opinion, but it got me thinking about it more deeply. I’m not a linguist, but I think the function with I would like to use objective criteria here and not historical reasons. Are there any linguists or mathematicians here? By the way I almost never seen pythons |
Mathematician here - don't think you're going to get much from math:
The clearest parallel, in my 2c, is to the |
D calls this operation Re @purpleP comment about language and meaning: PS: Looking very much forward to this feature as I'm porting functional-style D to Rust; what's blocking besides naming? [0] https://dlang.org/library/std/algorithm/iteration/chunk_by.html |
In my opinion
I would like to voice against
That's not entirely true, |
I'm finding this function quite useful in my code. It's similar to a split, but it takes a function (predicate) with two arguments (diadic) instead of a function with one argument. So if you don't like the "group_by" Python-inspired name (that takes a single argument predicate) then do you like split_on_adjacent? :-) Regarding the implementation, currently it performs a linear scan:
When (after the sort) the runs are long enough, a more aggressive strategy could be faster. Something like galloping of D language (exponential grow, followed by binary search), or an arithmetic grow followed by binary search. This can't work if the input slice isn't sorted. |
Hey @leonardo-m, If you need something more polished than this nightly method in the standard library, I have also worked on a crate with something similar to the galloping algorithm you describe here. The library is called |
I've just tried that crate and it works well (in my case it gives no speedup probably because my runs are generally very short). |
Another note, inside next() of GroupBy there's this line:
Inside slice::split_at there's this assert:
I've seen that such assert isn't removed in the asm of my group_by usages. I think it's because of this missed LLVM optimization: So are benchmarks telling us that it is performance-wise worth giving a hint (inside next() of GroupBy) to help LLVM remove that assert on mid? |
Note that with a carefully placed |
SkiFire13, despite I have some experience with optimizing Rust code and finding ways to remove bound tests, your code feels a bit magical. Very nice. I think that change is worth pushing in the Rust stdlib. |
One thing that I needed in my use-case in addition to what |
The chunks are slices and do not overlap. So the name |
I also like |
also semantically hinting that it would |
The proposed functions do not make any assumptions about ordering.
|
I apologize for imprecision. I've updated my comment to reflect my intention. (namely, that "group by" could carry an implicit meaning that it might not operate sequentially, as you suggest) |
Would it be possible for For example, a type might need to cache a result to use in it's In terms of implementation, since I can see how there might be some "weirdness" about passing a mutable reference to a predicate, but if we think that it's a unique reference instead, then it make sense, since the algorithm is designed in such a way that each item passed to the predicate is unique (i.e. both arguments are never the same and won't overlap with yielded slices). |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
@Zenithsiz oi Filipe 👋. I don't think
First as the right element, then, as the left element, with the exception of the first and last elements. If you mutate an element in the previous predicate call, this might lead to a confusing pitfall where the yielded groups do not respect the predicate (as expected). For the caching usage case you mentioned, I'd say that you can do this instead: slice.iter_mut().for_each(perform_cache_operation);
// then, call `group_by` |
Looks like most people do agree. Do we need to wait for FCP to finish before creating a PR to rename |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
As suggested here by the original author, I'm porting my comment to here: fn linear_group_by_key<F, K>(&self, func: F) -> LinearGroupByKey<T, F>
where F: FnMut(&T) -> K,
K: PartialEq Does not allow: let strings = vec!["A".to_owned()];
strings.linear_group_by_key(|s| s.as_str()); In order for that to work, we need: fn linear_group_by_key<'a, F, K>(&'a self, func: F) -> LinearGroupByKey<T, F>
where F: FnMut(&'a T) -> K,
K: PartialEq Since nothing moves in memory it should be possible. While there is (unfortunately) no That would also match better what is done on |
I agree it might be useful for |
Aha you're right, whoops 😅 Yeah so only |
Renamed "group by" to "chunk by" a per rust-lang#80552. Newly stable items: * `core::slice::ChunkBy` * `core::slice::ChunkByMut` * `[T]::chunk` * `[T]::chunk_by` Closes rust-lang#80552.
…r=dtolnay Stabilize `slice_group_by` Renamed "group by" to "chunk by" a per rust-lang#80552. Newly stable items: * `core::slice::ChunkBy` * `core::slice::ChunkByMut` * `[T]::chunk` * `[T]::chunk_by` Closes rust-lang#80552.
…r=dtolnay Stabilize `slice_group_by` Renamed "group by" to "chunk by" a per rust-lang#80552. Newly stable items: * `core::slice::ChunkBy` * `core::slice::ChunkByMut` * `[T]::chunk` * `[T]::chunk_by` Closes rust-lang#80552.
Rollup merge of rust-lang#117678 - niklasf:stabilize-slice_group_by, r=dtolnay Stabilize `slice_group_by` Renamed "group by" to "chunk by" a per rust-lang#80552. Newly stable items: * `core::slice::ChunkBy` * `core::slice::ChunkByMut` * `[T]::chunk` * `[T]::chunk_by` Closes rust-lang#80552.
Feature gate:
#![feature(slice_group_by)]
This is a tracking issue for the
GroupBy
andGroupByMut
iterators.This feature exposes the
group_by
andgroup_by_mut
methods on the slice and mutable slice types, these methods return theGroupBy
andGroupByMut
iterators structs respectively. Those two iterators return subslices of the original slice where a user-defined function returnstrue
for two following elements of the slice.Public API
These methods can return subslices that contains equal elements:
they can also be used to extract the sorted subslices:
Steps / History
group_by
andgroup_by_mut
methods to slice rfcs#2477 (it was determined that an RFC wasn't needed)Unresolved Questions
group_by
? Or should we reserve that name for another higher-level combinator?RFC: Add the
group_by
andgroup_by_mut
methods to slice rfcs#2477 (comment)The text was updated successfully, but these errors were encountered: