-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: slices: ChunkFunc to determine chunks of a slice using a function #69123
Comments
Related Issues and Documentation (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
If it's called "ChunkFunc" I would expect to be able to implement Chunk as a trivial composition with it, like how slices.Sort is just SortFunc + cmp.Compare. However, implementing Chunk with this would involve what to me feels like a non-trivial use of closures. So I don't love the name. It does seem useful. It's similar to Python's itertools.GroupBy, although I believe Python returns the equivalent of iter.Seq2[Key, iter.Seq[Value]], which in my experience wasn't very convenient because I always ended up wanting the inner iterator to be a list. |
Slightly more thought: I think it should be called xiter.GroupFunc and xiter.Group is the identity sequence: func GroupFunc[K comparable, V any](seq iter.Seq[V], key func(V) K) iter.Seq2[K, []V]
func Group[T comparable](seq iter.Seq[T]) iter.Seq2[T, []T] {
return GroupFunc(seq, func(v T) T { return v })
} |
I agree with you on the name not being quite consistent with the other A Back on topic with name choice, though maybe just copying Elixir and calling it |
I would expect In the original Chunk thread I posted a generalization that's similar to this but instead of having a key func the predicate takes two values and returns true if it is should start a new chunk. It's a bit fussier to implement but I think this proposal could be implemented in terms of that more easily than the other way around. #53987 (comment) I'm not sure that's quite right either, though. When I implemented it I ended up needing to replace the func with an interface that had a reset method because otherwise it became difficult to write combinators. |
Related: @earthboundkid's |
FWIW, Python's groupby works like this. >>> from itertools import groupby
>>> list(groupby("aabbaa"))
[('a', <itertools._grouper object at 0x100d24b20>),
('b', <itertools._grouper object at 0x100d25ae0>),
('a', <itertools._grouper object at 0x100d25b10>)] |
FWIW, we have an xiter package internally and we have this: // ChunkFunc returns an iterator over chunks of n, splitting the input sequence wherever split returns true.
func ChunkFunc[Slice ~[]E, E any](s Slice, split func(a, b E) bool) iter.Seq[Slice] Looking at our uses of it, I think it'd be hard to rewrite it into a "convert each element into a key" style function. Not impossible. But I think it'd also harm readability, at least in our cases, where the splitting mechanism is pretty intricate. |
Proposal Details
I propose adding a new function to the
slices
package that would return an iterator yielding variable-length subslices of a given slice based on a provided function. There are two possible implementations of this.Version 1
The first is something I have personally used several times before, as well as being similar to a function in the Elixir standard library, looks like
This function returns an iterator that yields subslices of
s
composed of consecutive runs of elements for whichchunker
returns the same value. In other words,would yield
[]int{-1, -2, -3}
,[]int{1, 2}
,[]int{-1}
, and then[]int{3, 2, 1}
. This is useful for a number of different things. For example, let's say that you have a slice of lines of output from something, some of which are to stdout and some to stderr. If you want to output those with a header to indicate which is which, being able to group the consecutive runs of lines that were to each is very useful, and a function like this can do it quite efficiently.Version 2
The other possible implementation is to simply make the function always return a
bool
, and then define that each time it returnstrue
is the start of a new chunk.While this is not a function that I've personally had a use for, I'm not really stuck on either specifically simply because I'm pretty sure that both can be implemented using the other. I think it's probably easier to implement the
comparable
one using thebool
version, though, as it would simply be something like the following untested function:Version 3
Another alternative is to simply provide both as, say,
ChunkFunc()
for thebool
version andChunkBy()
for thecomparable
one.The text was updated successfully, but these errors were encountered: