Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch should return IEnumerable<T[]> #98

Closed
atifaziz opened this issue Aug 21, 2015 · 5 comments · Fixed by #1014
Closed

Batch should return IEnumerable<T[]> #98

atifaziz opened this issue Aug 21, 2015 · 5 comments · Fixed by #1014
Assignees
Milestone

Comments

@atifaziz
Copy link
Member

atifaziz commented Aug 21, 2015

Since batches are about splitting an IEnumerable into fixed sizes, every batch should know how many items it has, in addition to the original order of the source enumerable. Size + order preservation = IList.

This should break little (if any) existing code, because IList inherits IEnumerable. Asides from the signature, the only code change that needs to be made is for the underlying bucket store to be a List instead of an Array (which is the more LINQy approach anyways).


Originally reported on Google Code with ID 98

Reported by @Arithmomaniac on 2015-03-09 14:19:50

@atifaziz
Copy link
Member Author

atifaziz commented Aug 21, 2015

Or at least IReadOnlyCollection.


Reported by @Arithmomaniac on 2015-05-27 13:54:17

@fsateler
Copy link
Member

This looks reasonable to me (after all the length is known beforehand). However, if done it should be done for 2.0, or else for the next major version as that signature change would break the ABI.

@atifaziz
Copy link
Member Author

I would argue that even returning a sequence of arrays might be best here, which is required, could then be cast to an IList<T> or wrapped into a read-only collection. And while we're on the topic of buffered views, the same could apply to Windowed and Partition.

@atifaziz
Copy link
Member Author

atifaziz commented Sep 30, 2016

every batch should know how many items it has

Every batch will be the same size except the last one if the source sequence doesn't have a total count of elements that's a multiple of the requested size.

Size + order preservation = IList.

Batch does not mandate order preservation & the signature currently reflects that. If batching is tomorrow implemented in a parallel scatter-gather fashion then it just needs to stream batches but requiring order preservation could pin implementations to being less efficient than they could be.

in addition to the original order of the source enumerable.

This is almost the case with most LINQ operators yet they don't return IList<T>. You'd think, for example, that something as simple as Select would preserve order but it doesn't when using Parallel LINQ (see “Order Preservation in PLINQ”):

In PLINQ, the goal is to maximize performance while maintaining correctness. A query should run as fast as possible but still produce the correct results. In some cases, correctness requires the order of the source sequence to be preserved; however, ordering can be computationally expensive. Therefore, by default, PLINQ does not preserve the order of the source sequence. In this regard, PLINQ resembles LINQ to SQL, but is unlike LINQ to Objects, which does preserve ordering.

The signature of Batch is designed around minimum promises that need to be made but the MoreLINQ implementation is indeed buffered and order-preserving (delivers more than promised). And while I realize that you can't capture everything in a name, does Batch always evoke order? Something like Buffer does.

@atifaziz
Copy link
Member Author

Considering that all but potentially the last batch will have the same size, I think it's fair to return a sequence of arrays. The signature will also be aligned with Chunk and allow just redirecting to it internally for a build targeting .NET 6+.

@atifaziz atifaziz changed the title Batch should return an IEnumerable<IList<T>> Batch should IEnumerable<T[]> Oct 14, 2023
@atifaziz atifaziz self-assigned this Oct 14, 2023
@atifaziz atifaziz linked a pull request Oct 15, 2023 that will close this issue
atifaziz added a commit that referenced this issue Oct 15, 2023
This is a squashed merge or PR #1014 that closes #98.
@atifaziz atifaziz changed the title Batch should IEnumerable<T[]> Batch should return IEnumerable<T[]> Oct 29, 2023
julienasp pushed a commit to julienasp/MoreLINQ that referenced this issue Nov 14, 2023
This is a squashed merge or PR morelinq#1014 that closes morelinq#98.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants