Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/exp/slices: add Chunk function to divide []T into [][]T chunks #53987

Open
mdlayher opened this issue Jul 21, 2022 · 8 comments
Open

Comments

@mdlayher
Copy link
Member

mdlayher commented Jul 21, 2022

A problem I've run into a fair amount is dealing with APIs which only accept a maximum number of inputs at once, though I may have more than that number of inputs that I would like to ultimately process.

For example, Amazon S3 can only delete up to 1000 objects in a single DeleteObjects API request (https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html). I've run into similar issues in the past when injecting a large number of routes into the Linux kernel via route netlink, or when modifying a large number of WireGuard peers at once.

To deal with this situation in a generic way, I've come up with the following:

package slices

// Chunk batches []T into [][]T in groups of size. The final chunk of []T will be
// smaller than size if the input slice cannot be chunked evenly. It does not
// make any copies of slice elements.
//
// As an example, take a slice of 5 integers and create chunks of 2 integers
// each (the final value creates a short chunk):
//   slices.Chunk([]int{1, 2, 3, 4, 5}, 2) = [][]int{{1, 2}, {3, 4}, {5}}
func Chunk[T any](slice []T, size int) [][]T {
	var chunks [][]T
	for i := 0; i < len(slice); {
		// Clamp the last chunk to the slice bound as necessary.
		end := size
		if l := len(slice[i:]); l < size {
			end = l
		}

		// Set the capacity of each chunk so that appending to a chunk does not
		// modify the original slice.
		chunks = append(chunks, slice[i:i+end:i+end])
		i += end
	}

	return chunks
}

Which is then used as follows:

	objs, err := c.allObjects(ctx)
	if err != nil {
		return nil, err
	}

	// Begin deleting the objects concurrently in batches of 1000 (the
	// maximum limit size supported by S3).
	const limit = 1000

	eg, ctx := errgroup.WithContext(ctx)
	eg.SetLimit(10)

	for _, chunk := range slices.Chunk(objs, limit) {
		chunk := chunk
		eg.Go(func() error { /* deletion logic */ })
	}

If this proposal is accepted, I'd be happy to send a CL for the above. Thanks for your time.

Edit 1: updated second parameter name to size int as inspired by Rust's chunks method: https://doc.rust-lang.org/std/primitive.slice.html#method.chunks.

Edit 2: set capacity for each chunk per next comment.

@gopherbot gopherbot added this to the Proposal milestone Jul 21, 2022
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jul 21, 2022
@JeremyLoy
Copy link

JeremyLoy commented Jul 22, 2022

A small correction to your given implementation:

https://github.com/golang/go/wiki/SliceTricks#batching-with-minimal-allocation

it’s important to use the 3 arg form when chunking, otherwise appending to a chunk could overwrite unintentionally

https://go.dev/play/p/Zp0Dh5cB1rB

@sten4eg
Copy link

sten4eg commented Jul 22, 2022

@carlmjohnson
Copy link
Contributor

carlmjohnson commented Jul 23, 2022

I just needed this functionality last week and almost opened an issue. I think it would be useful.

@icholy
Copy link

icholy commented Jul 25, 2022

I think Batch is a better name.

@DeedleFake
Copy link

DeedleFake commented Jul 25, 2022

I wonder if this might be better to wait on a general iterator API for. If one of the main intentions is to use it with range, this is a fairly inefficient way to do it, but its existence will encourage people to use it anyways. Map() and Filter() were dropped from slices for the same reason. This could be fairly easily done manually with a for loop in an efficient way, however, if there was some easier way to clamp that last index:

for start := 0; start < len(slice); start += chunkSize {
  end := min(start + chunkSize, len(slice))
  chunk := slice[start:end:end]

  // ...
}

Is there a common use for this that doesn't involve immediately proceeding to loop over it?

@icholy
Copy link

icholy commented Jul 31, 2022

Ive needed this function for 2 out of the last 3 projects I've worked on. An iterator based approach would not have had any benefit in either of these use cases.

@DeedleFake
Copy link

DeedleFake commented Aug 1, 2022

But would an iterator-based approach have been worse, @icholy? The inner slices don't need to be reallocated in either case, and the outer slice does need to be newly allocated in either case, so I don't think that an iterator-based variant would essentially ever be worse. If there's some iters.ToSlice() function, then it would only take one extra function call to get a slice if that would be easier to work with for a specific use-case.

@kf6nux
Copy link

kf6nux commented Aug 23, 2022

You probably want to be explicit about your panic condition and panic in one additional case where size == 0

if size < 1 {
	panic("chunk size cannot be less than 1")
}

Here's an alternate implementation that some may like more

func Chunk[T any](slice []T, size int) (chunks [][]T) {
	if size < 1 {
		panic("chunk size cannot be less than 1")
	}
	for i := 0; ; i++ {
		next := i * size
		if len(slice[next:]) > size {
			end := next + size
			chunks = append(chunks, slice[next:end:end])
		} else {
			chunks = append(chunks, slice[i*size:])
			return
		}
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

8 participants