Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: add CollectN with preallocated size. #68261

Open
earthboundkid opened this issue Jul 1, 2024 · 16 comments
Open

proposal: slices: add CollectN with preallocated size. #68261

earthboundkid opened this issue Jul 1, 2024 · 16 comments
Labels
Milestone

Comments

@earthboundkid
Copy link
Contributor

Proposal Details

As discussed in #61900, it's a shame that slices.Collect(maps.Keys(m)) and slices.Sorted(maps.Keys(m)) don't preallocate the size of m. You can work around it by writing slices.Sorted(slices.Append(make([]T, 0, len(m)), maps.Keys(m))), but that's kind of ugly.

I propose adding func CollectN[S ~[]T, T any](seq iter.Seq[T], minsize int) S to slices that just returns Append(make([]T, 0, minsize), seq). It would help with map keys but also other container types where you know the length in advance or can at least make a good guess.

Sorting map keys more efficiently would look like keys := slices.CollectN(maps.Keys(m), len(m); slices.Sort(keys).

@fzipp
Copy link
Contributor

fzipp commented Jul 1, 2024

The name gives me more maxsize than minsize vibes.

@andig
Copy link
Contributor

andig commented Jul 1, 2024

Sorting map keys more efficiently would look like keys := slices.CollectN(maps.Keys(m), len(m); slices.Sort(keys).

I guess we'd rather change the implementation of slices.Sorted to use CollectN internally? But maybe not.

That said, the function signature is not particularly nice. I guess (sorry, no evidence) that the majority of use cases is around the functions of maps.KeysSlice() and maps.ValuesSlice(). If that's the case, then remaining cases could be covered as above by:

slices.Append(make([]K,0, len(m)), seq)

Are there other wide-spread cases that need to be covered?

@earthboundkid
Copy link
Contributor Author

The name gives me more maxsize than minsize vibes.

ChatGPT suggests the name slices.CollectWithCapacity. Seems little long. CollectWithCap?

I guess we'd rather change the implementation of slices.Sorted to use CollectN internally? But maybe not.

I don't understand. Are you suggesting adding slices.SortedWithCap? It wouldn't be possible to use CollectWithCap internally since Sorted couldn't know the minimum length.

Sorted is just Collect+Sort, so I don't see the point of making a bunch of variations on it. If you need better Collecting, then use CollectWithCap, if not, use plain Sorted.

Are there other wide-spread cases that need to be covered?

I think this would be a good general purpose tool. You could use it with a hypothetical container/set type, for example, or if we ever genericize container/list and container/heap. I have a generic deque type I would have used this with if it had existed.

@andig
Copy link
Contributor

andig commented Jul 1, 2024

I have a generic deque type I would have used this with if it had existed.

I still assume (sorry), that we'll have 90% maps keys/values.

@Jorropo
Copy link
Member

Jorropo commented Jul 1, 2024

I am rehashing arguments that have already been made, but this whole API which is more mental load on the programmer and more places to make mistake wouldn't need to exists if the iter form was an interface instead of a function value then we could extend it conditionally:

package iter

type Seq[V any] interface {
	Iter(yield func(V) bool)
}
type Seq2[K, V any] interface {
	Iter(yield func(K, V) bool)
}

// Len can be implemented by [Seq] and [Seq2] as optional features.
type Len interface {
	// Len returns how many calls to yield Iter will do if ok == true.
	// To be a correct implementation it MUST do n many calls, no more no less.
	// When ok == false the length is not known.
	Len() (n int, ok bool)
}

// Cap can be implemented by [Seq] and [Seq2] as optional features.
type Cap interface {
	// Cap returns how many calls to yield Iter could do if ok == true.
	// To be a correct implementation it MUST do at most n many calls, no more.
	// When ok == false the capacity is not known.
	Cap() (n int, ok bool)
}
package slices

import "iter"

func Collect[V any](i iter.Seq[V]) []V {
	var sizeHint int
	var hasSizeHint bool
	if l, ok := i.(iter.Len); ok {
		sizeHint, hasSizeHint = l.Len()
	}
	if !hasSizeHint {
		if l, ok := i.(iter.Cap); ok {
			sizeHint, hasSizeHint = l.Cap()
		}
	}
	if !hasSizeHint {
		sizeHint = 0
	}
	r := make([]V, 0, sizeHint)
	for v := range i {
		r = append(r, v) // if you actually submitted this code to std you probably would want hardening against buggy iterators that return wrong .Len and .Cap.
	}
	return r
}

Having it be optionally implemented would make this so much better, and more generally anything where you sometime can do some extra optimizations if you know the size ahead of time.

@fzipp
Copy link
Contributor

fzipp commented Jul 1, 2024

if the iter form was an interface instead of a function value then

But it isn't. Let's focus on building on top of what is now.

@Merovius
Copy link
Contributor

Merovius commented Jul 1, 2024

@fzipp @earthboundkid I would suggest the name "hint" or "sizehint" for the parameter, as it takes the same role as the size hint parameter for make with maps: It has no impact on the semantics or correctness of the program, but makes it more efficient if given correctly.

The name CollectN seems pragmatic to me.

@ianlancetaylor
Copy link
Contributor

Like @fzipp above, to me the name CollectN suggests "collect up to N elements into a slice."

And as others have observed we can already do this using slices.AppendSeq.

I think this function might be premature at this point.

@zigo101
Copy link

zigo101 commented Jul 2, 2024

If an iterator must be used, consider adding a slices.CollectInto function for preallocation need. Or don't use iterator at all.

@ianlancetaylor
Copy link
Contributor

@zigo101 That sounds like the existing slices.AppendSeq.

@earthboundkid
Copy link
Contributor Author

I think it would be okay to put this proposal on hold for six months while we get used to using iterators in production.

@zigo101
Copy link

zigo101 commented Jul 2, 2024

@earthboundkid You can use nstd.CollectMapKeys now. :D

@earthboundkid
Copy link
Contributor Author

That's sort of what I mean. We can just use our personal iteration libraries for six months or so then regroup and see if we mostly want CollectWithCap (because we need to collect a lot of other containers into slices) or mostly want maps.KeySlice or maps.KeysSorted etc. based on real world usage. It's a little theoretical for now, although I have started doing some stuff with x/net/html iterators in production.

@seankhliao seankhliao changed the title proposal: slices: Add CollectN with preallocated size. proposal: slices: add CollectN with preallocated size. Jul 11, 2024
@zigo101
Copy link

zigo101 commented Jul 24, 2024

It might be a bad idea to put "slices.Collect" in std.

@andig
Copy link
Contributor

andig commented Aug 3, 2024

Imho it is very unfortunate that something as simple and often used as exp/maps/Keys() should become something as unobvious as

keys := slices.CollectN(maps.Keys(m), len(m))

Majority of use cases is probably better addressed adding dedicated functions for maps.Keys and slices.Values. Suggesting to revisit #61626

@jub0bs
Copy link

jub0bs commented Sep 18, 2024

If the main concern is about efficiently iterating over sorted keys of a map (and possibly its associated values), wouldn't it be addressed by the addition of the following functions (perhaps with better names) to the maps package, functions that would use the knowledge of the number of keys in the map?

func Sorted[M ~map[K]V, K cmp.Ordered, V any](m M) iter.Seq2[K, V]

func SortedFunc[M ~map[K]V, K comparable, V any](m M, cmp func(K, K) int) iter.Seq2[K, V]

The need for slices.CollectN would become moot, IMO.

Edit: I see that @earthboundkid had already suggested something similar (the addition of a SortedKeys function) in #61900 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

9 participants