Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: builtin add func resize(v Type, size) #41551

Open
elichai opened this issue Sep 22, 2020 · 23 comments
Open

proposal: Go 2: builtin add func resize(v Type, size) #41551

elichai opened this issue Sep 22, 2020 · 23 comments

Comments

@elichai
Copy link

@elichai elichai commented Sep 22, 2020

The make builtin allow you to specify a capacity for maps,slices and channels, but once you specify you cannot currently change that.
With slices you could make a new slice, copy over and assign to the current slice, but with maps/channels it is harder.
Maps will auto-scale with inserts but sometimes you know you're going to add X elements to the map so you can tell it to reallocate/resize beforehand.
Channels are even worse, you currently are stuck with the buffer you choose when you created the channel, sometimes a single channel is used to communicate with multiple go routines, and you want to resize the buffer if you launch more routines s.t. no routine will block on the channel.

I think a way to resize/realloc a type can be very useful, both for performance and usability,
an alternative for a new builtin can be to add this support to the make builtin

@gopherbot gopherbot added this to the Proposal milestone Sep 22, 2020
@gopherbot gopherbot added the Proposal label Sep 22, 2020
@davecheney
Copy link
Contributor

@davecheney davecheney commented Sep 22, 2020

Thank you for raising this proposal.

I don’t think there is much value in resizing slices as they grow geometrical when appended to. If this is not good enough making a new slice and copying the contents is two lines.

Maps grow by doubling (I think) and provide a size hint. Perhaps this is good enough, if not, some benchmarks showing that map rehashing is dominating insert performance would be valuable for this proposal.

I cannot see how a slice could be resized safely without additional locking. There are probably some unanswered questions when a channel is resized from unbuffered to buffered.

@martisch
Copy link
Contributor

@martisch martisch commented Sep 22, 2020

With slices you could make a new slice, copy over and assign to the current slice, but with maps/channels it is harder.
Is it really that much harder for maps to warrant a new language addition?

new := make(map[X]Y, capacity)
for key, value := range old {
    new[key] = value
}
old = new

If performance is a concern that can solved without a language change:
#26951

With generics a generic map copy function can be defined not requiring more typing than a builtin and no additional language change.

Can you pleas spell out in more detail what resize does and where it is useful and how often that comes up?
e.g. Does it an inplace replace of a slice header like a = append(a, ...)? Are existing pointers/references to the old backing array of the slice effected? Is a size lower then the current size allowed? What happens if the size is smaller then the number of elements the map currently holds?...

@martisch martisch changed the title proposal: builtin add func resize(v Type, size) proposal: Go 2: builtin add func resize(v Type, size) Sep 22, 2020
@martisch
Copy link
Contributor

@martisch martisch commented Sep 22, 2020

For language change proposals, please fill out the template at https://go.googlesource.com/proposal/+/refs/heads/master/go2-language-changes.md .

When you are done, please reply to the issue with @gopherbot please remove label WaitingForInfo.

Thanks!

@mvdan

This comment has been hidden.

@martisch

This comment has been hidden.

@elichai
Copy link
Author

@elichai elichai commented Sep 22, 2020

I don’t think there is much value in resizing slices as they grow geometrical when appended to. If this is not good enough making a new slice and copying the contents is two lines.

I agree that slices have the least value in this proposal

Maps grow by doubling (I think) and provide a size hint. Perhaps this is good enough, if not, some benchmarks showing that map rehashing is dominating insert performance would be valuable for this proposal.

I can redo the benchmarks but I did see runtime.mapassign very high in my app's benchmarks.

I cannot see how a slice could be resized safely without additional locking. There are probably some unanswered questions when a channel is resized from unbuffered to buffered.

well no slice operation is thread safe.
about channels, I really don't know how their internals are implemented, so I can't quite comment on how to resize them safely

new := make(map[X]Y, capacity)
for key, value := range old {
    new[key] = value
}
old = new

If performance is a concern that can solved without a language change:
#26951

A. I did not know this is possible :).
B. the performance of this is definitely a concern.

Can you pleas spell out in more detail what resize does and where it is useful and how often that comes up?

the map scenerio comes a lot in functions like
it comes a lot in code like:

func (c myCollection) AddMultiple(objs []Objects) {
    for _, obj := range objs {
        c[obj.key()] = obj.value()
    }
}

or even:

func (c myCollection) Merge(other myCollection) {
    for key, value := range other {
        c[key] = value
    }
}

e.g. Does it an inplace replace of a slice header like a = append(a, ...)? Are existing pointers/references to the old backing array of the slice effected? Is a size lower then the current size allowed? What happens if the size is smaller then the number of elements the map currently holds?...

obviously just like with make the size is just a hint that the compiler can ignore, about the sizes we can either ignore, or decide that the caller provides the additional capacity not the resulting capacity. (that's how rust's Vec/HashMap works, on the contrary C++'s std::vector/map is given the resulting capacity, and it is ignore if equal or less than the current capacity)

about pointers/references it should work just like a natural resizing caused by appending/inserting (so yeah it's kinda weird in slices)

EDIT: @martisch Should I write a comment here/edit the OP with answers to all the questions in the template?

@martisch
Copy link
Contributor

@martisch martisch commented Sep 22, 2020

The map examples given just show adding elements. There would need to be an additional resize call that is trying to more then double the capacity. How would that look like in the code above and what capacity would it target?

If its not more than doubling the internal resize mechanism of maps would do the same as the resize on the fly already. A resize will need to also copy all elements to a new map allocation in the current design of maps to grow the map.
Spending alot of time in mapassign is not necessarily a sign of map growth but can entirely be generated by just adding elements to a big enough map. runtime.evacuate_* functions are work done for grow work. As discussed note that this work will not entirely go away. each resize would need to do similar work to the amount of exisitng elements in the map.

If performance is lost on alot of loops adding elements to a map an option would also be for the compiler to automatically initiate a growth to more than 2 times the map size if the two be added elements in that loop are more than double the size of the map. This wouldnt need an explicit hint by the programmer.

More details can be added in an extra comment or in the top comment.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 22, 2020

This seems difficult to implement for channels. And I find the rationale unconvincing. Does any code ever really adjust the amount of workers on the fly while in the middle of processing? If not in the middle of processing, it's easy to simply create a new channel.

As noted, it's not that useful for slices.

Even more maps, changing the map size means rehashing all the elements, so although it may look simpler than a loop assigning to elements to a new map, it won't actually run any faster.

So I'm not seeing a significant benefit here.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 23, 2020

The need to resize channel buffers is real, IMO, and previously discussed in #20868 and #20352.

An elastic channel can be approximated with a goroutine, a buffer, and two channels for inbound and outbound msgs.

@elichai
Copy link
Author

@elichai elichai commented Sep 23, 2020

The map examples given just show adding elements. There would need to be an additional resize call that is trying to more then double the capacity. How would that look like in the code above and what capacity would it target?

1: (hint to increase by X)

func (c myCollection) Merge(other myCollection) {
    resize(c, len(other))
    for key, value := range other {
        c[key] = value
    }
}

2: (hint the max cap we want)

func (c myCollection) Merge(other myCollection) {
    resize(c, len(c)+len(other))
    for key, value := range other {
        c[key] = value
    }
}

it could also act similar to append (i.e c = resize(c, len(other))) but I personally don't enjoy the ergonomics of that (and I'm not even sure they'll have the same affect, ie by-ref/value etc)

@elichai
Copy link
Author

@elichai elichai commented Sep 23, 2020

This seems difficult to implement for channels. And I find the rationale unconvincing. Does any code ever really adjust the amount of workers on the fly while in the middle of processing? If not in the middle of processing, it's easy to simply create a new channel.

The use case is for example a server that can accept arbitrary connections, and a few routines at the side processing data from those connections via a channel, and every X new connections i'd like to increase the channel capacity so the connections won't block too much on the channel, but I don't want to start with a 2GB channel even if I only have 1 connection.

Even more maps, changing the map size means rehashing all the elements, so although it may look simpler than a loop assigning to elements to a new map, it won't actually run any faster.

If a map has 4 capacity and I'm going to push 5K elements it will reallocate + copy 11 times instead if I resize before it will check if the cap is enough for that and if not will reallocate+copy beforehand, thus avoiding doing it over and over

@martisch
Copy link
Contributor

@martisch martisch commented Sep 23, 2020

A few questions as the semantics of resize havent been spelled out completly yet:

Assuming: "2: (hint the max cap we want)"

What happens when the hinted at capacity is smaller than the existing capacity: Does resize downsize the map?

Does the resize happen for the whole map immediatly or is it done by having each write and delete do a bit of the resize like current map growth?

What happens if the hint provided is larger than a map is allowed to be (exceeding memory limits)?

Does resize require an exact capacity or is it only a hint e.g. for map it is allowed to have next power of 2 capacity like the current make on a map allocates?

@elichai
Copy link
Author

@elichai elichai commented Sep 23, 2020

What happens when the hinted at capacity is smaller than the existing capacity: Does resize downsize the map?

This question is why I prefer passing the additional capacity, not the max, but if not I see multiple possible semantics:

  1. Being exact, ie if new_cap<len it will truncate the buffer, this works for slice+channels but makes no sense for maps.
  2. Max capacity hint, if new_cap<cap resize the buffer to cap=max(new_cap, len).
    IMO It doesn't make sense to pass a full capacity but not to resize when smaller.

Does the resize happen for the whole map immediatly or is it done by having each write and delete do a bit of the resize like current map growth?

whole map immediately, I don't see any benefits to resizing "slowly", this is counter-productive to the proposal.

What happens if the hint provided is larger than a map is allowed to be (exceeding memory limits)?

the same behavior that will happen with the make builtin.

Does resize require an exact capacity or is it only a hint e.g. for map it is allowed to have next power of 2 capacity like the current make on a map allocates?

it can just pass the next power of 2 but as you said the user will gain nothing from doing that, but of course the compiler doesn't have to increase, it might use heuristics to increase even more / a little less to allow better performance (ie allocate a whole page if the specified capacity is close enough to a full page)

AFAIU this is the exact same semantics as make(map[Key]Value, cap)

@martisch
Copy link
Contributor

@martisch martisch commented Sep 23, 2020

I prefer passing the additional capacity

So when discussing further should it be assumed the proposal means the hint is to be used for additional capacity?

Can you spell out how the language specification would explain the semantics of the resize on maps?

Can you elaborate what that then means for the current hash map implementation. e.g. is the hint making sure there is at least that much open slots somewhere in the map buckets including overflow buckets? Note that depending on hash collisions that does not mean adding that much elements doesnt cause a map growth to happen.

Is a resize on a nil map allowed?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 24, 2020

If a map has 4 capacity and I'm going to push 5K elements it will reallocate + copy 11 times instead if I resize before it will check if the cap is enough for that and if not will reallocate+copy beforehand, thus avoiding doing it over and over

I'm suggesting using make to create a map at the desired size, then copying the elements.

My point is that that is exactly what the suggested resize operation is going to do anyhow.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 24, 2020

The use case is for example a server that can accept arbitrary connections, and a few routines at the side processing data from those connections via a channel, and every X new connections i'd like to increase the channel capacity so the connections won't block too much on the channel, but I don't want to start with a 2GB channel even if I only have 1 connection.

As @networkimprov said earlier, this can be done by interposing another goroutine that manages the buffer space.

I would be less concerned about this if there were an easy way to safely change the buffer size of a channel. But since I don't see an easy way, I think we need a very compelling benefit to pay the cost of writing and maintaining an implementation.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 24, 2020

Ian, what's complicated about changing a channel's buffer size?

@elichai
Copy link
Author

@elichai elichai commented Sep 24, 2020

My point is that that is exactly what the suggested resize operation is going to do anyhow.

No, it is currently not possible to check the capacity of map meaning that it won't be possible to implement this exactly the same in the user side (even putting aside the perf of #26951).

I know that capacity for maps doesn't always make perfect sense, but it still make some, and that depends on the map implementation (ie capacity makes a lot of sense in a SwissTable implementation),
and because of that we might not want to expose the "capacity" of the map to the user, but the compiler can definitely check some heuristics to estimate if it will need to resize for a new X items, even for linked list type hashmaps (unlike SwissTable)

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 24, 2020

@networkimprov The issue with changing channel buffer size is what I mentioned above: changing between a buffered and an unbuffered channel requires waking every goroutine currently sleeping on the channel and letting those goroutines requeue themselves with changes appropriate for the difference between buffered and unbuffered channels.

@elichai If the only issue is supporting cap on a map, then lets talk about that rather than talking about resize.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 24, 2020

Why not only allow channel buffer resize for channels with non-zero buffer? If you have a channel that may need to resize, I don't know why you'd ever start with no buffer.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 25, 2020

It's confusing to have capabilities with mysterious restrictions. We would need to have a really big benefit to be worth paying that cost.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Sep 25, 2020

Why not only allow channel buffer resize for channels with non-zero buffer? If you have a channel that may need to resize, I don't know why you'd ever start with no buffer.

How would this be reported at runtime?

Would it be permitted to resize a channel capacity downwards?

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 25, 2020

It's expensive to require a 4K goroutine for each channel that normally needs 64 bytes of buffer and sees occasional traffic spikes requiring larger buffers. It's a 50% increase in goroutines beyond the producer and consumer using the channel.

Maybe the overhead of GC swamps this in practice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants