Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: allow cap(make([]T, m, n)) > n #24204

Open
mdempsky opened this issue Mar 1, 2018 · 35 comments
Open

proposal: Go 2: allow cap(make([]T, m, n)) > n #24204

mdempsky opened this issue Mar 1, 2018 · 35 comments

Comments

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Mar 1, 2018

Currently, the Go spec requires that cap(make([]T, n)) == n and cap(make([]T, m, n)) == n. I propose relaxing this constraint from == to >=.

Rationale: the Go runtime already pads allocations up to the next malloc bucket size. By treating the user supplied capacity argument as a lower bound rather than exact bound, there's a possibility to make use of this padding space.

Also, if users really need an exact capacity (which seems like it would be very rare), they can write make([]T, n)[:n:n] or make([]T, m, n)[:m:n].

(See also discussion in #24163.)

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 1, 2018

It would be hard to "gofix" this in some hypothetical "go1to2" tool. We'd need to conservatively rewrite all make into make([]T, m, n][:m:n] which would make the rewritten code pretty ugly.

@josharian
Copy link
Contributor

@josharian josharian commented Mar 1, 2018

It might be an interesting experiment to hack this into the compiler/runtime and do a regression test and see what breaks (@dsnet).

Yes, it'd be ugly, but the gofix rule would at least be very simple.

As I noted in #24163, this optimization could be made opt-in (instead of opt-out) with an appropriate runtime function.

There's another option in this arena, which is to say "roughly" rather than == or >= in the spec, as we do with maps. The compiler and runtime could use this freedom along with PGO or introspection to make better decisions, treating the info from the user as just a hint to be used when no other info was available.

@cznic
Copy link
Contributor

@cznic cznic commented Mar 1, 2018

The three-index syntax of a slice was introduced to extend the precise control of slice cap. The proposal, even though in a different place, goes exactly the opposite way. I realize it can bring some performance improvements, but that's IMO not a good reason for breaking backward compatibility with Go 1 programs. Please consider a new syntax enabling this feature. There are plenty of options.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 1, 2018

Please consider a new syntax enabling this feature. There are plenty of options.

make([]T, 2, ~10)

Fortunately the bitwise complement operator is ^, freeing up tilde!

@cznic
Copy link
Contributor

@cznic cznic commented Mar 1, 2018

make([]T, 2, 10+)

@josharian
Copy link
Contributor

@josharian josharian commented Mar 1, 2018

Also time.NewTicker(~3*time.Second). (I’m mostly joking, but macOS actually does support vague timers; they can be coalesced for power savings or spread to avoid cpu spikes.)

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Mar 1, 2018

As I noted in #24163, this optimization could be made opt-in (instead of opt-out) with an appropriate runtime function.

That becomes rather clumsy though, because to properly round you'll need to incorporate the element size too, not just the element count. So users would have to write something like:

make([]T, runtime.RoundUp(n * unsafe.Sizeof(T{})) / unsafe.Sizeof(T{}))

or

make([]T, runtime.RoundUp(n, unsafe.Sizeof(T{})))

(And of course T{} needs to be changed appropriately if T is something other than an array/struct/slice/map type.)

The three-index syntax of a slice was introduced to extend the precise control of slice cap.

That control is important, but seems exceedingly rarely used.

Looking through Google's internal Go code base, I've found only a couple uses of full slice expressions, and they're all of the form:

x = append(x[:len(x):len(x)], y)

where the original x slice came from somewhere else (e.g., function argument or global), and they want to make sure to create a new slice.

I strongly suspect the majority of users don't care about how much capacity is allocated for them by make. I don't see any evidence that users are dissatisfied with the lack of control over exactly how much
capacity is allocated by append when it needs to grow the slice, for example. If they were, I'd expect to see full slice operations used in other contexts.

@faiface
Copy link

@faiface faiface commented Mar 1, 2018

(Don't get offended.) This proposal basically proposes to cripple the language and make it less predictable (like really, imagine debugging a bug caused by this) in order to accommodate a very minor CPU-architecture-specific performance optimization. This mistake has been done many times in many languages. Don't make Go one of them.

@slrz
Copy link

@slrz slrz commented Mar 1, 2018

I have recently written some code that would be broken by this change. Easy to fix, of course, but it'd be interesting to know whether this is more common.

The gist of it was:

var commonPrefix = make([]byte, N)

func f(arg Type) []byte {
        return append(commonPrefix, computeSuffix(arg)...)
}
@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Mar 1, 2018

This proposal basically proposes to cripple the language

I think that's strongly overstating the potential downsides.

Here's a simple experiment to just uniformly overallocate every make call of less than 1000 elements:

--- a/src/runtime/slice.go
+++ b/src/runtime/slice.go
@@ -58,6 +58,10 @@ func makeslice(et *_type, len, cap int) slice {
                panic(errorString("makeslice: cap out of range"))
        }
 
+       if uintptr(cap) < 1000 && et.size < 10000 {
+               cap++
+       }
+
        p := mallocgc(et.size*uintptr(cap), et, true)
        return slice{p, len, cap}
 }

This breaks a couple standard library and regress tests, but they all appear to be for verifying that make itself works according to spec.

There was one hairy crash in runtime/trace, which I've now diagnosed. The process for tracking it down was:

  1. I first searched for appearances of "cap(" in runtime/trace/*.go; finding none, I looked in runtime/*.go.
  2. I noticed "allp[:cap(allp)]" in runtime/trace.go, so then I looked at where allp was allocated.
  3. In proc.go, we allocate make([]*p, nprocs).
  4. I experimentally changed this to make([]*p, nprocs)[:nprocs:nprocs], and the crash went away.

I would argue the real issue here is that runtime is using an inconsistent mix of len(allp) and cap(allp). Another fix that works perfectly fine is just changing all cap(allp) to len(allp), which seems desirable if only to keep the code simpler and easier to understand.

(like really, imagine debugging a bug caused by this)

It didn't seem too bad. :)

@josharian
Copy link
Contributor

@josharian josharian commented Mar 2, 2018

Here's a simple experiment

Now that you have it working, if you change it to round up instead, are there any notable stdlib benchmark changes?

Interestingly, allp cap issues were also what I hit with CLs 21813 and 22197, which changed append behavior, which is further evidence that just using len is probably better in this case.

@faiface
Copy link

@faiface faiface commented Mar 2, 2018

Btw, if someone really wants to make this happen, I would suggest a slight modification and get the requirements as follows:

cap(make([]T, n)) >= n
cap(make([]T, m, n)) == n

So, when you don't specify the capacity explicitly, the compiler can optimize that for you. This would be sort of acceptable, I guess.

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 2, 2018

We'd need to conservatively rewrite all make into make([]T, m, n][:m:n] which would make the rewritten code pretty ugly.

Fortunately, make is a builtin and not a keyword, right?

So you add a new package, go1, with implementations of any builtins that changed, and rewrite make([]T, m, n) to go1.Make([]T, m, n).

Or, you add a three-arg “bounded make” which specifies a lower and upper bound, and rewrite make([]T, m, n) to make([]T, m, n, n).

@faiface
Copy link

@faiface faiface commented Mar 3, 2018

Or, you add a three-arg “bounded make” which specifies a lower and upper bound, and rewrite make([]T, m, n) to make([]T, m, n, n).

I don't think you care about the lower/upper-bound at all. You either care about the exact capacity, or you don't, in which case the compiler can insert anything and it's fine.

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 3, 2018

@slrz You could also write that code as

var commonPrefix = make([]byte, N)

func f(arg Type) []byte {
        return append(append([]byte{}, commonPrefix...), computeSuffix(arg)...)
}

If #18605 and #12854 are accepted, that would simplify further:

func f(arg Type) []byte {
        return append({}, commonPrefix..., computeSuffix(arg)...)
}

That seems much more difficult to go fix, but makes the behavior of the append call no longer depend on an allocation site that may become arbitrarily far away as the code evolves.

@josharian
Copy link
Contributor

@josharian josharian commented Mar 15, 2018

A related question for this proposal: Is it ok for slice literals to have cap > len? E.g. can cap([]byte{0}) be > 1?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 24, 2018

I tend to feel that if you ask for an explicit capacity, as in make([]T, n, m), you ought to get that capacity.

I think there is an argument for not precisely specifying the capacity for make([]T, n).

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Apr 24, 2018

It occurs to me that even if make([]T, n, m) continues to reserve exactly m capacity, a user could write make([]T, m)[:n] to get the flex-capacity behavior. Also, make([]T, m)[:n] (to get flex-capacity using an exact-capacity make) is slightly less clunky than make([]T, n, m)[:n:m] (to get exact-capacity using a flex-capacity make).

@gopherbot
Copy link

@gopherbot gopherbot commented May 6, 2018

Change https://golang.org/cl/111646 mentions this issue: text/tabwriter: allow initial slice size to be rounded up

@josharian
Copy link
Contributor

@josharian josharian commented May 6, 2018

Now that CL 109816 is in, you can sorta kinda approximate this in user code:

s := append([]T(nil), make([]T, m)...)[:0]

It's more expensive than make merely rounding up, but not dramatically so.

CL 111646 shows it in action, along with some benchmark results. You can see that there is a pretty significant improvement in some cases. And the regressions in other cases would go away if this were a language change rather than a compiler optimization of a pretty awkward bit of code.

Might be interesting for Go 1.12 to look at other make([]T, 0, m) instances in the standard library and see whether analogous changes are worthwhile.

@martisch
Copy link
Contributor

@martisch martisch commented May 6, 2018

Interesting! I had not thought about this issue in relation to https://golang.org/cl/109517 " cmd/compile: optimize append(x, make([]T, y)...) slice extension".

I will have a look at recognizing append([]T(nil), make([]T, m)...)[:0] and rewriting it into
a special makeslice variant (that optimizes the cap for allocation bucket size) and see how much better it could perform vs the current extend optimized version by cl/109816 which goes through growslice.

Others then can use that cl to do their own experiments and see if it helps and we do not have to change the spec for it. If it turns out to be helpful in comparison to the complexity it adds to the compiler we could submit the cl to recognize: append([]T(nil), make([]T, m)...)[:0] or alike making a spec change not necessary to cover the optimization use case until this proposal has been accepted.

@gopherbot
Copy link

@gopherbot gopherbot commented May 6, 2018

Change https://golang.org/cl/111736 mentions this issue: cmd/compile: optimize append([]T(nil), make([]T, n)...)

josharian added a commit to josharian/go that referenced this issue May 8, 2018
DO NOT SUBMIT
[given that the normal case for this package is Table,
this probably isn't a net win. But it is interesting.]

Updates golang#24204

name                      old time/op    new time/op    delta
Table/1x10/new-8            2.51µs ± 0%    2.69µs ± 0%   +7.33%  (p=0.000 n=14+12)
Table/1x10/reuse-8          1.31µs ± 1%    1.32µs ± 0%     ~     (p=0.433 n=14+13)
Table/1x1000/new-8           158µs ± 0%     176µs ± 0%  +11.33%  (p=0.000 n=13+13)
Table/1x1000/reuse-8         117µs ± 2%     117µs ± 1%     ~     (p=0.170 n=15+13)
Table/1x100000/new-8        28.2ms ± 1%    30.2ms ± 1%   +7.04%  (p=0.000 n=15+15)
Table/1x100000/reuse-8      12.2ms ± 2%    12.3ms ± 1%   +0.65%  (p=0.021 n=15+15)
Table/10x10/new-8           7.07µs ± 1%    7.31µs ± 1%   +3.40%  (p=0.000 n=14+14)
Table/10x10/reuse-8         4.94µs ± 2%    4.96µs ± 2%     ~     (p=0.310 n=14+15)
Table/10x1000/new-8          575µs ± 1%     596µs ± 1%   +3.75%  (p=0.000 n=13+14)
Table/10x1000/reuse-8        487µs ± 1%     485µs ± 1%     ~     (p=0.186 n=14+15)
Table/10x100000/new-8       78.2ms ± 3%    80.5ms ± 4%   +2.94%  (p=0.000 n=15+15)
Table/10x100000/reuse-8     64.9ms ± 3%    64.1ms ± 4%     ~     (p=0.713 n=15+15)
Table/100x10/new-8          46.9µs ± 0%    48.0µs ± 0%   +2.21%  (p=0.000 n=14+14)
Table/100x10/reuse-8        38.2µs ± 1%    38.8µs ± 1%   +1.34%  (p=0.000 n=14+14)
Table/100x1000/new-8        4.53ms ± 1%    4.66ms ± 1%   +2.89%  (p=0.000 n=14+14)
Table/100x1000/reuse-8      3.98ms ± 0%    4.01ms ± 0%   +0.94%  (p=0.000 n=14+12)
Table/100x100000/new-8       620ms ± 1%     629ms ± 1%   +1.45%  (p=0.000 n=12+13)
Table/100x100000/reuse-8     532ms ± 1%     598ms ± 1%  +12.48%  (p=0.000 n=15+14)
Pyramid/10-8                5.90µs ± 0%    6.07µs ± 0%   +2.83%  (p=0.000 n=13+15)
Pyramid/100-8                274µs ± 1%     247µs ± 1%   -9.96%  (p=0.000 n=14+15)
Pyramid/1000-8              27.0ms ± 0%    23.9ms ± 1%  -11.62%  (p=0.000 n=14+15)
Ragged/10-8                 7.07µs ± 1%    7.05µs ± 0%     ~     (p=0.060 n=14+14)
Ragged/100-8                60.2µs ± 1%    59.4µs ± 1%   -1.32%  (p=0.000 n=14+15)
Ragged/1000-8                590µs ± 1%     584µs ± 1%   -1.04%  (p=0.000 n=14+15)
[Geo mean]                   515µs          522µs        +1.31%

name                      old alloc/op   new alloc/op   delta
Table/1x10/new-8            1.52kB ± 0%    1.52kB ± 0%     ~     (all equal)
Table/1x10/reuse-8           0.00B          0.00B          ~     (all equal)
Table/1x1000/new-8          99.4kB ± 0%    99.4kB ± 0%     ~     (all equal)
Table/1x1000/reuse-8         9.00B ± 0%     9.00B ± 0%     ~     (all equal)
Table/1x100000/new-8        19.9MB ± 0%    19.9MB ± 0%     ~     (p=0.180 n=14+14)
Table/1x100000/reuse-8       199kB ± 0%     199kB ± 0%     ~     (all equal)
Table/10x10/new-8           5.06kB ± 0%    5.06kB ± 0%     ~     (all equal)
Table/10x10/reuse-8          0.00B          0.00B          ~     (all equal)
Table/10x1000/new-8          387kB ± 0%     387kB ± 0%     ~     (all equal)
Table/10x1000/reuse-8         128B ± 0%      128B ± 0%     ~     (all equal)
Table/10x100000/new-8       49.3MB ± 0%    49.3MB ± 0%     ~     (p=0.107 n=15+15)
Table/10x100000/reuse-8     2.46MB ± 0%    2.46MB ± 0%     ~     (p=0.685 n=15+15)
Table/100x10/new-8          38.0kB ± 0%    38.0kB ± 0%     ~     (all equal)
Table/100x10/reuse-8         0.00B          0.00B          ~     (all equal)
Table/100x1000/new-8        3.25MB ± 0%    3.25MB ± 0%     ~     (p=0.620 n=15+14)
Table/100x1000/reuse-8      10.8kB ± 0%    10.8kB ± 0%     ~     (all equal)
Table/100x100000/new-8       340MB ± 0%     340MB ± 0%   +0.00%  (p=0.011 n=13+15)
Table/100x100000/reuse-8     170MB ± 0%     170MB ± 0%   +0.00%  (p=0.011 n=13+15)
Pyramid/10-8                4.88kB ± 0%    4.88kB ± 0%     ~     (all equal)
Pyramid/100-8                411kB ± 0%     200kB ± 0%  -51.41%  (p=0.000 n=15+15)
Pyramid/1000-8              41.6MB ± 0%    16.2MB ± 0%  -61.12%  (p=0.000 n=15+15)
Ragged/10-8                 1.82kB ± 0%    1.82kB ± 0%     ~     (all equal)
Ragged/100-8                1.82kB ± 0%    1.82kB ± 0%     ~     (all equal)
Ragged/1000-8               1.82kB ± 0%    1.82kB ± 0%     ~     (all equal)
[Geo mean]                   109kB          101kB        -7.63%

name                      old allocs/op  new allocs/op  delta
Table/1x10/new-8              21.0 ± 0%      21.0 ± 0%     ~     (all equal)
Table/1x10/reuse-8            0.00           0.00          ~     (all equal)
Table/1x1000/new-8           1.02k ± 0%     1.02k ± 0%     ~     (all equal)
Table/1x1000/reuse-8          0.00           0.00          ~     (all equal)
Table/1x100000/new-8          100k ± 0%      100k ± 0%     ~     (all equal)
Table/1x100000/reuse-8       1.00k ± 0%     1.00k ± 0%     ~     (all equal)
Table/10x10/new-8             31.0 ± 0%      31.0 ± 0%     ~     (all equal)
Table/10x10/reuse-8           0.00           0.00          ~     (all equal)
Table/10x1000/new-8          1.04k ± 0%     1.04k ± 0%     ~     (all equal)
Table/10x1000/reuse-8         0.00           0.00          ~     (all equal)
Table/10x100000/new-8         100k ± 0%      100k ± 0%     ~     (all equal)
Table/10x100000/reuse-8      5.00k ± 0%     5.00k ± 0%     ~     (all equal)
Table/100x10/new-8            40.0 ± 0%      40.0 ± 0%     ~     (all equal)
Table/100x10/reuse-8          0.00           0.00          ~     (all equal)
Table/100x1000/new-8         1.05k ± 0%     1.05k ± 0%     ~     (all equal)
Table/100x1000/reuse-8        3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Table/100x100000/new-8        100k ± 0%      100k ± 0%     ~     (all equal)
Table/100x100000/reuse-8     50.0k ± 0%     50.0k ± 0%   +0.00%  (p=0.011 n=13+15)
Pyramid/10-8                  35.0 ± 0%      35.0 ± 0%     ~     (all equal)
Pyramid/100-8                  231 ± 0%       163 ± 0%  -29.44%  (p=0.000 n=15+15)
Pyramid/1000-8               2.06k ± 0%     1.11k ± 0%  -45.94%  (p=0.000 n=15+15)
Ragged/10-8                   19.0 ± 0%      19.0 ± 0%     ~     (all equal)
Ragged/100-8                  19.0 ± 0%      19.0 ± 0%     ~     (all equal)
Ragged/1000-8                 19.0 ± 0%      19.0 ± 0%     ~     (all equal)
[Geo mean]                     507            482        -4.95%


Change-Id: I5cce64791cedd03270d875a1189c79bace5fa437
@josharian
Copy link
Contributor

@josharian josharian commented Nov 22, 2018

Another concrete use case for being able to have sizes rounded up for you: math/big.nat.make. The code there doesn't append, presumably because doubling in size is rare for math/big. Instead it manually adds extra capacity of 4, noting that this choice has "significant performance impact". Being able to do make([]T, n+4)[:n], as in
#24204 (comment), to ask the runtime to round up the allocation to the nearest size class, would almost certainly be a net performance win.

@josharian
Copy link
Contributor

@josharian josharian commented Nov 6, 2019

Picking up the thread from #35386 (comment), in which “...” means “fill this in for me”, one possible syntax for this is:

make([]T, n, ...)
@tandr
Copy link

@tandr tandr commented Nov 8, 2019

What about using a type system to indicate the intent of "at least"?

Possibilities:

  1. Using negative number to say "allocate slice, reserve at least 100 elements
    slice := make([]T, 0, -100) // cap(slice) >= 100
  2. Use a special (undefined?) type atleast to call an override of make
    slice := make([]T, 0, atleast(100)) // cap(slice) >= 100
@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Nov 8, 2019

If we want to require users to opt-in to flexible capacity allocation, I think CL 111736 is the right solution: just recognize an existing valid code pattern like append([]T(nil), make([]T, n)...) (as horrible as that is).

My hypothesis with this proposal though was that most code would benefit from allowing the runtime to automatically increase capacity of initial slice allocations, just like how it automatically increases it for appends.

@alanfo
Copy link

@alanfo alanfo commented Nov 8, 2019

Provided it can be done in a backwards compatible, opt-in, basis, this proposal looks worthwhile to me.

Although I haven't always been comfortable with some of the uses proposed for ... lately, @josharian's suggested syntax:

make([]T, n, ...)

looks ideal here.

@tandr
Copy link

@tandr tandr commented Nov 8, 2019

make([]T, n, ...)

means at least n allocated.

What about "at least n+100" for example?

make([]T, n, n+100...)
make([]T, n, 100+n...)

does look weird a bit. (and I am not sure about compiler complications)

@magical
Copy link
Contributor

@magical magical commented Nov 8, 2019

What about "at least n+100" for example?

make([]T, n+100, ...)
@josharian
Copy link
Contributor

@josharian josharian commented Nov 8, 2019

@tandr or you could write make([]T, n+100, ...)[:n]. Not particularly graceful, but I also don't know how common this case is.

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Nov 8, 2019

I am not sure about compiler complications

make([]T, n...) and make([]T, n, _) are already valid syntax (e.g., they're already handled by gofmt today); they're just rejected by type-checking. This is a small change.

make([]T, n, ...) is not valid syntax. It's going to require every Go parser and every Go AST to change. This is a large change.

(That said, I'm not convinced new syntax is merited here.)

@tandr
Copy link

@tandr tandr commented Nov 8, 2019

Thanks @josharian, it is indeed a possibility.

As I see it for make - first param is a type, second is how much len() will return, and third one is how much (or at least :) ) cap() will return. Changing it that to "ungraceful" syntax breaks the mental model that is here for 10 years already (congrats btw :) ).
I see changes of that magnitude as something that makes code simpler to understand, and more elegant if possible. I do not look forward to a change that makes it easier in one and more painful in some other (already established) scenarios, sorry.

@tandr
Copy link

@tandr tandr commented Nov 8, 2019

but I also don't know how common this case is.

We have places in our code we know upper bound, but not a lower one for the slice or map, and start filling it up filtering the values from another (streaming) source. It is not "common" but it does happen. Cases where map or slice got expanded past initial known capacity are rather got collapsed into using default capacity - aka make([]T) or make([]T, n)

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 4, 2020

This change as originally proposed does not comply with http://golang.org/design/28221-go2-transitions#language-redefinitions, so I don't think it could be adopted as-is: the spec clearly states that make(T, n) allocates a slice of type T with length n and capacity n, so this would be a “redefinition”.

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 4, 2020

That said, if we get generics you could envision a generic library function that encapsulates some idiom that the compiler can recognize:

package slices

// Make allocates and returns a slice of n elements of type T, with an arbitrary capacity ≥ n.
func [type T]Make(n int) []T {
	return append([]T(nil), make([]T, n)...)
}

That gives call sites like

var s = slices.Make[T](n)

which to me doesn't seem appreciably worse than any of the other syntax alternatives mentioned above, and does not require any additional language or tooling changes beyond generics per se.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet