New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: allow cap(make([]T, m, n)) > n #24204

Open
mdempsky opened this Issue Mar 1, 2018 · 22 comments

Comments

Projects
None yet
@mdempsky
Member

mdempsky commented Mar 1, 2018

Currently, the Go spec requires that cap(make([]T, n)) == n and cap(make([]T, m, n)) == n. I propose relaxing this constraint from == to >=.

Rationale: the Go runtime already pads allocations up to the next malloc bucket size. By treating the user supplied capacity argument as a lower bound rather than exact bound, there's a possibility to make use of this padding space.

Also, if users really need an exact capacity (which seems like it would be very rare), they can write make([]T, n)[:n:n] or make([]T, m, n)[:m:n].

(See also discussion in #24163.)

@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 1, 2018

It would be hard to "gofix" this in some hypothetical "go1to2" tool. We'd need to conservatively rewrite all make into make([]T, m, n][:m:n] which would make the rewritten code pretty ugly.

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 1, 2018

It might be an interesting experiment to hack this into the compiler/runtime and do a regression test and see what breaks (@dsnet).

Yes, it'd be ugly, but the gofix rule would at least be very simple.

As I noted in #24163, this optimization could be made opt-in (instead of opt-out) with an appropriate runtime function.

There's another option in this arena, which is to say "roughly" rather than == or >= in the spec, as we do with maps. The compiler and runtime could use this freedom along with PGO or introspection to make better decisions, treating the info from the user as just a hint to be used when no other info was available.

@cznic

This comment has been minimized.

Contributor

cznic commented Mar 1, 2018

The three-index syntax of a slice was introduced to extend the precise control of slice cap. The proposal, even though in a different place, goes exactly the opposite way. I realize it can bring some performance improvements, but that's IMO not a good reason for breaking backward compatibility with Go 1 programs. Please consider a new syntax enabling this feature. There are plenty of options.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 1, 2018

Please consider a new syntax enabling this feature. There are plenty of options.

make([]T, 2, ~10)

Fortunately the bitwise complement operator is ^, freeing up tilde!

@cznic

This comment has been minimized.

Contributor

cznic commented Mar 1, 2018

make([]T, 2, 10+)

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 1, 2018

Also time.NewTicker(~3*time.Second). (I’m mostly joking, but macOS actually does support vague timers; they can be coalesced for power savings or spread to avoid cpu spikes.)

@mdempsky

This comment has been minimized.

Member

mdempsky commented Mar 1, 2018

As I noted in #24163, this optimization could be made opt-in (instead of opt-out) with an appropriate runtime function.

That becomes rather clumsy though, because to properly round you'll need to incorporate the element size too, not just the element count. So users would have to write something like:

make([]T, runtime.RoundUp(n * unsafe.Sizeof(T{})) / unsafe.Sizeof(T{}))

or

make([]T, runtime.RoundUp(n, unsafe.Sizeof(T{})))

(And of course T{} needs to be changed appropriately if T is something other than an array/struct/slice/map type.)

The three-index syntax of a slice was introduced to extend the precise control of slice cap.

That control is important, but seems exceedingly rarely used.

Looking through Google's internal Go code base, I've found only a couple uses of full slice expressions, and they're all of the form:

x = append(x[:len(x):len(x)], y)

where the original x slice came from somewhere else (e.g., function argument or global), and they want to make sure to create a new slice.

I strongly suspect the majority of users don't care about how much capacity is allocated for them by make. I don't see any evidence that users are dissatisfied with the lack of control over exactly how much
capacity is allocated by append when it needs to grow the slice, for example. If they were, I'd expect to see full slice operations used in other contexts.

@dsnet dsnet added the LanguageChange label Mar 1, 2018

@faiface

This comment has been minimized.

faiface commented Mar 1, 2018

(Don't get offended.) This proposal basically proposes to cripple the language and make it less predictable (like really, imagine debugging a bug caused by this) in order to accommodate a very minor CPU-architecture-specific performance optimization. This mistake has been done many times in many languages. Don't make Go one of them.

@slrz

This comment has been minimized.

slrz commented Mar 1, 2018

I have recently written some code that would be broken by this change. Easy to fix, of course, but it'd be interesting to know whether this is more common.

The gist of it was:

var commonPrefix = make([]byte, N)

func f(arg Type) []byte {
        return append(commonPrefix, computeSuffix(arg)...)
}
@mdempsky

This comment has been minimized.

Member

mdempsky commented Mar 1, 2018

This proposal basically proposes to cripple the language

I think that's strongly overstating the potential downsides.

Here's a simple experiment to just uniformly overallocate every make call of less than 1000 elements:

--- a/src/runtime/slice.go
+++ b/src/runtime/slice.go
@@ -58,6 +58,10 @@ func makeslice(et *_type, len, cap int) slice {
                panic(errorString("makeslice: cap out of range"))
        }
 
+       if uintptr(cap) < 1000 && et.size < 10000 {
+               cap++
+       }
+
        p := mallocgc(et.size*uintptr(cap), et, true)
        return slice{p, len, cap}
 }

This breaks a couple standard library and regress tests, but they all appear to be for verifying that make itself works according to spec.

There was one hairy crash in runtime/trace, which I've now diagnosed. The process for tracking it down was:

  1. I first searched for appearances of "cap(" in runtime/trace/*.go; finding none, I looked in runtime/*.go.
  2. I noticed "allp[:cap(allp)]" in runtime/trace.go, so then I looked at where allp was allocated.
  3. In proc.go, we allocate make([]*p, nprocs).
  4. I experimentally changed this to make([]*p, nprocs)[:nprocs:nprocs], and the crash went away.

I would argue the real issue here is that runtime is using an inconsistent mix of len(allp) and cap(allp). Another fix that works perfectly fine is just changing all cap(allp) to len(allp), which seems desirable if only to keep the code simpler and easier to understand.

(like really, imagine debugging a bug caused by this)

It didn't seem too bad. :)

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 2, 2018

Here's a simple experiment

Now that you have it working, if you change it to round up instead, are there any notable stdlib benchmark changes?

Interestingly, allp cap issues were also what I hit with CLs 21813 and 22197, which changed append behavior, which is further evidence that just using len is probably better in this case.

@faiface

This comment has been minimized.

faiface commented Mar 2, 2018

Btw, if someone really wants to make this happen, I would suggest a slight modification and get the requirements as follows:

cap(make([]T, n)) >= n
cap(make([]T, m, n)) == n

So, when you don't specify the capacity explicitly, the compiler can optimize that for you. This would be sort of acceptable, I guess.

@bcmills

This comment has been minimized.

Member

bcmills commented Mar 2, 2018

We'd need to conservatively rewrite all make into make([]T, m, n][:m:n] which would make the rewritten code pretty ugly.

Fortunately, make is a builtin and not a keyword, right?

So you add a new package, go1, with implementations of any builtins that changed, and rewrite make([]T, m, n) to go1.Make([]T, m, n).

Or, you add a three-arg “bounded make” which specifies a lower and upper bound, and rewrite make([]T, m, n) to make([]T, m, n, n).

@faiface

This comment has been minimized.

faiface commented Mar 3, 2018

Or, you add a three-arg “bounded make” which specifies a lower and upper bound, and rewrite make([]T, m, n) to make([]T, m, n, n).

I don't think you care about the lower/upper-bound at all. You either care about the exact capacity, or you don't, in which case the compiler can insert anything and it's fine.

@bcmills

This comment has been minimized.

Member

bcmills commented Mar 3, 2018

@slrz You could also write that code as

var commonPrefix = make([]byte, N)

func f(arg Type) []byte {
        return append(append([]byte{}, commonPrefix...), computeSuffix(arg)...)
}

If #18605 and #12854 are accepted, that would simplify further:

func f(arg Type) []byte {
        return append({}, commonPrefix..., computeSuffix(arg)...)
}

That seems much more difficult to go fix, but makes the behavior of the append call no longer depend on an allocation site that may become arbitrarily far away as the code evolves.

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 15, 2018

A related question for this proposal: Is it ok for slice literals to have cap > len? E.g. can cap([]byte{0}) be > 1?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Apr 24, 2018

I tend to feel that if you ask for an explicit capacity, as in make([]T, n, m), you ought to get that capacity.

I think there is an argument for not precisely specifying the capacity for make([]T, n).

@mdempsky

This comment has been minimized.

Member

mdempsky commented Apr 24, 2018

It occurs to me that even if make([]T, n, m) continues to reserve exactly m capacity, a user could write make([]T, m)[:n] to get the flex-capacity behavior. Also, make([]T, m)[:n] (to get flex-capacity using an exact-capacity make) is slightly less clunky than make([]T, n, m)[:n:m] (to get exact-capacity using a flex-capacity make).

@gopherbot

This comment has been minimized.

gopherbot commented May 6, 2018

Change https://golang.org/cl/111646 mentions this issue: text/tabwriter: allow initial slice size to be rounded up

@josharian

This comment has been minimized.

Contributor

josharian commented May 6, 2018

Now that CL 109816 is in, you can sorta kinda approximate this in user code:

s := append([]T(nil), make([]T, m)...)[:0]

It's more expensive than make merely rounding up, but not dramatically so.

CL 111646 shows it in action, along with some benchmark results. You can see that there is a pretty significant improvement in some cases. And the regressions in other cases would go away if this were a language change rather than a compiler optimization of a pretty awkward bit of code.

Might be interesting for Go 1.12 to look at other make([]T, 0, m) instances in the standard library and see whether analogous changes are worthwhile.

@martisch

This comment has been minimized.

Member

martisch commented May 6, 2018

Interesting! I had not thought about this issue in relation to https://golang.org/cl/109517 " cmd/compile: optimize append(x, make([]T, y)...) slice extension".

I will have a look at recognizing append([]T(nil), make([]T, m)...)[:0] and rewriting it into
a special makeslice variant (that optimizes the cap for allocation bucket size) and see how much better it could perform vs the current extend optimized version by cl/109816 which goes through growslice.

Others then can use that cl to do their own experiments and see if it helps and we do not have to change the spec for it. If it turns out to be helpful in comparison to the complexity it adds to the compiler we could submit the cl to recognize: append([]T(nil), make([]T, m)...)[:0] or alike making a spec change not necessary to cover the optimization use case until this proposal has been accepted.

@gopherbot

This comment has been minimized.

gopherbot commented May 6, 2018

Change https://golang.org/cl/111736 mentions this issue: cmd/compile: optimize append([]T(nil), make([]T, n)...)

josharian added a commit to josharian/go that referenced this issue May 8, 2018

text/tabwriter: allow initial slice size to be rounded up
DO NOT SUBMIT
[given that the normal case for this package is Table,
this probably isn't a net win. But it is interesting.]

Updates golang#24204

name                      old time/op    new time/op    delta
Table/1x10/new-8            2.51µs ± 0%    2.69µs ± 0%   +7.33%  (p=0.000 n=14+12)
Table/1x10/reuse-8          1.31µs ± 1%    1.32µs ± 0%     ~     (p=0.433 n=14+13)
Table/1x1000/new-8           158µs ± 0%     176µs ± 0%  +11.33%  (p=0.000 n=13+13)
Table/1x1000/reuse-8         117µs ± 2%     117µs ± 1%     ~     (p=0.170 n=15+13)
Table/1x100000/new-8        28.2ms ± 1%    30.2ms ± 1%   +7.04%  (p=0.000 n=15+15)
Table/1x100000/reuse-8      12.2ms ± 2%    12.3ms ± 1%   +0.65%  (p=0.021 n=15+15)
Table/10x10/new-8           7.07µs ± 1%    7.31µs ± 1%   +3.40%  (p=0.000 n=14+14)
Table/10x10/reuse-8         4.94µs ± 2%    4.96µs ± 2%     ~     (p=0.310 n=14+15)
Table/10x1000/new-8          575µs ± 1%     596µs ± 1%   +3.75%  (p=0.000 n=13+14)
Table/10x1000/reuse-8        487µs ± 1%     485µs ± 1%     ~     (p=0.186 n=14+15)
Table/10x100000/new-8       78.2ms ± 3%    80.5ms ± 4%   +2.94%  (p=0.000 n=15+15)
Table/10x100000/reuse-8     64.9ms ± 3%    64.1ms ± 4%     ~     (p=0.713 n=15+15)
Table/100x10/new-8          46.9µs ± 0%    48.0µs ± 0%   +2.21%  (p=0.000 n=14+14)
Table/100x10/reuse-8        38.2µs ± 1%    38.8µs ± 1%   +1.34%  (p=0.000 n=14+14)
Table/100x1000/new-8        4.53ms ± 1%    4.66ms ± 1%   +2.89%  (p=0.000 n=14+14)
Table/100x1000/reuse-8      3.98ms ± 0%    4.01ms ± 0%   +0.94%  (p=0.000 n=14+12)
Table/100x100000/new-8       620ms ± 1%     629ms ± 1%   +1.45%  (p=0.000 n=12+13)
Table/100x100000/reuse-8     532ms ± 1%     598ms ± 1%  +12.48%  (p=0.000 n=15+14)
Pyramid/10-8                5.90µs ± 0%    6.07µs ± 0%   +2.83%  (p=0.000 n=13+15)
Pyramid/100-8                274µs ± 1%     247µs ± 1%   -9.96%  (p=0.000 n=14+15)
Pyramid/1000-8              27.0ms ± 0%    23.9ms ± 1%  -11.62%  (p=0.000 n=14+15)
Ragged/10-8                 7.07µs ± 1%    7.05µs ± 0%     ~     (p=0.060 n=14+14)
Ragged/100-8                60.2µs ± 1%    59.4µs ± 1%   -1.32%  (p=0.000 n=14+15)
Ragged/1000-8                590µs ± 1%     584µs ± 1%   -1.04%  (p=0.000 n=14+15)
[Geo mean]                   515µs          522µs        +1.31%

name                      old alloc/op   new alloc/op   delta
Table/1x10/new-8            1.52kB ± 0%    1.52kB ± 0%     ~     (all equal)
Table/1x10/reuse-8           0.00B          0.00B          ~     (all equal)
Table/1x1000/new-8          99.4kB ± 0%    99.4kB ± 0%     ~     (all equal)
Table/1x1000/reuse-8         9.00B ± 0%     9.00B ± 0%     ~     (all equal)
Table/1x100000/new-8        19.9MB ± 0%    19.9MB ± 0%     ~     (p=0.180 n=14+14)
Table/1x100000/reuse-8       199kB ± 0%     199kB ± 0%     ~     (all equal)
Table/10x10/new-8           5.06kB ± 0%    5.06kB ± 0%     ~     (all equal)
Table/10x10/reuse-8          0.00B          0.00B          ~     (all equal)
Table/10x1000/new-8          387kB ± 0%     387kB ± 0%     ~     (all equal)
Table/10x1000/reuse-8         128B ± 0%      128B ± 0%     ~     (all equal)
Table/10x100000/new-8       49.3MB ± 0%    49.3MB ± 0%     ~     (p=0.107 n=15+15)
Table/10x100000/reuse-8     2.46MB ± 0%    2.46MB ± 0%     ~     (p=0.685 n=15+15)
Table/100x10/new-8          38.0kB ± 0%    38.0kB ± 0%     ~     (all equal)
Table/100x10/reuse-8         0.00B          0.00B          ~     (all equal)
Table/100x1000/new-8        3.25MB ± 0%    3.25MB ± 0%     ~     (p=0.620 n=15+14)
Table/100x1000/reuse-8      10.8kB ± 0%    10.8kB ± 0%     ~     (all equal)
Table/100x100000/new-8       340MB ± 0%     340MB ± 0%   +0.00%  (p=0.011 n=13+15)
Table/100x100000/reuse-8     170MB ± 0%     170MB ± 0%   +0.00%  (p=0.011 n=13+15)
Pyramid/10-8                4.88kB ± 0%    4.88kB ± 0%     ~     (all equal)
Pyramid/100-8                411kB ± 0%     200kB ± 0%  -51.41%  (p=0.000 n=15+15)
Pyramid/1000-8              41.6MB ± 0%    16.2MB ± 0%  -61.12%  (p=0.000 n=15+15)
Ragged/10-8                 1.82kB ± 0%    1.82kB ± 0%     ~     (all equal)
Ragged/100-8                1.82kB ± 0%    1.82kB ± 0%     ~     (all equal)
Ragged/1000-8               1.82kB ± 0%    1.82kB ± 0%     ~     (all equal)
[Geo mean]                   109kB          101kB        -7.63%

name                      old allocs/op  new allocs/op  delta
Table/1x10/new-8              21.0 ± 0%      21.0 ± 0%     ~     (all equal)
Table/1x10/reuse-8            0.00           0.00          ~     (all equal)
Table/1x1000/new-8           1.02k ± 0%     1.02k ± 0%     ~     (all equal)
Table/1x1000/reuse-8          0.00           0.00          ~     (all equal)
Table/1x100000/new-8          100k ± 0%      100k ± 0%     ~     (all equal)
Table/1x100000/reuse-8       1.00k ± 0%     1.00k ± 0%     ~     (all equal)
Table/10x10/new-8             31.0 ± 0%      31.0 ± 0%     ~     (all equal)
Table/10x10/reuse-8           0.00           0.00          ~     (all equal)
Table/10x1000/new-8          1.04k ± 0%     1.04k ± 0%     ~     (all equal)
Table/10x1000/reuse-8         0.00           0.00          ~     (all equal)
Table/10x100000/new-8         100k ± 0%      100k ± 0%     ~     (all equal)
Table/10x100000/reuse-8      5.00k ± 0%     5.00k ± 0%     ~     (all equal)
Table/100x10/new-8            40.0 ± 0%      40.0 ± 0%     ~     (all equal)
Table/100x10/reuse-8          0.00           0.00          ~     (all equal)
Table/100x1000/new-8         1.05k ± 0%     1.05k ± 0%     ~     (all equal)
Table/100x1000/reuse-8        3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Table/100x100000/new-8        100k ± 0%      100k ± 0%     ~     (all equal)
Table/100x100000/reuse-8     50.0k ± 0%     50.0k ± 0%   +0.00%  (p=0.011 n=13+15)
Pyramid/10-8                  35.0 ± 0%      35.0 ± 0%     ~     (all equal)
Pyramid/100-8                  231 ± 0%       163 ± 0%  -29.44%  (p=0.000 n=15+15)
Pyramid/1000-8               2.06k ± 0%     1.11k ± 0%  -45.94%  (p=0.000 n=15+15)
Ragged/10-8                   19.0 ± 0%      19.0 ± 0%     ~     (all equal)
Ragged/100-8                  19.0 ± 0%      19.0 ± 0%     ~     (all equal)
Ragged/1000-8                 19.0 ± 0%      19.0 ± 0%     ~     (all equal)
[Geo mean]                     507            482        -4.95%


Change-Id: I5cce64791cedd03270d875a1189c79bace5fa437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment