Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

runtime: don't allocate when putting a bool into an interface #17725

Closed
bradfitz opened this Issue Nov 1, 2016 · 32 comments

Comments

Projects
None yet
Owner

bradfitz commented Nov 1, 2016

Shoving a bool into an interface probably doesn't need to allocate. I imagine most binaries already have a static 1 byte and a static 0 byte somewhere whose address we could use in the interface's data pointer.

bradfitz@gdev:~$ cat alloc.go
package main

import "testing"

func main() {
        var x interface{}
        x = true
        println(x)
        x = false
        println(x)
        x = true
        println(x)
        
        println(testing.AllocsPerRun(5000, func() {
                x = true
        }))
}

bradfitz@gdev:~$ go run ~/alloc.go
(0x49b960,0xc42003bf4f)
(0x49b960,0xc42003bf4e)
(0x49b960,0xc42003bf4d)
+1.000000e+000

/cc @josharian @randall77 @mdempsky

@bradfitz bradfitz added the Performance label Nov 1, 2016

@bradfitz bradfitz added this to the Unplanned milestone Nov 1, 2016

@bradfitz bradfitz changed the title from cmd/compile: to cmd/compile: don't allocate when putting a bool into an interface Nov 1, 2016

Member

mdempsky commented Nov 1, 2016

Relatedly, I've been thinking that converting len-1 []byte to string shouldn't need to allocate either.

@mdempsky mdempsky changed the title from cmd/compile: don't allocate when putting a bool into an interface to runtime: don't allocate when putting a bool into an interface Nov 1, 2016

Member

mdempsky commented Nov 1, 2016

This can already be done within the runtime in convT2I. I don't believe it needs compiler assistance.

Contributor

dr2chase commented Nov 1, 2016

How do we feel about smallish integers? Seems like we could profile in convT2I to see how this would pay off. One possible value for smallish is {2,1,0,-1}

Owner

bradfitz commented Nov 1, 2016

I feel like last time smallish integers was brought up, the concern was that it would be inconsistent and thus surprising. But with bools at least it would be consistent.

Then again, the better the compiler and runtime get, the more that performance things get surprising, so maybe it's okay nowadays.

I agree some convT2I stats for all this would be interesting.

Member

mdempsky commented Nov 1, 2016 edited

I was thinking we just have a 256-byte slab of "\x00\x01...\xff", and we can index into that for any read-only 1-byte value that needs to be converted to a pointer (e.g., converting a len-1 []byte to string, or storing a bool/byte/int8/uint8 into an interface).

If we instead used [...]int64{..., -3, -2, -1, 0, 1, 2, 3, ...} for a certain range of integers, we could avoid heap allocs for a wider variety of values.

Agreed collecting stats on recognizable patterns would be good.

Contributor

josharian commented Nov 1, 2016

I was thinking we just have a 256-byte slab of "\x00\x01...\xff"

That was my exact reaction as well.

Contributor

randall77 commented Nov 1, 2016

I had a CL a long time ago which did [...]int64{0..1024}. We ended up not using it because the performance cliff from 1024->1025 would be weird.

The 256-byte one would not have that problem, although it would weirdly encourage people to cram their data into 1 byte chunks.

Contributor

josharian commented Nov 1, 2016

To spare others the search, that was CL 2500, which in turn also points to #8618. I'd still like to see this get done.

Contributor

griesemer commented Nov 1, 2016

I'm also still in favor of this. I think the benefits outweigh the oddity in performance behavior.

Owner

bradfitz commented Nov 1, 2016

We ended up not using it because the performance cliff from 1024->1025 would be weird.

Clearly we just need to smooth out that cliff with some fastrand(), slowly increasing the chance of allocating from 0 to 1024, where 1025 is 100%! /s

mrjrieke commented Nov 1, 2016

@mdempsky would slow compiler tiny bit.... of course, @bradfitz presents a difficult challenge for prediction with fastrand(). cqtm

@bradfitz bradfitz modified the milestone: Go1.9Maybe, Unplanned Nov 2, 2016

Contributor

josharian commented Jan 22, 2017

Byte-sized values handled in CL 35555. I went the compiler route because it was simple and meant that byte-size values required no runtime call at all, shrinking ns/op from ~5 to ~1.

After handling byte-sized values and constant values (#18704), I instrumented the runtime to dump what integer-ish values made it into convT2E. During make.bash, I got this (unsurprising) distribution, top 30 values only:

 percent cnt  bits  uint value
 28.83%  3995 64bit 0
  2.11%   292 64bit 1
  1.45%   201 64bit 4
  1.35%   187 64bit 3
  1.28%   177 64bit 2
  0.87%   120 64bit 17
  0.84%   117 64bit 6
  0.81%   112 64bit 5
  0.71%    99 64bit 7
  0.71%    98 64bit 47
  0.66%    91 64bit 8
  0.48%    66 64bit 13
  0.43%    60 32bit 118
  0.38%    53 64bit 12
  0.36%    50 64bit 9
  0.32%    45 64bit 42
  0.23%    32 64bit 39
  0.22%    30 64bit 44
  0.21%    29 64bit 10
  0.20%    28 64bit 30
  0.17%    24 64bit 16
  0.17%    23 64bit 14
  0.16%    22 32bit 106
  0.16%    22 64bit 11
  0.15%    21 32bit 101
  0.15%    21 64bit 85
  0.13%    18 32bit 281
  0.13%    18 32bit 395
  0.12%    17 32bit 398
  0.12%    16 32bit 123

That says to me we might get the best bang for our buck by inserting a check for integer-ish values of size <= uintptr in the generated code like:

  if val == 0 {
    e = {itab, &zerobase}
  } else {
    e = convT2E(...)
  }
Member

minux commented Jan 22, 2017

Contributor

randall77 commented Jan 22, 2017

It could be done if we increased the size of interfaces to 3 words. Otherwise, I'm afraid that there are a host of complications (mostly in the garbage collector) which make sometimes-a-pointer-sometimes-a-scalar fields hard. I can elaborate more if you wish.

Contributor

josharian commented Jan 22, 2017

Lots of prior discussion in #8405.

Contributor

josharian commented Jan 22, 2017

CL 35563 implements the strategy "use zerobase for small pointer-free zero values". It could probably be optimized a bit, but it's a first look at the cost/benefit of doing it entirely in the runtime. Checking if val == 0 in the calling code should be more efficient, because the == 0 check will be more efficient, but it's also considerably more of a pain to implement. :)

While running the stdlib tests, CL 35563 eliminated roughly 170,000 allocations. Increasing the size of zerobase did not show any improvement; larger pointer-free zero values sent to convT2E/convT2I are rare.

Contributor

josharian commented Jan 22, 2017

And with that, I'm going to pause my hacking on interface conversions. I hope that making the code and numbers concrete will help the conversation about the correct direction to go from here.

CL https://golang.org/cl/35555 mentions this issue.

CL https://golang.org/cl/35563 mentions this issue.

Member

minux commented Jan 25, 2017

@gopherbot gopherbot pushed a commit that referenced this issue Feb 2, 2017

@josharian josharian cmd/compile, runtime: convert byte-sized values to interfaces without…
… allocation

Based in part on khr's CL 2500.

Updates #17725
Updates #18121

Change-Id: I744e1f92fc2104e6c5bd883a898c30b2eea8cc31
Reviewed-on: https://go-review.googlesource.com/35555
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
0358367

How can an an static enum from 0x00 to 0xFF can help ?
in 0358367
Can anyone explain ?

Contributor

josharian commented Feb 3, 2017

Bools are represented as 0 or 1, so they can use the static values for 0 and 1.

I know that, and thanks for your reply, but I still do not understand the theory behind this optimisation.
Why do we need to list all the values of one byte ?

Contributor

josharian commented Feb 3, 2017

I plan to write a blog post (commaok.xyz) sometime in the next few days that explains a lot of this.

Contributor

josharian commented Feb 3, 2017

I just want to finish CL 35563 first, and that will take some experimentation.

Contributor

rasky commented Feb 4, 2017

@ancaemanuel an interface is two words: a pointer to a type and a pointer to a value. To store "12" in an interface, you need to have a position in memory that holds the value "12", to be pointed by the interface's second word. Up until now, that position in memory would be allocated on the heap, every time anew. With this patch, for special values between 0 and 255, the compiler knows that there is a location in memory that already holds those values and use it, rather than allocating on the heap.

Contributor

josharian commented Feb 7, 2017

@ancaemanuel perhaps http://commaok.xyz/post/interface-allocs/ will also provide some useful background and context, though you're probably better off reading https://research.swtch.com/interfaces :)

Thank you.
I have an ideea: the compiler knows the constants used for interfaces in a program. How about generating an static array (with the constants already in the program) at compile time ?

Contributor

josharian commented Feb 10, 2017

@ancaemanuel if I understand correctly, that is https://golang.org/issue/18704, which is done. (This is also discussed a bit more readably at http://commaok.xyz/post/interface-allocs/.)

I read the article.
This is an optimisation for small numbers.
If an program uses 500 constants, they are already in the compiled program. How about using that array (or map) ?

Contributor

josharian commented Feb 10, 2017

I'm not sure I understand. Would you file a new issue with some more details and cc me and we can discuss there? Thanks!

@josharian josharian added a commit to josharian/go that referenced this issue Feb 14, 2017

@josharian josharian runtime: use zeroVal for small zero values in interfaces
Fixes #17725

name                old time/op    new time/op    delta
ConvT2EInt/const-8    0.90ns ± 4%    0.90ns ± 2%      ~     (p=0.623 n=20+15)
ConvT2EInt/zero-8     21.5ns ± 2%     7.4ns ± 6%   -65.33%  (p=0.000 n=19+18)
ConvT2EInt/one-8      21.6ns ± 2%    22.8ns ± 2%    +5.21%  (p=0.000 n=20+19)

name                old alloc/op   new alloc/op   delta
ConvT2EInt/const-8     0.00B          0.00B           ~     (all equal)
ConvT2EInt/zero-8      8.00B ± 0%     0.00B       -100.00%  (p=0.000 n=20+20)
ConvT2EInt/one-8       8.00B ± 0%     8.00B ± 0%      ~     (all equal)

name                old allocs/op  new allocs/op  delta
ConvT2EInt/const-8      0.00           0.00           ~     (all equal)
ConvT2EInt/zero-8       1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
ConvT2EInt/one-8        1.00 ± 0%      1.00 ± 0%      ~     (all equal)

Change-Id: I5b71f9e44e3de8b8f2284a3821c5176c13ab2c61
a9d47ff
Contributor

randall77 commented Mar 26, 2017

This is fixed with 0358367

@randall77 randall77 closed this Mar 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment