Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

cmd/compile: don't allocate when putting constant strings in an interface #18704

Closed
josharian opened this Issue Jan 18, 2017 · 6 comments

Comments

Projects
None yet
2 participants
Contributor

josharian commented Jan 18, 2017 edited

fmt.Println("abc") allocates when it converts "abc" into an interface. It should not.

When the compiler sees a constant string being converted into an interface (usually at a call site?), it could create a static interface value to use, rather than allocating it at runtime.

That is, do the following conversion automatically:

var abc interface{} = "ABC"

func BenchmarkPrintln(b *testing.B) {
	b.Run("regular", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmt.Fprintln(ioutil.Discard, "ABC")
		}
	})
	b.Run("prealloc", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmt.Fprintln(ioutil.Discard, abc)
		}
	})
}

This might not be as bad for binary size as it initially appears; it might even be an improvement. The cost of the runtime conversion call is probably about as large as the two additional words of static data plus symbol overhead. (And that symbol overhead could be removed if necessary, as we did for strings.)

See also #17725 and the golang-dev thread about logging that insired this.

cc @randall77 @mdempsky @bradfitz

@josharian josharian added this to the Go1.9Maybe milestone Jan 18, 2017

Contributor

josharian commented Jan 18, 2017

Oh, and for those that are curious:

BenchmarkPrintln/regular-8         	20000000	        86.6 ns/op	      16 B/op	       1 allocs/op
BenchmarkPrintln/prealloc-8        	30000000	        51.1 ns/op	       0 B/op	       0 allocs/op
Contributor

josharian commented Jan 18, 2017

The regular function is 235 bytes of code. Prealloc is 188 bytes of code.
So prealloc is probably a net binary size improvement.

Contributor

josharian commented Jan 18, 2017

Might even be worth doing this for all static data, including ints, floats, etc.

Contributor

josharian commented Jan 21, 2017

Threw together a prototype for strings only. Not ready to be mailed; lots of sinit.go needs to be updated for better handling of interface values for a complete CL, and it's a fair amount of careful work. But it was enough to get some numbers to convince myself that this will help. For the go-kit logging benchmarks:

name                      old time/op    new time/op    delta
JSONLoggerSimple-8          1.94µs ± 2%    1.91µs ± 2%   -1.64%    (p=0.026 n=9+9)
JSONLoggerContextual-8      2.60µs ± 1%    2.49µs ± 3%   -4.07%  (p=0.000 n=10+10)
Discard-8                   96.8ns ± 1%    32.9ns ± 1%  -65.99%    (p=0.000 n=9+9)
OneWith-8                    157ns ± 2%      89ns ± 2%  -43.26%    (p=0.000 n=9+9)
TwoWith-8                    172ns ± 1%     102ns ± 2%  -40.39%    (p=0.000 n=9+9)
TenWith-8                    288ns ± 1%     229ns ± 8%  -20.36%    (p=0.000 n=9+9)
LogfmtLoggerSimple-8         699ns ± 1%     630ns ± 2%   -9.86%    (p=0.000 n=9+9)
LogfmtLoggerContextual-8     958ns ± 1%     818ns ± 2%  -14.57%   (p=0.000 n=10+8)
NopLoggerSimple-8            188ns ± 1%     119ns ± 3%  -36.87%  (p=0.000 n=10+10)
NopLoggerContextual-8        379ns ± 1%     238ns ± 1%  -37.23%    (p=0.000 n=9+9)
ValueBindingTimestamp-8      575ns ± 1%     505ns ± 9%  -12.13%  (p=0.000 n=10+10)
ValueBindingCaller-8         897ns ± 2%     754ns ± 3%  -15.86%   (p=0.000 n=10+9)

name                      old alloc/op   new alloc/op   delta
JSONLoggerSimple-8            904B ± 0%      872B ± 0%   -3.54%  (p=0.000 n=10+10)
JSONLoggerContextual-8      1.20kB ± 0%    1.14kB ± 0%   -5.33%  (p=0.000 n=10+10)
Discard-8                    64.0B ± 0%     32.0B ± 0%  -50.00%  (p=0.000 n=10+10)
OneWith-8                    96.0B ± 0%     64.0B ± 0%  -33.33%  (p=0.000 n=10+10)
TwoWith-8                     160B ± 0%      128B ± 0%  -20.00%  (p=0.000 n=10+10)
TenWith-8                     672B ± 0%      640B ± 0%   -4.76%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
NopLoggerSimple-8             128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       159B ± 0%      127B ± 0%  -20.13%  (p=0.000 n=10+10)
ValueBindingCaller-8          112B ± 0%       80B ± 0%  -28.57%  (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
JSONLoggerSimple-8            19.0 ± 0%      17.0 ± 0%  -10.53%  (p=0.000 n=10+10)
JSONLoggerContextual-8        25.0 ± 0%      21.0 ± 0%  -16.00%  (p=0.000 n=10+10)
Discard-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
OneWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TwoWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TenWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
NopLoggerSimple-8             4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       5.00 ± 0%      3.00 ± 0%  -40.00%  (p=0.000 n=10+10)
ValueBindingCaller-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)

fmt benchmarks also show some improvements. CPU benchmarks omitted because they are too noisy--the fmt CPU benchmarks are super sensitive to alignment, etc.

name                             old alloc/op   new alloc/op   delta
SprintfPadding-8                    24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfEmpty-8                     0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
SprintfString-8                     21.0B ± 0%      5.0B ± 0%  -76.19%        (p=0.000 n=10+10)
SprintfTruncateString-8             32.0B ± 0%     16.0B ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                48.0B ± 0%     32.0B ± 0%  -33.33%        (p=0.000 n=10+10)
SprintfInt-8                        16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfIntInt-8                     24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                72.0B ± 0%     72.0B ± 0%     ~     (all samples are equal)
SprintfFloat-8                      16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfComplex-8                    48.0B ± 0%     48.0B ± 0%     ~     (all samples are equal)
SprintfBoolean-8                    8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
SprintfHexString-8                  96.0B ± 0%     80.0B ± 0%  -16.67%        (p=0.000 n=10+10)
SprintfHexBytes-8                    112B ± 0%      112B ± 0%     ~     (all samples are equal)
SprintfBytes-8                      96.0B ± 0%     96.0B ± 0%     ~     (all samples are equal)
SprintfStringer-8                   32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
SprintfStructure-8                   256B ± 0%      256B ± 0%     ~     (all samples are equal)
ManyArgs-8                          80.0B ± 0%     48.0B ± 0%  -40.00%        (p=0.000 n=10+10)
FprintInt-8                         8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
FprintfBytes-8                      32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                 0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
ScanInts-8                         15.2kB ± 0%    15.2kB ± 0%     ~           (p=0.179 n=10+10)
ScanRecursiveInt-8                 21.6kB ± 0%    21.6kB ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8    21.7kB ± 0%    21.7kB ± 0%     ~     (all samples are equal)

name                             old allocs/op  new allocs/op  delta
SprintfPadding-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfEmpty-8                      0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
SprintfString-8                      2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfTruncateString-8              2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                 2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfInt-8                         2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfIntInt-8                      3.00 ± 0%      3.00 ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                 2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfFloat-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfComplex-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBoolean-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfHexString-8                   2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfHexBytes-8                    2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBytes-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfStringer-8                    4.00 ± 0%      4.00 ± 0%     ~     (all samples are equal)
SprintfStructure-8                   7.00 ± 0%      7.00 ± 0%     ~     (all samples are equal)
ManyArgs-8                           8.00 ± 0%      6.00 ± 0%  -25.00%        (p=0.000 n=10+10)
FprintInt-8                          1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintfBytes-8                       1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                  0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
ScanInts-8                          1.60k ± 0%     1.60k ± 0%     ~     (all samples are equal)
ScanRecursiveInt-8                  1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8     1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)
Contributor

josharian commented Jan 22, 2017

Found a simpler and broader (and thus higher risk) implementation. CL 35554

CL https://golang.org/cl/35554 mentions this issue.

@gopherbot gopherbot closed this in c682d32 Feb 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment