Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: don't allocate when putting constant strings in an interface #18704

Closed
josharian opened this issue Jan 18, 2017 · 6 comments
Closed

Comments

@josharian
Copy link
Contributor

josharian commented Jan 18, 2017

fmt.Println("abc") allocates when it converts "abc" into an interface. It should not.

When the compiler sees a constant string being converted into an interface (usually at a call site?), it could create a static interface value to use, rather than allocating it at runtime.

That is, do the following conversion automatically:

var abc interface{} = "ABC"

func BenchmarkPrintln(b *testing.B) {
	b.Run("regular", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmt.Fprintln(ioutil.Discard, "ABC")
		}
	})
	b.Run("prealloc", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			fmt.Fprintln(ioutil.Discard, abc)
		}
	})
}

This might not be as bad for binary size as it initially appears; it might even be an improvement. The cost of the runtime conversion call is probably about as large as the two additional words of static data plus symbol overhead. (And that symbol overhead could be removed if necessary, as we did for strings.)

See also #17725 and the golang-dev thread about logging that insired this.

cc @randall77 @mdempsky @bradfitz

@josharian josharian added this to the Go1.9Maybe milestone Jan 18, 2017
@josharian
Copy link
Contributor Author

Oh, and for those that are curious:

BenchmarkPrintln/regular-8         	20000000	        86.6 ns/op	      16 B/op	       1 allocs/op
BenchmarkPrintln/prealloc-8        	30000000	        51.1 ns/op	       0 B/op	       0 allocs/op

@josharian
Copy link
Contributor Author

The regular function is 235 bytes of code. Prealloc is 188 bytes of code.
So prealloc is probably a net binary size improvement.

@josharian
Copy link
Contributor Author

Might even be worth doing this for all static data, including ints, floats, etc.

@josharian
Copy link
Contributor Author

Threw together a prototype for strings only. Not ready to be mailed; lots of sinit.go needs to be updated for better handling of interface values for a complete CL, and it's a fair amount of careful work. But it was enough to get some numbers to convince myself that this will help. For the go-kit logging benchmarks:

name                      old time/op    new time/op    delta
JSONLoggerSimple-8          1.94µs ± 2%    1.91µs ± 2%   -1.64%    (p=0.026 n=9+9)
JSONLoggerContextual-8      2.60µs ± 1%    2.49µs ± 3%   -4.07%  (p=0.000 n=10+10)
Discard-8                   96.8ns ± 1%    32.9ns ± 1%  -65.99%    (p=0.000 n=9+9)
OneWith-8                    157ns ± 2%      89ns ± 2%  -43.26%    (p=0.000 n=9+9)
TwoWith-8                    172ns ± 1%     102ns ± 2%  -40.39%    (p=0.000 n=9+9)
TenWith-8                    288ns ± 1%     229ns ± 8%  -20.36%    (p=0.000 n=9+9)
LogfmtLoggerSimple-8         699ns ± 1%     630ns ± 2%   -9.86%    (p=0.000 n=9+9)
LogfmtLoggerContextual-8     958ns ± 1%     818ns ± 2%  -14.57%   (p=0.000 n=10+8)
NopLoggerSimple-8            188ns ± 1%     119ns ± 3%  -36.87%  (p=0.000 n=10+10)
NopLoggerContextual-8        379ns ± 1%     238ns ± 1%  -37.23%    (p=0.000 n=9+9)
ValueBindingTimestamp-8      575ns ± 1%     505ns ± 9%  -12.13%  (p=0.000 n=10+10)
ValueBindingCaller-8         897ns ± 2%     754ns ± 3%  -15.86%   (p=0.000 n=10+9)

name                      old alloc/op   new alloc/op   delta
JSONLoggerSimple-8            904B ± 0%      872B ± 0%   -3.54%  (p=0.000 n=10+10)
JSONLoggerContextual-8      1.20kB ± 0%    1.14kB ± 0%   -5.33%  (p=0.000 n=10+10)
Discard-8                    64.0B ± 0%     32.0B ± 0%  -50.00%  (p=0.000 n=10+10)
OneWith-8                    96.0B ± 0%     64.0B ± 0%  -33.33%  (p=0.000 n=10+10)
TwoWith-8                     160B ± 0%      128B ± 0%  -20.00%  (p=0.000 n=10+10)
TenWith-8                     672B ± 0%      640B ± 0%   -4.76%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
NopLoggerSimple-8             128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       159B ± 0%      127B ± 0%  -20.13%  (p=0.000 n=10+10)
ValueBindingCaller-8          112B ± 0%       80B ± 0%  -28.57%  (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
JSONLoggerSimple-8            19.0 ± 0%      17.0 ± 0%  -10.53%  (p=0.000 n=10+10)
JSONLoggerContextual-8        25.0 ± 0%      21.0 ± 0%  -16.00%  (p=0.000 n=10+10)
Discard-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
OneWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TwoWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TenWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
NopLoggerSimple-8             4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       5.00 ± 0%      3.00 ± 0%  -40.00%  (p=0.000 n=10+10)
ValueBindingCaller-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)

fmt benchmarks also show some improvements. CPU benchmarks omitted because they are too noisy--the fmt CPU benchmarks are super sensitive to alignment, etc.

name                             old alloc/op   new alloc/op   delta
SprintfPadding-8                    24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfEmpty-8                     0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
SprintfString-8                     21.0B ± 0%      5.0B ± 0%  -76.19%        (p=0.000 n=10+10)
SprintfTruncateString-8             32.0B ± 0%     16.0B ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                48.0B ± 0%     32.0B ± 0%  -33.33%        (p=0.000 n=10+10)
SprintfInt-8                        16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfIntInt-8                     24.0B ± 0%     24.0B ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                72.0B ± 0%     72.0B ± 0%     ~     (all samples are equal)
SprintfFloat-8                      16.0B ± 0%     16.0B ± 0%     ~     (all samples are equal)
SprintfComplex-8                    48.0B ± 0%     48.0B ± 0%     ~     (all samples are equal)
SprintfBoolean-8                    8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
SprintfHexString-8                  96.0B ± 0%     80.0B ± 0%  -16.67%        (p=0.000 n=10+10)
SprintfHexBytes-8                    112B ± 0%      112B ± 0%     ~     (all samples are equal)
SprintfBytes-8                      96.0B ± 0%     96.0B ± 0%     ~     (all samples are equal)
SprintfStringer-8                   32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
SprintfStructure-8                   256B ± 0%      256B ± 0%     ~     (all samples are equal)
ManyArgs-8                          80.0B ± 0%     48.0B ± 0%  -40.00%        (p=0.000 n=10+10)
FprintInt-8                         8.00B ± 0%     8.00B ± 0%     ~     (all samples are equal)
FprintfBytes-8                      32.0B ± 0%     32.0B ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                 0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
ScanInts-8                         15.2kB ± 0%    15.2kB ± 0%     ~           (p=0.179 n=10+10)
ScanRecursiveInt-8                 21.6kB ± 0%    21.6kB ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8    21.7kB ± 0%    21.7kB ± 0%     ~     (all samples are equal)

name                             old allocs/op  new allocs/op  delta
SprintfPadding-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfEmpty-8                      0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
SprintfString-8                      2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfTruncateString-8              2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfQuoteString-8                 2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfInt-8                         2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfIntInt-8                      3.00 ± 0%      3.00 ± 0%     ~     (all samples are equal)
SprintfPrefixedInt-8                 2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfFloat-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfComplex-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBoolean-8                     2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfHexString-8                   2.00 ± 0%      1.00 ± 0%  -50.00%        (p=0.000 n=10+10)
SprintfHexBytes-8                    2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfBytes-8                       2.00 ± 0%      2.00 ± 0%     ~     (all samples are equal)
SprintfStringer-8                    4.00 ± 0%      4.00 ± 0%     ~     (all samples are equal)
SprintfStructure-8                   7.00 ± 0%      7.00 ± 0%     ~     (all samples are equal)
ManyArgs-8                           8.00 ± 0%      6.00 ± 0%  -25.00%        (p=0.000 n=10+10)
FprintInt-8                          1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintfBytes-8                       1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
FprintIntNoAlloc-8                  0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
ScanInts-8                          1.60k ± 0%     1.60k ± 0%     ~     (all samples are equal)
ScanRecursiveInt-8                  1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)
ScanRecursiveIntReaderWrapper-8     1.71k ± 0%     1.71k ± 0%     ~     (all samples are equal)

@josharian
Copy link
Contributor Author

Found a simpler and broader (and thus higher risk) implementation. CL 35554

@gopherbot
Copy link

CL https://golang.org/cl/35554 mentions this issue.

@golang golang locked and limited conversation to collaborators Feb 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants