New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: consider speeding up formatting of small integers with cache #19445

Closed
ericlagergren opened this Issue Mar 7, 2017 · 22 comments

Comments

Projects
None yet
7 participants
@ericlagergren
Contributor

ericlagergren commented Mar 7, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.8

What operating system and processor architecture are you using (go env)?

GOOS=darwin
GOARCH=amd64

What, why, etc.

While profiling an ORM-like program, I discovered most of the 100+ allocs/op were from calls to strconv.Itoa. By implementing a simple cache in front of strconv.Itoa, the number of allocs dropped by over half and so did the runtime of the function (7 to 3 μs). I looked at some other of our projects and noticed most of the Itoa-like calls were used to format smaller numbers (i.e., < 100). I'd imagine other project that generate SQL or graph query code and need to format parameters ($1, $2, ...) would see decent performance improvements as well.

A small test I ran shows no appreciable difference when the majority of N is > 99 or N is a pseudo-random number in [0, b.N), but runs in 1/10 the time where N < 100 and doesn't incur any allocations.

My issues with this idea:

  • Small range of numbers (assumes that most calls are < 100)
  • Only works with base 10
  • Negative numbers need their own cache or an alloc needs to be made to contact the number with "-"

FWIW the cache only needs to be a 190-byte long string.

name             old time/op    new time/op    delta
SmallItoaAll-4     45.8ns ± 0%    47.4ns ± 0%   ~     (p=1.000 n=1+1)
SmallItoaRand-4    90.9ns ± 0%    90.5ns ± 0%   ~     (p=1.000 n=1+1)
SmallItoa-4        31.4ns ± 0%     3.4ns ± 0%   ~     (p=1.000 n=1+1)

name             old alloc/op   new alloc/op   delta
SmallItoaAll-4      8.00B ± 0%     8.00B ± 0%   ~     (all equal)
SmallItoaRand-4     7.00B ± 0%     7.00B ± 0%   ~     (all equal)
SmallItoa-4         1.00B ± 0%     0.00B        ~     (p=1.000 n=1+1)

name             old allocs/op  new allocs/op  delta
SmallItoaAll-4       1.00 ± 0%      1.00 ± 0%   ~     (all equal)
SmallItoaRand-4      1.00 ± 0%      0.00        ~     (p=1.000 n=1+1)
SmallItoa-4          1.00 ± 0%      0.00        ~     (p=1.000 n=1+1)
@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Mar 7, 2017

Contributor

I recently looked into doing something similar but lower level. When calling string(b) where b is a byte slice of length exactly one, construct a string using the recently introduced runtime.staticbytes, thus avoiding an allocation. This would eliminate the allocation for single-digit numbers.

I gave it up because I wasn't convinced it was worth the impact on other []byte to string conversions, but in the meantime, I saw some other optimizations, leading to CL 37791. It is possible that the other optimizations would pay for a "single digit" code path.

What do you think, @randall77 @mdempsky?

Contributor

josharian commented Mar 7, 2017

I recently looked into doing something similar but lower level. When calling string(b) where b is a byte slice of length exactly one, construct a string using the recently introduced runtime.staticbytes, thus avoiding an allocation. This would eliminate the allocation for single-digit numbers.

I gave it up because I wasn't convinced it was worth the impact on other []byte to string conversions, but in the meantime, I saw some other optimizations, leading to CL 37791. It is possible that the other optimizations would pay for a "single digit" code path.

What do you think, @randall77 @mdempsky?

@ericlagergren

This comment has been minimized.

Show comment
Hide comment
@ericlagergren

ericlagergren Mar 7, 2017

Contributor

Here's a sample implementation (from the ORM-project):

// smallInts is all integers [0, 99] and only 190 bytes.
const smallInts = "0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899"

func itoa(i int64) string {
	if i < 10 {
		return smallInts[i : i+1]
	}
	if i < 100 {
		return smallInts[2*i-10 : 2*i-8]
	}
	return strconv.FormatInt(i, 10)
}

@josharian I like the idea of single digits! I was a bit bummed when string(b) still allocated.

Contributor

ericlagergren commented Mar 7, 2017

Here's a sample implementation (from the ORM-project):

// smallInts is all integers [0, 99] and only 190 bytes.
const smallInts = "0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899"

func itoa(i int64) string {
	if i < 10 {
		return smallInts[i : i+1]
	}
	if i < 100 {
		return smallInts[2*i-10 : 2*i-8]
	}
	return strconv.FormatInt(i, 10)
}

@josharian I like the idea of single digits! I was a bit bummed when string(b) still allocated.

@gopherbot

This comment has been minimized.

Show comment
Hide comment

CL https://golang.org/cl/37963 mentions this issue.

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Mar 9, 2017

Member

Your SQL database and network is fast enough that you notice a 7μs to 3μs drop when formatting SQL queries?

I'm not totally against this, and I usually like to help fight the allocation battles, but I'm a little surprised this comes up often enough to matter. Does it help on any of our existing benchmarks?

Member

bradfitz commented Mar 9, 2017

Your SQL database and network is fast enough that you notice a 7μs to 3μs drop when formatting SQL queries?

I'm not totally against this, and I usually like to help fight the allocation battles, but I'm a little surprised this comes up often enough to matter. Does it help on any of our existing benchmarks?

@ericlagergren

This comment has been minimized.

Show comment
Hide comment
@ericlagergren

ericlagergren Mar 9, 2017

Contributor
Contributor

ericlagergren commented Mar 9, 2017

@gopherbot gopherbot closed this in bc8b9b2 Mar 16, 2017

@dlsniper

This comment has been minimized.

Show comment
Hide comment
@dlsniper

dlsniper Mar 16, 2017

Contributor

Sorry if this will sound bad, but will we now start seeing and accepting patches that further add numbers to this? Why stop at 99 and not at 2147483648? That seems to cover a lot more of the frequent use-cases where timestamps are converted from strings to numbers.

Contributor

dlsniper commented Mar 16, 2017

Sorry if this will sound bad, but will we now start seeing and accepting patches that further add numbers to this? Why stop at 99 and not at 2147483648? That seems to cover a lot more of the frequent use-cases where timestamps are converted from strings to numbers.

@mvdan

This comment has been minimized.

Show comment
Hide comment
@mvdan

mvdan Mar 16, 2017

Member

@dlsniper do you think that's worth the large amount of binary data and compiler work? 99 numbers, in comparison, is a tiny amount.

Member

mvdan commented Mar 16, 2017

@dlsniper do you think that's worth the large amount of binary data and compiler work? 99 numbers, in comparison, is a tiny amount.

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Mar 16, 2017

Member

@mvdan, what's wrong with a 20 GB binary? Disk space is cheap. We could avoid allocations! :)

Member

bradfitz commented Mar 16, 2017

@mvdan, what's wrong with a 20 GB binary? Disk space is cheap. We could avoid allocations! :)

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Mar 16, 2017

Member

Actually it'd only be about 18 GB. Even better.

Member

bradfitz commented Mar 16, 2017

Actually it'd only be about 18 GB. Even better.

@ericlagergren

This comment has been minimized.

Show comment
Hide comment
@ericlagergren

ericlagergren Mar 16, 2017

Contributor

since I'm not sure if I can comment on the CL: why a ~1.7kb (total size, including string headers) array instead of a 190 byte const string? Just to save two subtractions?

Contributor

ericlagergren commented Mar 16, 2017

since I'm not sure if I can comment on the CL: why a ~1.7kb (total size, including string headers) array instead of a 190 byte const string? Just to save two subtractions?

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Mar 16, 2017

Member

@ericlagergren, you can comment on the CL.

I don't see a 2.4k array.

Let me turn your question around: why a 190 byte const string over the existing code?

Member

bradfitz commented Mar 16, 2017

@ericlagergren, you can comment on the CL.

I don't see a 2.4k array.

Let me turn your question around: why a 190 byte const string over the existing code?

@ericlagergren

This comment has been minimized.

Show comment
Hide comment
@ericlagergren

ericlagergren Mar 16, 2017

Contributor

@bradfitz cool, didn't know that–thanks!

I typoed that number. I meant ~1.7k (16 bytes for each string header * 100 elements + the strings' data).

The const string is smaller.

Contributor

ericlagergren commented Mar 16, 2017

@bradfitz cool, didn't know that–thanks!

I typoed that number. I meant ~1.7k (16 bytes for each string header * 100 elements + the strings' data).

The const string is smaller.

@mvdan

This comment has been minimized.

Show comment
Hide comment
@mvdan

mvdan Mar 16, 2017

Member

@ericlagergren if you come up with a CL that doesn't complicate the code and keeps performance, I don't see why it wouldn't get merged.

Member

mvdan commented Mar 16, 2017

@ericlagergren if you come up with a CL that doesn't complicate the code and keeps performance, I don't see why it wouldn't get merged.

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Mar 16, 2017

Member

@ericlagergren, it was originally a const string. Presumably the code review discussion explains why it was changed. I stopped following at some point.

Member

bradfitz commented Mar 16, 2017

@ericlagergren, it was originally a const string. Presumably the code review discussion explains why it was changed. I stopped following at some point.

@mvdan

This comment has been minimized.

Show comment
Hide comment
@mvdan

mvdan Mar 16, 2017

Member

This seems to be the reason by gri:

For one, have you tried actually creating the slice of strings:
var smallInts = []string{"0", "1", "2", ...} ? Then you don't need an extra if and you can just index into it. You don't even need the formatSmallInt function and all the conversion stuff around it.

Member

mvdan commented Mar 16, 2017

This seems to be the reason by gri:

For one, have you tried actually creating the slice of strings:
var smallInts = []string{"0", "1", "2", ...} ? Then you don't need an extra if and you can just index into it. You don't even need the formatSmallInt function and all the conversion stuff around it.

@ericlagergren

This comment has been minimized.

Show comment
Hide comment
@ericlagergren

ericlagergren Mar 16, 2017

Contributor

@bradfitz yeah, I read through it and it seems like the idea was trading 1.5kb for two subtractions (-10 and -8). Thus my question.

@mvdan right, I was just asking why the CL went that direction since I wasn't aware I could comment on the CL itself. :-)

Contributor

ericlagergren commented Mar 16, 2017

@bradfitz yeah, I read through it and it seems like the idea was trading 1.5kb for two subtractions (-10 and -8). Thus my question.

@mvdan right, I was just asking why the CL went that direction since I wasn't aware I could comment on the CL itself. :-)

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Mar 16, 2017

Member

It's possible @griesemer didn't consider the binary size bloat when making his decision if there was no discussion of it previously.

Member

bradfitz commented Mar 16, 2017

It's possible @griesemer didn't consider the binary size bloat when making his decision if there was no discussion of it previously.

@ericlagergren

This comment has been minimized.

Show comment
Hide comment
@ericlagergren

ericlagergren Mar 16, 2017

Contributor

@bradfitz That was my concern that sparked the question–I know in the past y'all have made decisions with binary bloat in mind, so I was surprised the CL went with a larger array vs a smaller string. Which led to me wondering if there were other reasons.

Contributor

ericlagergren commented Mar 16, 2017

@bradfitz That was my concern that sparked the question–I know in the past y'all have made decisions with binary bloat in mind, so I was surprised the CL went with a larger array vs a smaller string. Which led to me wondering if there were other reasons.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Mar 16, 2017

Contributor
Contributor

griesemer commented Mar 16, 2017

@ericlagergren

This comment has been minimized.

Show comment
Hide comment
@ericlagergren

ericlagergren Mar 16, 2017

Contributor

@griesemer thanks for clarifying it for me.

I'll submit a CL if you want, but if CL 38255 is gonna go through then it doesn't seem like there's any point to change it to a const string since it'll just get changed back.

Contributor

ericlagergren commented Mar 16, 2017

@griesemer thanks for clarifying it for me.

I'll submit a CL if you want, but if CL 38255 is gonna go through then it doesn't seem like there's any point to change it to a const string since it'll just get changed back.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Mar 16, 2017

Contributor
Contributor

griesemer commented Mar 16, 2017

@gopherbot

This comment has been minimized.

Show comment
Hide comment

CL https://golang.org/cl/38255 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 22, 2017

strconv: optimize decimal ints formatting with smallsString
Benchmark results for GOARCH=amd64:

name                                     old time/op  new time/op  delta
FormatInt-4                              2.51µs ± 2%  2.40µs ± 2%   -4.51%  (p=0.000 n=9+10)
AppendInt-4                              1.67µs ± 2%  1.61µs ± 3%   -3.74%  (p=0.000 n=9+9)
FormatUint-4                              698ns ± 2%   643ns ± 3%   -7.95%  (p=0.000 n=10+8)
AppendUint-4                              478ns ± 1%   418ns ± 2%  -12.61%  (p=0.000 n=8+10)
AppendUintVarlen/1-4                     9.30ns ± 6%  9.15ns ± 1%     ~     (p=0.199 n=9+10)
AppendUintVarlen/12-4                    9.12ns ± 0%  9.16ns ± 2%     ~     (p=0.307 n=9+9)
AppendUintVarlen/123-4                   18.6ns ± 2%  18.7ns ± 0%     ~     (p=0.091 n=10+6)
AppendUintVarlen/1234-4                  19.1ns ± 4%  17.7ns ± 1%   -7.35%  (p=0.000 n=10+9)
AppendUintVarlen/12345-4                 21.5ns ± 3%  20.7ns ± 3%   -3.78%  (p=0.002 n=9+10)
AppendUintVarlen/123456-4                23.5ns ± 3%  20.9ns ± 1%  -11.14%  (p=0.000 n=10+9)
AppendUintVarlen/1234567-4               25.0ns ± 2%  23.6ns ± 7%   -5.48%  (p=0.004 n=9+10)
AppendUintVarlen/12345678-4              26.8ns ± 2%  23.4ns ± 2%  -12.79%  (p=0.000 n=9+10)
AppendUintVarlen/123456789-4             29.8ns ± 3%  26.5ns ± 5%  -11.03%  (p=0.000 n=10+10)
AppendUintVarlen/1234567890-4            31.6ns ± 3%  26.9ns ± 3%  -14.95%  (p=0.000 n=10+9)
AppendUintVarlen/12345678901-4           33.8ns ± 3%  29.3ns ± 5%  -13.21%  (p=0.000 n=10+10)
AppendUintVarlen/123456789012-4          35.5ns ± 4%  29.2ns ± 4%  -17.82%  (p=0.000 n=10+10)
AppendUintVarlen/1234567890123-4         37.6ns ± 4%  31.4ns ± 3%  -16.48%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234-4        39.8ns ± 6%  32.0ns ± 7%  -19.60%  (p=0.000 n=10+10)
AppendUintVarlen/123456789012345-4       40.7ns ± 0%  34.4ns ± 4%  -15.55%  (p=0.000 n=6+10)
AppendUintVarlen/1234567890123456-4      45.4ns ± 6%  35.1ns ± 4%  -22.66%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234567-4     45.1ns ± 1%  36.7ns ± 4%  -18.77%  (p=0.000 n=9+10)
AppendUintVarlen/123456789012345678-4    46.9ns ± 0%  36.4ns ± 3%  -22.49%  (p=0.000 n=9+10)
AppendUintVarlen/1234567890123456789-4   50.6ns ± 6%  38.8ns ± 3%  -23.28%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234567890-4  51.3ns ± 2%  38.4ns ± 0%  -25.00%  (p=0.000 n=9+8)

Benchmark results for GOARCH=386:

name                                     old time/op  new time/op  delta
FormatInt-4                              6.21µs ± 0%  6.14µs ± 0%  -1.11%  (p=0.008 n=5+5)
AppendInt-4                              4.95µs ± 0%  4.85µs ± 0%  -1.99%  (p=0.016 n=5+4)
FormatUint-4                             1.89µs ± 1%  1.83µs ± 1%  -2.94%  (p=0.008 n=5+5)
AppendUint-4                             1.59µs ± 0%  1.57µs ± 2%  -1.72%  (p=0.040 n=5+5)
FormatIntSmall-4                         8.48ns ± 0%  8.48ns ± 0%    ~     (p=0.905 n=5+5)
AppendIntSmall-4                         12.2ns ± 0%  12.2ns ± 0%    ~     (all equal)
AppendUintVarlen/1-4                     10.6ns ± 1%  10.7ns ± 0%    ~     (p=0.238 n=5+4)
AppendUintVarlen/12-4                    10.7ns ± 0%  10.7ns ± 1%    ~     (p=0.333 n=4+5)
AppendUintVarlen/123-4                   29.9ns ± 1%  30.2ns ± 0%  +1.07%  (p=0.016 n=5+4)
AppendUintVarlen/1234-4                  32.4ns ± 1%  30.4ns ± 0%  -6.30%  (p=0.008 n=5+5)
AppendUintVarlen/12345-4                 35.1ns ± 2%  34.9ns ± 0%    ~     (p=0.238 n=5+5)
AppendUintVarlen/123456-4                36.6ns ± 0%  35.3ns ± 0%  -3.55%  (p=0.029 n=4+4)
AppendUintVarlen/1234567-4               38.9ns ± 0%  39.6ns ± 0%  +1.80%  (p=0.029 n=4+4)
AppendUintVarlen/12345678-4              41.3ns ± 0%  40.1ns ± 0%  -2.91%  (p=0.000 n=5+4)
AppendUintVarlen/123456789-4             44.9ns ± 1%  44.8ns ± 0%    ~     (p=0.667 n=5+5)
AppendUintVarlen/1234567890-4            65.6ns ± 0%  66.2ns ± 1%  +0.88%  (p=0.016 n=4+5)
AppendUintVarlen/12345678901-4           77.9ns ± 0%  76.3ns ± 0%  -2.00%  (p=0.000 n=4+5)
AppendUintVarlen/123456789012-4          80.7ns ± 0%  79.1ns ± 1%  -2.01%  (p=0.008 n=5+5)
AppendUintVarlen/1234567890123-4         83.6ns ± 0%  80.2ns ± 1%  -4.07%  (p=0.008 n=5+5)
AppendUintVarlen/12345678901234-4        86.2ns ± 1%  83.3ns ± 0%  -3.39%  (p=0.008 n=5+5)
AppendUintVarlen/123456789012345-4       88.5ns ± 0%  83.7ns ± 0%  -5.42%  (p=0.008 n=5+5)
AppendUintVarlen/1234567890123456-4      90.6ns ± 0%  88.3ns ± 0%  -2.54%  (p=0.008 n=5+5)
AppendUintVarlen/12345678901234567-4     92.7ns ± 0%  89.0ns ± 1%  -4.01%  (p=0.008 n=5+5)
AppendUintVarlen/123456789012345678-4    95.6ns ± 1%  92.6ns ± 0%  -3.18%  (p=0.016 n=5+4)
AppendUintVarlen/1234567890123456789-4    118ns ± 0%   114ns ± 0%    ~     (p=0.079 n=4+5)
AppendUintVarlen/12345678901234567890-4   138ns ± 0%   136ns ± 0%  -1.45%  (p=0.008 n=5+5)

Updates #19445

Change-Id: Iafbe5c074898187c150dc3854e5b9fc19c10be05
Reviewed-on: https://go-review.googlesource.com/38255
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>

@golang golang locked and limited conversation to collaborators Mar 22, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.