Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/base64: encoding is slow #20206

markdryan opened this issue May 2, 2017 · 9 comments

encoding/base64: encoding is slow #20206

markdryan opened this issue May 2, 2017 · 9 comments
NeedsFix The path to resolution is known, but the work has not been done. Performance


Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go 1.8.1

What operating system and processor architecture are you using (go env)?

amd64, darwin

What did you do?

go test encoding/base64 --bench=BenchmarkEncodeToString

What did you expect to see?

It should be possible to increase the speed of the encoder by 3 or 4 times by using SIMD instructions. So ideally, I'd like to see something like

go test --bench=BenchmarkEncodeToString
BenchmarkEncodeToString-8 500000 3777 ns/op 2168.51 MB/s

What did you see instead?

BenchmarkEncodeToString-8 200000 11393 ns/op 719.01 MB/s
ok encoding/base64 2.406s

Copy link
Contributor Author

The following patch

optimises the base64 encoder to use SSSE3 instructions on amd64. This new SSSE3 encoder is 3 times faster than the existing Go encoder on a Mac Book Pro with an Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz running Sierra.

    name              old time/op   new time/op    delta
    EncodeToString-8   11.2µs ± 0%     3.7µs ± 1%   -66.70%  (p=0.008 n=5+5)
    name              old speed     new speed      delta
    EncodeToString-8  730MB/s ± 0%  2191MB/s ± 1%  +200.26%  (p=0.008 n=5+5)

It is however, slightly slower, ~5%, on amd64 when used to encode data less than 12 bytes in size. This is mainly due to the extra function call, encodeAccelerated, which is not inlined on amd64. If this is not acceptable, I could completely replace Encoding.Encode on amd64, but doing so would involve some code duplication.

I'm also working on an AVX2 version of the encoder which I will submit once the SSSE3 patch has been reviewed.

@ALTree ALTree changed the title base64 encoding is slow encoding/base64: encoding is slow May 2, 2017
Copy link

mvdan commented May 2, 2017

It is however, slightly slower, ~5%, on amd64 when used to encode data less than 12 bytes in size. This is mainly due to the extra function call, encodeAccelerated, which is not inlined on amd64.

If you're talking about the assembly function not being inlineable, I think that's expected now. But not sure if can be fixed in the future.

You could always try a hybrid that uses SIMD for sizes >=N, and the existing code for the rest. An if should be cheaper than a function call. But I'd say the 5% is acceptable if it means better performance for larger lengths and less complexity.

Since we're in the freeze though, this would be for 1.10.

Copy link

CL mentions this issue.

@bradfitz bradfitz added this to the Go1.10 milestone May 2, 2017
@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label May 2, 2017
Copy link
Contributor Author

I guess the main argument for optimizing the current base64 package would be that it appears to be quite widely used1 and yet is 3-4 times slower on modern hardware than it needs to be. The problem is though, to get this 3-4x speedup on a single core, we would need to use SIMD optimized algorithms. I don't currently see a way to take advantage of the tricks used by these algorithms in pure Go, hence the assembly language patch.

I appreciate assembly language comes with a maintenance burden, as noted in the comments on the patch in gerrit, and that this probably makes the patch un-mergeable. It may still make sense to keep the bug open, however, given the relative poor performance of the current encoder. Note, there is a similar issue (#19636) that references the same SIMD algorithms for the base64 decoder.

1 At the time of writing, Godoc reports 11773 imports of encoding/base64. To put this in context there are 1131 importers of aes, 3065 of encoding/gob and 56012 of encoding/json.

Copy link

bradfitz commented May 5, 2017

Based on Google-Wide Profiling at least, base64 decode seems much hotter than encoding. For base64 standard library code:

62.23% encoding/base64.(*Encoding).decode
17.92% encoding/base64.(*Encoding).Encode
10.16% encoding/base64.(*Encoding).DecodeString
 8.71% encoding/base64.(*Encoding).EncodeToString 
 0.87% encoding/base64.(*Encoding).Decode

Assuming this is representative of usage as a whole (which seems somewhat fair, considering this is sampling all Go applications inside Google), effort would be best spent optimizing decode.

We can definitely keep this tracking bug open, though. It just might not deserve assembly.

Copy link

I have some SSE code for the decoding that give a nice perf boost (~3GB/s) without using big lookup tables but I stopped working on that code after my patch for SSE bytes counting was accepted as I didn't want to add yet more assembly code to go.
Now I think I'll finish my work on the decode and submit it anyway so we can discuss actual code and decide if we want it or not.

Copy link

lemire commented Oct 30, 2017

While in CPU cache, you should be able to encode and decode base64 at something like 0.25 CPU cycles per byte or around 10 GB/s on a recent Intel/AMD processor. It should be so fast that you should literally have a hard time measuring the running time. It is only a few times slower than a memory copy.

The following paper might be relevant... Faster Base64 Encoding and Decoding Using AVX2 Instructions:

Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding (~10x) and the decoding (~7x). We achieve these good results by using the single-instruction-multiple-data (SIMD) instructions available on recent Intel processors (AVX2). Our accelerated software abides by the specification and reports errors when encountering characters outside of the base64 set. It is available online as free software under a liberal license.

c.c. @WojciechMula

@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 15, 2017
@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018
Copy link

mvdan commented Nov 25, 2018

I think there's still quite a bit to be optimized in pure Go. For example, I just found a 6% performance win with a single line to get rid of a nil check in the encoding hot loop :) I'll send that CL shortly.

Copy link

Change mentions this issue: encoding/base64: lift nil check out of encode loop

gopherbot pushed a commit that referenced this issue Mar 3, 2019
Most of the encoding time is spent in the first Encode loop, since the
rest of the function only deals with the few remaining bytes. Any
unnecessary work done in that loop body matters tremendously.

One such unnecessary bottleneck was the use of the enc.encode table.
Since enc is a pointer receiver, and the field is first used within the
loop, the encoder must perform a nil check at every iteration.

Add a dummy use of the field before the start of the loop, to move the
nil check there. After that line, the compiler now knows that enc can't
be nil, and thus the hot loop is free of nil checks.

name              old time/op    new time/op    delta
EncodeToString-4    14.7µs ± 0%    13.7µs ± 1%  -6.53%  (p=0.000 n=10+10)

name              old speed      new speed      delta
EncodeToString-4   559MB/s ± 0%   598MB/s ± 1%  +6.99%  (p=0.000 n=10+10)

Updates #20206.

Change-Id: Icbb523a7bd9e470a8be0a448d1d78ade97ed4ff6
Run-TryBot: Daniel Martí <>
TryBot-Result: Gobot Gobot <>
Reviewed-by: Brad Fitzpatrick <>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
NeedsFix The path to resolution is known, but the work has not been done. Performance
None yet

No branches or pull requests

6 participants