Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/base64: decoding is slow #19636

Open
josselin-c opened this issue Mar 21, 2017 · 9 comments

Comments

@josselin-c
Copy link
Contributor

commented Mar 21, 2017

What version of Go are you using (go version)?

Go 1.8

What operating system and processor architecture are you using (go env)?

amd64

What did you do?

On my slow computer, using encoding/base64, I can decode data at ~100MB/s.
It should be much faster as shown by https://github.com/aklomp/base64

I'm planning to work on this in my spare time. This issue tracks this effort.

@gopherbot

This comment has been minimized.

Copy link

commented Mar 21, 2017

CL https://golang.org/cl/34950 mentions this issue.

@dgryski

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

Also https://github.com/powturbo/TurboBase64 for a fast encoder/decoder that doesn't need assembly.

@ALTree ALTree changed the title encoding/base64 decoding is slow encoding/base64: decoding is slow Mar 21, 2017

@ALTree ALTree added the Performance label Mar 21, 2017

@bradfitz bradfitz added this to the Go1.9Maybe milestone Mar 21, 2017

@gopherbot

This comment has been minimized.

Copy link

commented Mar 24, 2017

CL https://golang.org/cl/38632 mentions this issue.

@josselin-c

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2017

@dgryski I saw such LUT based and I'm wondering if it's acceptable. What's go philosophy about embedding "large" LUT? Won't such tables trash the L1 when used and penalize the rest of the application?
I have an SSE implementation in the pipe that will be another improvement to CL 38632 but it's limited to amd64.

gopherbot pushed a commit that referenced this issue Apr 24, 2017
encoding/base64: Optimize DecodeString
Optimize DecodeString for the common case where most of the input isn't
a newline or a padding character.
Also add some testcases found when fuzzing this implementation against
upstream.
Change Decode benchmark to run with different input sizes.

name                 old time/op    new time/op    delta
DecodeString/2-4       71.5ns ± 4%    70.0ns ± 6%     ~     (p=0.246 n=5+5)
DecodeString/4-4        112ns ±25%      91ns ± 2%     ~     (p=0.056 n=5+5)
DecodeString/8-4        136ns ± 5%     126ns ± 5%   -7.33%  (p=0.016 n=5+5)
DecodeString/64-4       872ns ±29%     652ns ±21%  -25.23%  (p=0.032 n=5+5)
DecodeString/8192-4    90.9µs ±21%    61.0µs ±13%  -32.87%  (p=0.008 n=5+5)

name                 old speed      new speed      delta
DecodeString/2-4     56.0MB/s ± 4%  57.2MB/s ± 6%     ~     (p=0.310 n=5+5)
DecodeString/4-4     73.4MB/s ±23%  87.7MB/s ± 2%     ~     (p=0.056 n=5+5)
DecodeString/8-4     87.8MB/s ± 5%  94.8MB/s ± 5%   +7.98%  (p=0.016 n=5+5)
DecodeString/64-4     103MB/s ±24%   136MB/s ±19%  +32.63%  (p=0.032 n=5+5)
DecodeString/8192-4   122MB/s ±19%   180MB/s ±11%  +47.75%  (p=0.008 n=5+5)

Improves #19636

Change-Id: I39667f4fb682a12b3137946d017ad999553c5780
Reviewed-on: https://go-review.googlesource.com/34950
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe May 24, 2017

gopherbot pushed a commit that referenced this issue Oct 9, 2017
encoding/base64: optimize DecodeString
Optimize base64 decoding speed by adding 32-bits and 64-bits specialized
methods that don't perform any error checking and fall back to the more
complex decodeQuantum method when a non-base64 character is present.

On a 64-bits cpu:

name                 old time/op    new time/op     delta
DecodeString/2-4       70.0ns ± 6%     69.2ns ± 0%     ~     (p=0.169 n=5+8)
DecodeString/4-4       91.3ns ± 2%     80.4ns ± 0%  -11.89%  (p=0.001 n=5+10)
DecodeString/8-4        126ns ± 5%      106ns ± 0%  -16.14%  (p=0.000 n=5+7)
DecodeString/64-4       652ns ±21%      361ns ± 0%  -44.57%  (p=0.000 n=5+7)
DecodeString/8192-4    61.0µs ±13%     31.5µs ± 1%  -48.38%  (p=0.001 n=5+9)

name                 old speed      new speed       delta
DecodeString/2-4     57.2MB/s ± 6%   57.7MB/s ± 2%     ~     (p=0.419 n=5+9)
DecodeString/4-4     87.7MB/s ± 2%   99.5MB/s ± 0%  +13.45%  (p=0.001 n=5+10)
DecodeString/8-4     94.8MB/s ± 5%  112.6MB/s ± 1%  +18.82%  (p=0.001 n=5+9)
DecodeString/64-4     136MB/s ±19%    243MB/s ± 0%  +78.17%  (p=0.003 n=5+7)
DecodeString/8192-4   180MB/s ±11%    347MB/s ± 1%  +92.94%  (p=0.001 n=5+9)

Improves #19636

Change-Id: Ic10a454851093a7e1d46ca0c140deed73535d990
Reviewed-on: https://go-review.googlesource.com/38632
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 15, 2017

@powturbo

This comment has been minimized.

Copy link

commented Feb 6, 2018

@josselin-c: I saw such LUT based and I'm wondering if it's acceptable.

Strange rumours, but the short scalar "turbob64encs" encoding uses 64 bytes and the "turbob64decs" decoding function just uses 80 bytes (from 256 bytes LUT) delivering 2 GB/s in practical szenarios, This is less memory than other SIMD/AVX2 base64 functions. The fast "turbob64dec" function needs in fact less than 1k and is decoding faster than other scalar/SSE functions. See benchmark: TurboBabse64

quasilyte added a commit to quasilyte/go-contributing-ru that referenced this issue Apr 1, 2018
tasks: add several new tracked tasks
New tasks include:
golang/go#19675 cmd/vet: report uses of -0 in float32/64 context
golang/go#19683 cmd/compile: eliminate usages of global lineno
golang/go#19670 x/tools/go/ssa: make opaqueType less annoying to use
golang/go#19636 encoding/base64: decoding is slow
golang/go#23471 x/perf/cmd/benchstat: tips or quickstart for newcomers
golang/go#19577 test: errorcheck support for intraline errors
golang/go#19490 cmd/vet: reduce the amount of false positives for -shadow mode
golang/go#19042 cmd/internal/obj: optimize wrapper method prologue for branch prediction
golang/go#19013 cmd/compile: add tool for understanding/debugging SSA rules

@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

@gopherbot

This comment has been minimized.

Copy link

commented May 18, 2018

Change https://golang.org/cl/113776 mentions this issue: encoding/base64: slight decoding speed-up

gopherbot pushed a commit that referenced this issue Aug 22, 2018
encoding/base64: slight decoding speed-up
First, use a dummy slice access on decode64 and decode32 to ensure that
there is a single bounds check for src.

Second, move the PutUint64/PutUint32 calls out of these functions,
meaning that they are simpler and smaller. This may also open the door
to inlineability in the future, but for now, they both go past the
budget.

While at it, get rid of the ilen and olen variables, which have no
impact whatsoever on performance. At least, not measurable by any of the
benchmarks.

name                 old time/op    new time/op    delta
DecodeString/2-4       54.3ns ± 1%    55.2ns ± 2%   +1.60%  (p=0.017 n=5+6)
DecodeString/4-4       66.6ns ± 1%    66.8ns ± 2%     ~     (p=0.903 n=6+6)
DecodeString/8-4       79.3ns ± 2%    79.6ns ± 1%     ~     (p=0.448 n=6+6)
DecodeString/64-4       300ns ± 1%     281ns ± 3%   -6.54%  (p=0.002 n=6+6)
DecodeString/8192-4    27.4µs ± 1%    23.7µs ± 2%  -13.47%  (p=0.002 n=6+6)

name                 old speed      new speed      delta
DecodeString/2-4     73.7MB/s ± 1%  72.5MB/s ± 2%   -1.55%  (p=0.026 n=5+6)
DecodeString/4-4      120MB/s ± 1%   120MB/s ± 2%     ~     (p=0.851 n=6+6)
DecodeString/8-4      151MB/s ± 2%   151MB/s ± 1%     ~     (p=0.485 n=6+6)
DecodeString/64-4     292MB/s ± 1%   313MB/s ± 3%   +7.03%  (p=0.002 n=6+6)
DecodeString/8192-4   399MB/s ± 1%   461MB/s ± 2%  +15.58%  (p=0.002 n=6+6)

For #19636.

Change-Id: I0dfbdafa2a41dc4c582f63aef94b90b8e473731c
Reviewed-on: https://go-review.googlesource.com/113776
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Nov 25, 2018

Change https://golang.org/cl/151177 mentions this issue: encoding/base64: lift nil check out of decode loop

@gopherbot

This comment has been minimized.

Copy link

commented Nov 25, 2018

Change https://golang.org/cl/151197 mentions this issue: encoding/json: make decode32/decode64 inlineable

@mvdan

This comment has been minimized.

Copy link
Member

commented Nov 25, 2018

With the two CLs above, the decoder goes from ~500MB/s to ~630MB/s on an 8KiB input. Note that this is on a 2014 ultrabook locked at 70% cpu frequency, to prevent overheating and throttling.

The only remaining bottleneck in the pure Go code that I can see is how it has more bounds checks than needed per decoded chunk of bytes. A couple for every 8 bytes should be enough, but it currently does at least eight. I think if we can fix that via #28942, it should give another nice 5-10% speed-up.

gopherbot pushed a commit that referenced this issue Mar 13, 2019
encoding/base64: speed up the decoder
Most of the decoding time is spent in the first Decode loop, since the
rest of the function only deals with the few remaining bytes. Any
unnecessary work done in that loop body matters tremendously.

One such unnecessary bottleneck was the use of the enc.decodeMap table.
Since enc is a pointer receiver, and the field is used within the
non-inlineable function decode64, the decoder must perform a nil check
at every iteration.

To fix that, move the enc.decodeMap uses to the parent function, where
we can lift the nil check outside the loop. That gives roughly a 15%
speed-up. The function no longer performs decoding per se, so rename it.
While at it, remove the now unnecessary receivers.

An unfortunate side effect of this change is that the loop now contains
eight bounds checks on src instead of just one. However, not having to
slice src plus the nil check removal well outweigh the added cost.

The other piece that made decode64 slow was that it wasn't inlined, and
had multiple branches. Use a simple bitwise-or trick suggested by Roger
Peppe, and collapse the rest of the bitwise logic into a single
expression. Inlinability and the reduced branching give a further 10%
speed-up.

Finally, add these two functions to TestIntendedInlining, since we want
them to stay inlinable.

Apply the same refactor to decode32 for consistency, and to let 32-bit
architectures see a similar performance gain for large inputs.

name                 old time/op    new time/op    delta
DecodeString/2-8       47.3ns ± 1%    45.8ns ± 0%   -3.28%  (p=0.002 n=6+6)
DecodeString/4-8       55.8ns ± 2%    51.5ns ± 0%   -7.71%  (p=0.004 n=5+6)
DecodeString/8-8       64.9ns ± 0%    61.7ns ± 0%   -4.99%  (p=0.004 n=5+6)
DecodeString/64-8       238ns ± 0%     198ns ± 0%  -16.54%  (p=0.002 n=6+6)
DecodeString/8192-8    19.5µs ± 0%    14.6µs ± 0%  -24.96%  (p=0.004 n=6+5)

name                 old speed      new speed      delta
DecodeString/2-8     84.6MB/s ± 1%  87.4MB/s ± 0%   +3.38%  (p=0.002 n=6+6)
DecodeString/4-8      143MB/s ± 2%   155MB/s ± 0%   +8.41%  (p=0.004 n=5+6)
DecodeString/8-8      185MB/s ± 0%   195MB/s ± 0%   +5.29%  (p=0.004 n=5+6)
DecodeString/64-8     369MB/s ± 0%   442MB/s ± 0%  +19.78%  (p=0.002 n=6+6)
DecodeString/8192-8   560MB/s ± 0%   746MB/s ± 0%  +33.27%  (p=0.004 n=6+5)

Updates #19636.

Change-Id: Ib839577b0e3f5a2bb201f5cae580c61365d92894
Reviewed-on: https://go-review.googlesource.com/c/go/+/151177
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: roger peppe <rogpeppe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.