New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/lzw: reduce garbage in encoder #26535

Open
ericpauley opened this Issue Jul 22, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@ericpauley
Contributor

ericpauley commented Jul 22, 2018

compress/lzw encoders maintain a rather large table (64k) mapping uncompressed bytes to their compressed bits. Because the encoder requires a Close per stream they cannot be reused for multiple streams. This means that lzw creates 64k of garbage per use, which is especially apparent when encoding animation with image/gif.

While we can't easily change the API in go1 to support reuse, we can greatly reduce garbage by pooling the table parameter of encoders. I prototyped this behavior and got the following benchmarks:

name           old time/op    new time/op    delta
Encoder/1e4-8     108µs ± 1%     109µs ± 2%   +1.25%  (p=0.008 n=9+10)
Encoder/1e5-8    1.06ms ± 2%    1.11ms ± 2%   +4.15%  (p=0.000 n=10+9)
Encoder/1e6-8    10.5ms ± 2%    11.0ms ± 2%   +4.37%  (p=0.000 n=10+9)

name           old speed      new speed      delta
Encoder/1e4-8  93.0MB/s ± 1%  91.8MB/s ± 2%   -1.23%  (p=0.008 n=9+10)
Encoder/1e5-8  94.1MB/s ± 2%  90.4MB/s ± 2%   -3.98%  (p=0.000 n=10+9)
Encoder/1e6-8  95.2MB/s ± 2%  91.3MB/s ± 2%   -4.18%  (p=0.000 n=10+9)

name           old alloc/op   new alloc/op   delta
Encoder/1e4-8    77.9kB ± 0%     4.4kB ± 0%  -94.35%  (p=0.000 n=10+9)
Encoder/1e5-8    77.9kB ± 0%     4.4kB ± 0%  -94.33%  (p=0.000 n=10+10)
Encoder/1e6-8    77.9kB ± 0%     5.0kB ± 0%  -93.60%  (p=0.000 n=10+10)

name           old allocs/op  new allocs/op  delta
Encoder/1e4-8      3.00 ± 0%      4.00 ± 0%  +33.33%  (p=0.000 n=10+10)
Encoder/1e5-8      3.00 ± 0%      4.00 ± 0%  +33.33%  (p=0.000 n=10+10)
Encoder/1e6-8      3.00 ± 0%      4.00 ± 0%  +33.33%  (p=0.000 n=10+10)

(Note that these are on top of my existing CL https://go-review.googlesource.com/c/go/+/123478 that improves time/op significantly)

I don't see much use of pools in the standard library, is this not generally a good approach? Even if it is, is the memory overhead worth it for the 4% performance increase?

@dsnet

This comment has been minimized.

Member

dsnet commented Jul 23, 2018

Pools have subtle properties that affect their performance (see #23199 and #22950). While #23199 isn't an issue here since the object is fixed size, #22950 can have adverse affect in all usages of pools in certain workloads.

That being said, why do you say:

While we can't easily change the API in go1 to support reuse

Is that because the API returns io.ReadCloser and io.WriteCloser? A similar problem existed for compress/flate and it was resolved by defining the flate.Resetter interface.

@bcmills bcmills added this to the Go1.12 milestone Jul 23, 2018

@bcmills

This comment has been minimized.

Member

bcmills commented Jul 23, 2018

A pool seems like a poor fit for this use-case, since the lzw's ReadCloser and WriteCloser implementations are not documented to be safe for concurrent use anyway: a pool would add a contended cache line for an API that otherwise needs no cross-core synchronization.

A Resetter interface seems preferable.

@ericpauley

This comment has been minimized.

Contributor

ericpauley commented Jul 25, 2018

Thanks for the insight, a Resetter interface definitely seems like the way to go.

There's also the question of compress/lzw's main consumer, image/gif. As intended, this would reduce the allocation of multi-frame gif encoding significantly. (With a few internal modifications to persist the writer between calls to writeImageBlock. This wouldn't have any impact on the single-frame case, and wouldn't completely remove allocation from gif.EncodeAll.

One option would be for image/gif to have an lzw encoder Pool. Since all encoder use is internal there wouldn't be the same interface concerns as within lzw. Since there's no persistent encoder object there doesn't seem to be a good way to implement a Resetter interface like with lzw.

Maybe 64k garbage is okay for an image encoder, in which case the lzw change may be enough. (Personally I'd prefer that image/gif be near 0 quiescent allocation but that's for a niche use case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment