Skip to content

Commit 0a5dc67

Browse files
committed
[klauspost/deflate-improve-comp] compress/flate: improve compression speed
Fixes #75532 This improves the compression speed of the flate package. This is a cleaned version of github.com/klauspost/compress/flate Overall changes: * Compression level 2-6 are custom implementations. * Compression level 7-9 tweaked to match levels 2-6 with minor improvements. * Tokens are encoded and indexed when added. * Huffman encoding attempts to continue blocks instead of always starting a new one. * Loads/Stores in separate functions and can be made to use unsafe. In overall terms this attempts to better balance out the compression levels, which tended to have little spread in the top levels. The intention is to place "default" at the place where performance drops off considerably without a proportional improvement in compression ratio. In my package I have set "5" to be the default, but this keeps it at level 6. There are built-in benchmarks using the standard library's benchmark below. I do not think this is a particular good representation of different data types, so I have also done benchmarks on various data types. I have compiled the benchmarks on https://stdeflate.klauspost.com/ The main focus has been on level 1 (fastest), level 5+6 (default) and level 9 (smallest). It is quite rare that levels outside of this are used, but they should still fit their role reasonably. Level 9 will attempt more aggressive compression, but will also typically be slightly slower than before. I hope the graphs above shows that focusing on a few data types doesn't always give the full picture. My own observations: Level 1 and 2 are often "trading places" depending on data type. Since level 1 is usually the lowest compressing of the two - and mostly slightly faster, with lower memory usage - it is placed as the lowest. The switchover between level 6 and 7 is not always smooth, since the search method changes significantly. Random data is now ~100x faster on levels 2-6, and ~3 faster on levels 7-9. You can feed pre-compressed data with no significant speed penalty. "Unsafe" operations have been removed for now. They can trivially be added back. This is an approximately 10% speed penalty. benchmark old ns/op new ns/op delta BenchmarkEncode/Digits/Huffman/1e4-32 11431 8001 -30.01% BenchmarkEncode/Digits/Huffman/1e5-32 123175 74780 -39.29% BenchmarkEncode/Digits/Huffman/1e6-32 1260402 750022 -40.49% BenchmarkEncode/Digits/Speed/1e4-32 35100 23758 -32.31% BenchmarkEncode/Digits/Speed/1e5-32 675355 385954 -42.85% BenchmarkEncode/Digits/Speed/1e6-32 6878375 4873784 -29.14% BenchmarkEncode/Digits/Default/1e4-32 63411 40974 -35.38% BenchmarkEncode/Digits/Default/1e5-32 1815762 801563 -55.86% BenchmarkEncode/Digits/Default/1e6-32 18875894 8101836 -57.08% BenchmarkEncode/Digits/Compression/1e4-32 63859 85275 +33.54% BenchmarkEncode/Digits/Compression/1e5-32 1803745 2752174 +52.58% BenchmarkEncode/Digits/Compression/1e6-32 18931995 30727403 +62.30% BenchmarkEncode/Newton/Huffman/1e4-32 15770 11108 -29.56% BenchmarkEncode/Newton/Huffman/1e5-32 134567 85103 -36.76% BenchmarkEncode/Newton/Huffman/1e6-32 1663889 1030186 -38.09% BenchmarkEncode/Newton/Speed/1e4-32 32749 22934 -29.97% BenchmarkEncode/Newton/Speed/1e5-32 565609 336750 -40.46% BenchmarkEncode/Newton/Speed/1e6-32 5996011 3815437 -36.37% BenchmarkEncode/Newton/Default/1e4-32 70505 34148 -51.57% BenchmarkEncode/Newton/Default/1e5-32 2374066 570673 -75.96% BenchmarkEncode/Newton/Default/1e6-32 24562355 5975917 -75.67% BenchmarkEncode/Newton/Compression/1e4-32 71505 77670 +8.62% BenchmarkEncode/Newton/Compression/1e5-32 3345768 3730804 +11.51% BenchmarkEncode/Newton/Compression/1e6-32 35770364 39768939 +11.18% benchmark old MB/s new MB/s speedup BenchmarkEncode/Digits/Huffman/1e4-32 874.80 1249.91 1.43x BenchmarkEncode/Digits/Huffman/1e5-32 811.86 1337.25 1.65x BenchmarkEncode/Digits/Huffman/1e6-32 793.40 1333.29 1.68x BenchmarkEncode/Digits/Speed/1e4-32 284.90 420.91 1.48x BenchmarkEncode/Digits/Speed/1e5-32 148.07 259.10 1.75x BenchmarkEncode/Digits/Speed/1e6-32 145.38 205.18 1.41x BenchmarkEncode/Digits/Default/1e4-32 157.70 244.06 1.55x BenchmarkEncode/Digits/Default/1e5-32 55.07 124.76 2.27x BenchmarkEncode/Digits/Default/1e6-32 52.98 123.43 2.33x BenchmarkEncode/Digits/Compression/1e4-32 156.59 117.27 0.75x BenchmarkEncode/Digits/Compression/1e5-32 55.44 36.33 0.66x BenchmarkEncode/Digits/Compression/1e6-32 52.82 32.54 0.62x BenchmarkEncode/Newton/Huffman/1e4-32 634.13 900.25 1.42x BenchmarkEncode/Newton/Huffman/1e5-32 743.12 1175.04 1.58x BenchmarkEncode/Newton/Huffman/1e6-32 601.00 970.70 1.62x BenchmarkEncode/Newton/Speed/1e4-32 305.35 436.03 1.43x BenchmarkEncode/Newton/Speed/1e5-32 176.80 296.96 1.68x BenchmarkEncode/Newton/Speed/1e6-32 166.78 262.09 1.57x BenchmarkEncode/Newton/Default/1e4-32 141.83 292.84 2.06x BenchmarkEncode/Newton/Default/1e5-32 42.12 175.23 4.16x BenchmarkEncode/Newton/Default/1e6-32 40.71 167.34 4.11x BenchmarkEncode/Newton/Compression/1e4-32 139.85 128.75 0.92x BenchmarkEncode/Newton/Compression/1e5-32 29.89 26.80 0.90x BenchmarkEncode/Newton/Compression/1e6-32 27.96 25.15 0.90x Static Memory Usage: Before: Level -2: Memory Used: 704KB, 8 allocs Level -1: Memory Used: 776KB, 7 allocs Level 0: Memory Used: 704KB, 7 allocs Level 1: Memory Used: 1160KB, 13 allocs Level 2: Memory Used: 776KB, 8 allocs Level 3: Memory Used: 776KB, 8 allocs Level 4: Memory Used: 776KB, 8 allocs Level 5: Memory Used: 776KB, 8 allocs Level 6: Memory Used: 776KB, 8 allocs Level 7: Memory Used: 776KB, 8 allocs Level 8: Memory Used: 776KB, 9 allocs Level 9: Memory Used: 776KB, 8 allocs After: Level -2: Memory Used: 272KB, 12 allocs Level -1: Memory Used: 1016KB, 7 allocs Level 0: Memory Used: 304KB, 6 allocs Level 1: Memory Used: 760KB, 13 allocs Level 2: Memory Used: 1144KB, 8 allocs Level 3: Memory Used: 1144KB, 8 allocs Level 4: Memory Used: 888KB, 14 allocs Level 5: Memory Used: 1016KB, 8 allocs Level 6: Memory Used: 1016KB, 8 allocs Level 7: Memory Used: 952KB, 7 allocs Level 8: Memory Used: 952KB, 7 allocs Level 9: Memory Used: 1080KB, 9 allocs This package has been fuzz tested for about 24 hours. Currently, there is about 1h between new "interesting" finds. Change-Id: Icb4c9839dc8f1bb96fd6d548038679a7554a559b
1 parent 3cf1aaf commit 0a5dc67

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+3894
-1537
lines changed

src/compress/flate/deflate.go

Lines changed: 507 additions & 351 deletions
Large diffs are not rendered by default.

src/compress/flate/deflate_test.go

Lines changed: 148 additions & 555 deletions
Large diffs are not rendered by default.

src/compress/flate/deflatefast.go

Lines changed: 129 additions & 263 deletions
Large diffs are not rendered by default.

src/compress/flate/dict_decoder.go

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,7 @@ func (dd *dictDecoder) writeCopy(dist, length int) int {
104104
dstBase := dd.wrPos
105105
dstPos := dstBase
106106
srcPos := dstPos - dist
107-
endPos := dstPos + length
108-
if endPos > len(dd.hist) {
109-
endPos = len(dd.hist)
110-
}
107+
endPos := min(dstPos+length, len(dd.hist))
111108

112109
// Copy non-overlapping section after destination position.
113110
//
@@ -160,8 +157,10 @@ func (dd *dictDecoder) tryWriteCopy(dist, length int) int {
160157
srcPos := dstPos - dist
161158

162159
// Copy possibly overlapping section before destination position.
163-
for dstPos < endPos {
164-
dstPos += copy(dd.hist[dstPos:endPos], dd.hist[srcPos:dstPos])
160+
loop:
161+
dstPos += copy(dd.hist[dstPos:endPos], dd.hist[srcPos:dstPos])
162+
if dstPos < endPos {
163+
goto loop // Avoid for-loop so that this function can be inlined
165164
}
166165

167166
dd.wrPos = dstPos

src/compress/flate/example_test.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ func Example_dictionary() {
9393
var b bytes.Buffer
9494

9595
// Compress the data using the specially crafted dictionary.
96-
zw, err := flate.NewWriterDict(&b, flate.DefaultCompression, []byte(dict))
96+
zw, err := flate.NewWriterDict(&b, flate.BestCompression, []byte(dict))
9797
if err != nil {
9898
log.Fatal(err)
9999
}
@@ -168,6 +168,7 @@ func Example_synchronization() {
168168
wg.Add(1)
169169
go func() {
170170
defer wg.Done()
171+
defer wp.Close()
171172

172173
zw, err := flate.NewWriter(wp, flate.BestSpeed)
173174
if err != nil {

src/compress/flate/fuzz_test.go

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
package flate
2+
3+
import (
4+
"bytes"
5+
"flag"
6+
"io"
7+
"os"
8+
"strconv"
9+
"testing"
10+
)
11+
12+
// Fuzzing tweaks:
13+
var fuzzStartF = flag.Int("start", HuffmanOnly, "Start fuzzing at this level")
14+
var fuzzEndF = flag.Int("end", BestCompression, "End fuzzing at this level (inclusive)")
15+
var fuzzMaxF = flag.Int("max", 1<<20, "Maximum input size")
16+
17+
func TestMain(m *testing.M) {
18+
flag.Parse()
19+
os.Exit(m.Run())
20+
}
21+
22+
// FuzzEncoding tests the fuzzer by doing roundtrips.
23+
// Every input is run through the fuzzer at every level.
24+
// Note: When running the fuzzer, it may hit the 10-second timeout on slower CPUs.
25+
func FuzzEncoding(f *testing.F) {
26+
startFuzz := *fuzzStartF
27+
endFuzz := *fuzzEndF
28+
maxSize := *fuzzMaxF
29+
30+
decoder := NewReader(nil)
31+
buf, buf2 := new(bytes.Buffer), new(bytes.Buffer)
32+
encs := make([]*Writer, endFuzz-startFuzz+1)
33+
for i := range encs {
34+
var err error
35+
encs[i], err = NewWriter(nil, i+startFuzz)
36+
if err != nil {
37+
f.Fatal(err.Error())
38+
}
39+
}
40+
41+
f.Fuzz(func(t *testing.T, data []byte) {
42+
if len(data) > maxSize {
43+
return
44+
}
45+
for level := startFuzz; level <= endFuzz; level++ {
46+
if level == DefaultCompression {
47+
continue // Already covered.
48+
}
49+
msg := "level " + strconv.Itoa(level) + ":"
50+
buf.Reset()
51+
fw := encs[level-startFuzz]
52+
fw.Reset(buf)
53+
n, err := fw.Write(data)
54+
if n != len(data) {
55+
t.Fatal(msg + "short write")
56+
}
57+
if err != nil {
58+
t.Fatal(msg + err.Error())
59+
}
60+
err = fw.Close()
61+
if err != nil {
62+
t.Fatal(msg + err.Error())
63+
}
64+
compressed := buf.Bytes()
65+
err = decoder.(Resetter).Reset(buf, nil)
66+
if err != nil {
67+
t.Fatal(msg + err.Error())
68+
}
69+
data2, err := io.ReadAll(decoder)
70+
if err != nil {
71+
t.Fatal(msg + err.Error())
72+
}
73+
if !bytes.Equal(data, data2) {
74+
t.Fatal(msg + "decompressed not equal")
75+
}
76+
77+
// Do it again...
78+
msg = "level " + strconv.Itoa(level) + " (reset):"
79+
buf2.Reset()
80+
fw.Reset(buf2)
81+
n, err = fw.Write(data)
82+
if n != len(data) {
83+
t.Fatal(msg + "short write")
84+
}
85+
if err != nil {
86+
t.Fatal(msg + err.Error())
87+
}
88+
err = fw.Close()
89+
if err != nil {
90+
t.Fatal(msg + err.Error())
91+
}
92+
compressed2 := buf2.Bytes()
93+
err = decoder.(Resetter).Reset(buf2, nil)
94+
if err != nil {
95+
t.Fatal(msg + err.Error())
96+
}
97+
data2, err = io.ReadAll(decoder)
98+
if err != nil {
99+
t.Fatal(msg + err.Error())
100+
}
101+
if !bytes.Equal(data, data2) {
102+
t.Fatal(msg + "decompressed not equal")
103+
}
104+
// Determinism checks will usually not be reproducible,
105+
// since it often relies on the internal state of the compressor.
106+
if !bytes.Equal(compressed, compressed2) {
107+
t.Fatal(msg + "non-deterministic output")
108+
}
109+
}
110+
})
111+
}

0 commit comments

Comments
 (0)