Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/flate: deflatefast produces corrupted output #41420

Closed
k-ye opened this issue Sep 16, 2020 · 38 comments
Closed

compress/flate: deflatefast produces corrupted output #41420

k-ye opened this issue Sep 16, 2020 · 38 comments
Assignees
Milestone

Comments

@k-ye
Copy link

@k-ye k-ye commented Sep 16, 2020

What version of Go are you using (go version)?

$ go version

go version go1.15 linux/amd64

Does this issue reproduce with the latest release?

I'm able to repro this in since go1.15, including the latest go1.15.2.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE="off"
GOARCH="amd64"
GOBIN="/usr/local/google/home/yekuang/infra/go/bin"
GOCACHE="/usr/local/google/home/yekuang/infra/go/.cache"
GOENV="/usr/local/google/home/yekuang/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/usr/local/google/home/yekuang/infra/go/.vendor/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/google/home/yekuang/infra/go/.vendor:/usr/local/google/home/yekuang/infra/go"
GOPRIVATE=""
GOPROXY="off"
GOROOT="/usr/local/google/home/yekuang/golang/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/google/home/yekuang/golang/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build699870689=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We have compressed the same data using zlib.NewWriterLevel(writer, zlib.BestSpeed), and found that the output between go1.15 and go1.14.2 were different. This difference has eventually led to a data corruption when we uploaded the compressed file to GCS.

Here's the minimal reproducible example I have:

package main

import (
	"bufio"
	"compress/zlib"
	"flag"
	"fmt"
	"io"
	"os"
	"path/filepath"
)

func main() {
	var filename string
	flag.StringVar(&filename, "f", "", "filename")
	flag.Parse()

	fi, err := os.Open(filename)
	if err != nil {
		panic(err)
	}
	defer fi.Close()

	outname := filepath.Base(filename) + "-gzip"
	fo, err := os.Create(outname)
	if err != nil {
		panic(err)
	}
	defer fo.Close()

	fmt.Printf("%s -> %s\n", filename, outname)
	const outBufSize = 1024 * 1024
	foWr := bufio.NewWriterSize(fo, outBufSize)
	compressor, err := zlib.NewWriterLevel(foWr, zlib.BestSpeed)

	buf := make([]byte, outBufSize*3)
	if _, err := io.CopyBuffer(compressor, fi, buf); err != nil {
		compressor.Close()
		panic(err)
	}
	// compressor needs to be closed first to flush the rest of the data
	// into the bufio.Writer
	if err := compressor.Close(); err != nil {
		panic(err)
	}
	if err := foWr.Flush(); err != nil {
		panic(err)
	}
}

The input data was too big to be shared (2.5G). But I can share the data internally (FYI, my LDAP is yekuang@).

What did you expect to see?

No difference in the compressed data between go1.14.2 and go.1.15.

What did you see instead?

Compressed data were different.

Size of the compressed data:

  • go1.14.2: 728571269
  • go1.15: 728570333

I also did a cmp, and they started to differ at byte 363266597.

Let me know if you need more information, thanks!

@k-ye k-ye changed the title Compression behavior change in `compress/zlib` since go1.15 caused GCS data corruption? Compression behavior change in `compress/zlib` since go1.15? Sep 16, 2020
@ALTree ALTree changed the title Compression behavior change in `compress/zlib` since go1.15? compress/zlib: compression behavior change since go1.15? Sep 16, 2020
@ALTree
Copy link
Member

@ALTree ALTree commented Sep 16, 2020

cc @dsnet

@k-ye
Copy link
Author

@k-ye k-ye commented Sep 16, 2020

FWIW, this could be related to the input file size. I tested the script with a smaller file (102MB). The outcomes were the same between different Go versions.

@atetubou
Copy link
Contributor

@atetubou atetubou commented Sep 16, 2020

This difference has eventually led to a data corruption when we uploaded the compressed file to GCS.

Using decompressor is not sufficient to verify data corruption?

@atetubou
Copy link
Contributor

@atetubou atetubou commented Sep 16, 2020

And maybe CLs for #34121 is related

@dsnet
Copy link
Member

@dsnet dsnet commented Sep 16, 2020

It is unclear to me whether this issue is about the compressed output simply being different or that the compressed output is actually invalid (i.e., cannot be decompressed). Can you please clarify?

@dsnet
Copy link
Member

@dsnet dsnet commented Sep 16, 2020

If the output is valid (e.g., can be decompressed), then this is working as expected. The compression libraries make no guarantees that the output remains stable for all time. While that guarantee can be useful in some contexts, if unfortunately means that we can never make changes to the compression algorithm either to improve the speed or to improve the compression ratio, both which are properties that are generally considered more important than stability.

@k-ye
Copy link
Author

@k-ye k-ye commented Sep 16, 2020

Thanks for the advice. Unfortunately, It looks like the data was indeed corrupted, as I got an invalid checksum error when trying to decompress the outcome of the compression.

I added the decompression logic:

package main

import (
	"bufio"
	"compress/zlib"
	"flag"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"runtime"
	"strings"
)

var fileSplit = "---"

func getBaseName(p string) string {
	s := filepath.Base(p)
	return strings.Split(s, fileSplit)[0]
}

func doCompression(filename string) error {
	fi, err := os.Open(filename)
	if err != nil {
		return err
	}
	defer fi.Close()

	outname := fmt.Sprintf("%s%scompressed-%s", getBaseName(filename), fileSplit, runtime.Version())
	fo, err := os.Create(outname)
	if err != nil {
		return err
	}
	defer fo.Close()
	const outBufSize = 1024 * 1024
	foWr := bufio.NewWriterSize(fo, outBufSize)
	compressor, err := zlib.NewWriterLevel(foWr, zlib.BestSpeed)

	fmt.Printf("Compress: %s -> %s\n", filename, outname)
	buf := make([]byte, outBufSize*3)
	if _, err := io.CopyBuffer(compressor, fi, buf); err != nil {
		compressor.Close()
		return err
	}
	// compressor needs to be closed first to flush the rest of the data
	// into the bufio.Writer
	if err := compressor.Close(); err != nil {
		return err
	}
	if err := foWr.Flush(); err != nil {
		return err
	}
	return nil
}

func doDecompression(filename string) error {
	fi, err := os.Open(filename)
	if err != nil {
		return err
	}
	defer fi.Close()
	fiRd, err := zlib.NewReader(fi)
	if err != nil {
		return err
	}

	outname := fmt.Sprintf("%s%sdecompressed-%s", getBaseName(filename), fileSplit, runtime.Version())
	fo, err := os.Create(outname)
	if err != nil {
		return err
	}
	defer fo.Close()
	const outBufSize = 1024 * 1024
	foWr := bufio.NewWriterSize(fo, outBufSize)

	fmt.Printf("Decompress: %s -> %s\n", filename, outname)
	buf := make([]byte, outBufSize*3)
	if _, err := io.CopyBuffer(foWr, fiRd, buf); err != nil {
		fiRd.Close()
		return err
	}
	if err := fiRd.Close(); err != nil {
		return err
	}
	if err := foWr.Flush(); err != nil {
		return err
	}
	return nil
}

func main() {
	var filename string
	var decompression bool
	flag.StringVar(&filename, "f", "", "filename")
	flag.BoolVar(&decompression, "d", false, "should decompress")
	flag.Parse()

	var err error
	if decompression {
		err = doDecompression(filename)
	} else {
		err = doCompression(filename)
	}
	if err != nil {
		panic(err)
	}

}

I first tested this with go1.14.2, by compressing the input, and then decompressing the result of the previous step. The decompressed output is the same as the original input.

$ ./cli -f ~/chromium/src/out/Release/content_browsertests
Compress: ~/chromium/src/out/Release/content_browsertests -> content_browsertests---compressed-go1.14.2

$ ./cli -f content_browsertests---compressed-go1.14.2 -d
Decompress: content_browsertests---compressed-go1.14.2 -> content_browsertests---decompressed-go1.14.2 

$ diff ~/chromium/src/out/Release/content_browsertests content_browsertests---decompressed-go1.14.2 
# No diff

Then I switched to go1.15 and did the same:

$ ./cli -f ~/chromium/src/out/Release/content_browsertests
Compress: ~/chromium/src/out/Release/content_browsertests -> content_browsertests---compressed-go1.15

$ ./cli -f content_browsertests---compressed-go1.15 -d
Decompress: content_browsertests---compressed-go1.15 -> content_browsertests---decompressed-go1.15
panic: zlib: invalid checksum

goroutine 1 [running]:
main.main()
	.../main.go:105 +0x1f9

And there was a difference between the original input and the decompressed file.


A bit more update:

I used go1.15 to decompress the data produced by go1.14.2. This time the decompression went OK, and the decompressed output matched the original data. So it seems that at least the decompression part in go1.15 is good.

On the other hand, I also tried this vice versa -- using go.1.14.2 to decompress the data produced by go1.15. I still got the invalid checksum error, and the decompressed data didn't even match content_browsertests---decompressed-go1.15.

@dsnet
Copy link
Member

@dsnet dsnet commented Sep 16, 2020

I need more information to investigate this. The code snippets provided only do basic compression/decompression. I need real test data that exhibits the problem.

@k-ye
Copy link
Author

@k-ye k-ye commented Sep 17, 2020

Sent you the data link in an email.

@atetubou
Copy link
Contributor

@atetubou atetubou commented Sep 17, 2020

Does this issue not happen if you use randomly generated data having similar size?

@k-ye
Copy link
Author

@k-ye k-ye commented Sep 17, 2020

No it didn't happen with random data. I tested with /bin/dd if=/dev/random of=rand_bytes.dat bs=1M count=3000 to produce a 3GB file, and the script worked fine. Although to be fair, little compression can be achieved on random data (if the data are truly randomized).

@dgryski
Copy link
Contributor

@dgryski dgryski commented Sep 17, 2020

You might be able to shrink the test case down from 3GB with https://github.com/dgryski/go-ddmin.

@kokes
Copy link

@kokes kokes commented Sep 17, 2020

I was at first able to reproduce only a part of this. I downloaded this small dataset and appended it to a file 28 times until it was 1139520480 bytes.

I could then get a zlib file of 291858894 bytes in 1.14 and 291858627 bytes in 1.15. The first time the output changed was in 8d43307.

cmp: foo14 foo15 differ: char 275153560, line 1014680

BUT, zlib reader in both 1.14 and 1.15 can read each other's output, so there doesn't appear to be any corruption there, just a change in layout.


Then I tried it on a different file (~4.8 GB), sadly a proprietary dataset, and got a bit further. When decompressing the just-compressed file to calculate its checksum, I get a panic

$ go run cmp.go -f filename.csv
panic: zlib: invalid checksum

goroutine 1 [running]:
main.sumZFile(0xc00001a094, 0xc, 0xc0000141c0, 0x40)
	/Users/okokes/tmp/zlib/cmp.go:37 +0x29b
main.main()
	/Users/okokes/tmp/zlib/cmp.go:53 +0x1a5
exit status 2

there's a difference here: all14.gz all15.gz differ: char 199653755, line 768787

git bisect points to the same commit as before though.

It's a shame I can't share this file, not even privately, but I can try and run it again using a different Go build to see if it's fixed.

edited code to (de)compress
package main

import (
	"bufio"
	"compress/zlib"
	"crypto/sha256"
	"flag"
	"fmt"
	"io"
	"log"
	"os"
	"path/filepath"
)

func sumFile(filename string) string {
	f, err := os.Open(filename)
	if err != nil {
		panic(err)
	}
	hasher := sha256.New()
	if _, err := io.Copy(hasher, f); err != nil {
		panic(err)
	}
	return fmt.Sprintf("%x", hasher.Sum(nil))
}
func sumZFile(filename string) string {
	f, err := os.Open(filename)
	if err != nil {
		panic(err)
	}
	zr, err := zlib.NewReader(f)
	if err != nil {
		panic(err)
	}
	hasher := sha256.New()
	if _, err := io.Copy(hasher, zr); err != nil {
		panic(err)
	}
	return fmt.Sprintf("%x", hasher.Sum(nil))
}

func main() {
	var filename string
	flag.StringVar(&filename, "f", "", "filename")
	flag.Parse()

	outname, err := compress(filename)
	if err != nil {
		log.Fatal(err)
	}

	out := sumZFile(outname)
	inc := sumFile(filename)
	if inc != out {
		log.Fatal("mismatch")
	}
}

func compress(filename string) (string, error) {
	fi, err := os.Open(filename)
	if err != nil {
		return "", err
	}
	defer fi.Close()

	outname := filepath.Base(filename) + "-gzip"
	fo, err := os.Create(outname)
	if err != nil {
		return "", err
	}
	defer fo.Close()

	// fmt.Printf("%s -> %s\n", filename, outname)
	const outBufSize = 1024 * 1024
	foWr := bufio.NewWriterSize(fo, outBufSize)
	compressor, err := zlib.NewWriterLevel(foWr, zlib.BestSpeed)

	buf := make([]byte, outBufSize*3)
	if _, err := io.CopyBuffer(compressor, fi, buf); err != nil {
		compressor.Close()
		return "", err
	}
	// compressor needs to be closed first to flush the rest of the data
	// into the bufio.Writer
	if err := compressor.Close(); err != nil {
		return "", err
	}
	if err := foWr.Flush(); err != nil {
		return "", err
	}
	return outname, nil
}
bisection code
package main

import (
	"log"
	"os/exec"
)

func main() {
	build := exec.Command("/Users/okokes/git/go/src/make.bash")
	build.Dir = "/Users/okokes/git/go/src"

	err := build.Run()

	if err != nil {
		log.Fatal("toolchain build failed", err)
	}

	cmd := exec.Command("/Users/okokes/git/go/bin/go", "run", "cmp.go", "-f", "all.csv")
	cmd.Dir = "/Users/okokes/git/go/src/"
	err = cmd.Run()
	if err != nil {
		log.Fatal("program build failed", err)
	}
}
@dsnet
Copy link
Member

@dsnet dsnet commented Sep 17, 2020

@kokes, difference in output in itself is not particularly interesting since the compress/flate library provides no guarantees about output stability. However, producing corrupted output is something we take seriously.

Thank you @k-ye. I am able to reproduce which does indeed indicate that it's producing corrupted output. I'm bisecting the root cause now (may very well be the CLs related to #34121 that you pointed to earlier).

@dsnet dsnet self-assigned this Sep 17, 2020
@dsnet dsnet added this to the Go1.15.3 milestone Sep 17, 2020
@dsnet
Copy link
Member

@dsnet dsnet commented Sep 17, 2020

Adding to Go1.15.3 milestone since this is a regression relative to Go1.14 and a serious one.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 17, 2020

@dsnet, requesting a backport adds it to 1.15.x milestone. This issue should target 1.16.

@dsnet
Copy link
Member

@dsnet dsnet commented Sep 17, 2020

Thanks @networkimprov. I've been a bit distant from Go toolchain development for a while, so I've gotten behind on the latest in processes.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 17, 2020

@gopherbot please open a backport issue for Go 1.15.

The compress/flate package can produce corrupt output. There is no workaround.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 17, 2020

Backport issue(s) opened: #41463 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@dsnet
Copy link
Member

@dsnet dsnet commented Sep 17, 2020

@dsnet dsnet changed the title compress/zlib: compression behavior change since go1.15? compress/flate: deflatefast produces corrupted output Sep 17, 2020
@klauspost
Copy link
Contributor

@klauspost klauspost commented Sep 17, 2020

@dsnet Anyway you can share the reproducer? My github name @gmail.com

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Sep 17, 2020

Not sure whether this is helpful, but it seems setting bufferReset = maxMatchOffset<<4 - maxStoreBlockSize*2 makes tests fail. I would think that changing that constant shouldn't make tests fail.

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Sep 17, 2020

@klauspost should lines https://github.com/golang/go/blob/master/src/compress/flate/deflatefast.go#L296 read:

	for i := range e.table[:] {
		v := e.table[i].offset - e.cur + maxMatchOffset
		if v < 0 {
			e.table[i] = tableEntry{}
			continue
		}
		e.table[i].offset = v
	}

At least this seems to fix the tests for the smaller bufferReset value.

@dgryski
Copy link
Contributor

@dgryski dgryski commented Sep 17, 2020

@egonelbre This also fixes my reproducer. (A 3.6GB file of concatenated Go source files from my $GOPATH).

@kokes
Copy link

@kokes kokes commented Sep 17, 2020

@egonelbre fixes my reproducer as well

@freeformz
Copy link
Contributor

@freeformz freeformz commented Sep 17, 2020

If I am reading the code right it looks to me like any input over 2,147,352,577 bytes (bufferReset) could be corrupted, and it's more likely for corruption to occur at every multiple of 32,768 bytes (maxMatchOffset) over that. Is that accurate ?

@k-ye
Copy link
Author

@k-ye k-ye commented Sep 18, 2020

@egonelbre also fixed my reproducer as well 🎉

@klauspost
Copy link
Contributor

@klauspost klauspost commented Sep 18, 2020

@egonelbre No, that cannot be correct. The value may not matter, that will only fix 16383 out of 16384 cases, but will still result in a false hit when the hash(0)&16383 matches the index.

The offset must be enough to invalidate it completely. I will see if I can figure out what is causing the false match.

@klauspost
Copy link
Contributor

@klauspost klauspost commented Sep 18, 2020

My own branch has a defensive if offset < maxMatchOffset to reject long offset matches, whereas this has if offset > maxMatchOffset, which is technically correct, but since we only bump with maxMatchOffset a match at offset 0 would still be valid.

So we will need to bump offset by an extra one in this case. Let me cook up a PR.

klauspost added a commit to klauspost/go that referenced this issue Sep 18, 2020
Since matches are allowed to be up to and including at maxMatchOffset
we must offset the buffer by an additional element to prevent the first
4 bytes to match after an out-of-reach value after shiftOffsets has
been called.

We offset by `maxMatchOffset + 1` so offset 0 in the table will now
fail the `if offset > maxMatchOffset` in all cases.

Fixes golang#41420

Change-Id: I2f45faee4ef75a8b072af5e28d7e18b973f4bdac
klauspost added a commit to klauspost/go that referenced this issue Sep 18, 2020
Since matches are allowed to be up to and including at maxMatchOffset
we must offset the buffer by an additional element to prevent the first
4 bytes to match after an out-of-reach value after shiftOffsets has
been called.

We offset by `maxMatchOffset + 1` so offset 0 in the table will now
fail the `if offset > maxMatchOffset` in all cases.

Fixes golang#41420

Change-Id: Id6b054d9fd8adf2440d10490cb94962edac0bb02
@klauspost
Copy link
Contributor

@klauspost klauspost commented Sep 18, 2020

If someone has reproducers, please throw everything you have at #41477

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2020

Change https://golang.org/cl/255879 mentions this issue: compress/flate: fix corrupted output

@kokes
Copy link

@kokes kokes commented Sep 18, 2020

@klauspost fixes mine from #41420 (comment)

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Sep 18, 2020

Does anyone have a test case that can be included within the standard repo?

It sounds like testing this requires a large (multi GB) input, which presumably also takes considerable CPU time to deflate/reflate. So if the test needs to be disabled by default that's fine. The test input needs to either be synthesized locally or downloaded as-needed from the network though.

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Sep 21, 2020

If test would somehow change bufferReset it would mean a smaller reproducer could be used. Alternatively, maybe it's possible invoke reset or shiftOffset manually from tests?

Also, maybe there's some input that deflates/inflates relatively fast?

@klauspost
Copy link
Contributor

@klauspost klauspost commented Sep 21, 2020

Was AFK and preoccupied.

Alternatively, maybe it's possible invoke reset or shiftOffset manually from tests?

Yeah. Probably a reasonable thing (shiftOffsets, reset clears the history). I will see if I can add a reasonable test with this.

Also, maybe there's some input that deflates/inflates relatively fast?

Zeroes. Should also be doable. Encode a MB of semi-compressible content. A bit more than 2GB zeroes and re-add some semi-compressible content that should find matches in the first part if broken.

However, decompressing 2GB will probably be by far the slowest. I could see that part taking several seconds, even with this.

@klauspost
Copy link
Contributor

@klauspost klauspost commented Sep 21, 2020

PR/CL updated with a test that catches the regression and sanity tests a few more things.

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Oct 8, 2020

Friendly ping as this issue has the Soon label.

@mdempsky Are you able to take another look at the updated CL?

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Oct 16, 2020

I commented on the CL. I think it's good, but I had a few questions to double-check that I understand the code correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

You can’t perform that action at this time.