Skip to content

compress/gzip: multistream reading fails on os.File #30230

@pbnjay

Description

@pbnjay

What version of Go are you using (go version)?

$ go version
go version go1.11.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jeremy/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jeremy"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/sn/3cktm0557lbc6wvwltjl9cwm0000gn/T/go-build634703979=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

While implementing a multi-stream gzip reader for a specific bioinformatics file format (BAM), (*gzip.Reader).Reset(*os.File) fails to reset the stream correctly. It appears to be due to read-ahead somewhere, as substituting os.File with a bytes.Buffer with the same contents works correctly.

I've attached a somewhat minimal example code and a truncated data file to give a test case showing the above (attached). The data files are typically in the multi-gigabyte range so streaming from disk is preferred to loading everything into memory.

bug_repro.zip

2019/02/14 10:31:39 ------- LOADING FROM BUFFERED FILE -------------
2019/02/14 10:31:39 Should end in +20059
2019/02/14 10:31:39 Ended at  20059
2019/02/14 10:31:39 Should end in +15247
2019/02/14 10:31:39 Ended at  35306
2019/02/14 10:31:39 Should end in +14446
2019/02/14 10:31:39 Ended at  49752
2019/02/14 10:31:39 Should end in +15601
2019/02/14 10:31:39 Ended at  65353
2019/02/14 10:31:39 EOF
2019/02/14 10:31:39 ------- LOADING FROM FILE DIRECTLY -------------
2019/02/14 10:31:39 Should end in +20059
2019/02/14 10:31:39 Ended at 20480
2019/02/14 10:31:39 gzip: invalid header

What did you expect to see?

Loading from a file should work the same as loading from a buffer.

What did you see instead?

The file is advanced too far, so the subsequent read fails to find the header of the next gzip chunk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions