According to RFC 1952, section 2.2:

A gzip file consists of a series of "members" (compressed data sets).

Thus, when a gzip member terminates, the parser should correctly check whether there is a header for another gzip member. An "invalid header" error seems to be the right error given this grammar.

@dsnet While the error is technically correct, I agree with @nikaiw that a having a distinct error would provide tremendous clarity. Something like invalid header in multistream gzip would make the cause much clearer, and make the solution much easier to find.

To quote my rationale from #67986¹:

When using the compress/gzip package to decompress gzip files, receiving a gzip: invalid header error can indicate two distinct possibilities.

The file is not a valid gzip file.

Example test case (click to expand)

import (
  "compress/gzip"
  "io"
  "os"
  "testing"
)

func TestNotAValidGzipFile(t *testing.T) {
  f, err := os.Open("/tmp/python-3.12.3-amd64.exe") // Arbitrary example, any non-gzip file will suffice.
  if err != nil {
      t.Fatal(err)
  }
  defer f.Close()

  rc, err := gzip.NewReader(f)
  if err != nil {
      t.Fatal(err)
  }
  defer rc.Close()

  _, err = io.ReadAll(rc)
  if err != nil {
      t.Fatal(err)
  }
}

// === RUN   TestNotAValidGzipFile
//  gzip_test.go:19: gzip: invalid header

The file is a valid gzip file, but contains invalid trailing data.

Example test case (click to expand)

func TestValidGzipFileWithTrailingData(t *testing.T) {
  // Reproducer file. There are many examples of this.
  // https://github.com/udacity/self-driving-car-sim/blob/4b1f739ebda9ed4920fe895ee3677bd4ccb79218/Assets/Standard%20Assets/Environment/SpeedTree/Conifer/Conifer_Desktop.spm
  f, err := os.Open("/tmp/Conifer_Desktop.spm")
  if err != nil {
    t.Fatal(err)
  }
  defer f.Close()
  
  rc, err := gzip.NewReader(f)
  if err != nil {
    t.Fatal(err)
  }
  defer rc.Close()
  
  _, err = io.ReadAll(rc)
  if err != nil {
    t.Fatal(err)
  }
}

// === RUN   TestValidGzipFileWithTrailingData
//  gzip_test.go:19: gzip: invalid header

Scenario 2 can be especially confusing because Go's implementation of compress/gzip rejects invalid trailing data, while many popular applications and languages do not. Hence, the ambiguity of this error can lead people to believe that Go is rejecting a valid gzip file.

GitHub's search failed me when looking for an existing issue. I'm guessing I should close that in favour of this? ↩

Footnotes

proposal: compress/gzip: return different error for trailing garbage #61797

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions