Skip to content

archive/tar: Does not handle extended pax values with nulls #32264

@keegancsmith

Description

@keegancsmith

What version of Go are you using (go version)?

go version go1.12.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

Read a tar file generated by git archive:

$ git clone https://github.com/SSW-SCIENTIFIC/NNDD.git
$ cd NNDD
$ git archive --format tar c21b98da2ca7f007230e696b2eda5da6589fe137 > nndd.tar

Alternatively you can fetch this archive directly from github: https://github.com/SSW-SCIENTIFIC/NNDD/tarball/c21b98da2ca7f007230e696b2eda5da6589fe137

If you try to read this with golang's archive/tar you will get a tar.ErrHeader on one of the entries. Here is a reproduction program:

package main

import (
	"archive/tar"
	"compress/gzip"
	"fmt"
	"io"
	"log"
	"net/http"
	"os"
)

func main() {
	// Read tar from file if specified, otherwise fetch directly from github
	var r io.Reader
	if len(os.Args) > 1 {
		f, err := os.Open(os.Args[1])
		if err != nil {
			log.Fatal(err)
		}
		defer f.Close()
		r = f
	} else {
		resp, err := http.Get("https://github.com/SSW-SCIENTIFIC/NNDD/tarball/c21b98da2ca7f007230e696b2eda5da6589fe137")
		if err != nil {
			log.Fatal(err)
		}
		defer resp.Body.Close()
		gr, err := gzip.NewReader(resp.Body)
		if err != nil {
			log.Fatal(err)
		}
		defer gr.Close()
		r = gr
	}

	tr := tar.NewReader(r)
	for {
		_, err := tr.Next()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
	}
	fmt.Println("Successfully read tar")
}

If you try read the same file with bsdtar or gnutar they both read the whole tarball. bsdtar (depending on version) will log a warning to stderr but still extract the rest of the tarball.

The archive contains a symbolic link which instead of a path for value, contains a large value which includes null. The validatePAXRecord explicitly disallows null from appearing in the value for PAX values for links. This leads to tar.Reader.Next() returning tar.ErrHeader.

https://github.com/golang/go/blob/go1.12.5/src/archive/tar/strconv.go#L321-L322

I reported this to the git mailing list: https://public-inbox.org/git/CAMVcy0Q0TL6uEGR2NeudJrOiXdQ87XcducL0EwMidWucjk5XYw@mail.gmail.com/T/#u Read that issue for more details on the "bad" entry in the archive.

The opinion there seems to be that generating values that aren't C-strings for links is valid. Not all tarballs are destined for filesystems. And the latest bsdtar and gnutar do not complain about the value. gnutar will write the symbolic link value upto the first null.

What did you expect to see?

I think any of the follow behaviours would be better:

  • Be able to read the rest of the tar if encountering a recoverable error.
  • Allow nulls in all PAX values.
  • Truncate up to first null in PAX value. (consistent with gnutar)

What did you see instead?

tar.ErrHeader on a tarball both bsdtar and gnutar can read.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions