-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
What version of Go are you using (go version
)?
go version go1.12.5 darwin/amd64
Does this issue reproduce with the latest release?
Yes
What did you do?
Read a tar file generated by git archive
:
$ git clone https://github.com/SSW-SCIENTIFIC/NNDD.git
$ cd NNDD
$ git archive --format tar c21b98da2ca7f007230e696b2eda5da6589fe137 > nndd.tar
Alternatively you can fetch this archive directly from github: https://github.com/SSW-SCIENTIFIC/NNDD/tarball/c21b98da2ca7f007230e696b2eda5da6589fe137
If you try to read this with golang's archive/tar
you will get a tar.ErrHeader
on one of the entries. Here is a reproduction program:
package main
import (
"archive/tar"
"compress/gzip"
"fmt"
"io"
"log"
"net/http"
"os"
)
func main() {
// Read tar from file if specified, otherwise fetch directly from github
var r io.Reader
if len(os.Args) > 1 {
f, err := os.Open(os.Args[1])
if err != nil {
log.Fatal(err)
}
defer f.Close()
r = f
} else {
resp, err := http.Get("https://github.com/SSW-SCIENTIFIC/NNDD/tarball/c21b98da2ca7f007230e696b2eda5da6589fe137")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
gr, err := gzip.NewReader(resp.Body)
if err != nil {
log.Fatal(err)
}
defer gr.Close()
r = gr
}
tr := tar.NewReader(r)
for {
_, err := tr.Next()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
}
fmt.Println("Successfully read tar")
}
If you try read the same file with bsdtar or gnutar they both read the whole tarball. bsdtar (depending on version) will log a warning to stderr but still extract the rest of the tarball.
The archive contains a symbolic link which instead of a path for value, contains a large value which includes null. The validatePAXRecord
explicitly disallows null from appearing in the value for PAX values for links. This leads to tar.Reader.Next()
returning tar.ErrHeader
.
https://github.com/golang/go/blob/go1.12.5/src/archive/tar/strconv.go#L321-L322
I reported this to the git mailing list: https://public-inbox.org/git/CAMVcy0Q0TL6uEGR2NeudJrOiXdQ87XcducL0EwMidWucjk5XYw@mail.gmail.com/T/#u Read that issue for more details on the "bad" entry in the archive.
The opinion there seems to be that generating values that aren't C-strings for links is valid. Not all tarballs are destined for filesystems. And the latest bsdtar and gnutar do not complain about the value. gnutar will write the symbolic link value upto the first null.
What did you expect to see?
I think any of the follow behaviours would be better:
- Be able to read the rest of the tar if encountering a recoverable error.
- Allow nulls in all PAX values.
- Truncate up to first null in PAX value. (consistent with gnutar)
What did you see instead?
tar.ErrHeader
on a tarball both bsdtar and gnutar can read.