Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: fs.WalkDir fails if archive contains a directory without a trailing slash #50390

Open
ericchiang opened this issue Dec 29, 2021 · 9 comments
Milestone

Comments

@ericchiang
Copy link
Contributor

@ericchiang ericchiang commented Dec 29, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/home/ericchiang/bin"
GOCACHE="/home/ericchiang/.cache/go-build"
GOENV="/home/ericchiang/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ericchiang/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ericchiang"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2858401830=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We had a report of a JAR file where fs.WalkDir hit an error while walking the archive google/log4jscanner#12

JAR can be found here: https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-api/3.141.59

Reproducer (https://go.dev/play/p/-yKYcEqgotT):

package main

import (
	"archive/zip"
	"bytes"
	"io/fs"
	"log"
	"time"
)

func main() {
	h := &zip.FileHeader{Name: "./foo", Modified: time.Now()}
	h.SetMode(fs.ModeDir | 0755)

	b := &bytes.Buffer{}
	zw := zip.NewWriter(b)
	if _, err := zw.CreateHeader(h); err != nil {
		log.Fatalf("create header: %v", err)
	}
	if err := zw.Close(); err != nil {
		log.Fatalf("closing zip writier: %v", err)
	}

	r := bytes.NewReader(b.Bytes())
	zr, err := zip.NewReader(r, r.Size())
	if err != nil {
		log.Fatalf("create new reader: %v", err)
	}

	// WalkDir loops forever.
	err = fs.WalkDir(zr, ".", func(path string, d fs.DirEntry, err error) error {
		return err
	})
	if err != nil {
		log.Fatalf("walk dir: %v", err)
	}
}

What did you expect to see?

No error

What did you see instead?

2009/11/10 23:00:00 walk dir: readdir foo: not implemented
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2021

In the Selenium jar you link to, the directory META-INF/versions/9 appears twice, first without a trailing slash and second with a trailing slash. I'm not sure why. That is what is causing the problem there. I think that your test case, which explicitly adds fs.ModeDir, may be slightly misleading; it may be a real problem but I don't think it's the problem with the Selenium jar.

@ericchiang
Copy link
Contributor Author

@ericchiang ericchiang commented Dec 30, 2021

Thanks for the analysis, Ian!

With this and other issues we've hit when using archive/zip with io/fs.WalkDir (#50179), I'd be happy to dig in, but wanted to make sure this is something that archive/zip wants to support. If these JARs are just invalid ZIPs, that works for me.

Based on your comment, it sounds like this may be valid, if peculiar.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2021

I don't know whether it is valid or not, but since it seems to happen it seems that we should handle it.

I've written a CL that ignores duplicate entries. But I'm not sure whether that is correct. Perhaps we should give an error instead.

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 30, 2021

Change https://golang.org/cl/374934 mentions this issue: archive/zip: handle duplicate entries in zip files for io/fs

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 30, 2021

Change https://golang.org/cl/374954 mentions this issue: archive/zip: error if using io/fs on zip with duplicate entries

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2021

Upon further thought I think the only safe approach is to reject an attempt to use the io/fs interfaces with an archive that has duplicate entries. That means that the Selenium jar will fail. Reporting an error seems clearly better to quietly hiding entries.

@ericchiang
Copy link
Contributor Author

@ericchiang ericchiang commented Dec 30, 2021

Works for me!

In hindsight, it's not obvious that we gain a lot by using fs.FS since we only support ZIP archives. Maybe a premature abstraction on our part?

I think I may try iterating over zip.Reader.File instead of using fs.WalkDir.

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Dec 31, 2021

Please note that the filenames can be a dominating factor in some workloads involving lots of files (e.g. a compressed version of a node_modules directory).

Would it be ok to reject one or the other syntax of filenames in JARs instead of running duplicate file detection?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 31, 2021

If someone chooses to use the io.FS interface with a zip archive, then they are required to resolve all file names in any case (see initFileList at https://go.dev/src/archive/zip/reader.go#L737 ). I would be surprised if the duplicate detection adds significant overhead to that step. People concerned about this overhead should not be using the io.FS interface at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants