Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: fs.WalkDir fails if archive contains a directory without a trailing slash #50390

Closed
ericchiang opened this issue Dec 29, 2021 · 9 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@ericchiang
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.17.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/home/ericchiang/bin"
GOCACHE="/home/ericchiang/.cache/go-build"
GOENV="/home/ericchiang/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/ericchiang/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ericchiang"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2858401830=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We had a report of a JAR file where fs.WalkDir hit an error while walking the archive google/log4jscanner#12

JAR can be found here: https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-api/3.141.59

Reproducer (https://go.dev/play/p/-yKYcEqgotT):

package main

import (
	"archive/zip"
	"bytes"
	"io/fs"
	"log"
	"time"
)

func main() {
	h := &zip.FileHeader{Name: "./foo", Modified: time.Now()}
	h.SetMode(fs.ModeDir | 0755)

	b := &bytes.Buffer{}
	zw := zip.NewWriter(b)
	if _, err := zw.CreateHeader(h); err != nil {
		log.Fatalf("create header: %v", err)
	}
	if err := zw.Close(); err != nil {
		log.Fatalf("closing zip writier: %v", err)
	}

	r := bytes.NewReader(b.Bytes())
	zr, err := zip.NewReader(r, r.Size())
	if err != nil {
		log.Fatalf("create new reader: %v", err)
	}

	// WalkDir loops forever.
	err = fs.WalkDir(zr, ".", func(path string, d fs.DirEntry, err error) error {
		return err
	})
	if err != nil {
		log.Fatalf("walk dir: %v", err)
	}
}

What did you expect to see?

No error

What did you see instead?

2009/11/10 23:00:00 walk dir: readdir foo: not implemented
@ianlancetaylor
Copy link
Contributor

In the Selenium jar you link to, the directory META-INF/versions/9 appears twice, first without a trailing slash and second with a trailing slash. I'm not sure why. That is what is causing the problem there. I think that your test case, which explicitly adds fs.ModeDir, may be slightly misleading; it may be a real problem but I don't think it's the problem with the Selenium jar.

@ericchiang
Copy link
Contributor Author

ericchiang commented Dec 30, 2021

Thanks for the analysis, Ian!

With this and other issues we've hit when using archive/zip with io/fs.WalkDir (#50179), I'd be happy to dig in, but wanted to make sure this is something that archive/zip wants to support. If these JARs are just invalid ZIPs, that works for me.

Based on your comment, it sounds like this may be valid, if peculiar.

@ianlancetaylor
Copy link
Contributor

I don't know whether it is valid or not, but since it seems to happen it seems that we should handle it.

I've written a CL that ignores duplicate entries. But I'm not sure whether that is correct. Perhaps we should give an error instead.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/374934 mentions this issue: archive/zip: handle duplicate entries in zip files for io/fs

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/374954 mentions this issue: archive/zip: error if using io/fs on zip with duplicate entries

@ianlancetaylor
Copy link
Contributor

Upon further thought I think the only safe approach is to reject an attempt to use the io/fs interfaces with an archive that has duplicate entries. That means that the Selenium jar will fail. Reporting an error seems clearly better to quietly hiding entries.

@ericchiang
Copy link
Contributor Author

Works for me!

In hindsight, it's not obvious that we gain a lot by using fs.FS since we only support ZIP archives. Maybe a premature abstraction on our part?

I think I may try iterating over zip.Reader.File instead of using fs.WalkDir.

@nightlyone
Copy link
Contributor

Please note that the filenames can be a dominating factor in some workloads involving lots of files (e.g. a compressed version of a node_modules directory).

Would it be ok to reject one or the other syntax of filenames in JARs instead of running duplicate file detection?

@ianlancetaylor
Copy link
Contributor

If someone chooses to use the io.FS interface with a zip archive, then they are required to resolve all file names in any case (see initFileList at https://go.dev/src/archive/zip/reader.go#L737 ). I would be surprised if the duplicate detection adds significant overhead to that step. People concerned about this overhead should not be using the io.FS interface at all.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 4, 2022
@cagedmantis cagedmantis added this to the Go1.19 milestone Jan 4, 2022
@cagedmantis cagedmantis added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 4, 2022
@golang golang locked and limited conversation to collaborators May 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

5 participants