Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: timestamp of files in module archive unset #34953

Closed
klingtnet opened this issue Oct 17, 2019 · 14 comments
Closed

cmd/go: timestamp of files in module archive unset #34953

klingtnet opened this issue Oct 17, 2019 · 14 comments

Comments

@klingtnet
Copy link

@klingtnet klingtnet commented Oct 17, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.1 linux/amd64

Does this issue reproduce with the latest release?

Probably.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/alinz/.cache/go-build"
GOENV="/home/alinz/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/alinz/.go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/alinz/code/gogitlabproxy/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build916870041=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am writing a custom module proxy at the moment and while testing equality of the zipped module with the output of proxy.golang.org I noticed that the official proxy does not set the timestamp for the files in the archive, i.e. they are all dated Nov 30 1979.
Is this the expected behavior?

What did you expect to see?

The actual file timestamps, i.e. the files in the archive should have the same timestamp as their source in the git repository.

What did you see instead?

$ curl -s https://proxy.golang.org/github.com/pkg/errors/@v/v0.8.1.zip | bsdtar -ztvf-
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/.gitignore
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/.travis.yml
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/LICENSE
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/README.md
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/appveyor.yml
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/bench_test.go
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/errors.go
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/errors_test.go
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/example_test.go
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/format_test.go
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/stack.go
-rw-rw-r--  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/stack_test.go
@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Oct 17, 2019

Is this the expected behavior?

This is the behavior of go mod download:

$ cd $(mktemp -d)
$ export GOPATH=$(pwd)
$ export GO111MODULE=on
$ export GOPROXY=direct
$ go mod download github.com/pkg/errors@v0.8.1
go: finding github.com/pkg/errors v0.8.1
$ cat $GOPATH/pkg/mod/cache/download/github.com/pkg/errors/\@v/v0.8.1.zip | bsdtar -ztvf-
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/.gitignore
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/.travis.yml
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/LICENSE
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/README.md
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/appveyor.yml
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/bench_test.go
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/errors.go
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/errors_test.go
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/example_test.go
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/format_test.go
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/stack.go
-rwxrwxrwx  0 0      0           0 Nov 30  1979 github.com/pkg/errors@v0.8.1/stack_test.go
@dmitshur dmitshur changed the title proxy.golang.org: timestamp of files in module archive unset cmd/go: timestamp of files in module archive unset Oct 17, 2019
@klingtnet

This comment has been minimized.

Copy link
Author

@klingtnet klingtnet commented Oct 18, 2019

I took a quick look in /archive/zip and it looks like 1979-11-30 is the Zero MSDOS date:

https://play.golang.org/p/HOhsuCVXEuP

https://github.com/golang/go/blob/master/src/archive/zip/struct.go#L128

@klingtnet

This comment has been minimized.

Copy link
Author

@klingtnet klingtnet commented Oct 18, 2019

Another small thing I found was that all files in the module from proxy.golang.org have 0664 permissions.

@heschik

This comment has been minimized.

Copy link
Contributor

@heschik heschik commented Oct 18, 2019

The go mod download command is the only supported mechanism for producing module zips. There are a lot of subtleties involved, and I'm not sure anyone could give you a full list without some research. All of the things you've listed so far are deliberate as far as I know.

I believe @jayconrod has an open bug somewhere to consider making that code available as a library. Other than that, I'm not sure what there is to do here.

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Oct 18, 2019

CL 202042 is a draft CL that moves module zip handling code to x/mod. It documents the restrictions on zip files that the Go command accepts and provides a way to build and extract zip files without go mod download.

We don't currently save any metadata other than file names, and we don't set file modes or dates when we extract files into the module cache. I'm not sure there's any concrete reason for this. Storing and setting timestamps and executability would not change module hashes, which only cover file names and content.

It would change the SHA-256 hash of generated zip files. We haven't promised those are stable, but changing them would probably surprise some people. I would like golang.org/x/mod/zip to produce files with the same hash as cmd/go/internal/modfetch as long as there are two implementations.

@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Oct 18, 2019

We haven't promised those are stable, but changing them would probably surprise some people.

I think we should actively document that they're not guaranteed to be stable, so that people don't begin to expect them to be stable. Otherwise, we'd be locking in the exact zip file format encoding, compression level, etc.

I can't find it written down right now unfortunately, but my understanding of the decision to include only the file names and file contents in the hashes have been to prevent this problem. The only reference I'm able to find now is the documentation of dirhash.HashZip, which says:

HashZip returns the hash of the file content in the named zip file. Only the file names and their contents are included in the hash: the exact zip file format encoding, compression method, per-file modification times, and other metadata are ignored.

@klingtnet

This comment has been minimized.

Copy link
Author

@klingtnet klingtnet commented Oct 19, 2019

Storing and setting timestamps and executability would not change module hashes, which only cover file names and content.

@jayconrod This is exactly what it wanted to ask, thanks!

@klingtnet

This comment has been minimized.

Copy link
Author

@klingtnet klingtnet commented Oct 19, 2019

In my module proxy I create the zip file on the fly from a tar.gz stream that I receive from the Gitlab API in order to prevent downloading the full repository snapshot first. I think this use case would not be applicable with the API of https://go-review.googlesource.com/c/mod/+/202042/2/zip/zip.go . It would be really nice if the zip could be built by incrementally adding files, one after another.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 21, 2019

we don't set file modes or dates when we extract files into the module cache.

#34965 is closely related. (We should probably preserve the execute bit, at least.)

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Oct 21, 2019

It would change the SHA-256 hash of generated zip files. We haven't promised those are stable, but changing them would probably surprise some people.

There is a reason the hash in the go.sum file is on the file contents and not the zipfile itself. 😉

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Oct 21, 2019

@klingtnet

In my module proxy I create the zip file on the fly from a tar.gz stream that I receive from the Gitlab API in order to prevent downloading the full repository snapshot first. I think this use case would not be applicable with the API of https://go-review.googlesource.com/c/mod/+/202042/2/zip/zip.go . It would be really nice if the zip could be built by incrementally adding files, one after another.

In this CL, zip.Create excludes files that are part of a submodule (files that are in a directory that contains a go.mod file, other than the root). If we provide an API that accepts a stream of files, it would be difficult to exclude those files since all the go.mod files might be at the end of the stream. We could either 1) read the whole stream internally before adding files to the zip or 2) remove this logic and require the caller to exclude those files (adding complexity and implicitly requiring the caller to read the whole stream).

Either way, the whole stream needs to be read before we can start adding files to the zip. Given that, I think it's better to make the cost clear to the caller and accept a list of files, rather than a stream.

@klingtnet

This comment has been minimized.

Copy link
Author

@klingtnet klingtnet commented Oct 22, 2019

In this CL, zip.Create excludes files that are part of a submodule (files that are in a directory that contains a go.mod file, other than the root)

Ok, that's a valid point and I haven't thought about filtering submodules, yet.
Is there documentation about what files/folders are excluded from a module, e.g. /vendor should be ignored as well?

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 24, 2019

Repeating my comment from #34965 (comment):

We defined the Go module representation to be absolutely least common denominator across systems for portability. The Go module zip file is for dependencies that need to be downloaded and built by the go command. It is not meant to support anything more. By design, there are no permission bits, nor special files like symlinks, nor modification times. I still believe this is the right decision. We can go arbitrarily far in "preserving" file metadata, and the result would still miss some kinds of metadata and fail to recreate other metadata on some subset of systems (symlinks, odd file modification times, etc).

Tests that need special files (anything beyond files containing bytes) can write them to a temporary directory and configure them as they wish.

Install scripts can be invoked as bash ./script (or cat script | bash) instead of ./script.

I don't believe this decision was wrong. But even if it were, it is a fundamental design decision made years ago that would be very hard to change now - it would invalidate all the checksums (if this extra information is important enough to preserve it would have to be checksummed too). It would have to be very wrong to be worth the pain of correcting.

@jayconrod

This comment has been minimized.

Copy link
Contributor

@jayconrod jayconrod commented Oct 24, 2019

Closing as working as intended. I'll include this in the module reference documentation and the documentation for golang.org/x/mod/zip when it's ready.

@jayconrod jayconrod closed this Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.