New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: compress/gzip, archive/tar: randomize output some of the time #26378

Closed
neild opened this Issue Jul 13, 2018 · 4 comments

Comments

Projects
None yet
6 participants
@neild
Copy link
Contributor

neild commented Jul 13, 2018

Every release cycle, we run all of our tests with the upcoming release. And every release cycle, it seems, we discover several new places that expect the output of compress/gzip and archive/tar to be stable for all time.

Perhaps there's some way to introduce deliberate randomness into this output, along the lines of map iteration order randomization and https://go-review.googlesource.com/c/go/+/64451.

This is tricky, however, because it's probably reasonable to depend on the output of these packages to be consistent within any given binary. Perhaps a build stamp could be an input to the randomizer, or randomization could be only in tests, or both.

@gopherbot gopherbot added this to the Proposal milestone Jul 13, 2018

@gopherbot gopherbot added the Proposal label Jul 13, 2018

@dsnet

This comment has been minimized.

Copy link
Member

dsnet commented Jul 13, 2018

I would expand this to all compress, archive, and most encoding packages.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jul 16, 2018

@dsnet, we definitely don't want archive/zip and archive/tar to start generating random output. We're trying in another bug to have reproducible builds for Go. And what would it mean for encoding? Randomized orders? Another bug is proposing we start sorting more in fmt, for instance.

I think this would cause more pain than it'd solve.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Jul 23, 2018

We used to do this, by putting time stamps in the output, and we took them out precisely to get repeatable output. I don't see why we would make it non-repeatable again. Like you say, it's entirely reasonable to depend on the output of these packages to be consistent within any given binary.

We do agree it can change from release to release. I don't see a nice way to rub that in everyone's faces though. Note also that we decided against #13884 for pretty much the same reasons.

@rsc rsc closed this Jul 23, 2018

@cyphar

This comment has been minimized.

Copy link

cyphar commented Jan 20, 2019

And it should be noted that most container image formats are based on Go's archive/tar. I'm trying to move everyone away from it, because it's an awful format for that usecase, but to randomise it would be to intentionally break layer caching and reproducibility for most container image formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment