Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(archive.go): repeatable shasum on archive #1537

Merged
merged 3 commits into from
Apr 27, 2021

Conversation

vdice
Copy link
Member

@vdice vdice commented Apr 8, 2021

What does this change

  • Adds logic to ensure identical archive shasums are produced, provided the same bundle is used

Uses the approach of zero-ing out the timestamps for all archive files to achieve this. As seen in https://unix.stackexchange.com/questions/346789/compressing-two-identical-folders-give-different-result and suggested by @sebumd in the original issue #1526

What issue does it fix

Closes #1526

Notes for the reviewer

  • The original timestamps for all files in an expanded archive will no longer be preserved
  • For testing, I updated the archive integration test to not use a randomized bundle name and reference, so that we get a deterministic archive hash for matching on

Checklist

  • Unit Integration Tests
  • Documentation
  • Schema (porter.yaml)

If this is your first pull request, please add your name to the bottom of our Contributors list. Thank you for making Porter better! 🙇‍♀️

@vdice
Copy link
Member Author

vdice commented Apr 8, 2021

Still trying to figure out why the archive shasum is still non-deterministic in the integration test on CI... thoughts @carolynvs ? Repeated runs locally pass... 🤔

@carolynvs
Copy link
Member

Sorry been focused 100% on the infra migration from this week and haven't had time yet to consider this problem. I won't get a chance to look at this until Monday.

Signed-off-by: Vaughn Dice <vadice@microsoft.com>
Signed-off-by: Vaughn Dice <vadice@microsoft.com>
@carolynvs
Copy link
Member

The error reproduces on my machine! TO THE DEBUGGER! 💨

@carolynvs
Copy link
Member

carolynvs commented Apr 15, 2021

I am still looking into this. After your patch the diff between the files is MUCH closer but drifts towards the end of the file.

@carolynvs
Copy link
Member

I just pushed an extra commit that uses github.com/mholt/archiver to handle creating the file. Locally that fixed the problem for me. Let's see what happens on CI.

A couple thoughts after I have worked on this:

  1. I am reading that tgz will still have differences depending on the os it was run on. We may want to move to a different compression format, such as zip or maybe? bzip2 which doesn't have as much volatile stuff in the headers.
  2. I switched to archiver because I wanted to eliminate "we just screwed up writing the archive code". We can either keep it (because it's easy to use) or figure out the difference and implement it ourselves again. Note that the current logic in my fix can write using any compression format it goes off of whatever file extension they specified with porter archive mybuns.EXT. So if we want to keep this fix, we should decide if we care about the format, which format is reproducible and possibly add validation up-front on the extension.

@carolynvs
Copy link
Member

Also I want to update the archive test to run archive twice and compare the hash. So that we don't get false positives on a dev machine.

@scottbuckel
Copy link

  1. I am reading that tgz will still have differences depending on the os it was run on. We may want to move to a different compression format, such as zip or maybe? bzip2 which doesn't have as much volatile stuff in the headers.

That probably explains why docker save defaults to .tar files I'm guessing? I bet .tar doesn't have the same behavior.

@carolynvs
Copy link
Member

I'll give just plain .tar files a go too!

Only check that the hash of the archive is repeatable. It will not be
the same across operating systems.

Signed-off-by: Carolyn Van Slyck <me@carolynvanslyck.com>
@carolynvs
Copy link
Member

@vdice I checked docker save and also get different hashes on different computers. But on the same computer the hash is stable. I've updated the test to only check if the hash is the same between calls to porter archive, but doesn't assert that it matches some canonical hash (because there doesn't seem to be one).

Copy link
Member

@carolynvs carolynvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If we get more clever in the future, we can always revisit! 😀

@carolynvs carolynvs merged commit 23099eb into getporter:main Apr 27, 2021
@carolynvs carolynvs added this to In Progress in Porter and Mixins via automation Apr 27, 2021
@carolynvs carolynvs moved this from In Progress to Done in Porter and Mixins Apr 27, 2021
@vdice vdice deleted the fix/archive-shasum branch April 27, 2021 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Porter archive --reference returns different filesizes
3 participants