Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github generated tarballs change randomly #10800

Closed
qbit opened this issue Mar 7, 2018 · 28 comments
Closed

Github generated tarballs change randomly #10800

qbit opened this issue Mar 7, 2018 · 28 comments
Labels

Comments

@qbit
Copy link
Contributor

qbit commented Mar 7, 2018

Hi,

As the title states, github generated tarballs can change at any point. This creates issues for packaging on systems like OpenBSD ports. Would it be possible to get official tarballs created / attached to releases? Maybe even signed :D?

@thinrope
Copy link

thinrope commented Mar 21, 2018

You mean they can be changed by the developers re-publishing them? Or something else?
In my understanding, a zip/tar.gz is tied to a tag which normally does not change (committers can change it, but that is a bad practice).
Keeping external hash of the file is the usual way (at least on Gentoo); it gets verified on each use of the (cached or not) file.

@qbit
Copy link
Contributor Author

qbit commented Mar 21, 2018

They can change at any point regardless of developers changing a tag. We have seen it multiple times during the 6 month release cycle of OpenBSD. It was also confirmed by GitHub staff. The tarballs are generated and cached, once the cash is invalidated, a new tarball is generated. This new tarball will have a different checksum.

@thinrope
Copy link

OK, that was something new to me, thank you!
I had a bit of digging and although I couldn't get to the bottom of it, it seems that tar.gz files were (are?) generated by tar something |gzip - >file.tar.gz that saves the file name and timestamp. This can be (is?) fixed by using tar something |gzip -n >file.tar.gz ...
I contacted GitHub support, asking for clarification.

So attaching tarballs to a release as point 7 in https://help.github.com/articles/creating-releases/ would help that problem, assuming it is still a problem.

@qbit
Copy link
Contributor Author

qbit commented Mar 22, 2018

@thinrope
Copy link

OK, so indeed this is the case :-(
Here is what I got from support@github.com:

Currently, we don't make any guarantees about the byte-for-byte equivalence of any tarball which is generated on the fly.

If a team wants to produce a stable tarball, they will have to create it themselves and put it as a download in the releases.

We realize that this approach can be confusing since we put links to the on-the-fly tarballs right where the user-provided ones would exist. Our team is aware of this and will keep it in mind for future iterations of the feature, though we can't make any promises of specific changes.

So, I guess the only way is to attach explicit tarballs, or in other words have some more structured/scripted release procedure (e.g. publish hashes, etc.)

@qbit Thanks for raising that, I'll need to shout in other projects as well :-|

@eli-schwartz
Copy link
Contributor

I was under the impression that in practice github archives are generated by git-archive(1) which is deterministic.

$ git config --get alias.github-release
!f() { local repo=$(basename "$(pwd)") tag=$1; git archive --prefix=${repo}-${tag#v}/ -o ${repo}-${tag#v}.tar.gz ${tag}; }; f

This is what I use to generate release tarballs for locally signing with PGP, then uploading just the signature to Github releases. It's similar to Debian's recommendation here: https://wiki.debian.org/Creating%20signed%20GitHub%20releases#Creating_GnuPG-signed_releases_on_GitHub_-_alternative_local_workflow

@qbit
Copy link
Contributor Author

qbit commented Mar 30, 2018

Here is a blurb from a GitHub'er:

These are not planned changes but rather they come about from updating
the software involved in creating them. The main purpose of the auto-
generated archives are for someone to download the source from the
website if they don't want to bother with downloading the repository.

It is not meant to be reliable or a way to distribute software releases
and nothing in the software stack is made to try to produce consistent
archives. This is no different from creating a tarball locally and
trying verify it with the hash of the tarball someone created on their
own machine.

The only way to get a known-good checksum for a tarball is to have
upstream (or the packagers) prepare the release and upload the tarball
alongside its checksum. This is true regardless of GitHub. There is a
feature on the site where maintainers can upload their own assets for a
release though clearly not too many people actually use it.

@heronhaye
Copy link
Contributor

To summarize, what you need on our end is a .tar.gz and tar.gz.sig under "Assets" on https://github.com/keybase/client/releases containing the client source code, for each release? It'd be signed with our code signing key.

@qbit
Copy link
Contributor Author

qbit commented Aug 28, 2019

Yep!

@eli-schwartz
Copy link
Contributor

Yes, that would be much appreciated.

(Note that I'm not explicitly attached to gzip compression, xz is pretty common too in some source distribution channels. e.g. kernel.org prefers xz in order to reduce download sizes, saving bandwidth for them and for the users. By now I think xz support is basically everywhere that wants it.)

@qbit
Copy link
Contributor Author

qbit commented Aug 29, 2019

In other projects (snap for example) I just download the auto-gen'd tar.gz, sign and re-upload.

@eli-schwartz
Copy link
Contributor

So what you're saying is the signatures on your software are meaningless? No thanks!

@qbit
Copy link
Contributor Author

qbit commented Aug 29, 2019

So what you're saying is the signatures on your software are meaningless?

It's exactly the same as if I had generated the tarball locally.. the re-attached tarball (signed) never changes once it's reattached - as it's an asset at that point.

@eli-schwartz
Copy link
Contributor

The point is that you are YOLO trusting GitHub to produce a tarball, and blindly attaching your cryptographic signing certification to it. It is small comfort that -- probably -- at least only GitHub (or someone who hacked GitHub) can attack you, because https will prevent anyone who isn't a malicious SSL certificate company or abusing an incompetent one, from intercepting the tarball in transit while you download it.

Given how utterly trivially easy it is to generate the tarball yourself on your local machine, I'm mind-boggled that you feel the need to offload that work to GitHub.com -- is your dev machine simply not powerful enough to run the git-archive(1) command?

@qbit
Copy link
Contributor Author

qbit commented Aug 29, 2019

I don't appreciate your attitude.

To begin with, I sign stuff locally with signify. That signature is in the repo and can be verified from the tarball, git clone.. or what ever. At that point, it doesn't matter who generates the tarball - it can be verified.

Also, this is completely off topic. If you have issues with the way I do things you can open an issue on the tracker for any of my code.

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Aug 29, 2019

I'm perfectly content to not use your code in the first place. Please don't tell the developers of the code I do use, to make use of your lax security protocols. Especially given that keybase is supposed to be security software.

To be clear, it doesn't matter if you securely add real signatures checked into git. The signatures you attach to github assets are still worthless and might as well not exist (because all they verify is that github said "here's a tarball of something, we pinky swear that it's your code").

Since keybase does not add signify signatures checked into git for every file in the repository, blindly signing a tarball that github has generated (without verifying its contents) does NOT add ANY security to the github assets that are being downloaded.

It is trivially, trivially easy to run git-archive yourself in order to generate meaningful tarball+signatures. This is independent of whether you have an additional layer of files+signatures that can be separately verified after extracting the (unverified) tarball.

@qbit
Copy link
Contributor Author

qbit commented Aug 29, 2019

Please don't tell the developers of the code I do use, to make use of your lax security protocols.

Sure, I concede going this route isn't a good idea. My intent with the example was to show that it doesn't have to be a complicated workflow (given how trivial this is.. and how long this issue has been open.. I assume there is something complicating the process.)

@heronhaye
Copy link
Contributor

I just added a .tar.xz and .tar.xz.sig for the last release, 4.3.2. https://github.com/keybase/client/releases/tag/v4.3.2.

Is this what is expected? Any changes needed?

@qbit
Copy link
Contributor Author

qbit commented Sep 3, 2019

Indeed it is! No changes needed from my end! Thanks!

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Sep 3, 2019

One minor nitpick: the name of the tarball is "client-v4.3.2", which is a bit ambiguous, so I have to rename the tarball to "keybase-v4.3.2" in order to store source archives for many software packages together. (My distro tooling has a global source download cache.)

But mostly, thanks!

@heronhaye
Copy link
Contributor

Okay, just added "keybase-v4.4.0.tar.xz" to today's release, and it'll be named like that in the future.

sgn added a commit to sgn/void-packages that referenced this issue Jul 8, 2020
GitHub tarballs can be changed at anytime, which renders GitHub
auto-generated tarball invalid at sometime [1]

Use keybase manual generated tarball that is verified with their code
signing key [2]

[1] keybase/client#10800
[2] https://keybase.io/docs/server_security/code_signing_key.asc
Vaelatern pushed a commit to void-linux/void-packages that referenced this issue Jul 13, 2020
GitHub tarballs can be changed at anytime, which renders GitHub
auto-generated tarball invalid at sometime [1]

Use keybase manual generated tarball that is verified with their code
signing key [2]

[1] keybase/client#10800
[2] https://keybase.io/docs/server_security/code_signing_key.asc
atweiden pushed a commit to atweiden/voidpkgs that referenced this issue Jul 14, 2020
GitHub tarballs can be changed at anytime, which renders GitHub
auto-generated tarball invalid at sometime [1]

Use keybase manual generated tarball that is verified with their code
signing key [2]

[1] keybase/client#10800
[2] https://keybase.io/docs/server_security/code_signing_key.asc

void-linux/void-packages@5516bc2
BouncyMaster pushed a commit to BouncyMaster/void-packages that referenced this issue Jul 16, 2020
GitHub tarballs can be changed at anytime, which renders GitHub
auto-generated tarball invalid at sometime [1]

Use keybase manual generated tarball that is verified with their code
signing key [2]

[1] keybase/client#10800
[2] https://keybase.io/docs/server_security/code_signing_key.asc
@bmwiedemann
Copy link

For the record; | gzip is reproducible since 2018

https://reproducible-builds.org/docs/archives/ has a "Full example" tar command line that should produce bit-identical results anywhere. git archive seems to do a pretty good job as well.

@alexeagle
Copy link

Hello, we had the same issue with Bazel in bazel-contrib/SIG-rules-authors#11 - @fmeum reached out to GitHub and they confirmed that the SHA for certain URLs is guaranteed to remain stable:

bazel-contrib/SIG-rules-authors#11 (comment)

I checked with our team and they confirmed that we can expect the checksums for repository release archives, found at /archive/refs/tags/$tag, to be stable going forward. That cannot be said, however, for repository code download archives found at archive/v6.0.4.
It's totally understandable that users have come to expect a stable and consistent checksum value for these archives, which would be the case most of the time. However, it is not meant to be reliable or a way to distribute software releases and nothing in the software stack is made to try to produce consistent archives. This is no different from creating a tarball locally and trying verify it with the hash of the tarball someone created on their own machine.
If you had only a tag with no associated release, you should still expect to have a consistent checksum for the archives at /archive/refs/tags/$tag.

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Jan 13, 2023

Both:

  • https://github.com/{repo-slug}/archive/{tag}.tar.gz
  • https://github.com/{repo-slug}/archive/refs/tags/{tag}.tar.gz

are usually a redirect to the same place: https://codeload.github.com/{repo-slug}/tar.gz/refs/tags/{tag}

Github introduced the refs/tags/ version "recently" (well, it's been around for a while at this point) and defaults this as the url for the autogenerated "source code" downloads in the tags / releases page. I suspect that they added this to distinguish between branches and tags which are otherwise ambiguous.

Anyways it is good to have a promised confirmation of that guarantee, I guess. :)

copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505765523
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505765523
copybara-service bot pushed a commit to protocolbuffers/upb that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505772735
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505765523
copybara-service bot pushed a commit to protocolbuffers/upb that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505772735
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505765523
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505784554
copybara-service bot pushed a commit to protocolbuffers/upb that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505772735
@eli-schwartz
Copy link
Contributor

Hello, we had the same issue with Bazel in bazel-contrib/SIG-rules-authors#11 - @fmeum reached out to GitHub and they confirmed that the SHA for certain URLs is guaranteed to remain stable:

So much for that: https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/

"Hey, btw we're using a different gzip now. Change every checksum in the world".

copybara-service bot pushed a commit to protocolbuffers/upb that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505772735
@eli-schwartz
Copy link
Contributor

Update: for the moment, it's being reverted. :)

copybara-service bot pushed a commit to protocolbuffers/upb that referenced this issue Jan 30, 2023
… instead.

These triggered non-hermetic build failures due to a change in how github generates tarballs.  See keybase/client#10800 (comment) for more info.

PiperOrigin-RevId: 505814382
@alexeagle
Copy link

Yeah this time I hope we will come away from this mess with a documented decision about stability, one way or the other...

@eli-schwartz
Copy link
Contributor

I guess it's really down to git, not GitHub, though. ;)

But on that note, I have at last put some thoughts together for the git community to consider:

https://public-inbox.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com/T/

My hope is, indeed, that git-archive can finally document "yes, this is guaranteed to produce byte-reproducible output and here is why". Then we can end this discussion for all time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants
@alexeagle @qbit @bmwiedemann @thinrope @eli-schwartz @heronhaye and others