Permalink
Show file tree
Hide file tree
16 comments
on commit
sign in to comment.
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
archive-tar: use internal gzip by default
Drop the dependency on gzip(1) and use our internal implementation to create tar.gz and tgz files. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Loading branch information
Showing
3 changed files
with
20 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this broke checksums of the tars and lots of things relied on the consistent checksum, including bazelbuild:
https://twitter.com/shs96c/status/1620201523211894784
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also broke of bunch of builds in the homebrew side, https://github.com/Homebrew/homebrew-core/labels/git-archive-checksum-incident
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break Archlinux packages. And most probably deb, rpm, npm, pypi, etc.
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nix's ecosystem will be highly affected too.
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broke RTEMS tools builds, we adjusted to the new hashes, and then it got reverted so we have to revert our hash updates🤦
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI on open-source projects like🥶 Remember the old school engineering rule: If it works well - do not touch it.
Buildrootreported possibleMan-in-the-middle attacksbecause of this change4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broke all the FreeBSD ports that downloaded tar.gz files from github. This is approximately 5800
packages out of a total of 31000 that are affected. Thankfully, it was reverted... regenerating all these hashes would be quite the chore.
The right way to cope with this would be to add newer, better compression methods, give people a chance to migrate and then if you really want to get rid of this dependency, just delete .gz files. They are so early 90s anyway :)
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be all for adding archives in more modern formats1. However that's not really the point - the point is there are a lot of tools which download an archive and then verify it's checksum. It doesn't really matter what compression scheme those archives use, but stability is important. Even if documentation says the checksums shouldn't be relied upon to remain stable, they have remained stable for long enough that people have begun relying on them anyway. There are also a lot of use cases where release artifacts aren't an acceptable substitute, either2.
Footnotes
brotli,zstd, andxzare all great formats with different speed/compression-ratio trade-offs. The first two are both faster and achieve higher compression ratios thangzip.xzgets higher compression ratios than any of the others, but is slightly slower. I don't think there are any supported operating systems which still ship withoutxzin their default set of installed packages, whilebrotliandzstdsupport is less common. But none of those formats are available now.Sometimes you need to get a commit that isn't a tagged release, for a repository you don't have control over. Also, for private repos, you can fetch the default archives by supplying an↩
Authorizationheader but fetching other kinds of release artifacts is more complicated and, depending on the tool that's doing the fetching, it might be impossible, if for example there isn't an option for setting theAcceptheader.4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that the whole affair just proved that the only artifacts that are useful are those which are immutable. Because only then their integrity can be verified the easiest (and only) way, by using a cryptographic hash. If GitHub would want to insist on not guaranteeing that then the response should be why even bother.
Using a brand new compression algorithm/format for any future artifacts should not be a problem as long as they stay that way (and assuming the brand new and shiny stuff is reasonably available). Defining the "future" might be the only tricky part.
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nixpkgs doesn't calculate the checksum of the archive, but rather checksum of files within it specifically to avoid this issue (see https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/fetchzip/default.nix). If you use
fetchFromGitHub, the checksums will be stable, and they were since 2014.4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but I've still seen
fetchurlused where it shouldn't be much too frequently (e.g. lopsided98/nix-ros-overlay#241)4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An attempt to fix this matter, #1454
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you prevent tarbombs ? The idea of checking the tarball itself is to do it before any extraction.
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… For packages. Not source downloading. It's not today's subject.
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite simply, Nix doesn't. Although I wonder how many package managers deal with
Content-Encoding: gzipbombs.However, there are some measures against it. First of all, for packages inside Nixpkgs, the source files aren't usually downloaded directly from upstream, but from the cache. The cache is designed to ensure that packages will still work even if something happens to upstream, but it also helps with that. Also, the default build directory is inside /tmp, which is usually a tmpfs which has its own size limits (by default half of physical RAM without swap), which will avoid most problems.
4f4be00There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's actually a nice answer.