Gerrit tarballs for the base packages aren't deterministic #84
Comments
Reported by |
Reported by |
|
FWIW, this makes it somewhat annoying to follow Bazel best practices when using |
|
FYI, this is pretty unlikely to be fixed due to needing to break a public API inside JGit, which requires a major version bump, and that doesn't happen often. The best practice to get files from Git is to use the Git wire protocol to If Bazel doesn't want to use Git to fetch source files from Git, then best practice should be to export the files as a tarball and store that tarball in another, non-Git persistent location where the exact bytes of that stream are unlikely to change. Attempting to checksum a dynamically created .tar.bz2 or .tar.gz stream is not a good idea, as the compressor can change over time and produce different compressed stream results that still inflate to the same original files. |
|
Bazel can use git directly, but it doesn't support shallow clones and therefore unnecessarily fetches all of the history for a repo. Their suggestion is to use |
|
IMHO there should be a feature request against Bazel to support shallow clones. It should be trivial to add the As Shawn says, using a dynamically generated compressed file is still a bad idea for this use case. Even if we fix JGit/Gitiles to generate a deterministic sequence of bytes at a given server version, we have no way to ensure that the given sequence of bytes remains deterministic across server versions. We may depend on the JDK's zlib implementation for compressing objects, and there is no guarantee that that implementation is going to always produce the same byte sequence across JDK versions. Similarly, we use Apache Commons Compress for generating the archives, and we have no guarantee that a given list of archive entries is always going to contain the same bytes of metadata even if the compressed content is the same. The upshot is that callers really should not depend on the sequence of bytes in an archive being stable in the long term, which is what the Bazel use case is asking for. |
|
you could write a custom repository rule that runs a git clone/fetch of a specific revision to implement shawn's suggestion. Beyond fixing the direct issue, I think that would also be a good direction for Bazel to take, so Bazel can stop depending on JGit. |
These tarballs are not deterministic due to changing metadata, cf. google/gitiles#84
These tarballs are not deterministic due to changing metadata, cf. google/gitiles#84
These tarballs are not deterministic due to changing metadata, cf. google/gitiles#84
|
This is now tracked as: [1]. The change under review is: [2]. |
|
Thanks @msohn it is fixed now, as of JGit 5.1.9. @dborowitz, @jrn, @hanwen Can this be closed? |
|
Has this been deployed to googlesource.com? |
|
unfortunately, it has not, and it doesn't seem like it will be :/ |
|
Whom do we need to contact to get that fixed? |
|
googlesource.com runs JGit from master, so if this is still non-deterministic, something else is going on. |
It is: note different Content-Length on different runs of trying to fetch the same commit: |
|
This is still happening. |
This has been true from the start. Unless we a. Store the tarball when a user downloads it (this is what GitHub does), or b. Keep around historical versions of commons-compress and record which one was used to produce the tarball we cannot make a long term deterministic tarball download. All the requests I have seen are for use cases that require long term determinism. In that spirit, it would be misleading to pretend we intend to provide that; it is expensive to do and not part of what Gitiles is meant for. If you don't need determinism, you can use the Gitiles tarball. If you do need determinism, I recommend storing the tarball somewhere (e.g. a cloud storage provider or an ftp host). |
|
(a) can we make this a hosting config option? I get that storing archives for every project and every commit is a ton of space and would be pretty wasteful (especially if crawlers fire). I wonder if a middle ground of doing it only for tags would work. (b) how big of a problem is this approach? gitiles doesn't seem to change that much (for better or worse). what if we did this? not entirely unrelated, but the gzip project has an rsync option so compressed files are stable and easy to transfer. |
We had to use fetchgit so far as the tarballs are generated on demand and have embedded timestamps which makes their hashes unstable [0][1]. This is a problem for fetchurl but fetchzip extracts the tarballs into the Nix store and therefore the contents will get normalized and the hashes remain stable. [0]: google/gitiles#84 [1]: https://bugs.eclipse.org/bugs/show_bug.cgi?id=548312
|
I'm still seeing the timestamp in the tar metadata when downloading from googlesource.com. So this is not yet resolved. It looks like it was already fixed in JGit. I added more info in #217 |
|
Why is this issue closed? The problem was never fixed. Please reopen. |
…kip] Also make directory strip level a parameter in live_xt. googlesource.com tarballs need level 0. Ref: google/gitiles#84
Don't use *.googlesource.com as tarball source, it generates non-reproducible tarballs (google/gitiles#84). Closes: https://bugs.gentoo.org/860297 Signed-off-by: Azamat H. Hackimov <azamat.hackimov@gmail.com>
Don't use *.googlesource.com as tarball source, it generates non-reproducible tarballs (google/gitiles#84). Closes: https://bugs.gentoo.org/860297 Signed-off-by: Azamat H. Hackimov <azamat.hackimov@gmail.com> Closes: #26550 Signed-off-by: Joonas Niilola <juippis@gentoo.org>
Originally reported on Google Code with ID 92
Reported by
Noneon 2015-12-13 16:27:54The text was updated successfully, but these errors were encountered: