-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create checksum from more reproducible artifact #7212
Comments
There's been an event I call "zlib explosion" after which cargo adopted tons of variants of the zlib algorithm. With this algorithm the attempts to re-create hash-matching .crate files from the deduped storage turn pointless. See also rust-lang/cargo#7212
Another option is to hash the files, write the hashes into the manifest and sign the manifest itself. This is how JAR file signatures work. This way even the .tar format can be replaced with a different format later. |
There is a large number of security vulnerabilities around that, and the jar model is generally seen as bad now in the security community. |
For my own edification, where can I learn more about these security problems? |
Having done a small search I found:
Right now cargo/crates.io does it really well with hashing the entire tar.gz file's contents. My suggestion is to only use the tar file's contents for the hashing, allowing for better compression and deduplication to be done on the backend side. If we ever want to do signatures embedded in the tar.gz, some alternatives
|
This might benefit #2526 as it would let us change migrate to additional compression algorithms while the checksum stays stable across all of them. |
I was reminded of this at PackagingCon, and wanted to write it down before it slipped my mind. There is an existing battle hardened (file system|operating system|compression) agnostic file structure hash, the hashing algorithm used for git trees. It should be possible to compute this hash over any compressed version of the file or a checkout of the tree on the file system and get the same result. It does treat several filesystem properties as irrelevant and hashes other properties either of which could be wrong decisions in our use case. But it has the advantage of being battle hardened. Just a thought. |
PR #6317 has been quite disruptive (you can call it detrimental) to my attempts at building a reversable storage of cargo crates.
The goal of the project is to deduplicate files present in multiple versions of the crate and do the compression on the fly. This gives sweet improvements. The reason why #6317 was so disruptive is that I want to replicate the compression 100% so that there are no changes in Cargo.lock hashes. But for this I need to be able to run the the exact compression algorithm used. Now there are countless versions of the zlib library used in various OSs and I'm not sure whether all of them are even open source. That's why I said detrimental, as if proprietary versions of zlib are used, the project would be impossible.
A short term fix for the issue would be to revert the PR. A more long term fix would be to use a different format for the hashes used in the registry and Cargo.lock. Instead of hashing the entire tar.gz files, you could hash the .tar files only. There is a new upcoming version of Cargo.lock in which it could be adopted. Do you think that this is a good idea?
The text was updated successfully, but these errors were encountered: