Make crate hash stable and externally computable. #10593

Merged
merged 2 commits into from Dec 11, 2013

Projects

None yet

7 participants

@metajack
Contributor

This replaces the link meta attributes with a pkgid attribute and uses a hash
of this as the crate hash. This makes the crate hash computable by things
other than the Rust compiler. It also switches the hash function ot SHA1 since
that is much more likely to be available in shell, Python, etc than SipHash.

Fixes #10188, #8523.

@metajack
Contributor

r? @brson

This probably isn't quite ready to merge as a few tests are still failing. Some help figuring out why would be appreciated.

@brson
Contributor
brson commented Nov 22, 2013

Ok, I'll build locally and see if I can figure out what's wrong. Perhaps we should infer the pkgid from the filename if one isn't provided, maybe print a warning - otherwise this is going to break a lot of code.

How does this interact with rustpkg's notion of package id's?

@metajack
Contributor

It currently infers pkgid the same way it used to infer link.name. Or at least that was my intent.

rustpkg currently injects link attributes in, so I suppose I'll need to modify rustpkg to inject pkgid attributes instead if they aren't provided.

I also noticed that some tests use a link_name attribute (ie, they have both link.name and link_name), whose purpose I am not sure of. Perhaps that will need to be dealt with too.

@brson
Contributor
brson commented Nov 22, 2013

Oh, I see where it's inferred now. The link_name attribute is for specifying the actual symbol of an external function. It would possibly be better called 'symbol_name'. It shouldn't be affected by this.

@brson
Contributor
brson commented Nov 22, 2013

crateresolve6 and 7 don't work because now the two aux crates end up with the same output name, so trying to differentiate their loading by their link metadata no longer works. We can probably just drop this use case, but the crate resolution code in rustc::metadata::filesearch is going to need to change and so will the syntax for extern mod foo(...) that allows matching against arbitrary metadata.

crateresolve8 is fixed by this patch:

-extern mod crateresolve8(vers = "0.1", package_id="crateresolve8");
+extern mod crateresolve8(vers = "0.1", package_id="crateresolve8#0.1");

I guess package_id here is special cased to match the pkgid crate metadata value. We may want to change this to pkgid to match. Still looking at others failures.

@DaGenix
DaGenix commented Nov 22, 2013

Why Sha1 instead of Sha2? The existing Sha1 code was being used by rustpkg as, I believe, an implementation detail that could easily be updated or modified without being visible to other applications. I would imagine that if the create hashes are using Sha1, its going to be quite a bit harder to change that to something else after Rust 1.0. I could be wrong about that, but, if I'm not, I think it would be more future proof to use something like Sha2 which is also widely available and much more secure.

@brson
Contributor
brson commented Nov 22, 2013

issue-6919 is fixed by changing the pkgid to #[pkgid="issue6919_3#0.1"]; in iss.rs

Fixing up these test cases to pass we can probably merge this pretty much as-is, but I think we need to straighten out some of the semantics of crate resolution and the extern mod statement pretty soon.

@brson
Contributor
brson commented Nov 22, 2013

@DaGenix the hashing here isn't providing any security that I can think of, just a unique identifier.

@metajack
Contributor

@DaGenix I don't really care what hash function it uses, but it should be something that ships in most default libraries. The entire point of it is to be computable easily by other build tooling not from the Rust world.

We were using SipHash, but the Python standard library doesn't include this. Nor is there a shell command that will calculate the SipHash of arbitrary input. SHA1 is widely enough deployed that shasum and standard libraries generally have it.

Even with this, determining the file name the rust compiler will output is nontrivial. Here's an example Makefile function that does it:
https://gist.github.com/metajack/7594447

@DaGenix
DaGenix commented Nov 22, 2013

@brson Whats the impact if the unique identifiers don't turn out to be unique? Accidental collisions are probably highly unlikely, but, intentional malicious collisions are already theoretically possible. According to Wikipedia ( http://en.wikipedia.org/wiki/SHA-1) there is a theoretical attack that could find a collision at a cost of roughly $2.7 million today. That's unrealistic today for anyone but a nation state, but, its only a matter of years before that becomes more feasible. If Rust becomes as popular as it has the potential to be, how much of a pain would it be to change the hash in 5-10 years? It could be done today with an hour or less of work. Why not use Sha2?

@metajack
Contributor

There's no need for an attack on SHA1, as you can simply set your pkgid to the same as someone else's package and it will generate the same hash.

This is equivalent to you naming your library "libc" or "libpng" or something and trying to get people to install it.

I considered just bash64ing the pkgid but hashing seemed the best way to get the result down to a reasonable number of characters with low probability of accidental collision. The goal I am aiming for is to be better than the completely flat namespace of /usr/lib that exists now.

@metajack
Contributor

re-r? @brson

I addressed your comments (I think) and all tests pass now.

@metajack
Contributor

Also, I'm happy to squash after you review. I thought this would make it easier for you to see the changes.

@DaGenix
DaGenix commented Nov 27, 2013

Ok, so, I was thinking that the crate hash was actually hashing up its contents and so a collision might fool someone into using the wrong library. Anyway, as you point out, that's not the case. However, If the goal is absolute compatibility, its probably hard to do better than MD5, although MD5 is exceptionally insecure. Sha1 is almost as compatible, and almost as fast, but only somewhat more secure. I think its hard to find someone that recommends using Sha1 for a new application, regardless of how is being used. Sha256 or Sha512 is nearly as ubiquitous as MD5 or Sha1 and runs at a competitive speed (The Sha256 rust implementation that I wrote that used to be in libextra was faster than the current Sha1 implementation, IIRC) while being much more secure. You don't need to follow the sentence "I'm using Sha256" with an explanation of why its not a security problem like you do with MD5 or Sha1. So, Sha1 seems like it exists in an odd middle area of not really being better than MD5 or Sha256. I'm also a little concerned about moving Sha1 into the util module of rustc - even though this current use doesn't really have security implications, it feels like this is just encouraging a future use that might.

I don't know how persuasive these argument have been. However, if you are not opposed, I'm happy to whip up a patch to use Sha256 instead of Sha1 since I don't know what the benefit of Sha1 is other than that it is currently in the Rust tree.

@metajack
Contributor

I just checked and all the places I care about (CMake, Lua, Python, shell, Ruby) all have sha256. I'm fine with using that.

@DaGenix
DaGenix commented Nov 27, 2013

Ok, cool. I'll get a patch together tomorrow for that.

@DaGenix
DaGenix commented Nov 28, 2013

I put at patch in the sha256-pkg-id branch of https://github.com/DaGenix/rust.git. I also sent a pull request for it to your repository, since I wasn't sure what the best way to post it was.

@DaGenix
DaGenix commented Nov 28, 2013

I am getting failures when running make check in the rustpkg tests and I have no idea why. I'm quite certain they aren't due to the changes I made since some of the tests that are failing are failing trying to find certain files that don't have any hashes appended to them. I think its some issue with how those tests are trying to find files that isn't playing well with my system.

@metajack
Contributor

I'll investigate and see if I can track down the failures.

@DaGenix
DaGenix commented Nov 28, 2013

I was doing a build in a subdirectory of the main source folder. I dunno if that was causing the problems, but, I'm trying a full rebuild from the root of the source directory to see if that works better.

@DaGenix
DaGenix commented Nov 28, 2013

I rebuilt and re-ran make check with 003c35e (top commit before my Sha2 changes) and it encountered the same errors. I then tried again with 8464004 (last commit before this pull request) and it passed. No idea why, though.

@metajack
Contributor

What system are you on and which tests failed? I'm mostly through check-stage1 with your patch applied and not seeing any failures yet.

@DaGenix
DaGenix commented Nov 28, 2013

I'm using 64-bit Ubuntu 13.10. Here are the errors i'm getting after running make check-stage1-rustpkg:

failures:
    tests::compile_flag_build
    tests::dash_S
    tests::find_sources_in_cwd
    tests::install_after_build
    tests::install_remove
    tests::pkgid_pointing_to_subdir
    tests::reinstall
    tests::rust_path_install_target
    tests::rust_path_test
    tests::rustpkg_build_no_arg
    tests::rustpkg_clean_no_arg
    tests::rustpkg_library_target
    tests::rustpkg_local_pkg
    tests::sysroot_flag
    tests::test_7402
    tests::test_c_dependency_no_rebuilding
    tests::test_c_dependency_ok
    tests::test_c_dependency_yes_rebuilding
    tests::test_cfg_build
    tests::test_emit_llvm_S_build
    tests::test_emit_llvm_build
    tests::test_install_git
    tests::test_install_to_rust_path
    tests::test_install_valid_external
    tests::test_installed_read_only
    tests::test_linker_build
    tests::test_list
    tests::test_optimized_build
    tests::test_package_request_version
    tests::test_package_version
    tests::test_rebuild_when_needed
    tests::test_rust_path_can_contain_package_dirs_with_flag
    tests::test_rustpkg_test_cfg
    tests::test_rustpkg_test_creates_exec
    tests::test_rustpkg_test_failure_exit_status
    tests::test_rustpkg_test_output
    tests::test_target_specific_build_dir
    tests::test_target_specific_install_dir

test result: FAILED. 41 passed; 38 failed; 9 ignored; 0 measured
@DaGenix
DaGenix commented Nov 28, 2013

I put the full output here: https://gist.github.com/DaGenix/7695515

@brson
Contributor
brson commented Dec 4, 2013

lgtm but needs a rebase. You might consider adding a flag to generate the sha so you don't have to depend on a local sha utility.

@metajack
Contributor

@brson Rebased. I fixed up more test cases as well.

The final commit also fixes a makefile dependency error I encountered.

metajack added some commits Dec 9, 2013
@metajack metajack Make crate hash stable and externally computable.
This replaces the link meta attributes with a pkgid attribute and uses a hash
of this as the crate hash. This makes the crate hash computable by things
other than the Rust compiler. It also switches the hash function ot SHA1 since
that is much more likely to be available in shell, Python, etc than SipHash.

Fixes #10188, #8523.
b349036
@metajack metajack Add missing sundown dependency to rustdoc tests.
a16753c
@metajack
Owner

r=brson

Contributor
bors replied Dec 11, 2013

saw approval from brson
at metajack@a16753c

Contributor
bors replied Dec 11, 2013

merging metajack/rust/pkgid-hash = a16753c into auto

Contributor
bors replied Dec 11, 2013

metajack/rust/pkgid-hash = a16753c merged ok, testing candidate = ac4dd9e

Contributor
bors replied Dec 11, 2013

fast-forwarding master to auto = ac4dd9e

@metajack
Contributor

I apparently forgot to run make tidy and had two lines that were too long. Fixed.

@bors bors added a commit that referenced this pull request Dec 11, 2013
@bors bors auto merge of #10593 : metajack/rust/pkgid-hash, r=brson
This replaces the link meta attributes with a pkgid attribute and uses a hash
of this as the crate hash. This makes the crate hash computable by things
other than the Rust compiler. It also switches the hash function ot SHA1 since
that is much more likely to be available in shell, Python, etc than SipHash.

Fixes #10188, #8523.
ac4dd9e
@bors bors merged commit a16753c into rust-lang:master Dec 11, 2013

1 check passed

default all tests passed
@metajack metajack deleted the metajack:pkgid-hash branch Dec 11, 2013
@adrientetar

Thought rustdoc_ng had been merged as rustdoc? (@cmr)

Contributor
cmr replied Dec 11, 2013

Yup, this section can be removed.

@kballard
Contributor

This change is causing a serious problem. Library names are now completely different. And I don't mean the hash. I mean the actual name on-disk.

The library name on disk is lib<name>-<hash>.... Previously, <name> came from the link args, so #[link(name="foo")]; produced libfoo-<hash>.... Now, it's apparently based off of the last path component of the #[pkgid]. So if my library doesn't have a #[pkgid], then I'll end up with liblib-<hash>.... And since the hash is also based off of the #[pkgid], then every single non-#[pkgid] library will result in the same full filename (as both the library name and the hash are identical). That's really bad.

If I go ahead and add a #[pkgid], and use the recommended approach of using my github path as the pkgid, then I end up with a library that's potentially named incorrectly. For example, my github.com/kballard/rust-lua library wants to produce liblua-<hash>... but instead with the #[pkgid] it's now librust-lua-<hash>....

@kballard
Contributor

I've filed this same commentary as issue #10922.

@bors bors added a commit that referenced this pull request Dec 13, 2013
@bors bors auto merge of #10698 : metajack/rust/dep-info, r=alexcrichton
This isn't super useful for libraries yet without #10593.

Fixes #7633.
9bbef13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment