Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collisions in type_id #10389

Open
DaGenix opened this issue Nov 9, 2013 · 130 comments
Open

Collisions in type_id #10389

DaGenix opened this issue Nov 9, 2013 · 130 comments
Labels
A-typesystem Area: The type system C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@DaGenix
Copy link

DaGenix commented Nov 9, 2013

The implementation of type_id from #10182 uses SipHash on various parameters depending on the type. The output size of SipHash is only 64-bits, however, making it feasible to find collisions via a Birthday Attack. I believe the code below demonstrates a collision in the type_id value of two different ty_structs:

use std::hash;
use std::hash::Streaming;

// I believe that this pretty much the same thing as hash_crate_independent() in ty.rs
// for a ty_struct on a 64-bit platform
fn hash_struct(local_hash: &str, node: u64) -> u64 {
    let mut state = hash::default_state();
    state.input([18]);
    state.input(local_hash.as_bytes());
    do node.iter_bytes(true) |bytes| { state.input(bytes); true };
    state.result_u64()
}

fn main() {
    // This represents two structures with different node values from a crate with a 
    // local_hash value of "local" that end up getting the same hash and thus, 
    // I think, the same type_id
    assert!(hash_struct("local", 0x9e02c8943c336302) == hash_struct("local", 0x366a806b1d5f1b2));
}
@alexcrichton
Copy link
Member

I'm not entirely sure how feasible it is for a program to have 0x366a806b1d5f1b2 node ids (2.5 trillion), but this is still concerning.

We could in theory have very cheap inequality among types, and then have an expensive equality check. Something which may walk the respective TyDesc structures in parallel to make sure that they're the same. We could also bump up the hash size to using something like sha1/sha2 and have the type_id() intrinsic return [u8, ..N] to reduce the possibility of a collision.

Either way, I don't think that this is a super-pressing issue for now, but I'm nominating to discuss whether we want to get this done for 1.0. This could in theory have serious implications depending on how frequently Any is used.

@alexcrichton
Copy link
Member

Ah, it was already nominated!

@bill-myers
Copy link
Contributor

Why not compare an interned version of the type data string? (i.e. what is currently passed as data to be hashed, possibly SHA-256 hashed first)

The linker can be used for interning by emitting a common symbol with the type data string as name and taking its address, and otherwise the same thing can be done manually in a global constructor.

This way it's always a pointer comparison, and there are no collisions.

@DaGenix
Copy link
Author

DaGenix commented Nov 11, 2013

I don't know how node id values are generated, but assuming that they are generated sequentially, this particular collision is not realistic. However, its not hard to find collisions for more realistic node id values by picking particular values for the crate hashes:

assert!(hash_struct("a2c55ca1a1f68", 4080) == hash_struct("138b8278caab5", 2804));

The key thing to consider isn't the number of node id values, though: its the total number of type id values. Some quick (hopefully correct) math shows that there is a 0.01% chance of a collision once there are around 60 million type id values. That's still a pretty large number of type id values for a somewhat low probability of a collision, thought. So, its unclear to me how big a deal this is for the Rust 1.0 timeframe. It all depends on what the acceptable probability of a collision is.

@nikomatsakis
Copy link
Contributor

When I saw that @alexcrichton proposed using a hash, my first reaction was "collision!" but then I thought "...but exceedingly unlikely to occur in practice". I think this is not a matter of imminent destruction but if we can leverage the linker or some other scheme to avoid this danger, we should -- and perhaps we should just go ahead and mark the current scheme as deprecated and just plan on finding a replacement scheme.

@thestinger
Copy link
Contributor

A cryptographic hash designed for this purpose (larger output) would be enough. Although, a larger output would be more expensive to compare (four u64 comparisons for SHA2).

@pnkfelix
Copy link
Member

We don't need to deal with this right now. P-low.

@steveklabnik
Copy link
Member

How relevant is this issue today? I think that it's all the same, but am not sure.

@thestinger
Copy link
Contributor

It's 64-bit so collisions are likely with enough types (consider recursive type metaprogramming) and it doesn't have any check to bail out if one occurs. Bailing out is not a very good solution anyway, because it pretty much means that there's no way to compile the program, beyond using a different random seed and hoping for the best. It's a crappy situation.

@vks
Copy link
Contributor

vks commented Jan 21, 2015

Note that "hoping for the best" by iteratively changing the seed might work with overwhelmingly large probability after very few iterations.

@sorear
Copy link
Contributor

sorear commented Oct 14, 2015

use std::any::Any;

fn main() {
    let weird : [([u8; 188250],[u8; 1381155],[u8; 558782]); 0] = [];
    let whoops = Any::downcast_ref::<[([u8; 1990233],[u8; 798602],[u8; 2074279]); 1]>(&weird);
    println!("{}",whoops.unwrap()[0].0[333333]);
}

Actually a soundness issue. playground: http://is.gd/TwBayX

@pnkfelix
Copy link
Member

I'd like the lang team to devote a little time to this now that we are post 1.0. Nominating

@pnkfelix pnkfelix added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Oct 18, 2015
@nikomatsakis
Copy link
Contributor

OK, lang team discussed it, and our conclusion was that:

  1. This issue ought to be fixed, it's silly not to.
  2. This is an implementation detail that we could change whenever we want (right?)
  3. Nonetheless, we probably ought to open an RFC or at least a discuss thread, with a proposal to do better, since probably people will have some clever ideas.
  4. Probably the runtime overhead of the virtual call in the Any trait is way more than a strcmp anyhow for all realistic types.

@nikomatsakis
Copy link
Contributor

I was wondering about a design where we do something like:

  • generate a static string representing the full type; in static builds, at least, this will be interned by the linker;
  • generate a hash

compare the string pointers for equality (to give a fast equality check). If that fails, compare the hashes for inequality (to give a fast inequality check). If THAT fails, compare the strings for content (to handle dynamic linking).

Although re-reading the thread I see @bill-myers may have had an even more clever solution.

@pnkfelix
Copy link
Member

@nikomatsakis putting the hash of the data at the start is a good idea, to increase the probability that we catch unequal things quickly. It seems to me like @bill-myers' approach composes fine with that strategy.

@sorear
Copy link
Contributor

sorear commented Oct 30, 2015

I doubt the "problem" is limited to Any. You can probably confuse the compiler just as effectively by colliding hashes for symbol mangling, or many other things. What is the objective here? Since Rust is not a sandbox language, I don't think "protect memory from malicious programmers" should be one of our goals (we should document the types of undefined behavior that can be hit in safe code, and fix the ones that are possible to hit by accident; if someone is determined to break the type system, they can already write an unsafe block, or use std::process to launch a subprocess that ptraces its parent and corrupts memory).

@hmvp
Copy link

hmvp commented Jan 24, 2017

@Mark-Simulacrum
Copy link
Member

@nikomatsakis Should this be marked as I-unsound? I've done so for now, since that seems to be the general conclusion a couple of times by different people, but please unmark if I'm wrong.

@Mark-Simulacrum Mark-Simulacrum added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness C-bug Category: This is a bug. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 19, 2017
@the8472
Copy link
Member

the8472 commented Jun 20, 2024

Yes, speed is the concern. See #107925 for an example about the impact of the chosen hash function.
You can still try if you want, but the overhead is likely prohibitive and an alternative solution would be required if a decision is made to require collision-freedom or at least cryptographic strength.

@bjorn3
Copy link
Member

bjorn3 commented Jun 20, 2024

For TypeId specifically speed shouldn't matter a lot due to the relative rarity of TypeId compared to other uses of hashing in the compiler. There are other places where we also depend on collision freedom for soundness, where speed is very much important. For the cases where collision freedom is not depended on we already use rustc-hash (this is a polynomial hash currently) rather than SipHash.

@RalfJung
Copy link
Member

RalfJung commented Jun 20, 2024 via email

@RalfJung
Copy link
Member

RalfJung commented Jun 20, 2024 via email

@briansmith
Copy link
Contributor

briansmith commented Jun 20, 2024

Ralf, thanks for clarifying that. I had misread what you wrote.

And then if yes, how close to perfect does the actual hash function have to be?

I cannot find it right now, but if it is useful I can find a discussion that made a pretty convincing argument that no practical hash function with 128-bit output can never be collision-free enough to approximate such a "perfect" hash that we could make a "good enough" argument for. And my understanding from the discussion above (or elsewhere?) is that increasing the hash size to 256 bits (which would probably be the minimum to be considered "good enough" for theoretical soundness arguments) is considered prohibitive, such that [edit] relying solely on hash comparison would be disqualified as a solution.

@cr-marcstevens
Copy link

It is one thing knowing there exist collisions in theory.
It is another having them show up in practice and screw things up.
For a 128-bit hash there would need to be about 2^64 different types in all rust code before you would expect to see a collision, they then also would need to coexist in 1 total compilation.
That seems unlikely enough to cause problems in practice.

But that is discounting conscious effort to cause bad stuff.
From first glance SipHash-1-3 seems to have little enough mixing that I would certainly not surprised if there could be attacks.
It seemingly really relies on a secret random key for its security level.
But further cryptanalysis would be necessary to tell how secure it really is with a zero key.

@the8472
Copy link
Member

the8472 commented Jun 20, 2024

But that is discounting conscious effort to cause bad stuff.

Yes, threat models have been discussed upthread.

@cr-marcstevens
Copy link

I read some parts, it looks like such threats are not taken really in design.
(Otherwise a cryptographic hash would already be used)
Yet, when a collision was demonstrated for the 64-bit hash, it was enough motivation to immediately switch to a 128-bit version?

@RalfJung
Copy link
Member

Not "immediately", it took quite a while.^^

Yes, threat models have been discussed upthread.

Indeed, a lot of stuff was discussed there.^^ I think we need a decent summary collecting all major positions before we can ask t-lang to take a look.

@the8472
Copy link
Member

the8472 commented Jun 20, 2024

10 years from the report to the fix. And 32bit resistance is not enough even for non-malicious uses. Even if it's very rare it'd still happen eventually to someone over the course of normal rust use.

@briansmith
Copy link
Contributor

I think we need a decent summary collecting all major positions before we can ask t-lang to take a look.

The bigger problem here is that now other parts of the Rust project, and other projects, are using rustc's use of SipHash 1-3 with an all-zero key and 128-bit output as an indication that it is good enough for their uses. See:

@saethlin
Copy link
Member

I should think that simply directing people to read the discussion here would be sufficient to dissuade them. Or, even better, directing them to a summary.

@tarcieri
Copy link
Contributor

One important comment in this thread that I hope doesn't get lost is per SipHash's author (and as others have noted), it's designed to be a keyed PRF, not an unkeyed hash function, and StableHasher is attempting to use it as the latter, which can be considered a misuse by some (as @cr-marcstevens noted, "It seemingly really relies on a secret random key for its security level").

From what I can tell the selection of SipHash for StableHasher is motivated primarily by the amount of work has gone into performance optimizing SipHash within rustc and no one has yet done the work to get a proper unkeyed hash function (SHA2/SHA3/BLAKE2/BLAKE3/Ascon-Hash) to a similar level of performance.

@michaelwoerister
Copy link
Member

There's no intrinsic reason why StableHasher could not support different hash functions for different use cases. For incremental compilation fingerprinting we need something very fast, for anything else performance is not so much a concern.

@RalfJung
Copy link
Member

RalfJung commented Jun 21, 2024 via email

@michaelwoerister
Copy link
Member

Yes, incremental compilation is a best-effort developer-quality-of-life feature. No incrementally built code should ever be shipped.

@michaelwoerister
Copy link
Member

I've opened an issue in the rustc-stable-hasher repo about supporting different hash algorithms. It's clear that SipHash13 is not a good choice for most use cases.

@cr-marcstevens
Copy link

Just to confirm, the SipHash128 that rustc is using is identical to this code:
https://github.com/veorq/SipHash/blob/master/siphash.c
Using 128-bit output (outlen=16), and instead using #define cROUNDS 1 and #define dROUNDS 3?

@michaelwoerister
Copy link
Member

I don't think there is any deliberate deviation from the reference impl, so probably yes. Here are the compiler's test vectors for the algorithm:

const TEST_VECTOR: [[u8; 16]; 64] = [
[0xe7, 0x7e, 0xbc, 0xb2, 0x27, 0x88, 0xa5, 0xbe, 0xfd, 0x62, 0xdb, 0x6a, 0xdd, 0x30, 0x30, 0x01],
[0xfc, 0x6f, 0x37, 0x04, 0x60, 0xd3, 0xed, 0xa8, 0x5e, 0x05, 0x73, 0xcc, 0x2b, 0x2f, 0xf0, 0x63],
[0x75, 0x78, 0x7f, 0x09, 0x05, 0x69, 0x83, 0x9b, 0x85, 0x5b, 0xc9, 0x54, 0x8c, 0x6a, 0xea, 0x95],
[0x6b, 0xc5, 0xcc, 0xfa, 0x1e, 0xdc, 0xf7, 0x9f, 0x48, 0x23, 0x18, 0x77, 0x12, 0xeb, 0xd7, 0x43],
[0x0c, 0x78, 0x4e, 0x71, 0xac, 0x2b, 0x28, 0x5a, 0x9f, 0x8e, 0x92, 0xe7, 0x8f, 0xbf, 0x2c, 0x25],
[0xf3, 0x28, 0xdb, 0x89, 0x34, 0x5b, 0x62, 0x0c, 0x79, 0x52, 0x29, 0xa4, 0x26, 0x95, 0x84, 0x3e],
[0xdc, 0xd0, 0x3d, 0x29, 0xf7, 0x43, 0xe7, 0x10, 0x09, 0x51, 0xb0, 0xe8, 0x39, 0x85, 0xa6, 0xf8],
[0x10, 0x84, 0xb9, 0x23, 0xf2, 0xaa, 0xe0, 0xc3, 0xa6, 0x2f, 0x2e, 0xc8, 0x08, 0x48, 0xab, 0x77],
[0xaa, 0x12, 0xfe, 0xe1, 0xd5, 0xe3, 0xda, 0xb4, 0x72, 0x4f, 0x16, 0xab, 0x35, 0xf9, 0xc7, 0x99],
[0x81, 0xdd, 0xb8, 0x04, 0x2c, 0xf3, 0x39, 0x94, 0xf4, 0x72, 0x0e, 0x00, 0x94, 0x13, 0x7c, 0x42],
[0x4f, 0xaa, 0x54, 0x1d, 0x5d, 0x49, 0x8e, 0x89, 0xba, 0x0e, 0xa4, 0xc3, 0x87, 0xb2, 0x2f, 0xb4],
[0x72, 0x3b, 0x9a, 0xf3, 0x55, 0x44, 0x91, 0xdb, 0xb1, 0xd6, 0x63, 0x3d, 0xfc, 0x6e, 0x0c, 0x4e],
[0xe5, 0x3f, 0x92, 0x85, 0x9e, 0x48, 0x19, 0xa8, 0xdc, 0x06, 0x95, 0x73, 0x9f, 0xea, 0x8c, 0x65],
[0xb2, 0xf8, 0x58, 0xc7, 0xc9, 0xea, 0x80, 0x1d, 0x53, 0xd6, 0x03, 0x59, 0x6d, 0x65, 0x78, 0x44],
[0x87, 0xe7, 0x62, 0x68, 0xdb, 0xc9, 0x22, 0x72, 0x26, 0xb0, 0xca, 0x66, 0x5f, 0x64, 0xe3, 0x78],
[0xc1, 0x7e, 0x55, 0x05, 0xb2, 0xbd, 0x52, 0x6c, 0x29, 0x21, 0xcd, 0xec, 0x1e, 0x7e, 0x01, 0x09],
[0xd0, 0xa8, 0xd9, 0x57, 0x15, 0x51, 0x8e, 0xeb, 0xb5, 0x13, 0xb0, 0xf8, 0x3d, 0x9e, 0x17, 0x93],
[0x23, 0x41, 0x26, 0xf9, 0x3f, 0xbb, 0x66, 0x8d, 0x97, 0x51, 0x12, 0xe8, 0xfe, 0xbd, 0xf7, 0xec],
[0xef, 0x42, 0xf0, 0x3d, 0xb7, 0x8f, 0x70, 0x4d, 0x02, 0x3c, 0x44, 0x9f, 0x16, 0xb7, 0x09, 0x2b],
[0xab, 0xf7, 0x62, 0x38, 0xc2, 0x0a, 0xf1, 0x61, 0xb2, 0x31, 0x4b, 0x4d, 0x55, 0x26, 0xbc, 0xe9],
[0x3c, 0x2c, 0x2f, 0x11, 0xbb, 0x90, 0xcf, 0x0b, 0xe3, 0x35, 0xca, 0x9b, 0x2e, 0x91, 0xe9, 0xb7],
[0x2a, 0x7a, 0x68, 0x0f, 0x22, 0xa0, 0x2a, 0x92, 0xf4, 0x51, 0x49, 0xd2, 0x0f, 0xec, 0xe0, 0xef],
[0xc9, 0xa8, 0xd1, 0x30, 0x23, 0x1d, 0xd4, 0x3e, 0x42, 0xe6, 0x45, 0x69, 0x57, 0xf8, 0x37, 0x79],
[0x1d, 0x12, 0x7b, 0x84, 0x40, 0x5c, 0xea, 0xb9, 0x9f, 0xd8, 0x77, 0x5a, 0x9b, 0xe6, 0xc5, 0x59],
[0x9e, 0x4b, 0xf8, 0x37, 0xbc, 0xfd, 0x92, 0xca, 0xce, 0x09, 0xd2, 0x06, 0x1a, 0x84, 0xd0, 0x4a],
[0x39, 0x03, 0x1a, 0x96, 0x5d, 0x73, 0xb4, 0xaf, 0x5a, 0x27, 0x4d, 0x18, 0xf9, 0x73, 0xb1, 0xd2],
[0x7f, 0x4d, 0x0a, 0x12, 0x09, 0xd6, 0x7e, 0x4e, 0xd0, 0x6f, 0x75, 0x38, 0xe1, 0xcf, 0xad, 0x64],
[0xe6, 0x1e, 0xe2, 0x40, 0xfb, 0xdc, 0xce, 0x38, 0x96, 0x9f, 0x4c, 0xd2, 0x49, 0x27, 0xdd, 0x93],
[0x4c, 0x3b, 0xa2, 0xb3, 0x7b, 0x0f, 0xdd, 0x8c, 0xfa, 0x5e, 0x95, 0xc1, 0x89, 0xb2, 0x94, 0x14],
[0xe0, 0x6f, 0xd4, 0xca, 0x06, 0x6f, 0xec, 0xdd, 0x54, 0x06, 0x8a, 0x5a, 0xd8, 0x89, 0x6f, 0x86],
[0x5c, 0xa8, 0x4c, 0x34, 0x13, 0x9c, 0x65, 0x80, 0xa8, 0x8a, 0xf2, 0x49, 0x90, 0x72, 0x07, 0x06],
[0x42, 0xea, 0x96, 0x1c, 0x5b, 0x3c, 0x85, 0x8b, 0x17, 0xc3, 0xe5, 0x50, 0xdf, 0xa7, 0x90, 0x10],
[0x40, 0x6c, 0x44, 0xde, 0xe6, 0x78, 0x57, 0xb2, 0x94, 0x31, 0x60, 0xf3, 0x0c, 0x74, 0x17, 0xd3],
[0xc5, 0xf5, 0x7b, 0xae, 0x13, 0x20, 0xfc, 0xf4, 0xb4, 0xe8, 0x68, 0xe7, 0x1d, 0x56, 0xc6, 0x6b],
[0x04, 0xbf, 0x73, 0x7a, 0x5b, 0x67, 0x6b, 0xe7, 0xc3, 0xde, 0x05, 0x01, 0x7d, 0xf4, 0xbf, 0xf9],
[0x51, 0x63, 0xc9, 0xc0, 0x3f, 0x19, 0x07, 0xea, 0x10, 0x44, 0xed, 0x5c, 0x30, 0x72, 0x7b, 0x4f],
[0x37, 0xa1, 0x10, 0xf0, 0x02, 0x71, 0x8e, 0xda, 0xd2, 0x4b, 0x3f, 0x9e, 0xe4, 0x53, 0xf1, 0x40],
[0xb9, 0x87, 0x7e, 0x38, 0x1a, 0xed, 0xd3, 0xda, 0x08, 0xc3, 0x3e, 0x75, 0xff, 0x23, 0xac, 0x10],
[0x7c, 0x50, 0x04, 0x00, 0x5e, 0xc5, 0xda, 0x4c, 0x5a, 0xc9, 0x44, 0x0e, 0x5c, 0x72, 0x31, 0x93],
[0x81, 0xb8, 0x24, 0x37, 0x83, 0xdb, 0xc6, 0x46, 0xca, 0x9d, 0x0c, 0xd8, 0x2a, 0xbd, 0xb4, 0x6c],
[0x50, 0x57, 0x20, 0x54, 0x3e, 0xb9, 0xb4, 0x13, 0xd5, 0x0b, 0x3c, 0xfa, 0xd9, 0xee, 0xf9, 0x38],
[0x94, 0x5f, 0x59, 0x4d, 0xe7, 0x24, 0x11, 0xe4, 0xd3, 0x35, 0xbe, 0x87, 0x44, 0x56, 0xd8, 0xf3],
[0x37, 0x92, 0x3b, 0x3e, 0x37, 0x17, 0x77, 0xb2, 0x11, 0x70, 0xbf, 0x9d, 0x7e, 0x62, 0xf6, 0x02],
[0x3a, 0xd4, 0xe7, 0xc8, 0x57, 0x64, 0x96, 0x46, 0x11, 0xeb, 0x0a, 0x6c, 0x4d, 0x62, 0xde, 0x56],
[0xcd, 0x91, 0x39, 0x6c, 0x44, 0xaf, 0x4f, 0x51, 0x85, 0x57, 0x8d, 0x9d, 0xd9, 0x80, 0x3f, 0x0a],
[0xfe, 0x28, 0x15, 0x8e, 0x72, 0x7b, 0x86, 0x8f, 0x39, 0x03, 0xc9, 0xac, 0xda, 0x64, 0xa2, 0x58],
[0x40, 0xcc, 0x10, 0xb8, 0x28, 0x8c, 0xe5, 0xf0, 0xbc, 0x3a, 0xc0, 0xb6, 0x8a, 0x0e, 0xeb, 0xc8],
[0x6f, 0x14, 0x90, 0xf5, 0x40, 0x69, 0x9a, 0x3c, 0xd4, 0x97, 0x44, 0x20, 0xec, 0xc9, 0x27, 0x37],
[0xd5, 0x05, 0xf1, 0xb7, 0x5e, 0x1a, 0x84, 0xa6, 0x03, 0xc4, 0x35, 0x83, 0xb2, 0xed, 0x03, 0x08],
[0x49, 0x15, 0x73, 0xcf, 0xd7, 0x2b, 0xb4, 0x68, 0x2b, 0x7c, 0xa5, 0x88, 0x0e, 0x1c, 0x8d, 0x6f],
[0x3e, 0xd6, 0x9c, 0xfe, 0x45, 0xab, 0x40, 0x3f, 0x2f, 0xd2, 0xad, 0x95, 0x9b, 0xa2, 0x76, 0x66],
[0x8b, 0xe8, 0x39, 0xef, 0x1b, 0x20, 0xb5, 0x7c, 0x83, 0xba, 0x7e, 0xb6, 0xa8, 0xc2, 0x2b, 0x6a],
[0x14, 0x09, 0x18, 0x6a, 0xb4, 0x22, 0x31, 0xfe, 0xde, 0xe1, 0x81, 0x62, 0xcf, 0x1c, 0xb4, 0xca],
[0x2b, 0xf3, 0xcc, 0xc2, 0x4a, 0xb6, 0x72, 0xcf, 0x15, 0x1f, 0xb8, 0xd2, 0xf3, 0xf3, 0x06, 0x9b],
[0xb9, 0xb9, 0x3a, 0x28, 0x82, 0xd6, 0x02, 0x5c, 0xdb, 0x8c, 0x56, 0xfa, 0x13, 0xf7, 0x53, 0x7b],
[0xd9, 0x7c, 0xca, 0x36, 0x94, 0xfb, 0x20, 0x6d, 0xb8, 0xbd, 0x1f, 0x36, 0x50, 0xc3, 0x33, 0x22],
[0x94, 0xec, 0x2e, 0x19, 0xa4, 0x0b, 0xe4, 0x1a, 0xf3, 0x94, 0x0d, 0x6b, 0x30, 0xc4, 0x93, 0x84],
[0x4b, 0x41, 0x60, 0x3f, 0x20, 0x9a, 0x04, 0x5b, 0xe1, 0x40, 0xa3, 0x41, 0xa3, 0xdf, 0xfe, 0x10],
[0x23, 0xfb, 0xcb, 0x30, 0x9f, 0x1c, 0xf0, 0x94, 0x89, 0x07, 0x55, 0xab, 0x1b, 0x42, 0x65, 0x69],
[0xe7, 0xd9, 0xb6, 0x56, 0x90, 0x91, 0x8a, 0x2b, 0x23, 0x2f, 0x2f, 0x5c, 0x12, 0xc8, 0x30, 0x0e],
[0xad, 0xe8, 0x3c, 0xf7, 0xe7, 0xf3, 0x84, 0x7b, 0x36, 0xfa, 0x4b, 0x54, 0xb0, 0x0d, 0xce, 0x61],
[0x06, 0x10, 0xc5, 0xf2, 0xee, 0x57, 0x1c, 0x8a, 0xc8, 0x0c, 0xbf, 0xe5, 0x38, 0xbd, 0xf1, 0xc7],
[0x27, 0x1d, 0x5d, 0x00, 0xfb, 0xdb, 0x5d, 0x15, 0x5d, 0x9d, 0xce, 0xa9, 0x7c, 0xb4, 0x02, 0x18],
[0x4c, 0x58, 0x00, 0xe3, 0x4e, 0xfe, 0x42, 0x6f, 0x07, 0x9f, 0x6b, 0x0a, 0xa7, 0x52, 0x60, 0xad],
];
#[test]
fn test_siphash_1_3_test_vector() {
let k0 = 0x_07_06_05_04_03_02_01_00;
let k1 = 0x_0f_0e_0d_0c_0b_0a_09_08;
let mut input: Vec<u8> = Vec::new();
for i in 0..64 {
let out = hash_with(SipHasher128::new_with_keys(k0, k1), &Bytes(&input[..]));
let expected = (
((TEST_VECTOR[i][0] as u64) << 0)
| ((TEST_VECTOR[i][1] as u64) << 8)
| ((TEST_VECTOR[i][2] as u64) << 16)
| ((TEST_VECTOR[i][3] as u64) << 24)
| ((TEST_VECTOR[i][4] as u64) << 32)
| ((TEST_VECTOR[i][5] as u64) << 40)
| ((TEST_VECTOR[i][6] as u64) << 48)
| ((TEST_VECTOR[i][7] as u64) << 56),
((TEST_VECTOR[i][8] as u64) << 0)
| ((TEST_VECTOR[i][9] as u64) << 8)
| ((TEST_VECTOR[i][10] as u64) << 16)
| ((TEST_VECTOR[i][11] as u64) << 24)
| ((TEST_VECTOR[i][12] as u64) << 32)
| ((TEST_VECTOR[i][13] as u64) << 40)
| ((TEST_VECTOR[i][14] as u64) << 48)
| ((TEST_VECTOR[i][15] as u64) << 56),
);
assert_eq!(out, expected);
input.push(i as u8);
}
}

@briansmith
Copy link
Contributor

Yes, incremental compilation is a best-effort developer-quality-of-life feature. No incrementally built code should ever be shipped.

IMO, developers shouldn't be put more at risk using incremental compilation than with a full rebuild. We should be aiming for trustworthy incremental builds.

@RalfJung
Copy link
Member

RalfJung commented Jul 1, 2024

rust-lang/compiler-team#765 proposes another use of hashing in the build process.

EDIT: Ah, this was already brought up.

@RalfJung
Copy link
Member

RalfJung commented Jul 1, 2024

Yes, incremental compilation is a best-effort developer-quality-of-life feature. No incrementally built code should ever be shipped.

If that is the position of the team, it seems like that should be communicated more clearly? I wasn't aware of this, and I think it is safe to assume that the vast majority of our users are not aware of this, either.

@michaelwoerister
Copy link
Member

If that is the position of the team, it seems like that should be communicated more clearly?

Yes, that's definitely something that should be done. It's not like incrementally compiled code is likely to be wrong (especially with an empty cache there should be no difference other than CGU partitioning). But even without hash collisions taken into account, doing things incrementally is intrinsically more difficult and much harder to test. The likelihood of additional compiler bugs is just greater.

@SimonSapin
Copy link
Contributor

Anecdotally, the small number of ICEs I hit in recent years all went away with cargo clean which suggests they were incremental compilation bugs. So I’m aware it’s somewhat buggy but “No incrementally built code should ever be shipped” sounds much more dire.

@CAD97
Copy link
Contributor

CAD97 commented Jul 2, 2024

For further context, I recall seeing someone knowledgeable (sorry, don't recall who) say much the same — that incremental is likely to have an unknown number of issues, solely due to the massively expanded surface area. (IIRC, this was fairly close to when incremental was made default for the dev profile.) However, this is mitigated by the fact that they also expected that these issues would manifest as ICE rather than incorrect compilation.

Anecdotally, my experience has also been that every issue I've hit since then (without unstable features) has been incremental ICEs, never a successful compilation generating incoherent behavior. Even when I've done UB that would justify two compilation modes having divergent behavior. Additionally, AIUI, the compiler has only gotten better at spotting any issues with incremental compilation over time.

I'd actually concur that incrementally built code shouldn't be shipped, but not due to any risk of miscompilation, just because that's needlessly leaving performance on the table compared to a non-incremental optimized build. Not every piece of shipped software is distributed enough to justify full fat LTO and PGO, but a clean build is generally worth it. It was also my impression that this was the compiler team position, and the best-effort falls out of that, in the same way cargo check skipping potentially fallible mono work does.

An available-by-default "dist" profile that tunes optimization knobs more aggressively than just release and isn't compatible with incremental might be interesting, and a release of such a decent place to attach an announcement that incremental shouldn't be used for distribution builds, but I don't think it's in any way needed. However, deciding on an official policy for probabilistic correctness in the compiler and stdlib (potentially with an accompanying insiders blog post) does seem like a good idea.

@saethlin
Copy link
Member

saethlin commented Jul 2, 2024

I think it would be best to have this discussion elsewhere, this seems like a tangent. Maybe important, but still a tangent.

@michaelwoerister
Copy link
Member

Yes, sorry for derailing the discussion here. "No incrementally built code should ever be shipped" does make it sound too extreme. Let's put it this way: there is no upside to building code incrementally unless your rebuilds need to be quick. The initial build will be slower, code quality might be lower due to more object files being generated, the resulting binary will be larger, and there is a chance of running into incr. comp. only compiler bugs which otherwise are just not an issue. But: any incr. comp. miscompilation bug will certainly be treated as critical and we have only had one such bug (in 2021), as far as I know.

I'll take an action item of adding information about incremental builds wrt release builds to the relevant docs for rustc and cargo.

@RalfJung
Copy link
Member

RalfJung commented Jul 10, 2024

To get back to the request for some lang team input on type_id (ignoring other hashes for a moment) -- as I said above, the lang team is not going to read a 130-comment thread, so someone will have to write a summary of what was discussed here, why some people want a cryptographic hash, why others think it is not necessary, what possibilities and reasons exist to avoid relying on a hash altogether, what attacker models have been discussed, the estimated cost of constructing a collision with the current scheme, the estimated likelihood of that happening accidentally -- all that. Most of the points have been made, so I think we can say that the evidence gathering phase of this has concluded -- we can go in circles a few more times but that's not going to help anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-typesystem Area: The type system C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.