Collisions in type_id #10389

DaGenix · 2013-11-09T22:29:16Z

The implementation of type_id from #10182 uses SipHash on various parameters depending on the type. The output size of SipHash is only 64-bits, however, making it feasible to find collisions via a Birthday Attack. I believe the code below demonstrates a collision in the type_id value of two different ty_structs:

use std::hash;
use std::hash::Streaming;

// I believe that this pretty much the same thing as hash_crate_independent() in ty.rs
// for a ty_struct on a 64-bit platform
fn hash_struct(local_hash: &str, node: u64) -> u64 {
    let mut state = hash::default_state();
    state.input([18]);
    state.input(local_hash.as_bytes());
    do node.iter_bytes(true) |bytes| { state.input(bytes); true };
    state.result_u64()
}

fn main() {
    // This represents two structures with different node values from a crate with a 
    // local_hash value of "local" that end up getting the same hash and thus, 
    // I think, the same type_id
    assert!(hash_struct("local", 0x9e02c8943c336302) == hash_struct("local", 0x366a806b1d5f1b2));
}

alexcrichton · 2013-11-10T06:22:08Z

I'm not entirely sure how feasible it is for a program to have 0x366a806b1d5f1b2 node ids (2.5 trillion), but this is still concerning.

We could in theory have very cheap inequality among types, and then have an expensive equality check. Something which may walk the respective TyDesc structures in parallel to make sure that they're the same. We could also bump up the hash size to using something like sha1/sha2 and have the type_id() intrinsic return [u8, ..N] to reduce the possibility of a collision.

Either way, I don't think that this is a super-pressing issue for now, but I'm nominating to discuss whether we want to get this done for 1.0. This could in theory have serious implications depending on how frequently Any is used.

alexcrichton · 2013-11-10T06:22:20Z

Ah, it was already nominated!

bill-myers · 2013-11-10T08:17:12Z

Why not compare an interned version of the type data string? (i.e. what is currently passed as data to be hashed, possibly SHA-256 hashed first)

The linker can be used for interning by emitting a common symbol with the type data string as name and taking its address, and otherwise the same thing can be done manually in a global constructor.

This way it's always a pointer comparison, and there are no collisions.

DaGenix · 2013-11-11T03:07:14Z

I don't know how node id values are generated, but assuming that they are generated sequentially, this particular collision is not realistic. However, its not hard to find collisions for more realistic node id values by picking particular values for the crate hashes:

assert!(hash_struct("a2c55ca1a1f68", 4080) == hash_struct("138b8278caab5", 2804));

The key thing to consider isn't the number of node id values, though: its the total number of type id values. Some quick (hopefully correct) math shows that there is a 0.01% chance of a collision once there are around 60 million type id values. That's still a pretty large number of type id values for a somewhat low probability of a collision, thought. So, its unclear to me how big a deal this is for the Rust 1.0 timeframe. It all depends on what the acceptable probability of a collision is.

nikomatsakis · 2013-11-11T15:01:40Z

When I saw that @alexcrichton proposed using a hash, my first reaction was "collision!" but then I thought "...but exceedingly unlikely to occur in practice". I think this is not a matter of imminent destruction but if we can leverage the linker or some other scheme to avoid this danger, we should -- and perhaps we should just go ahead and mark the current scheme as deprecated and just plan on finding a replacement scheme.

thestinger · 2013-11-11T15:18:44Z

A cryptographic hash designed for this purpose (larger output) would be enough. Although, a larger output would be more expensive to compare (four u64 comparisons for SHA2).

pnkfelix · 2013-11-21T18:26:18Z

We don't need to deal with this right now. P-low.

steveklabnik · 2015-01-20T19:49:13Z

How relevant is this issue today? I think that it's all the same, but am not sure.

thestinger · 2015-01-20T22:28:09Z

It's 64-bit so collisions are likely with enough types (consider recursive type metaprogramming) and it doesn't have any check to bail out if one occurs. Bailing out is not a very good solution anyway, because it pretty much means that there's no way to compile the program, beyond using a different random seed and hoping for the best. It's a crappy situation.

vks · 2015-01-21T14:16:43Z

Note that "hoping for the best" by iteratively changing the seed might work with overwhelmingly large probability after very few iterations.

sorear · 2015-10-14T03:24:55Z

use std::any::Any;

fn main() {
    let weird : [([u8; 188250],[u8; 1381155],[u8; 558782]); 0] = [];
    let whoops = Any::downcast_ref::<[([u8; 1990233],[u8; 798602],[u8; 2074279]); 1]>(&weird);
    println!("{}",whoops.unwrap()[0].0[333333]);
}

Actually a soundness issue. playground: http://is.gd/TwBayX

pnkfelix · 2015-10-18T08:34:43Z

I'd like the lang team to devote a little time to this now that we are post 1.0. Nominating

nikomatsakis · 2015-10-29T22:09:16Z

OK, lang team discussed it, and our conclusion was that:

This issue ought to be fixed, it's silly not to.
This is an implementation detail that we could change whenever we want (right?)
Nonetheless, we probably ought to open an RFC or at least a discuss thread, with a proposal to do better, since probably people will have some clever ideas.
Probably the runtime overhead of the virtual call in the Any trait is way more than a strcmp anyhow for all realistic types.

nikomatsakis · 2015-10-29T22:11:09Z

I was wondering about a design where we do something like:

generate a static string representing the full type; in static builds, at least, this will be interned by the linker;
generate a hash

compare the string pointers for equality (to give a fast equality check). If that fails, compare the hashes for inequality (to give a fast inequality check). If THAT fails, compare the strings for content (to handle dynamic linking).

Although re-reading the thread I see @bill-myers may have had an even more clever solution.

pnkfelix · 2015-10-29T22:12:54Z

@nikomatsakis putting the hash of the data at the start is a good idea, to increase the probability that we catch unequal things quickly. It seems to me like @bill-myers' approach composes fine with that strategy.

sorear · 2015-10-30T03:51:45Z

I doubt the "problem" is limited to Any. You can probably confuse the compiler just as effectively by colliding hashes for symbol mangling, or many other things. What is the objective here? Since Rust is not a sandbox language, I don't think "protect memory from malicious programmers" should be one of our goals (we should document the types of undefined behavior that can be hit in safe code, and fix the ones that are possible to hit by accident; if someone is determined to break the type system, they can already write an unsafe block, or use std::process to launch a subprocess that ptraces its parent and corrupts memory).

hmvp · 2017-01-24T11:37:42Z

Thanks to: https://www.reddit.com/r/rust/comments/5pfwjr/mitigating_underhandedness_clippy/dcrew0k/
https://is.gd/Xb7L5r

This example works on Beta and Nightly.

Mark-Simulacrum · 2017-07-19T23:58:10Z

@nikomatsakis Should this be marked as I-unsound? I've done so for now, since that seems to be the general conclusion a couple of times by different people, but please unmark if I'm wrong.

the8472 · 2024-06-20T18:28:33Z

Yes, speed is the concern. See #107925 for an example about the impact of the chosen hash function.
You can still try if you want, but the overhead is likely prohibitive and an alternative solution would be required if a decision is made to require collision-freedom or at least cryptographic strength.

bjorn3 · 2024-06-20T18:35:38Z

For TypeId specifically speed shouldn't matter a lot due to the relative rarity of TypeId compared to other uses of hashing in the compiler. There are other places where we also depend on collision freedom for soundness, where speed is very much important. For the cases where collision freedom is not depended on we already use rustc-hash (this is a polynomial hash currently) rather than SipHash.

RalfJung · 2024-06-20T19:25:00Z

The question as I understood is whether even a cryptographic hash is sufficient. After all, the probability that there exists a sha3 collision is 1. So there is a fundamental decision to be made whether we are okay with that: assuming a perfect hash function (random oracle style), is that good enough? And then if yes, how close to perfect does the actual hash function have to be? I was not referring to the current hash function specifically when I talked about computational hardness. I trust you in your judgments about how hard it is to find a collision in hash functions.

RalfJung · 2024-06-20T19:26:36Z

Also, cool work on finding those collisions :)

briansmith · 2024-06-20T19:38:03Z

Ralf, thanks for clarifying that. I had misread what you wrote.

And then if yes, how close to perfect does the actual hash function have to be?

I cannot find it right now, but if it is useful I can find a discussion that made a pretty convincing argument that no practical hash function with 128-bit output can never be collision-free enough to approximate such a "perfect" hash that we could make a "good enough" argument for. And my understanding from the discussion above (or elsewhere?) is that increasing the hash size to 256 bits (which would probably be the minimum to be considered "good enough" for theoretical soundness arguments) is considered prohibitive, such that [edit] relying solely on hash comparison would be disqualified as a solution.

cr-marcstevens · 2024-06-20T19:39:24Z

It is one thing knowing there exist collisions in theory.
It is another having them show up in practice and screw things up.
For a 128-bit hash there would need to be about 2^64 different types in all rust code before you would expect to see a collision, they then also would need to coexist in 1 total compilation.
That seems unlikely enough to cause problems in practice.

But that is discounting conscious effort to cause bad stuff.
From first glance SipHash-1-3 seems to have little enough mixing that I would certainly not surprised if there could be attacks.
It seemingly really relies on a secret random key for its security level.
But further cryptanalysis would be necessary to tell how secure it really is with a zero key.

the8472 · 2024-06-20T19:50:12Z

But that is discounting conscious effort to cause bad stuff.

Yes, threat models have been discussed upthread.

cr-marcstevens · 2024-06-20T19:53:28Z

I read some parts, it looks like such threats are not taken really in design.
(Otherwise a cryptographic hash would already be used)
Yet, when a collision was demonstrated for the 64-bit hash, it was enough motivation to immediately switch to a 128-bit version?

RalfJung · 2024-06-20T19:57:36Z

Not "immediately", it took quite a while.^^

Yes, threat models have been discussed upthread.

Indeed, a lot of stuff was discussed there.^^ I think we need a decent summary collecting all major positions before we can ask t-lang to take a look.

the8472 · 2024-06-20T20:07:39Z

10 years from the report to the fix. And 32bit resistance is not enough even for non-malicious uses. Even if it's very rare it'd still happen eventually to someone over the course of normal rust use.

briansmith · 2024-06-20T20:18:23Z

I think we need a decent summary collecting all major positions before we can ask t-lang to take a look.

The bigger problem here is that now other parts of the Rust project, and other projects, are using rustc's use of SipHash 1-3 with an all-zero key and 128-bit output as an indication that it is good enough for their uses. See:

saethlin · 2024-06-20T21:00:50Z

I should think that simply directing people to read the discussion here would be sufficient to dissuade them. Or, even better, directing them to a summary.

tarcieri · 2024-06-20T23:24:30Z

One important comment in this thread that I hope doesn't get lost is per SipHash's author (and as others have noted), it's designed to be a keyed PRF, not an unkeyed hash function, and StableHasher is attempting to use it as the latter, which can be considered a misuse by some (as @cr-marcstevens noted, "It seemingly really relies on a secret random key for its security level").

From what I can tell the selection of SipHash for StableHasher is motivated primarily by the amount of work has gone into performance optimizing SipHash within rustc and no one has yet done the work to get a proper unkeyed hash function (SHA2/SHA3/BLAKE2/BLAKE3/Ascon-Hash) to a similar level of performance.

michaelwoerister · 2024-06-21T07:52:53Z

There's no intrinsic reason why StableHasher could not support different hash functions for different use cases. For incremental compilation fingerprinting we need something very fast, for anything else performance is not so much a concern.

RalfJung · 2024-06-21T07:58:36Z

Incremental compilation hash collisions can also lead to unsoundness though, can't they? Though there at least users have a work-around -- build the final artifact for distribution without incremental.

michaelwoerister · 2024-06-21T08:05:37Z

Yes, incremental compilation is a best-effort developer-quality-of-life feature. No incrementally built code should ever be shipped.

michaelwoerister · 2024-06-21T08:43:58Z

I've opened an issue in the rustc-stable-hasher repo about supporting different hash algorithms. It's clear that SipHash13 is not a good choice for most use cases.

cr-marcstevens · 2024-06-21T08:57:09Z

Just to confirm, the SipHash128 that rustc is using is identical to this code:
https://github.com/veorq/SipHash/blob/master/siphash.c
Using 128-bit output (outlen=16), and instead using #define cROUNDS 1 and #define dROUNDS 3?

michaelwoerister · 2024-06-21T09:15:05Z

I don't think there is any deliberate deviation from the reference impl, so probably yes. Here are the compiler's test vectors for the algorithm:

rust/compiler/rustc_data_structures/src/sip128/tests.rs

Lines 26 to 124 in 4e6de37

    
           const TEST_VECTOR: [[u8; 16]; 64] = [ 
        
               [0xe7, 0x7e, 0xbc, 0xb2, 0x27, 0x88, 0xa5, 0xbe, 0xfd, 0x62, 0xdb, 0x6a, 0xdd, 0x30, 0x30, 0x01], 
        
               [0xfc, 0x6f, 0x37, 0x04, 0x60, 0xd3, 0xed, 0xa8, 0x5e, 0x05, 0x73, 0xcc, 0x2b, 0x2f, 0xf0, 0x63], 
        
               [0x75, 0x78, 0x7f, 0x09, 0x05, 0x69, 0x83, 0x9b, 0x85, 0x5b, 0xc9, 0x54, 0x8c, 0x6a, 0xea, 0x95], 
        
               [0x6b, 0xc5, 0xcc, 0xfa, 0x1e, 0xdc, 0xf7, 0x9f, 0x48, 0x23, 0x18, 0x77, 0x12, 0xeb, 0xd7, 0x43], 
        
               [0x0c, 0x78, 0x4e, 0x71, 0xac, 0x2b, 0x28, 0x5a, 0x9f, 0x8e, 0x92, 0xe7, 0x8f, 0xbf, 0x2c, 0x25], 
        
               [0xf3, 0x28, 0xdb, 0x89, 0x34, 0x5b, 0x62, 0x0c, 0x79, 0x52, 0x29, 0xa4, 0x26, 0x95, 0x84, 0x3e], 
        
               [0xdc, 0xd0, 0x3d, 0x29, 0xf7, 0x43, 0xe7, 0x10, 0x09, 0x51, 0xb0, 0xe8, 0x39, 0x85, 0xa6, 0xf8], 
        
               [0x10, 0x84, 0xb9, 0x23, 0xf2, 0xaa, 0xe0, 0xc3, 0xa6, 0x2f, 0x2e, 0xc8, 0x08, 0x48, 0xab, 0x77], 
        
               [0xaa, 0x12, 0xfe, 0xe1, 0xd5, 0xe3, 0xda, 0xb4, 0x72, 0x4f, 0x16, 0xab, 0x35, 0xf9, 0xc7, 0x99], 
        
               [0x81, 0xdd, 0xb8, 0x04, 0x2c, 0xf3, 0x39, 0x94, 0xf4, 0x72, 0x0e, 0x00, 0x94, 0x13, 0x7c, 0x42], 
        
               [0x4f, 0xaa, 0x54, 0x1d, 0x5d, 0x49, 0x8e, 0x89, 0xba, 0x0e, 0xa4, 0xc3, 0x87, 0xb2, 0x2f, 0xb4], 
        
               [0x72, 0x3b, 0x9a, 0xf3, 0x55, 0x44, 0x91, 0xdb, 0xb1, 0xd6, 0x63, 0x3d, 0xfc, 0x6e, 0x0c, 0x4e], 
        
               [0xe5, 0x3f, 0x92, 0x85, 0x9e, 0x48, 0x19, 0xa8, 0xdc, 0x06, 0x95, 0x73, 0x9f, 0xea, 0x8c, 0x65], 
        
               [0xb2, 0xf8, 0x58, 0xc7, 0xc9, 0xea, 0x80, 0x1d, 0x53, 0xd6, 0x03, 0x59, 0x6d, 0x65, 0x78, 0x44], 
        
               [0x87, 0xe7, 0x62, 0x68, 0xdb, 0xc9, 0x22, 0x72, 0x26, 0xb0, 0xca, 0x66, 0x5f, 0x64, 0xe3, 0x78], 
        
               [0xc1, 0x7e, 0x55, 0x05, 0xb2, 0xbd, 0x52, 0x6c, 0x29, 0x21, 0xcd, 0xec, 0x1e, 0x7e, 0x01, 0x09], 
        
               [0xd0, 0xa8, 0xd9, 0x57, 0x15, 0x51, 0x8e, 0xeb, 0xb5, 0x13, 0xb0, 0xf8, 0x3d, 0x9e, 0x17, 0x93], 
        
               [0x23, 0x41, 0x26, 0xf9, 0x3f, 0xbb, 0x66, 0x8d, 0x97, 0x51, 0x12, 0xe8, 0xfe, 0xbd, 0xf7, 0xec], 
        
               [0xef, 0x42, 0xf0, 0x3d, 0xb7, 0x8f, 0x70, 0x4d, 0x02, 0x3c, 0x44, 0x9f, 0x16, 0xb7, 0x09, 0x2b], 
        
               [0xab, 0xf7, 0x62, 0x38, 0xc2, 0x0a, 0xf1, 0x61, 0xb2, 0x31, 0x4b, 0x4d, 0x55, 0x26, 0xbc, 0xe9], 
        
               [0x3c, 0x2c, 0x2f, 0x11, 0xbb, 0x90, 0xcf, 0x0b, 0xe3, 0x35, 0xca, 0x9b, 0x2e, 0x91, 0xe9, 0xb7], 
        
               [0x2a, 0x7a, 0x68, 0x0f, 0x22, 0xa0, 0x2a, 0x92, 0xf4, 0x51, 0x49, 0xd2, 0x0f, 0xec, 0xe0, 0xef], 
        
               [0xc9, 0xa8, 0xd1, 0x30, 0x23, 0x1d, 0xd4, 0x3e, 0x42, 0xe6, 0x45, 0x69, 0x57, 0xf8, 0x37, 0x79], 
        
               [0x1d, 0x12, 0x7b, 0x84, 0x40, 0x5c, 0xea, 0xb9, 0x9f, 0xd8, 0x77, 0x5a, 0x9b, 0xe6, 0xc5, 0x59], 
        
               [0x9e, 0x4b, 0xf8, 0x37, 0xbc, 0xfd, 0x92, 0xca, 0xce, 0x09, 0xd2, 0x06, 0x1a, 0x84, 0xd0, 0x4a], 
        
               [0x39, 0x03, 0x1a, 0x96, 0x5d, 0x73, 0xb4, 0xaf, 0x5a, 0x27, 0x4d, 0x18, 0xf9, 0x73, 0xb1, 0xd2], 
        
               [0x7f, 0x4d, 0x0a, 0x12, 0x09, 0xd6, 0x7e, 0x4e, 0xd0, 0x6f, 0x75, 0x38, 0xe1, 0xcf, 0xad, 0x64], 
        
               [0xe6, 0x1e, 0xe2, 0x40, 0xfb, 0xdc, 0xce, 0x38, 0x96, 0x9f, 0x4c, 0xd2, 0x49, 0x27, 0xdd, 0x93], 
        
               [0x4c, 0x3b, 0xa2, 0xb3, 0x7b, 0x0f, 0xdd, 0x8c, 0xfa, 0x5e, 0x95, 0xc1, 0x89, 0xb2, 0x94, 0x14], 
        
               [0xe0, 0x6f, 0xd4, 0xca, 0x06, 0x6f, 0xec, 0xdd, 0x54, 0x06, 0x8a, 0x5a, 0xd8, 0x89, 0x6f, 0x86], 
        
               [0x5c, 0xa8, 0x4c, 0x34, 0x13, 0x9c, 0x65, 0x80, 0xa8, 0x8a, 0xf2, 0x49, 0x90, 0x72, 0x07, 0x06], 
        
               [0x42, 0xea, 0x96, 0x1c, 0x5b, 0x3c, 0x85, 0x8b, 0x17, 0xc3, 0xe5, 0x50, 0xdf, 0xa7, 0x90, 0x10], 
        
               [0x40, 0x6c, 0x44, 0xde, 0xe6, 0x78, 0x57, 0xb2, 0x94, 0x31, 0x60, 0xf3, 0x0c, 0x74, 0x17, 0xd3], 
        
               [0xc5, 0xf5, 0x7b, 0xae, 0x13, 0x20, 0xfc, 0xf4, 0xb4, 0xe8, 0x68, 0xe7, 0x1d, 0x56, 0xc6, 0x6b], 
        
               [0x04, 0xbf, 0x73, 0x7a, 0x5b, 0x67, 0x6b, 0xe7, 0xc3, 0xde, 0x05, 0x01, 0x7d, 0xf4, 0xbf, 0xf9], 
        
               [0x51, 0x63, 0xc9, 0xc0, 0x3f, 0x19, 0x07, 0xea, 0x10, 0x44, 0xed, 0x5c, 0x30, 0x72, 0x7b, 0x4f], 
        
               [0x37, 0xa1, 0x10, 0xf0, 0x02, 0x71, 0x8e, 0xda, 0xd2, 0x4b, 0x3f, 0x9e, 0xe4, 0x53, 0xf1, 0x40], 
        
               [0xb9, 0x87, 0x7e, 0x38, 0x1a, 0xed, 0xd3, 0xda, 0x08, 0xc3, 0x3e, 0x75, 0xff, 0x23, 0xac, 0x10], 
        
               [0x7c, 0x50, 0x04, 0x00, 0x5e, 0xc5, 0xda, 0x4c, 0x5a, 0xc9, 0x44, 0x0e, 0x5c, 0x72, 0x31, 0x93], 
        
               [0x81, 0xb8, 0x24, 0x37, 0x83, 0xdb, 0xc6, 0x46, 0xca, 0x9d, 0x0c, 0xd8, 0x2a, 0xbd, 0xb4, 0x6c], 
        
               [0x50, 0x57, 0x20, 0x54, 0x3e, 0xb9, 0xb4, 0x13, 0xd5, 0x0b, 0x3c, 0xfa, 0xd9, 0xee, 0xf9, 0x38], 
        
               [0x94, 0x5f, 0x59, 0x4d, 0xe7, 0x24, 0x11, 0xe4, 0xd3, 0x35, 0xbe, 0x87, 0x44, 0x56, 0xd8, 0xf3], 
        
               [0x37, 0x92, 0x3b, 0x3e, 0x37, 0x17, 0x77, 0xb2, 0x11, 0x70, 0xbf, 0x9d, 0x7e, 0x62, 0xf6, 0x02], 
        
               [0x3a, 0xd4, 0xe7, 0xc8, 0x57, 0x64, 0x96, 0x46, 0x11, 0xeb, 0x0a, 0x6c, 0x4d, 0x62, 0xde, 0x56], 
        
               [0xcd, 0x91, 0x39, 0x6c, 0x44, 0xaf, 0x4f, 0x51, 0x85, 0x57, 0x8d, 0x9d, 0xd9, 0x80, 0x3f, 0x0a], 
        
               [0xfe, 0x28, 0x15, 0x8e, 0x72, 0x7b, 0x86, 0x8f, 0x39, 0x03, 0xc9, 0xac, 0xda, 0x64, 0xa2, 0x58], 
        
               [0x40, 0xcc, 0x10, 0xb8, 0x28, 0x8c, 0xe5, 0xf0, 0xbc, 0x3a, 0xc0, 0xb6, 0x8a, 0x0e, 0xeb, 0xc8], 
        
               [0x6f, 0x14, 0x90, 0xf5, 0x40, 0x69, 0x9a, 0x3c, 0xd4, 0x97, 0x44, 0x20, 0xec, 0xc9, 0x27, 0x37], 
        
               [0xd5, 0x05, 0xf1, 0xb7, 0x5e, 0x1a, 0x84, 0xa6, 0x03, 0xc4, 0x35, 0x83, 0xb2, 0xed, 0x03, 0x08], 
        
               [0x49, 0x15, 0x73, 0xcf, 0xd7, 0x2b, 0xb4, 0x68, 0x2b, 0x7c, 0xa5, 0x88, 0x0e, 0x1c, 0x8d, 0x6f], 
        
               [0x3e, 0xd6, 0x9c, 0xfe, 0x45, 0xab, 0x40, 0x3f, 0x2f, 0xd2, 0xad, 0x95, 0x9b, 0xa2, 0x76, 0x66], 
        
               [0x8b, 0xe8, 0x39, 0xef, 0x1b, 0x20, 0xb5, 0x7c, 0x83, 0xba, 0x7e, 0xb6, 0xa8, 0xc2, 0x2b, 0x6a], 
        
               [0x14, 0x09, 0x18, 0x6a, 0xb4, 0x22, 0x31, 0xfe, 0xde, 0xe1, 0x81, 0x62, 0xcf, 0x1c, 0xb4, 0xca], 
        
               [0x2b, 0xf3, 0xcc, 0xc2, 0x4a, 0xb6, 0x72, 0xcf, 0x15, 0x1f, 0xb8, 0xd2, 0xf3, 0xf3, 0x06, 0x9b], 
        
               [0xb9, 0xb9, 0x3a, 0x28, 0x82, 0xd6, 0x02, 0x5c, 0xdb, 0x8c, 0x56, 0xfa, 0x13, 0xf7, 0x53, 0x7b], 
        
               [0xd9, 0x7c, 0xca, 0x36, 0x94, 0xfb, 0x20, 0x6d, 0xb8, 0xbd, 0x1f, 0x36, 0x50, 0xc3, 0x33, 0x22], 
        
               [0x94, 0xec, 0x2e, 0x19, 0xa4, 0x0b, 0xe4, 0x1a, 0xf3, 0x94, 0x0d, 0x6b, 0x30, 0xc4, 0x93, 0x84], 
        
               [0x4b, 0x41, 0x60, 0x3f, 0x20, 0x9a, 0x04, 0x5b, 0xe1, 0x40, 0xa3, 0x41, 0xa3, 0xdf, 0xfe, 0x10], 
        
               [0x23, 0xfb, 0xcb, 0x30, 0x9f, 0x1c, 0xf0, 0x94, 0x89, 0x07, 0x55, 0xab, 0x1b, 0x42, 0x65, 0x69], 
        
               [0xe7, 0xd9, 0xb6, 0x56, 0x90, 0x91, 0x8a, 0x2b, 0x23, 0x2f, 0x2f, 0x5c, 0x12, 0xc8, 0x30, 0x0e], 
        
               [0xad, 0xe8, 0x3c, 0xf7, 0xe7, 0xf3, 0x84, 0x7b, 0x36, 0xfa, 0x4b, 0x54, 0xb0, 0x0d, 0xce, 0x61], 
        
               [0x06, 0x10, 0xc5, 0xf2, 0xee, 0x57, 0x1c, 0x8a, 0xc8, 0x0c, 0xbf, 0xe5, 0x38, 0xbd, 0xf1, 0xc7], 
        
               [0x27, 0x1d, 0x5d, 0x00, 0xfb, 0xdb, 0x5d, 0x15, 0x5d, 0x9d, 0xce, 0xa9, 0x7c, 0xb4, 0x02, 0x18], 
        
               [0x4c, 0x58, 0x00, 0xe3, 0x4e, 0xfe, 0x42, 0x6f, 0x07, 0x9f, 0x6b, 0x0a, 0xa7, 0x52, 0x60, 0xad], 
        
           ]; 
        
           #[test] 
        
           fn test_siphash_1_3_test_vector() { 
        
               let k0 = 0x_07_06_05_04_03_02_01_00; 
        
               let k1 = 0x_0f_0e_0d_0c_0b_0a_09_08; 
        
               let mut input: Vec<u8> = Vec::new(); 
        
               for i in 0..64 { 
        
                   let out = hash_with(SipHasher128::new_with_keys(k0, k1), &Bytes(&input[..])); 
        
                   let expected = ( 
        
                       ((TEST_VECTOR[i][0] as u64) << 0) 
        
                           | ((TEST_VECTOR[i][1] as u64) << 8) 
        
                           | ((TEST_VECTOR[i][2] as u64) << 16) 
        
                           | ((TEST_VECTOR[i][3] as u64) << 24) 
        
                           | ((TEST_VECTOR[i][4] as u64) << 32) 
        
                           | ((TEST_VECTOR[i][5] as u64) << 40) 
        
                           | ((TEST_VECTOR[i][6] as u64) << 48) 
        
                           | ((TEST_VECTOR[i][7] as u64) << 56), 
        
                       ((TEST_VECTOR[i][8] as u64) << 0) 
        
                           | ((TEST_VECTOR[i][9] as u64) << 8) 
        
                           | ((TEST_VECTOR[i][10] as u64) << 16) 
        
                           | ((TEST_VECTOR[i][11] as u64) << 24) 
        
                           | ((TEST_VECTOR[i][12] as u64) << 32) 
        
                           | ((TEST_VECTOR[i][13] as u64) << 40) 
        
                           | ((TEST_VECTOR[i][14] as u64) << 48) 
        
                           | ((TEST_VECTOR[i][15] as u64) << 56), 
        
                   ); 
        
                   assert_eq!(out, expected); 
        
                   input.push(i as u8); 
        
               } 
        
           }

briansmith · 2024-06-21T18:16:02Z

Yes, incremental compilation is a best-effort developer-quality-of-life feature. No incrementally built code should ever be shipped.

IMO, developers shouldn't be put more at risk using incremental compilation than with a full rebuild. We should be aiming for trustworthy incremental builds.

RalfJung · 2024-07-01T05:45:28Z

rust-lang/compiler-team#765 proposes another use of hashing in the build process.

EDIT: Ah, this was already brought up.

RalfJung · 2024-07-01T05:52:35Z

Yes, incremental compilation is a best-effort developer-quality-of-life feature. No incrementally built code should ever be shipped.

If that is the position of the team, it seems like that should be communicated more clearly? I wasn't aware of this, and I think it is safe to assume that the vast majority of our users are not aware of this, either.

michaelwoerister · 2024-07-01T08:37:38Z

If that is the position of the team, it seems like that should be communicated more clearly?

Yes, that's definitely something that should be done. It's not like incrementally compiled code is likely to be wrong (especially with an empty cache there should be no difference other than CGU partitioning). But even without hash collisions taken into account, doing things incrementally is intrinsically more difficult and much harder to test. The likelihood of additional compiler bugs is just greater.

SimonSapin · 2024-07-02T14:48:28Z

Anecdotally, the small number of ICEs I hit in recent years all went away with cargo clean which suggests they were incremental compilation bugs. So I’m aware it’s somewhat buggy but “No incrementally built code should ever be shipped” sounds much more dire.

CAD97 · 2024-07-02T16:31:00Z

For further context, I recall seeing someone knowledgeable (sorry, don't recall who) say much the same — that incremental is likely to have an unknown number of issues, solely due to the massively expanded surface area. (IIRC, this was fairly close to when incremental was made default for the dev profile.) However, this is mitigated by the fact that they also expected that these issues would manifest as ICE rather than incorrect compilation.

Anecdotally, my experience has also been that every issue I've hit since then (without unstable features) has been incremental ICEs, never a successful compilation generating incoherent behavior. Even when I've done UB that would justify two compilation modes having divergent behavior. Additionally, AIUI, the compiler has only gotten better at spotting any issues with incremental compilation over time.

I'd actually concur that incrementally built code shouldn't be shipped, but not due to any risk of miscompilation, just because that's needlessly leaving performance on the table compared to a non-incremental optimized build. Not every piece of shipped software is distributed enough to justify full fat LTO and PGO, but a clean build is generally worth it. It was also my impression that this was the compiler team position, and the best-effort falls out of that, in the same way cargo check skipping potentially fallible mono work does.

An available-by-default "dist" profile that tunes optimization knobs more aggressively than just release and isn't compatible with incremental might be interesting, and a release of such a decent place to attach an announcement that incremental shouldn't be used for distribution builds, but I don't think it's in any way needed. However, deciding on an official policy for probabilistic correctness in the compiler and stdlib (potentially with an accompanying insiders blog post) does seem like a good idea.

saethlin · 2024-07-02T19:50:54Z

I think it would be best to have this discussion elsewhere, this seems like a tangent. Maybe important, but still a tangent.

michaelwoerister · 2024-07-03T08:02:57Z

Yes, sorry for derailing the discussion here. "No incrementally built code should ever be shipped" does make it sound too extreme. Let's put it this way: there is no upside to building code incrementally unless your rebuilds need to be quick. The initial build will be slower, code quality might be lower due to more object files being generated, the resulting binary will be larger, and there is a chance of running into incr. comp. only compiler bugs which otherwise are just not an issue. But: any incr. comp. miscompilation bug will certainly be treated as critical and we have only had one such bug (in 2021), as far as I know.

I'll take an action item of adding information about incremental builds wrt release builds to the relevant docs for rustc and cargo.

RalfJung · 2024-07-10T06:12:06Z

To get back to the request for some lang team input on type_id (ignoring other hashes for a moment) -- as I said above, the lang team is not going to read a 130-comment thread, so someone will have to write a summary of what was discussed here, why some people want a cryptographic hash, why others think it is not necessary, what possibilities and reasons exist to avoid relying on a hash altogether, what attacker models have been discussed, the estimated cost of constructing a collision with the current scheme, the estimated likelihood of that happening accidentally -- all that. Most of the points have been made, so I think we can say that the evidence gathering phase of this has concluded -- we can go in circles a few more times but that's not going to help anyone.

DaGenix mentioned this issue Nov 9, 2013

Add a type_id intrinsic #10182

Merged

nikomatsakis mentioned this issue Nov 11, 2013

The Any trait should not use virtual calls for type checks #10382

Closed

alexcrichton mentioned this issue Nov 21, 2013

Wrap the return value of the type_id intrinsic in an opaque box #10594

Closed

DaGenix mentioned this issue Feb 11, 2015

RFC: Simplify std::hash rust-lang/rfcs#823

Merged

pnkfelix added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Oct 18, 2015

sorear mentioned this issue Oct 28, 2015

ABI differs when generated by different architecture compilers #29235

Closed

nikomatsakis removed the I-nominated label Oct 29, 2015

bluss mentioned this issue Feb 12, 2016

WIP: Implement stable symbol-name generation algorithm. #31539

Closed

Mark-Simulacrum added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness C-bug Category: This is a bug. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 19, 2017

briansmith mentioned this issue Jun 20, 2024

Why use siphasher? typst/comemo#3

Closed

tgross35 mentioned this issue Jul 12, 2024

ACP: add std::any::is_same_type::<T, U>() -> bool rust-lang/libs-team#411

Open

Collisions in type_id #10389

Collisions in type_id #10389

Comments

DaGenix commented Nov 9, 2013

alexcrichton commented Nov 10, 2013

alexcrichton commented Nov 10, 2013

bill-myers commented Nov 10, 2013

DaGenix commented Nov 11, 2013

nikomatsakis commented Nov 11, 2013

thestinger commented Nov 11, 2013

pnkfelix commented Nov 21, 2013

steveklabnik commented Jan 20, 2015

thestinger commented Jan 20, 2015

vks commented Jan 21, 2015

sorear commented Oct 14, 2015

pnkfelix commented Oct 18, 2015

nikomatsakis commented Oct 29, 2015

nikomatsakis commented Oct 29, 2015

pnkfelix commented Oct 29, 2015

sorear commented Oct 30, 2015

hmvp commented Jan 24, 2017

Mark-Simulacrum commented Jul 19, 2017

the8472 commented Jun 20, 2024

bjorn3 commented Jun 20, 2024 • edited Loading

RalfJung commented Jun 20, 2024 via email

RalfJung commented Jun 20, 2024 via email

briansmith commented Jun 20, 2024 • edited Loading

cr-marcstevens commented Jun 20, 2024

the8472 commented Jun 20, 2024

cr-marcstevens commented Jun 20, 2024

RalfJung commented Jun 20, 2024

the8472 commented Jun 20, 2024

briansmith commented Jun 20, 2024

saethlin commented Jun 20, 2024

tarcieri commented Jun 20, 2024

michaelwoerister commented Jun 21, 2024

RalfJung commented Jun 21, 2024 via email

michaelwoerister commented Jun 21, 2024

michaelwoerister commented Jun 21, 2024

cr-marcstevens commented Jun 21, 2024

michaelwoerister commented Jun 21, 2024

briansmith commented Jun 21, 2024

RalfJung commented Jul 1, 2024 • edited Loading

RalfJung commented Jul 1, 2024

michaelwoerister commented Jul 1, 2024

SimonSapin commented Jul 2, 2024

CAD97 commented Jul 2, 2024

saethlin commented Jul 2, 2024

michaelwoerister commented Jul 3, 2024

RalfJung commented Jul 10, 2024 • edited Loading

bjorn3 commented Jun 20, 2024 •

edited

Loading

briansmith commented Jun 20, 2024 •

edited

Loading

RalfJung commented Jul 1, 2024 •

edited

Loading

RalfJung commented Jul 10, 2024 •

edited

Loading