Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCollisions in type_id #10389
Comments
This comment has been minimized.
This comment has been minimized.
|
I'm not entirely sure how feasible it is for a program to have We could in theory have very cheap inequality among types, and then have an expensive equality check. Something which may walk the respective Either way, I don't think that this is a super-pressing issue for now, but I'm nominating to discuss whether we want to get this done for 1.0. This could in theory have serious implications depending on how frequently |
This comment has been minimized.
This comment has been minimized.
|
Ah, it was already nominated! |
This comment has been minimized.
This comment has been minimized.
|
Why not compare an interned version of the type data string? (i.e. what is currently passed as data to be hashed, possibly SHA-256 hashed first) The linker can be used for interning by emitting a common symbol with the type data string as name and taking its address, and otherwise the same thing can be done manually in a global constructor. This way it's always a pointer comparison, and there are no collisions. |
This comment has been minimized.
This comment has been minimized.
|
I don't know how node id values are generated, but assuming that they are generated sequentially, this particular collision is not realistic. However, its not hard to find collisions for more realistic node id values by picking particular values for the crate hashes: assert!(hash_struct("a2c55ca1a1f68", 4080) == hash_struct("138b8278caab5", 2804));The key thing to consider isn't the number of node id values, though: its the total number of type id values. Some quick (hopefully correct) math shows that there is a 0.01% chance of a collision once there are around 60 million type id values. That's still a pretty large number of type id values for a somewhat low probability of a collision, thought. So, its unclear to me how big a deal this is for the Rust 1.0 timeframe. It all depends on what the acceptable probability of a collision is. |
nikomatsakis
referenced this issue
Nov 11, 2013
Closed
The `Any` trait should not use virtual calls for type checks #10382
This comment has been minimized.
This comment has been minimized.
|
When I saw that @alexcrichton proposed using a hash, my first reaction was "collision!" but then I thought "...but exceedingly unlikely to occur in practice". I think this is not a matter of imminent destruction but if we can leverage the linker or some other scheme to avoid this danger, we should -- and perhaps we should just go ahead and mark the current scheme as deprecated and just plan on finding a replacement scheme. |
This comment has been minimized.
This comment has been minimized.
|
A cryptographic hash designed for this purpose (larger output) would be enough. Although, a larger output would be more expensive to compare (four |
This comment has been minimized.
This comment has been minimized.
|
We don't need to deal with this right now. P-low. |
alexcrichton
referenced this issue
Nov 21, 2013
Closed
Wrap the return value of the type_id intrinsic in an opaque box #10594
This comment has been minimized.
This comment has been minimized.
|
How relevant is this issue today? I think that it's all the same, but am not sure. |
This comment has been minimized.
This comment has been minimized.
|
It's 64-bit so collisions are likely with enough types (consider recursive type metaprogramming) and it doesn't have any check to bail out if one occurs. Bailing out is not a very good solution anyway, because it pretty much means that there's no way to compile the program, beyond using a different random seed and hoping for the best. It's a crappy situation. |
This comment has been minimized.
This comment has been minimized.
|
Note that "hoping for the best" by iteratively changing the seed might work with overwhelmingly large probability after very few iterations. |
This comment has been minimized.
This comment has been minimized.
use std::any::Any;
fn main() {
let weird : [([u8; 188250],[u8; 1381155],[u8; 558782]); 0] = [];
let whoops = Any::downcast_ref::<[([u8; 1990233],[u8; 798602],[u8; 2074279]); 1]>(&weird);
println!("{}",whoops.unwrap()[0].0[333333]);
}Actually a soundness issue. playground: http://is.gd/TwBayX |
This comment has been minimized.
This comment has been minimized.
|
I'd like the lang team to devote a little time to this now that we are post 1.0. Nominating |
pnkfelix
added
I-nominated
T-lang
labels
Oct 18, 2015
sorear
referenced this issue
Oct 28, 2015
Closed
ABI differs when generated by different architecture compilers #29235
This comment has been minimized.
This comment has been minimized.
|
OK, lang team discussed it, and our conclusion was that:
|
nikomatsakis
removed
the
I-nominated
label
Oct 29, 2015
This comment has been minimized.
This comment has been minimized.
|
I was wondering about a design where we do something like:
compare the string pointers for equality (to give a fast equality check). If that fails, compare the hashes for inequality (to give a fast inequality check). If THAT fails, compare the strings for content (to handle dynamic linking). Although re-reading the thread I see @bill-myers may have had an even more clever solution. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis putting the hash of the data at the start is a good idea, to increase the probability that we catch unequal things quickly. It seems to me like @bill-myers' approach composes fine with that strategy. |
This comment has been minimized.
This comment has been minimized.
|
I doubt the "problem" is limited to Any. You can probably confuse the compiler just as effectively by colliding hashes for symbol mangling, or many other things. What is the objective here? Since Rust is not a sandbox language, I don't think "protect memory from malicious programmers" should be one of our goals (we should document the types of undefined behavior that can be hit in safe code, and fix the ones that are possible to hit by accident; if someone is determined to break the type system, they can already write an unsafe block, or use std::process to launch a subprocess that ptraces its parent and corrupts memory). |
bluss
referenced this issue
Feb 12, 2016
Closed
WIP: Implement stable symbol-name generation algorithm. #31539
This comment has been minimized.
This comment has been minimized.
hmvp
commented
Jan 24, 2017
|
Thanks to: https://www.reddit.com/r/rust/comments/5pfwjr/mitigating_underhandedness_clippy/dcrew0k/ This example works on Beta and Nightly. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis Should this be marked as I-unsound? I've done so for now, since that seems to be the general conclusion a couple of times by different people, but please unmark if I'm wrong. |
Mark-Simulacrum
added
I-unsound 💥
C-bug
and removed
C-enhancement
labels
Jul 19, 2017
Mark-Simulacrum
removed
the
I-wrong
label
Jul 28, 2017
bstrie
referenced this issue
Sep 17, 2017
Open
borrowed referent of a `&T` sometimes incorrectly allowed #38899
This comment has been minimized.
This comment has been minimized.
Would any of these result in incorrect runtime behavior, or just bizarre compiler errors? EDIT: The comment I'm replying to is over 2 years old. |
This comment has been minimized.
This comment has been minimized.
It could be unsound, if it fooled |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis Right, I meant sorear's other examples (name mangling and such). The fact that this can cause a soundness bug separates it from other mishashes which would just result in a compiler error. |
glaebhoerl
referenced this issue
Jan 31, 2018
Closed
Tracking issue for type_id stabilization #27745
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Nov 8, 2018
|
eventbus crate uses a lot of Any, as it encourages you to use a lot of different event types and emulated inheritance. this could become an issue... |
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Nov 8, 2018
•
|
What if, instead of making TypeId bigger, we add another implementation detail that works like eventbus? Calls to Any, TypeId, etc are converted into a set of static usize that get initialized at runtime. https://internals.rust-lang.org/t/static-generics/8734?u=soni we can then use an expensive check (string lookup/comparison) once and use a cheap check every other time. |
This comment has been minimized.
This comment has been minimized.
|
Why not use a pointer comparison of an (in C++) template <typename T>
struct VtableForAny {
std::size_t size;
std::size_t align;
void (*destroy)(void *);
};
template <typename T>
inline VtableForAny<T> VTABLE_FOR_ANY = {
sizeof(T),
alignof(T),
T::~T(),
};
struct Foo {
};
template <typename T>
T *downcast(AnyRef any) {
if (any.vtable == &VTABLE_FOR_ANY<T>) {
return static_cast<T*>(any.ptr);
} else {
return nullptr;
}
}We could do a similar translation for Rust, and this is guaranteed to work by LLVM. |
This comment has been minimized.
This comment has been minimized.
I think that would not be easy. It's not just TUs; if you create a trait object in a |
This comment has been minimized.
This comment has been minimized.
We can't, for the same reason generic statics couldn't be unique across TUs if we added them (dynamic linking on Windows, possibly other platforms). |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
What strings do you want to compare? Lacking a global namespace, the only way I know of to get a "type name" that won't be the same across different crates is using the crate disambiguator, which is a hash, which is therefore susceptible to the same sort of collision. Or can we arrange for such collisions always resulting in a linker error, even when the crates are only linked together dynamically and indirectly? |
This comment has been minimized.
This comment has been minimized.
|
@rkruppe the name of the type, I think? That should be globally unique.
maybe |
This comment has been minimized.
This comment has been minimized.
|
Crate names are not globally unique, and neither are (crate name, version) tuples. |
This comment has been minimized.
This comment has been minimized.
|
@rkruppe huh! then I guess whatever information we pass to the hash function :P |
This comment has been minimized.
This comment has been minimized.
|
The disambiguator is passed in using the |
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Nov 8, 2018
|
You can work around the generic statics issue the same way eventbus does it currently. Language support would be an improvement over using a macro, tho. |
Centril
added
the
A-typesystem
label
Nov 20, 2018
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Nov 22, 2018
•
|
So, first, for fixing the collisions:
Why this specific layout? Three reasons:
So, for example, we can end up with some of the following TypeIds:
Note that these are only examples and the real thing probably wouldn't look like this, but this shows the basic idea. It's still possible to intentionally create collisions, ofc, but that's different from accidental collisions (what we're trying to prevent here). Also note that these are still opaque, so the contents need not be unambiguous to humans, only to the compiler (or the debugger, if we want that...). |
This comment has been minimized.
This comment has been minimized.
|
@rkruppe With your comment about the metadata string, did you mean that it's "safe" to use it as a disambiguator between different versions of the same crate (be it versions or feature-sets), or that we shouldn't use it (because with non-cargo workflows it may not be passed). Storing the type "name" and comparing that (potentially with a short hash for a fail-fast option) and trusting the |
This comment has been minimized.
This comment has been minimized.
|
I didn't really mean to recommend any course of action, just list constraints. One of them is that the metadata string is itself a hash in the typical workflow and thus could have collision. But thinking about the options you gave, it seems to me that: if we are OK with a neglegible chance for collisions, we can achieve that (in principle, I don't know how large the collision risk is today) with a single hash. Well, assuming collisions in the crate disambiguator can't be detected -- I am not sure whether there is perhaps some clever use of the linker to detect them. |
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Nov 24, 2018
•
|
I have tricks to improve performance, but they need to come after this (am I even being understood? I have no idea if ppl can understand me but they often seem to ignore me) |
This comment has been minimized.
This comment has been minimized.
eduardosm
commented
Jan 28, 2019
|
What about keeping somewhere a "cache" of calculated TypeIds? That cache would store a TypeId-type pair for each type for which We could also forbid TypeId zero, so |
DaGenix commentedNov 9, 2013
The implementation of type_id from #10182 uses SipHash on various parameters depending on the type. The output size of SipHash is only 64-bits, however, making it feasible to find collisions via a Birthday Attack. I believe the code below demonstrates a collision in the type_id value of two different ty_structs: