Clarify some interning details#158642
Conversation
`Interned::new_unchecked`'s argument must be unique. This is usually guaranteed by interning, as per the `Interned` name. The `new_unchecked` name (and the accompanying doc comment) is intended to communicate this requirement. There are two main uses of `Interned`. - `rustc_middle`, which uses interning to ensure uniqueness. - `rustc_resolve`, which does not. This is questionable. PR 137202 tried to clarify things by adding a `T: Hash` constraint to `Interned<'a, T>`. This constraint isn't actually used, because `Interned` is hashed based on pointer value, not contents. But it was intended to communicate the idea that a type stored in `Interned` is actually interned, which is likely to be done with hashing. Panicking impls of `Hash` were added for the relevant `rustc_resolve` types to work around the fact that it doesn't use hashing-based interning. In my opinion PR 137202 didn't improve things. The `T: Hash` constraint is not quite right. You could use a `BTreeMap` to intern instead of a `HashMap`. You could also have a type whose values are guaranteed to be unique via construction. (Hopefully this is the case for the `rustc_resolve` types.) There isn't a good way to enforce uniqueness in the Rust type system. Therefore, this commit removes the `T: Hash` constraint and the `Hash` impls added in PR 137202. It also adds comments to the questionable `Interned::new_unchecked` calls in `rustc_resolve`, which seems like the best place to document the weirdness.
| fn alloc_import(&'ra self, import: ImportData<'ra>) -> Import<'ra> { | ||
| // FIXME: `Interned::new_unchecked`'s argument must be unique (e.g. via interning) but it's | ||
| // not clear that this value is unique! And yet, somehow things seem to work ok. Perhaps | ||
| // the IDs and/or the spans within `import` guarantee uniqueness? |
There was a problem hiding this comment.
There isn't much to fix here (and in the other 3 cases), different Imports are unique by definition, even if they have the same data.
The address of the ImportData on the arena is the ID.
In theory we could introduce some separate ImportId(u32)s and add them to ImportData, and alloc_import would generate a fresh ImportId every time, but why doing that if the pointer itself is enough.
Previously arena allocated data in the resolver didn't even use Interned, it used a different type named PtrKey that simply provided pointer-like comparison and hashing for references.
But then PtrKey was merged into Interned (maybe by you actually), despite slightly different requirements.
Upd, relevant PR:
There was a problem hiding this comment.
different Imports are unique by definition, even if they have the same data.
Can you expand on that? I don't understand.
If you write a SAFETY: <comment> explanation to replace the FIXME comments I'm happy to change them.
There was a problem hiding this comment.
Suppose DeclData, for example, loses all the auxiliary information and is left only with
struct DeclData {
res: Res,
span: Span,
}Then you can end up with two different declarations in different modules having the same DeclData (the spans can always be dummy).
But they are still different unique declarations, and the DeclData address on the arena is used as their unique ID, and they can therefore be compared by address.
It can be represented as "desugaring"
struct DeclData {
id: *const () = /* address of self */, // this would make the data unique too
res: Res,
span: Span,
}, except the address is not kept explicitly.
(In C++ maps keyed on pointers like this are used right and left, including in LLVM.)
Interned::new_unchecked's argument must be unique. This is usually guaranteed by interning, as per theInternedname. Thenew_uncheckedname (and the accompanying doc comment) is intended to communicate this requirement.There are two main uses of
Interned.rustc_middle, which uses interning to ensure uniqueness.rustc_resolve, which does not. This is questionable.PR 137202 tried to clarify things by adding a
T: Hashconstraint toInterned<'a, T>. This constraint isn't actually used, becauseInternedis hashed based on pointer value, not contents. But it was intended to communicate the idea that a type stored inInternedis actually interned, which is likely to be done with hashing. Panicking impls ofHashwere added for the relevantrustc_resolvetypes to work around the fact that it doesn't use hashing-based interning.In my opinion PR 137202 didn't improve things. The
T: Hashconstraint is not quite right. You could use aBTreeMapto intern instead of aHashMap. You could also have a type whose values are guaranteed to be unique via construction. (Hopefully this is the case for therustc_resolvetypes.) There isn't a good way to enforce uniqueness in the Rust type system.Therefore, this commit removes the
T: Hashconstraint and theHashimpls added in PR 137202. It also adds comments to the questionableInterned::new_uncheckedcalls inrustc_resolve, which seems like the best place to document the weirdness.r? @Mark-Simulacrum