Skip to content

Clarify some interning details#158642

Open
nnethercote wants to merge 1 commit into
rust-lang:mainfrom
nnethercote:interning-clarification
Open

Clarify some interning details#158642
nnethercote wants to merge 1 commit into
rust-lang:mainfrom
nnethercote:interning-clarification

Conversation

@nnethercote

Copy link
Copy Markdown
Contributor

Interned::new_unchecked's argument must be unique. This is usually guaranteed by interning, as per the Interned name. The new_unchecked name (and the accompanying doc comment) is intended to communicate this requirement.

There are two main uses of Interned.

  • rustc_middle, which uses interning to ensure uniqueness.
  • rustc_resolve, which does not. This is questionable.

PR 137202 tried to clarify things by adding a T: Hash constraint to Interned<'a, T>. This constraint isn't actually used, because Interned is hashed based on pointer value, not contents. But it was intended to communicate the idea that a type stored in Interned is actually interned, which is likely to be done with hashing. Panicking impls of Hash were added for the relevant rustc_resolve types to work around the fact that it doesn't use hashing-based interning.

In my opinion PR 137202 didn't improve things. The T: Hash constraint is not quite right. You could use a BTreeMap to intern instead of a HashMap. You could also have a type whose values are guaranteed to be unique via construction. (Hopefully this is the case for the rustc_resolve types.) There isn't a good way to enforce uniqueness in the Rust type system.

Therefore, this commit removes the T: Hash constraint and the Hash impls added in PR 137202. It also adds comments to the questionable Interned::new_unchecked calls in rustc_resolve, which seems like the best place to document the weirdness.

r? @Mark-Simulacrum

`Interned::new_unchecked`'s argument must be unique. This is usually
guaranteed by interning, as per the `Interned` name. The `new_unchecked`
name (and the accompanying doc comment) is intended to communicate this
requirement.

There are two main uses of `Interned`.
- `rustc_middle`, which uses interning to ensure uniqueness.
- `rustc_resolve`, which does not. This is questionable.

PR 137202 tried to clarify things by adding a `T: Hash` constraint to
`Interned<'a, T>`. This constraint isn't actually used, because
`Interned` is hashed based on pointer value, not contents. But it was
intended to communicate the idea that a type stored in `Interned` is
actually interned, which is likely to be done with hashing. Panicking
impls of `Hash` were added for the relevant `rustc_resolve` types to
work around the fact that it doesn't use hashing-based interning.

In my opinion PR 137202 didn't improve things. The `T: Hash` constraint
is not quite right. You could use a `BTreeMap` to intern instead of a
`HashMap`. You could also have a type whose values are guaranteed to be
unique via construction. (Hopefully this is the case for the
`rustc_resolve` types.) There isn't a good way to enforce uniqueness in
the Rust type system.

Therefore, this commit removes the `T: Hash` constraint and the `Hash`
impls added in PR 137202. It also adds comments to the questionable
`Interned::new_unchecked` calls in `rustc_resolve`, which seems like the
best place to document the weirdness.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 1, 2026
@nnethercote

Copy link
Copy Markdown
Contributor Author

cc @petrochenkov

fn alloc_import(&'ra self, import: ImportData<'ra>) -> Import<'ra> {
// FIXME: `Interned::new_unchecked`'s argument must be unique (e.g. via interning) but it's
// not clear that this value is unique! And yet, somehow things seem to work ok. Perhaps
// the IDs and/or the spans within `import` guarantee uniqueness?

@petrochenkov petrochenkov Jul 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't much to fix here (and in the other 3 cases), different Imports are unique by definition, even if they have the same data.
The address of the ImportData on the arena is the ID.
In theory we could introduce some separate ImportId(u32)s and add them to ImportData, and alloc_import would generate a fresh ImportId every time, but why doing that if the pointer itself is enough.

Previously arena allocated data in the resolver didn't even use Interned, it used a different type named PtrKey that simply provided pointer-like comparison and hashing for references.
But then PtrKey was merged into Interned (maybe by you actually), despite slightly different requirements.

Upd, relevant PR:

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different Imports are unique by definition, even if they have the same data.

Can you expand on that? I don't understand.

If you write a SAFETY: <comment> explanation to replace the FIXME comments I'm happy to change them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose DeclData, for example, loses all the auxiliary information and is left only with

struct DeclData {
  res: Res,
  span: Span,
}

Then you can end up with two different declarations in different modules having the same DeclData (the spans can always be dummy).
But they are still different unique declarations, and the DeclData address on the arena is used as their unique ID, and they can therefore be compared by address.

It can be represented as "desugaring"

struct DeclData {
  id: *const () = /* address of self */, // this would make the data unique too
  res: Res,
  span: Span,
}

, except the address is not kept explicitly.
(In C++ maps keyed on pointers like this are used right and left, including in LLVM.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants