Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul interning. #93148

Merged
merged 8 commits into from
Feb 15, 2022
Merged

Overhaul interning. #93148

merged 8 commits into from
Feb 15, 2022

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Jan 21, 2022

A number of types are interned and eq and hash are implemented on
the pointer rather than the contents. But this is not well enforced
within the type system like you might expect.

This PR introduces a new type Interned which encapsulates this concept
more rigorously, and uses it to convert a couple of the less common
interned types.

r? @fee1-dead

@rust-highfive
Copy link
Collaborator

Some changes occurred in clean/types.rs.

cc @camelid

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jan 21, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 21, 2022
@nnethercote nnethercote changed the title Introduce rustc_middle::ty::Uniq. [DRAFT] Introduce rustc_middle::ty::Uniq. Jan 21, 2022
@nnethercote
Copy link
Contributor Author

@michaelwoerister Ignore the first two commits, they come from #93147, which this PR builds upon. The third commit is the one of interest.

This PR was inspired by (a) my finding the interned types very hard to understand, (b) this GitHub comment, and (c) this Zulip comment.

We could conceivably merge this as is, but it's probably worth converting some of the more heavily used interned types to make sure there aren't any unexpected obstacles. TyS is the holy grail here, but that'll probably happen last :)

@michaelwoerister I'd appreciate feedback about whether you think this is heading in the right direction. Thanks!

Copy link
Contributor

@lcnr lcnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to this, i've wrote https://lcnr.de/blog/low-effort-interner/ where I ended up with a type using the same name ^^ using the following type from the "perrrrrrf" section in my blog with edit 2. might be worth considering:

mod private {
    pub struct ProofToken;
}

#[repr(transparent)]
pub struct Uniq<T>(pub T, pub private::ProofToken);

With this, the Eq and PartialEq impls don't need to temporarily use &&T which, while probably easily getting optimized out, might slightly improve perf. More importantly for me, you can now access the T while pattern matching without being able to mutate it, as it will always be behind a immutable reference.

Comment on lines 131 to 133
/// will cause incorrect behaviour, though it shouldn't cause undefined
/// behaviour. For this reason it is marked `unsafe`.
unsafe fn new(ptr: &'tcx T) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering that getting this wrong is memory safe, i would prefer

fn new_unchecked(ptr: &'tcx T) -> Self {

here instead 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my stash I've written these methods, that should cover a lot of use cases, preventing new_unchecked callsites.

    pub unsafe fn new_unchecked(inner: &'tcx T) -> Self {
        Uniq(inner)
    }

    pub fn from_arena_interner(
        value: T,
        arena: &'tcx WorkerLocal<Arena<'tcx>>,
        interner: &ShardedHashMap<&'tcx T, ()>,
    ) -> Self {
        let inner = interner.intern(value, |v| arena.alloc(v));
        Uniq(inner)
    }

    pub fn from_make<K, F: FnOnce(K) -> T>(
        key: K,
        make: F,
        arena: &'tcx WorkerLocal<Arena<'tcx>>,
        interner: &ShardedHashMap<&'tcx T, ()>,
    ) -> Self {
        let inner = interner.intern(key, move |key| arena.alloc(make(key)));
        Uniq(inner)
    }

@michaelwoerister
Copy link
Member

I'll reserve some time early next week to take a closer look. I think it's great to try and make the interning infrastructure more robust against accidentally violating invariants!

Copy link
Member

@camelid camelid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a great idea!

@@ -380,7 +380,7 @@ crate fn rustc_span(def_id: DefId, tcx: TyCtxt<'_>) -> Span {
}

impl Item {
crate fn stability<'tcx>(&self, tcx: TyCtxt<'tcx>) -> Option<&'tcx Stability> {
crate fn stability<'tcx>(&self, tcx: TyCtxt<'tcx>) -> Option<Uniq<'tcx, Stability>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the function below dereferences the pointer. I didn't realize it was interned, so I thought it was unnecessary to return a reference and had @GuillaumeGomez remove the reference. Maybe convert it back to a reference (with Uniq)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand this comment. Can you elaborate, or point me at the earlier PR? Thanks.

compiler/rustc_middle/src/ty/mod.rs Outdated Show resolved Hide resolved

impl<'tcx> TyCtxt<'tcx> {
pub fn $method(self, v: $ty) -> Uniq<'tcx, $ty> {
let v = self.interners.$name.intern(v, |v| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should intern just return Uniq directly?

@nnethercote
Copy link
Contributor Author

Lots of good comments above, this is clearly an idea whose time has come :) I will do more on this next week.

@bors
Copy link
Contributor

bors commented Jan 24, 2022

☔ The latest upstream changes (presumably #93245) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member

I'd appreciate feedback about whether you think this is heading in the right direction. Thanks!

Yes, this definitely goes in the right direction! I'd even go a step further and try to hide away all the implementation details of interning in a single, generic implementation in rustc_data_structures. Something along the lines of:

  • Add a new interning module in rustc_data_structures and
    • move the InternedSet type to it,
    • move ShardedHashMap::intern(), ShardedHashMap::intern_ref(), and ShardedHashMap::contains_pointer_to() to that new InternedSet type.
    • move Interned to the module, since it's a private implementation detail of InternedSet,
    • remove the rustc_data_structures::sharded::IntoPointer trait, since Interned is the only implementer of it,
    • document InternedSet::intern() and InternedSet::intern_ref(): what is make, why don't the methods use value directly? What are the invariants? (+add debug assertions that value and result of make are equal and their hashes too?)
    • the make parameter could be renamed to something like move_to_arena or alloc_in_arena.

Uniq should then go into that module too. It's not clear to me if we need the Interned type at all once we have Uniq and a proper InternedSet type. Seemingly, all Interned does is to make sure that we use content hashing and equality instead of pointer hashing and equality. But with Uniq we already have a place where we make that distinction. So InternedSet could look like:

struct InternedSet<'arena, T: Hash + Eq> {
    inner: ShardedHashMap<&'arena T, ()>
}

impl InternedSet<'arena, T: Hash + Eq> {
    pub fn intern<Q>(&self, value: Q, move_to_arena: impl FnOnce(Q) -> &'arena T) -> Uniq<'arena, T> { 
        // ... 
    }
}

It's entirely possible that I'm overlooking some complications here -- but that's roughly what I had in mind for cleaning things up.

@michaelwoerister
Copy link
Member

We could conceivably merge this as is, but it's probably worth converting some of the more heavily used interned types to make sure there aren't any unexpected obstacles. TyS is the holy grail here, but that'll probably happen last :)

Yes, it sounds like a good idea to add the new interning infrastructure in parallel to the old one and then incrementally move things over, until the old infrastructure can be removed. But trying to move the most complicated case as soon as possible, sounds like a good idea too. After that it should be easy :)

I have a hunch that all of this looks a lot more complicated than it actually is, just because it has grown organically over the years.

@nnethercote
Copy link
Contributor Author

  • Add a new interning module in rustc_data_structures and

All that makes sense. The latest update doesn't do any of that yet, though. Instead I started again from scratch with TyS, which I got working (tests pass and performance seems fine) though there are still lots of rough edges.

I changed Uniq to be more like how @lcnr did it; Uniq wraps a value, rather than being a smart pointer. Ty<'tcx> is now a &'tcx Uniq<TyS<'tcx>>.

The commits have lots of changes like this:

-impl<'tcx> TyS<'tcx> {
+impl<'tcx> Uniq<TyS<'tcx>> {

and this:

-            hir::CaptureBy::Value if !place.deref_tys().any(ty::TyS::is_ref) => {
+            hir::CaptureBy::Value if !place.deref_tys().any(<Uniq<ty::TyS<'_>>>::is_ref) => {

I.e. all the functionality is moved off TyS as much as possible. These lines are a bit unusual and I don't love it. Ideally they would instead refer to Ty<'tcx> but it's a typedef, not a first-class type, and so can't have its own impl blocks. I've started a further change to make TyS a newtype rather than a typedef, which would improve these things, but that requires a lot more changes and I don't have it working yet.

Anyway, this is where I'm up to. As usual, comments are welcome! Thanks.

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 25, 2022
@bors
Copy link
Contributor

bors commented Jan 25, 2022

⌛ Trying commit d75c38ea9a96f53a9eae88b29c335cd28e815f51 with merge 43ea0bf0847986569308356f1623c124b2a46a28...

@rust-log-analyzer

This comment has been minimized.

@michaelwoerister
Copy link
Member

I changed Uniq to be more like how @lcnr did it; Uniq wraps a value, rather than being a smart pointer.

Interesting! What are the concrete consequences of that?

@michaelwoerister
Copy link
Member

By the way, a cross cutting change like this will probably need a major change proposal.

@nnethercote
Copy link
Contributor Author

Interesting! What are the concrete consequences of that?

Probably the main one is that Uniq is a good name; with the smart pointer approach it should arguably be called UniqPtr.

Other than that it's not a big difference, I don't think. Switching Ty from a typedef to a newtype is a much bigger change.

@bors
Copy link
Contributor

bors commented Feb 15, 2022

📌 Commit 80632de has been approved by fee1-dead

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 15, 2022
@bors
Copy link
Contributor

bors commented Feb 15, 2022

⌛ Testing commit 80632de with merge 5569757...

@bors
Copy link
Contributor

bors commented Feb 15, 2022

☀️ Test successful - checks-actions
Approved by: fee1-dead
Pushing 5569757 to master...

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5569757): comparison url.

Summary: This benchmark run shows 198 relevant improvements 🎉 but 12 relevant regressions 😿 to instruction counts.

  • Average relevant regression: 1.0%
  • Average relevant improvement: -1.1%
  • Largest improvement in instruction counts: -4.5% on full builds of projection-caching check
  • Largest regression in instruction counts: 1.8% on full builds of match-stress-enum check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

/// A reference to a value that is interned, and is known to be unique.
///
/// Note that it is possible to have a `T` and a `Interned<T>` that are (or
/// refer to) equal but different values. But if you have two different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the last word on this line be "equal"? Or I am really confused.

Also "equal but different" sounds rather odd. The key is the location in memory, but that is only first mentioned later.

Copy link
Member

@michaelwoerister michaelwoerister Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Note that it is possible to have a T and a Interned<T> to be equal but that are separate values at different addresses.

Copy link
Member

@RalfJung RalfJung Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes more sense. I would probably use &T since that corresponds better with Interned<T> (and talking about addresses only makes sense with references anyway).

@nnethercote nnethercote deleted the Uniq branch February 25, 2022 03:07
bors-servo added a commit to servo/servo that referenced this pull request Mar 6, 2022
Update rustc to 3/6 nightly.

The only breaking change comes from rust-lang/rust#93148.
bors-servo added a commit to servo/servo that referenced this pull request Mar 6, 2022
Update rustc to 3/6 nightly.

The only breaking change comes from rust-lang/rust#93148, which a warning from rust-lang/cargo#10245.
dario23 added a commit to dario23/rust-semverver that referenced this pull request May 4, 2022
dario23 added a commit to dario23/rust-semverver that referenced this pull request May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet