Skip to content

Bounded Trait Casting#3952

Open
DiamondLovesYou wants to merge 6 commits intorust-lang:masterfrom
DiamondLovesYou:master
Open

Bounded Trait Casting#3952
DiamondLovesYou wants to merge 6 commits intorust-lang:masterfrom
DiamondLovesYou:master

Conversation

@DiamondLovesYou
Copy link
Copy Markdown

@DiamondLovesYou DiamondLovesYou commented Apr 19, 2026

Safe, constant-time, minimal-space-overhead casting between trait objects that share a common root supertrait. A bounded trait graph is one rooted at a single explicitly-declared supertrait; that root names the closure of traits a cast may target, so the compiler can compute a per-type metadata table globally and resolve each cast with two loads and a branch. The user-facing surface is a cast!(in dyn Root, expr => dyn U) macro (plus try_cast! and unchecked_cast! variants) that works for references, &mut, and owned Box/Rc/Arc. Unlike ecosystem solutions, casting does not require 'static, global registries, or TypeId, and remains correct across crate boundaries and generic instantiations.

pub trait Root: TraitMetadataTable<dyn Root> {}
pub trait Sub: Root { fn greet(&self); }

let r: &dyn Root = /* … */;
match cast!(in dyn Root, r => dyn Sub) {
    Ok(s)  => s.greet(),                      // r implemented Sub
    Err(_) => { /* r did not implement Sub */ }
}

I have a working implementation I used to validate the details of this RFC: rust-lang/rust#155624

Rendered

[guide-level-explanation]: #guide-level-explanation

Rust lets you declare a trait as the *root* of a bounded trait hierarchy.
Every trait that transitively inherits from that root forms a *trait graph*,
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, the term graph here feels unfit because it opens up for the potential of cycles and other complex behaviour. Personally, I think the term bounded trait hierarchy is cleaner, even though more verbose, since it explicitly rules out all this complex behaviour. If you want to simplify it, maybe just trait hierarchy would be better, without the bounded part.

View changes since the review

supertrait bound:

```rust
pub trait SuperTrait: TraitMetadataTable<dyn SuperTrait> { }
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, reading the full RFC, I understand the motivation behind this trait specifically, and you've done a great job designing this to work within the bounds of the existing trait system.

That said, I don't really like this syntax at all. There are a few obvious questions:

  • What does it mean to have a TraitMetadataTable<dyn SuperTrait> bound but not a SuperTrait bound?
  • What happens when you define a trait with this bound where the trait object type is different, e.g. trait ChildTrait: TraitMetadataTable<UnrelatedType>? Later, you mention TraitMetadataTable<u8> and, without knowing a lot of compiler specifics, I genuinely don't know what that means.

Personally, I don't see a lot of value in stabilising this trait instead of keeping it as an implementation detail, since stabilising it necessarily requires either allowing these weird exceptions, or explicitly forbidding them with dedicated machinery.

While I get apprehension adding additional context-sensitive keywords, e.g. pub root trait SuperTrait {} or pub trait root SuperTrait {}, this would be substantially more understandable imho and avoid some of these pitfalls.

(Note that I say context-sensitive keywords because, like union, the term root is widely used across the ecosystem and making it a forbidden term would likely break lots of things, even if done across an edition boundary.)

View changes since the review

(`Trait2<T>: SuperTrait<T>`) joins whichever root shares its instantiation.
See *Appendix A: Generic roots*.

## Lifetimes
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest, this extends far beyond a reasonable guide-level explanation. I feel like just an example of lifetimes rules being broken is probably the extent of what would be reasonable.

Even though lifetime erasure isn't a particularly complicated subject to understand, I think that the wording here could be simplified to simply state that:

  1. Runtime has no sense of lifetimes; lifetimes are compile-time only (this is lifetime erasure, but more beginner-friendly terminology)
  2. Thus, runtime casting cannot extend lifetimes; there would be no way of knowing if this is valid

View changes since the review

Erasure*. All bound lifetimes participate, including lifetimes that only
appear through associated-type bindings such as `dyn Sub<Assoc = &'a T>`.

### `'static` is special in trait selection
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this could be folded into the above section by simply stating that 'static is an example of "extending lifetimes." Unless you know that a lifetime is explicitly 'static, you cannot convert it to 'static, since that would be extending the lifetime. Whether they're a special case in the compiler doesn't matter for its usage.

Like, this is relevant in the reference-level explanation, but talking about the invariance of lifetimes and how this relates to 'static is likely irrelevant at best and confusing at worst for newcomers.

The full matrix of these cases is worked out in *Appendix A: Lifetime
selection*.

### Relationships between lifetimes
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mostly just a reiteration of the above rules, from the perspective of a guide-level explanation.

View changes since the review


## Cross-crate boundaries and cdylibs

The *global crate* is the artifact where trait-graph layout is finalized
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's helpful to clarify here that "global crate" is not a term someone should know from elsewhere; it's specifically used to facilitate trait casting.

View changes since the review

Comment on lines +247 to +253
Why this restriction is load-bearing: two independently built cdylibs `A`
and `B` that depend on a shared library `C` each compute their own layouts
in isolation. The index `A` assigns to `ATrait` may collide with the
index `B` assigns to `BTrait`. A loader that passed a `B`-built object
into an `A`-built cast would, absent the identity check, silently read off
the wrong slot. The identity comparison rejects such casts regardless of
any index coincidence.
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence should honestly be at the top of this section, although I would reword it a bit.

To simplify a bit, the crux of the problem is that the information required to facilitate trait casting is only kept during compilation, and thus lost at runtime. In fact, this is the reason why trait casting doesn't currently work in the current version of the compiler.

When compiling, a crate has to make a choice regarding whether to keep this information or finalize it, and the result cannot simply be loaded at runtime. So, if you want to make a self-contained crate artifact like a cdylib or even a self-contained staticlib, you have to finalize this information.

So, specifically when there are multiple compiled artifacts which have finalized this information, you could run the risk of there being a trait object created using one version of that information, trying to cast to a trait using a different version of that information, and this is an error.

This is particularly difficult to explain considering how it's tempting to describe this as there being two lookup tables, but as mentioned, lookup tables are not necessarily involved at all. It's just two different sets of layouts decided for trait objects.

View changes since the review

Comment on lines +255 to +261
The deeper reason a shared schema cannot be precomputed in `C` is that the
trait graph is *lazily monomorphized*: `dyn Trait2<DownstreamType>` does
not exist from `C`'s point of view until a downstream crate instantiates
it. No precomputation in `C` can fix a canonical layout that covers all
future instantiations downstream crates might invent. A dynamic registry
would have to codegen new vtables at runtime — effectively shipping a
subset of the compiler — so this RFC rejects that path.
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit might be reasonable to fit in the guide-level explanation, but it still feels too technical imho. The main point that I feel this should be conveying is that you explicitly want to avoid the case where everything is done via a lookup table. Since the layouts of trait objects aren't explicitly recorded anywhere, you just have the idea of these layouts that can potentially conflict between artifacts, and these ideas go away when you put them in a "global crate."

View changes since the review

@ehuss ehuss added T-lang Relevant to the language team, which will review and decide on the RFC. T-types Relevant to the types team, which will review and decide on the RFC. labels Apr 20, 2026
DiamondLovesYou added a commit to DiamondLovesYou/rust that referenced this pull request Apr 21, 2026
…fcs#3952)

Tracking issue: TBD

r? @ghost (draft)

## Summary

This PR implements the compiler- and library-side plumbing for the
**bounded intertrait casting** proposal in rust-lang/rfcs#3952. It adds
a mechanism for casting between `dyn Trait` objects that share an
explicitly-declared common root supertrait, resolved at runtime in
`O(1)` via a per-root metadata table — no `'static` bound, no `TypeId`,
and no global registry.

Stabilization is not proposed here; everything is gated behind
`#![feature(trait_cast)]` and the new items are `#[unstable]`. The
feature is large (~16k LoC across ~200 files) and intentionally landed
as one commit so the graph/layout/augmentation passes stay coherent;
I'd like reviewer guidance on whether to split before further review,
and where the natural seams are.

## Surface

```rust
#![feature(trait_cast)]
use core::marker::TraitMetadataTable;

trait Animal: TraitMetadataTable<dyn Animal> {}  // declares `Animal` as a cast root
trait Dog: Animal { fn bark(&self); }

fn maybe_bark(a: &dyn Animal) {
    if let Ok(d) = core::cast!(in dyn Animal, a => dyn Dog) {
        d.bark();
    }
}
```

A trait becomes a **cast root** by naming `TraitMetadataTable<dyn Self>`
as a supertrait. Every subtrait of a root inherits the
`TraitMetadataTable<dyn Root>` bound and is eligible as a cast target
within that root's graph. `core::cast!`, `core::try_cast!`, and
`core::unchecked_cast!` macros (in a new `core::trait_cast` module)
dispatch through the `TraitCast<I, U>` trait implemented for `&T`,
`&mut T`, `Box<T>`, `Rc<T>`, and `Arc<T>`.

Runtime cost per cast: two loads and a branch against the table for
the root's graph.

## Library additions (`core`/`alloc`)

- `core::marker::TraitMetadataTable<SuperTrait>` — the marker/lang-item
  that declares a cast root; blanket impl for all `Sized` types (the
  actual root-supertrait obligation is enforced by the supertrait
  relationship itself, not the where-clauses, to break a cycle through
  `Unsize`).
- `core::trait_cast` — `TraitCast`/`TraitCastError` and the `cast!` /
  `try_cast!` / `unchecked_cast!` macros.
- `alloc::{boxed, rc, sync}` — owned-cast impls.
- New intrinsics in `core::intrinsics`:
  - `trait_metadata_index<SuperTrait, Trait>() -> (&'static u8, usize)`
  - `trait_metadata_table<SuperTrait, ConcreteType>() -> (&'static u8, NonNull<Option<NonNull<()>>>)`
  - `trait_metadata_table_len<SuperTrait>() -> usize`
  - `trait_cast_is_lifetime_erasure_safe<SuperTrait, TargetTrait>() -> bool`

The `&'static u8` returned alongside each index/table pointer is a
per-global-crate sentinel used to detect the `ForeignTraitGraph` case
when two independently-built artifacts are linked into one binary.

## Compiler additions

**New passes / modules** (all under `rustc_monomorphize` unless noted):

- `trait_graph.rs` — per-root `TraitGraph` built from gathered
  `trait_metadata_index` / `trait_metadata_table` requests.
- `table_layout.rs` — assigns slots for `(sub_trait, outlives_class)`
  pairs with condensation (`BitMatrix` row-grouping) to collapse classes
  admitting identical impl sets.
- `erasure_safe.rs` — resolves `trait_cast_is_lifetime_erasure_safe` by
  DFS-walking binder vars of the target dyn type and checking each is
  expressible through the root's binder.
- `cast_sensitivity.rs` — SCC-based batch computation of per-`Instance`
  `CastRelevantLifetimes` (direct + transitive via call-graph).
- `resolved_bodies.rs`, `trait_cast_requests.rs` — request gathering
  and delayed-codegen queue.
- `partitioning.rs` — cascade-canonicalization of augmented callees so
  sensitive subgraphs are emitted once per signature group.

**MIR**: `TerminatorKind::{Call, TailCall}` grows a
`call_id: &'tcx List<(DefId, u32, GenericArgsRef<'tcx>)>` recording the
full inlining chain. `TerminatorKind` size assertion goes from 80 → 88.
Before inlining each list has length 1; the inliner prepends the
caller's chain to each inlined callee's.

**Borrowck**: new `region_summary.rs` publishes a
`BorrowckRegionSummary` per fn (walk-position → `RegionVid`, call-site
region mappings keyed on the `u32` counter) consumed by the sensitivity
pass after typeck but before mono.

**Generic args**: new `GenericArgKind::Outlives(OutlivesArg)` variant
(tag `0b11`) carrying `(longer, shorter)` region-index pairs. Appended
to an `Instance`'s args when a sensitive callee must be specialized for
a given caller's outlives environment. Wired through interning,
encode/decode, folding/visiting, symbol mangling, and all the usual
suspects.

**New lang item**: `TraitMetadataTable` (`sym::trait_metadata_table`).

**HIR analysis** (`wfcheck.rs`, `dyn_trait.rs`): eagerly diagnoses at
trait-definition time when a root-connected trait introduces a lifetime
not expressible through the root (would be manufactured at downcast
time — unsound).

## Diagnostics

- `UNUSED_CAST_TARGET` lint — cast to a target no concrete type in the
  final binary implements (always `Err` at runtime).
- `trait graph rooted at {root} is not downcast-safe` — erased-lifetime
  manufacturability check.
- `TraitMetadataTable type argument must be a trait object` — non-`dyn T`
  arg.
- `TraitMetadataTable type argument does not match a cast root` —
  `dyn X` where `X` isn't `Self` or a transitive cast-root supertrait.
- `cast target not reachable in graph` / `non-dyn-compat target` /
  `tmt-arg-*` — various ill-formed roots and targets.

A "not part of any global crate" diagnostic was considered but is not
feasible — the detection info is categorically unavailable at compile
time.

## Debugging / inspection flags

All `-Z`, all dump to stderr:

- `-Z dump-trait-graph[=FILTER]`,
  `-Z dump-trait-cast-sensitivity[=FILTER]`,
  `-Z dump-trait-cast-augmentation[=FILTER]`,
  `-Z dump-trait-cast-canonicalization`,
  `-Z dump-trait-cast-chain-composition[=FILTER]`,
  `-Z dump-trait-cast-erasure-safety[=FILTER]`
- `-Z print-trait-cast-stats`

Each has a matching `tests/run-make/dump-*` test.

## Tests

- `tests/ui/trait-cast/` — 23 files: basic/lifetime-bounded downcasts,
  erasure-safety (chain-walk, projections, structural, outlives),
  cross-crate casts, invalid targets, non-dyn-compat targets, missing
  root bound, TMT arg mismatch, lifetime-in-generics (565 lines),
  torture-tests (306 lines), runtime cast failures.
- `tests/run-make/` — 11 rmake tests: `trait-cast-condense-*`
  (baseline, param aliasing, static-in-impl,
  same-class-different-impls), `trait-cast-table-layout`,
  `cross-global-crate-casts`, `print-trait-cast-stats`,
  `dump-trait-*`.

## Known caveats for review

- The `call_id` chain is threaded through every `TerminatorKind::Call`
  construction site in the compiler and in test mocks (which use
  `ty::List::empty()`). If there's a cleaner place to stash this —
  e.g. a side table keyed on basic-block / statement index — I'd take
  that feedback.
- `OutlivesArg` lands as a first-class `GenericArgKind` variant with
  pack/unpack. Whether this belongs in `GenericArg` or should live as
  a separate field on `Instance` is a legitimate design question; it's
  in `GenericArgKind` today so mangling/encoding come along for free.
- `library/alloc/*` and a few other paths carry pre-existing churn
  from earlier iterations; I'll rebase/squash those out before this
  is reviewable outside of a draft.
- Perf was evaluated with rustc-perf and the impact on crates that do
  not use trait casting was found to be minimal. The SCC +
  Floyd-Warshall pass only runs over directly- and
  transitively-sensitive call graphs and stops at the ground-level
  caller, so crates with no cast graph pay effectively nothing. Heavy
  trait-casting usage has not yet been benched; guidance on a
  representative workload would be welcome.

## Not in this PR

- Stabilization / `rustc_deny_explicit_impl` on `TraitMetadataTable`
  (the RFC discussion around a `pub root trait` keyword is unresolved).
- `cast!` on `Pin<P>` or user smart pointers.
- `rustdoc` surfacing of cast graphs.
DiamondLovesYou added a commit to DiamondLovesYou/rust that referenced this pull request Apr 21, 2026
…fcs#3952)

Tracking issue: TBD

r? @ghost (draft)

## Summary

This PR implements the compiler- and library-side plumbing for the
**bounded intertrait casting** proposal in rust-lang/rfcs#3952. It adds
a mechanism for casting between `dyn Trait` objects that share an
explicitly-declared common root supertrait, resolved at runtime in
`O(1)` via a per-root metadata table — no `'static` bound, no `TypeId`,
and no global registry.

Stabilization is not proposed here; everything is gated behind
`#![feature(trait_cast)]` and the new items are `#[unstable]`. The
feature is large (~16k LoC across ~200 files) and intentionally landed
as one commit so the graph/layout/augmentation passes stay coherent;
I'd like reviewer guidance on whether to split before further review,
and where the natural seams are.

## Surface

```rust
#![feature(trait_cast)]
use core::marker::TraitMetadataTable;

trait Animal: TraitMetadataTable<dyn Animal> {}  // declares `Animal` as a cast root
trait Dog: Animal { fn bark(&self); }

fn maybe_bark(a: &dyn Animal) {
    if let Ok(d) = core::cast!(in dyn Animal, a => dyn Dog) {
        d.bark();
    }
}
```

A trait becomes a **cast root** by naming `TraitMetadataTable<dyn Self>`
as a supertrait. Every subtrait of a root inherits the
`TraitMetadataTable<dyn Root>` bound and is eligible as a cast target
within that root's graph. `core::cast!`, `core::try_cast!`, and
`core::unchecked_cast!` macros (in a new `core::trait_cast` module)
dispatch through the `TraitCast<I, U>` trait implemented for `&T`,
`&mut T`, `Box<T>`, `Rc<T>`, and `Arc<T>`.

Runtime cost per cast: two loads and a branch against the table for
the root's graph.

## Library additions (`core`/`alloc`)

- `core::marker::TraitMetadataTable<SuperTrait>` — the marker/lang-item
  that declares a cast root; blanket impl for all `Sized` types (the
  actual root-supertrait obligation is enforced by the supertrait
  relationship itself, not the where-clauses, to break a cycle through
  `Unsize`).
- `core::trait_cast` — `TraitCast`/`TraitCastError` and the `cast!` /
  `try_cast!` / `unchecked_cast!` macros.
- `alloc::{boxed, rc, sync}` — owned-cast impls.
- New intrinsics in `core::intrinsics`:
  - `trait_metadata_index<SuperTrait, Trait>() -> (&'static u8, usize)`
  - `trait_metadata_table<SuperTrait, ConcreteType>() -> (&'static u8, NonNull<Option<NonNull<()>>>)`
  - `trait_metadata_table_len<SuperTrait>() -> usize`
  - `trait_cast_is_lifetime_erasure_safe<SuperTrait, TargetTrait>() -> bool`

The `&'static u8` returned alongside each index/table pointer is a
per-global-crate sentinel used to detect the `ForeignTraitGraph` case
when two independently-built artifacts are linked into one binary.

## Compiler additions

**New passes / modules** (all under `rustc_monomorphize` unless noted):

- `trait_graph.rs` — per-root `TraitGraph` built from gathered
  `trait_metadata_index` / `trait_metadata_table` requests.
- `table_layout.rs` — assigns slots for `(sub_trait, outlives_class)`
  pairs with condensation (`BitMatrix` row-grouping) to collapse classes
  admitting identical impl sets.
- `erasure_safe.rs` — resolves `trait_cast_is_lifetime_erasure_safe` by
  DFS-walking binder vars of the target dyn type and checking each is
  expressible through the root's binder.
- `cast_sensitivity.rs` — SCC-based batch computation of per-`Instance`
  `CastRelevantLifetimes` (direct + transitive via call-graph).
- `resolved_bodies.rs`, `trait_cast_requests.rs` — request gathering
  and delayed-codegen queue.
- `partitioning.rs` — cascade-canonicalization of augmented callees so
  sensitive subgraphs are emitted once per signature group.

**MIR**: `TerminatorKind::{Call, TailCall}` grows a
`call_id: &'tcx List<(DefId, u32, GenericArgsRef<'tcx>)>` recording the
full inlining chain. `TerminatorKind` size assertion goes from 80 → 88.
Before inlining each list has length 1; the inliner prepends the
caller's chain to each inlined callee's.

**Borrowck**: new `region_summary.rs` publishes a
`BorrowckRegionSummary` per fn (walk-position → `RegionVid`, call-site
region mappings keyed on the `u32` counter) consumed by the sensitivity
pass after typeck but before mono.

**Generic args**: new `GenericArgKind::Outlives(OutlivesArg)` variant
(tag `0b11`) carrying `(longer, shorter)` region-index pairs. Appended
to an `Instance`'s args when a sensitive callee must be specialized for
a given caller's outlives environment. Wired through interning,
encode/decode, folding/visiting, symbol mangling, and all the usual
suspects.

**New lang item**: `TraitMetadataTable` (`sym::trait_metadata_table`).

**HIR analysis** (`wfcheck.rs`, `dyn_trait.rs`): eagerly diagnoses at
trait-definition time when a root-connected trait introduces a lifetime
not expressible through the root (would be manufactured at downcast
time — unsound).

## Diagnostics

- `UNUSED_CAST_TARGET` lint — cast to a target no concrete type in the
  final binary implements (always `Err` at runtime).
- `trait graph rooted at {root} is not downcast-safe` — erased-lifetime
  manufacturability check.
- `TraitMetadataTable type argument must be a trait object` — non-`dyn T`
  arg.
- `TraitMetadataTable type argument does not match a cast root` —
  `dyn X` where `X` isn't `Self` or a transitive cast-root supertrait.
- `cast target not reachable in graph` / `non-dyn-compat target` /
  `tmt-arg-*` — various ill-formed roots and targets.

A "not part of any global crate" diagnostic was considered but is not
feasible — the detection info is categorically unavailable at compile
time.

## Debugging / inspection flags

All `-Z`, all dump to stderr:

- `-Z dump-trait-graph[=FILTER]`,
  `-Z dump-trait-cast-sensitivity[=FILTER]`,
  `-Z dump-trait-cast-augmentation[=FILTER]`,
  `-Z dump-trait-cast-canonicalization`,
  `-Z dump-trait-cast-chain-composition[=FILTER]`,
  `-Z dump-trait-cast-erasure-safety[=FILTER]`
- `-Z print-trait-cast-stats`

Each has a matching `tests/run-make/dump-*` test.

## Tests

- `tests/ui/trait-cast/` — 23 files: basic/lifetime-bounded downcasts,
  erasure-safety (chain-walk, projections, structural, outlives),
  cross-crate casts, invalid targets, non-dyn-compat targets, missing
  root bound, TMT arg mismatch, lifetime-in-generics (565 lines),
  torture-tests (306 lines), runtime cast failures.
- `tests/run-make/` — 11 rmake tests: `trait-cast-condense-*`
  (baseline, param aliasing, static-in-impl,
  same-class-different-impls), `trait-cast-table-layout`,
  `cross-global-crate-casts`, `print-trait-cast-stats`,
  `dump-trait-*`.

## Known caveats for review

- The `call_id` chain is threaded through every `TerminatorKind::Call`
  construction site in the compiler and in test mocks (which use
  `ty::List::empty()`). If there's a cleaner place to stash this —
  e.g. a side table keyed on basic-block / statement index — I'd take
  that feedback.
- `OutlivesArg` lands as a first-class `GenericArgKind` variant with
  pack/unpack. Whether this belongs in `GenericArg` or should live as
  a separate field on `Instance` is a legitimate design question; it's
  in `GenericArgKind` today so mangling/encoding come along for free.
- `library/alloc/*` and a few other paths carry pre-existing churn
  from earlier iterations; I'll rebase/squash those out before this
  is reviewable outside of a draft.
- Perf was evaluated with rustc-perf and the impact on crates that do
  not use trait casting was found to be minimal. The SCC +
  Floyd-Warshall pass only runs over directly- and
  transitively-sensitive call graphs and stops at the ground-level
  caller, so crates with no cast graph pay effectively nothing. Heavy
  trait-casting usage has not yet been benched; guidance on a
  representative workload would be welcome.

## Not in this PR

- Stabilization / `rustc_deny_explicit_impl` on `TraitMetadataTable`
  (the RFC discussion around a `pub root trait` keyword is unresolved).
- `cast!` on `Pin<P>` or user smart pointers.
- `rustdoc` surfacing of cast graphs.
DiamondLovesYou added a commit to DiamondLovesYou/rust that referenced this pull request Apr 21, 2026
## Summary

This PR implements the compiler- and library-side plumbing for the **bounded intertrait casting** proposal in rust-lang/rfcs#3952. It adds a mechanism for casting between `dyn Trait` objects that share an explicitly-declared common root supertrait, resolved at runtime in `O(1)` via a per-root metadata table — no `'static` bound, no `TypeId`, and no global registry.

## Surface

```rust
#![feature(trait_cast)]
use core::marker::TraitMetadataTable;

trait Animal: TraitMetadataTable<dyn Animal> {}  // declares `Animal` as a cast root
trait Dog: Animal { fn bark(&self); }

fn maybe_bark(a: &dyn Animal) {
    if let Ok(d) = core::cast!(in dyn Animal, a => dyn Dog) {
        d.bark();
    }
}
```

A trait becomes a **cast root** by naming `TraitMetadataTable<dyn Self>` as a supertrait. Every subtrait of a root inherits the `TraitMetadataTable<dyn Root>` bound and is eligible as a cast target within that root's graph. `core::cast!`, `core::try_cast!`, and `core::unchecked_cast!` macros (in a new `core::trait_cast` module) dispatch through the `TraitCast<I, U>` trait implemented for `&T`, `&mut T`, `Box<T>`, `Rc<T>`, and `Arc<T>`.

Runtime cost per cast: two loads and a branch against the table for the root's graph.

## Library additions (`core`/`alloc`)

- `core::marker::TraitMetadataTable<SuperTrait>` — the marker/lang-item that declares a cast root; blanket impl for all `Sized` types (the actual root-supertrait obligation is enforced by the supertrait relationship itself, not the where-clauses, to break a cycle through `Unsize`).
- `core::trait_cast` — `TraitCast`/`TraitCastError` and the `cast!` / `try_cast!` / `unchecked_cast!` macros.
- `alloc::{boxed, rc, sync}` — owned-cast impls.
- New intrinsics in `core::intrinsics`:
  - `trait_metadata_index<SuperTrait, Trait>() -> (&'static u8, usize)`
  - `trait_metadata_table<SuperTrait, ConcreteType>() -> (&'static u8, NonNull<Option<NonNull<()>>>)`
  - `trait_metadata_table_len<SuperTrait>() -> usize`
  - `trait_cast_is_lifetime_erasure_safe<SuperTrait, TargetTrait>() -> bool`

The `&'static u8` returned alongside each index/table pointer is a per-global-crate sentinel used to detect the `ForeignTraitGraph` case when two independently-built artifacts are linked into one binary.

## Compiler additions

**New passes / modules** (all under `rustc_monomorphize` unless noted):

- `trait_graph.rs` — per-root `TraitGraph` built from gathered `trait_metadata_index` / `trait_metadata_table` requests.
- `table_layout.rs` — assigns slots for `(sub_trait, outlives_class)` pairs with condensation (`BitMatrix` row-grouping) to collapse classes admitting identical impl sets.
- `erasure_safe.rs` — resolves `trait_cast_is_lifetime_erasure_safe` by DFS-walking binder vars of the target dyn type and checking each is expressible through the root's binder.
- `cast_sensitivity.rs` — SCC-based batch computation of per-`Instance` `CastRelevantLifetimes` (direct + transitive via call-graph).
- `resolved_bodies.rs`, `trait_cast_requests.rs` — request gathering and delayed-codegen queue.
- `partitioning.rs` — cascade-canonicalization of augmented callees so sensitive subgraphs are emitted once per signature group.

**MIR**: `TerminatorKind::{Call, TailCall}` grows a `call_id: &'tcx List<(DefId, u32, GenericArgsRef<'tcx>)>` recording the full inlining chain. `TerminatorKind` size assertion goes from 80 → 88. Before inlining each list has length 1; the inliner prepends the caller's chain to each inlined callee's.

**Borrowck**: new `region_summary.rs` publishes a
`BorrowckRegionSummary` per fn (walk-position → `RegionVid`, call-site region mappings keyed on the `u32` counter) consumed by the sensitivity pass after typeck but before mono.

**Generic args**: new `GenericArgKind::Outlives(OutlivesArg)` variant (tag `0b11`) carrying `(longer, shorter)` region-index pairs. Appended to an `Instance`'s args when a sensitive callee must be specialized for a given caller's outlives environment. Wired through interning, encode/decode, folding/visiting, symbol mangling, and all the usual suspects.

**New lang item**: `TraitMetadataTable` (`sym::trait_metadata_table`).

**HIR analysis** (`wfcheck.rs`, `dyn_trait.rs`): eagerly diagnoses at trait-definition time when a root-connected trait introduces a lifetime not expressible through the root (would be manufactured at downcast time — unsound).

## Diagnostics

- `UNUSED_CAST_TARGET` lint — cast to a target no concrete type in the final binary implements (always `Err` at runtime).
- `trait graph rooted at {root} is not downcast-safe` — erased-lifetime manufacturability check.
- `TraitMetadataTable type argument must be a trait object` — non-`dyn T` arg.
- `TraitMetadataTable type argument does not match a cast root` — `dyn X` where `X` isn't `Self` or a transitive cast-root supertrait.
- `cast target not reachable in graph` / `non-dyn-compat target` / `tmt-arg-*` — various ill-formed roots and targets.

## Debugging / inspection flags

All `-Z`, all dump to stderr:

- `-Z dump-trait-graph[=FILTER]`, `-Z dump-trait-cast-sensitivity[=FILTER]`, `-Z dump-trait-cast-augmentation[=FILTER]`, `-Z dump-trait-cast-canonicalization`, `-Z dump-trait-cast-chain-composition[=FILTER]`, `-Z dump-trait-cast-erasure-safety[=FILTER]`
- `-Z print-trait-cast-stats`

Each has a matching `tests/run-make/dump-*` test.

## Known caveats for review

- Perf was evaluated with rustc-perf and the impact on crates that do not use trait casting was found to be minimal. The SCC + Floyd-Warshall pass only runs over directly- and transitively-sensitive call graphs and stops at the ground-level caller, so crates with no cast graph pay effectively nothing. Heavy trait-casting usage has not yet been benched as no suitable public crates exist yet.

## Not in this PR

- Stabilization / `rustc_deny_explicit_impl` on `TraitMetadataTable`.
- `cast!` on `Pin<P>` or user smart pointers.
- `rustdoc` surfacing of cast graphs.

Meanwhile, the compiler already possesses the global knowledge required to solve this problem correctly. After monomorphization, the compiler effectively knows:

* every type implementing a particular root trait
Copy link
Copy Markdown
Member

@bjorn3 bjorn3 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know that after monomorphization. Monomorphization happens during codegen, which is done one crate at a time, but the set of types implementing a trait is only known when all crates are known. In the presence of dlopen, the set of crates and by extension the set of types implementing a typw cannot be known at compile time.

I seem to recall another discussion which also incorrectly assumed that the set of crates can be known at compile time... Found it. That was a discussion you started too: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Inter-trait.20casting.3F/with/565020521

View changes since the review

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I just glossed over the rest of your comment.

We don't know that after monomorphization. Monomorphization happens during codegen,

Mono does happen during codegen, yes, but that's not the only time the mono process runs and it isn't the first time.

... which is done one crate at a time,

Where are cross-crate generic instantiations placed? The source-code crate does not determine code placement if the items are not mono-roots. Where are #[inline] functions placed? LTO/LTCG change this model even more.

... but the set of types implementing a trait is only known when all crates are known. In the presence of dlopen, the set of crates and by extension the set of types implementing a type cannot be known at compile time.

To be clear: there is no trait downcasting possible, given a lazy generic instantiation, without some sort of global crate notion. dlopen or not. I think you know this too, but to explicitly call out dlopen-specific non-sense, allow me to quote the RFC text:

The deeper reason a shared schema cannot be precomputed in C is that the trait graph is lazily monomorphized: dyn Trait2 does not exist from C's point of view until a downstream crate instantiates it. No precomputation in C can fix a canonical layout that covers all future instantiations downstream crates might invent. A dynamic registry would have to codegen new vtables at runtime — effectively shipping a subset of the compiler — so this RFC rejects that path.

See https://github.com/DiamondLovesYou/rfcs-bounded-intertrait-casting/blob/master/text/0000-bounded-intertrait-casting.md#cross-crate-boundaries-and-cdylibs

Maybe this can be addressed in the future (this RFC explicitly carves out Err->Ok behavioral changes for this reason), but certainly not today.

In short: casts never cross global-crate boundaries, even when the trait
and struct definitions are literally identical on both sides. A cast
whose source object and call site carry different identities returns
`Err(TraitCastError::ForeignTraitGraph)`.
Copy link
Copy Markdown
Member

@bjorn3 bjorn3 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that implemented? When compiling an rlib it isn't known yet in which dylib or executable it will end up.

View changes since the review

Copy link
Copy Markdown
Author

@DiamondLovesYou DiamondLovesYou Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-global crates also don't have a "maximal" type system, in addition to what you ask. I've introduced a "delayed codegen" mechanism for functions that depend on such global info. In the trait-casting case, an additional global-crate-identity token is embedded into the artifact, and its address is used to verify the table and the index are from the same global crate, no matter the final linkage structure.

In short: rlibs are not usually global crates, so trait casting codegen doesn't happen in them. In case they are, the above applies and casts across the global crate boundary will categorically fail at runtime.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, it will not be possible to ever bypass rustc as linker when any crate uses bounded trait casting as otherwise the delayed codegen never happens? Also delayed codegen seems bad for compile time performance.

Copy link
Copy Markdown
Author

@DiamondLovesYou DiamondLovesYou Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the global crate delayed codegen requests are processed (and patched, actually) before codegen happens. This feature does not require any linker cooperation other than what we already have. Codegen requires only a minimal change (optimized_mir -> codegen_mir) to pick up the patched MIR bodies.

On perf, let me quote myself from the PR message:

Perf was evaluated with rustc-perf and the impact on crates that do not use trait casting (as-in, all of them) was found to be minimal. The mono-level SCC + Floyd-Warshall pass only runs over directly- and transitively-sensitive call graphs and stops at the ground-level caller, so crates with no cast graph pay effectively nothing. Perf is from slightly better to neutral on average across rustc-perf, with modest rmeta bloat at ~2.5% for typical crates.

It was actually engineered to be zero-impact where no trait-casting/lifetime-sensitivity is present (this feature changes mono to be able to be lifetime sensitive - a soundness requirement for the index intrinsic), which is all crates today. No cost is paid if this feature is not utilized, even for incr comp cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When rustc does not do the linking, there is no crate that could have the role of what you call "global crate". The reason that for example Bazel and the Chromium build system bypass rustc as linker is because these build systems don't have any support for knowing the full set of rust crates that ends up in an executable at any point where rustc is invoked. And for rust for linux it would also require changes to the build system that are not localized to the parts responsible for building rust code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do these build systems codegen without upstream crate metadata?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume:

  • rustc does the codegen
  • every time rustc is called it's passed all the dependencies of the current crate so it does get the metadata files
  • rustc is never told to do linking or that it's the root crate in the crate graph, it's always just told it's compiling a library crate

so there is no global root rustc invocation, but it does still work

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every time rustc is called it's passed all the dependencies of the current crate so it does get the metadata files

That implies the build system knows the full crate graph, and thus can label a suitable root/global crate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's dependencies of the current crate, so it only has to know the subgraph that starts at the current crate (which may not be the root crate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-lang Relevant to the language team, which will review and decide on the RFC. T-types Relevant to the types team, which will review and decide on the RFC.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants