Conversation
| [guide-level-explanation]: #guide-level-explanation | ||
|
|
||
| Rust lets you declare a trait as the *root* of a bounded trait hierarchy. | ||
| Every trait that transitively inherits from that root forms a *trait graph*, |
There was a problem hiding this comment.
Honestly, the term graph here feels unfit because it opens up for the potential of cycles and other complex behaviour. Personally, I think the term bounded trait hierarchy is cleaner, even though more verbose, since it explicitly rules out all this complex behaviour. If you want to simplify it, maybe just trait hierarchy would be better, without the bounded part.
| supertrait bound: | ||
|
|
||
| ```rust | ||
| pub trait SuperTrait: TraitMetadataTable<dyn SuperTrait> { } |
There was a problem hiding this comment.
So, reading the full RFC, I understand the motivation behind this trait specifically, and you've done a great job designing this to work within the bounds of the existing trait system.
That said, I don't really like this syntax at all. There are a few obvious questions:
- What does it mean to have a
TraitMetadataTable<dyn SuperTrait>bound but not aSuperTraitbound? - What happens when you define a trait with this bound where the trait object type is different, e.g.
trait ChildTrait: TraitMetadataTable<UnrelatedType>? Later, you mentionTraitMetadataTable<u8>and, without knowing a lot of compiler specifics, I genuinely don't know what that means.
Personally, I don't see a lot of value in stabilising this trait instead of keeping it as an implementation detail, since stabilising it necessarily requires either allowing these weird exceptions, or explicitly forbidding them with dedicated machinery.
While I get apprehension adding additional context-sensitive keywords, e.g. pub root trait SuperTrait {} or pub trait root SuperTrait {}, this would be substantially more understandable imho and avoid some of these pitfalls.
(Note that I say context-sensitive keywords because, like union, the term root is widely used across the ecosystem and making it a forbidden term would likely break lots of things, even if done across an edition boundary.)
| (`Trait2<T>: SuperTrait<T>`) joins whichever root shares its instantiation. | ||
| See *Appendix A: Generic roots*. | ||
|
|
||
| ## Lifetimes |
There was a problem hiding this comment.
I'll be honest, this extends far beyond a reasonable guide-level explanation. I feel like just an example of lifetimes rules being broken is probably the extent of what would be reasonable.
Even though lifetime erasure isn't a particularly complicated subject to understand, I think that the wording here could be simplified to simply state that:
- Runtime has no sense of lifetimes; lifetimes are compile-time only (this is lifetime erasure, but more beginner-friendly terminology)
- Thus, runtime casting cannot extend lifetimes; there would be no way of knowing if this is valid
| Erasure*. All bound lifetimes participate, including lifetimes that only | ||
| appear through associated-type bindings such as `dyn Sub<Assoc = &'a T>`. | ||
|
|
||
| ### `'static` is special in trait selection |
There was a problem hiding this comment.
Similarly, this could be folded into the above section by simply stating that 'static is an example of "extending lifetimes." Unless you know that a lifetime is explicitly 'static, you cannot convert it to 'static, since that would be extending the lifetime. Whether they're a special case in the compiler doesn't matter for its usage.
Like, this is relevant in the reference-level explanation, but talking about the invariance of lifetimes and how this relates to 'static is likely irrelevant at best and confusing at worst for newcomers.
| The full matrix of these cases is worked out in *Appendix A: Lifetime | ||
| selection*. | ||
|
|
||
| ### Relationships between lifetimes |
There was a problem hiding this comment.
Also mostly just a reiteration of the above rules, from the perspective of a guide-level explanation.
|
|
||
| ## Cross-crate boundaries and cdylibs | ||
|
|
||
| The *global crate* is the artifact where trait-graph layout is finalized |
There was a problem hiding this comment.
I think it's helpful to clarify here that "global crate" is not a term someone should know from elsewhere; it's specifically used to facilitate trait casting.
| Why this restriction is load-bearing: two independently built cdylibs `A` | ||
| and `B` that depend on a shared library `C` each compute their own layouts | ||
| in isolation. The index `A` assigns to `ATrait` may collide with the | ||
| index `B` assigns to `BTrait`. A loader that passed a `B`-built object | ||
| into an `A`-built cast would, absent the identity check, silently read off | ||
| the wrong slot. The identity comparison rejects such casts regardless of | ||
| any index coincidence. |
There was a problem hiding this comment.
This sentence should honestly be at the top of this section, although I would reword it a bit.
To simplify a bit, the crux of the problem is that the information required to facilitate trait casting is only kept during compilation, and thus lost at runtime. In fact, this is the reason why trait casting doesn't currently work in the current version of the compiler.
When compiling, a crate has to make a choice regarding whether to keep this information or finalize it, and the result cannot simply be loaded at runtime. So, if you want to make a self-contained crate artifact like a cdylib or even a self-contained staticlib, you have to finalize this information.
So, specifically when there are multiple compiled artifacts which have finalized this information, you could run the risk of there being a trait object created using one version of that information, trying to cast to a trait using a different version of that information, and this is an error.
This is particularly difficult to explain considering how it's tempting to describe this as there being two lookup tables, but as mentioned, lookup tables are not necessarily involved at all. It's just two different sets of layouts decided for trait objects.
| The deeper reason a shared schema cannot be precomputed in `C` is that the | ||
| trait graph is *lazily monomorphized*: `dyn Trait2<DownstreamType>` does | ||
| not exist from `C`'s point of view until a downstream crate instantiates | ||
| it. No precomputation in `C` can fix a canonical layout that covers all | ||
| future instantiations downstream crates might invent. A dynamic registry | ||
| would have to codegen new vtables at runtime — effectively shipping a | ||
| subset of the compiler — so this RFC rejects that path. |
There was a problem hiding this comment.
This bit might be reasonable to fit in the guide-level explanation, but it still feels too technical imho. The main point that I feel this should be conveying is that you explicitly want to avoid the case where everything is done via a lookup table. Since the layouts of trait objects aren't explicitly recorded anywhere, you just have the idea of these layouts that can potentially conflict between artifacts, and these ideas go away when you put them in a "global crate."
…fcs#3952) Tracking issue: TBD r? @ghost (draft) ## Summary This PR implements the compiler- and library-side plumbing for the **bounded intertrait casting** proposal in rust-lang/rfcs#3952. It adds a mechanism for casting between `dyn Trait` objects that share an explicitly-declared common root supertrait, resolved at runtime in `O(1)` via a per-root metadata table — no `'static` bound, no `TypeId`, and no global registry. Stabilization is not proposed here; everything is gated behind `#![feature(trait_cast)]` and the new items are `#[unstable]`. The feature is large (~16k LoC across ~200 files) and intentionally landed as one commit so the graph/layout/augmentation passes stay coherent; I'd like reviewer guidance on whether to split before further review, and where the natural seams are. ## Surface ```rust #![feature(trait_cast)] use core::marker::TraitMetadataTable; trait Animal: TraitMetadataTable<dyn Animal> {} // declares `Animal` as a cast root trait Dog: Animal { fn bark(&self); } fn maybe_bark(a: &dyn Animal) { if let Ok(d) = core::cast!(in dyn Animal, a => dyn Dog) { d.bark(); } } ``` A trait becomes a **cast root** by naming `TraitMetadataTable<dyn Self>` as a supertrait. Every subtrait of a root inherits the `TraitMetadataTable<dyn Root>` bound and is eligible as a cast target within that root's graph. `core::cast!`, `core::try_cast!`, and `core::unchecked_cast!` macros (in a new `core::trait_cast` module) dispatch through the `TraitCast<I, U>` trait implemented for `&T`, `&mut T`, `Box<T>`, `Rc<T>`, and `Arc<T>`. Runtime cost per cast: two loads and a branch against the table for the root's graph. ## Library additions (`core`/`alloc`) - `core::marker::TraitMetadataTable<SuperTrait>` — the marker/lang-item that declares a cast root; blanket impl for all `Sized` types (the actual root-supertrait obligation is enforced by the supertrait relationship itself, not the where-clauses, to break a cycle through `Unsize`). - `core::trait_cast` — `TraitCast`/`TraitCastError` and the `cast!` / `try_cast!` / `unchecked_cast!` macros. - `alloc::{boxed, rc, sync}` — owned-cast impls. - New intrinsics in `core::intrinsics`: - `trait_metadata_index<SuperTrait, Trait>() -> (&'static u8, usize)` - `trait_metadata_table<SuperTrait, ConcreteType>() -> (&'static u8, NonNull<Option<NonNull<()>>>)` - `trait_metadata_table_len<SuperTrait>() -> usize` - `trait_cast_is_lifetime_erasure_safe<SuperTrait, TargetTrait>() -> bool` The `&'static u8` returned alongside each index/table pointer is a per-global-crate sentinel used to detect the `ForeignTraitGraph` case when two independently-built artifacts are linked into one binary. ## Compiler additions **New passes / modules** (all under `rustc_monomorphize` unless noted): - `trait_graph.rs` — per-root `TraitGraph` built from gathered `trait_metadata_index` / `trait_metadata_table` requests. - `table_layout.rs` — assigns slots for `(sub_trait, outlives_class)` pairs with condensation (`BitMatrix` row-grouping) to collapse classes admitting identical impl sets. - `erasure_safe.rs` — resolves `trait_cast_is_lifetime_erasure_safe` by DFS-walking binder vars of the target dyn type and checking each is expressible through the root's binder. - `cast_sensitivity.rs` — SCC-based batch computation of per-`Instance` `CastRelevantLifetimes` (direct + transitive via call-graph). - `resolved_bodies.rs`, `trait_cast_requests.rs` — request gathering and delayed-codegen queue. - `partitioning.rs` — cascade-canonicalization of augmented callees so sensitive subgraphs are emitted once per signature group. **MIR**: `TerminatorKind::{Call, TailCall}` grows a `call_id: &'tcx List<(DefId, u32, GenericArgsRef<'tcx>)>` recording the full inlining chain. `TerminatorKind` size assertion goes from 80 → 88. Before inlining each list has length 1; the inliner prepends the caller's chain to each inlined callee's. **Borrowck**: new `region_summary.rs` publishes a `BorrowckRegionSummary` per fn (walk-position → `RegionVid`, call-site region mappings keyed on the `u32` counter) consumed by the sensitivity pass after typeck but before mono. **Generic args**: new `GenericArgKind::Outlives(OutlivesArg)` variant (tag `0b11`) carrying `(longer, shorter)` region-index pairs. Appended to an `Instance`'s args when a sensitive callee must be specialized for a given caller's outlives environment. Wired through interning, encode/decode, folding/visiting, symbol mangling, and all the usual suspects. **New lang item**: `TraitMetadataTable` (`sym::trait_metadata_table`). **HIR analysis** (`wfcheck.rs`, `dyn_trait.rs`): eagerly diagnoses at trait-definition time when a root-connected trait introduces a lifetime not expressible through the root (would be manufactured at downcast time — unsound). ## Diagnostics - `UNUSED_CAST_TARGET` lint — cast to a target no concrete type in the final binary implements (always `Err` at runtime). - `trait graph rooted at {root} is not downcast-safe` — erased-lifetime manufacturability check. - `TraitMetadataTable type argument must be a trait object` — non-`dyn T` arg. - `TraitMetadataTable type argument does not match a cast root` — `dyn X` where `X` isn't `Self` or a transitive cast-root supertrait. - `cast target not reachable in graph` / `non-dyn-compat target` / `tmt-arg-*` — various ill-formed roots and targets. A "not part of any global crate" diagnostic was considered but is not feasible — the detection info is categorically unavailable at compile time. ## Debugging / inspection flags All `-Z`, all dump to stderr: - `-Z dump-trait-graph[=FILTER]`, `-Z dump-trait-cast-sensitivity[=FILTER]`, `-Z dump-trait-cast-augmentation[=FILTER]`, `-Z dump-trait-cast-canonicalization`, `-Z dump-trait-cast-chain-composition[=FILTER]`, `-Z dump-trait-cast-erasure-safety[=FILTER]` - `-Z print-trait-cast-stats` Each has a matching `tests/run-make/dump-*` test. ## Tests - `tests/ui/trait-cast/` — 23 files: basic/lifetime-bounded downcasts, erasure-safety (chain-walk, projections, structural, outlives), cross-crate casts, invalid targets, non-dyn-compat targets, missing root bound, TMT arg mismatch, lifetime-in-generics (565 lines), torture-tests (306 lines), runtime cast failures. - `tests/run-make/` — 11 rmake tests: `trait-cast-condense-*` (baseline, param aliasing, static-in-impl, same-class-different-impls), `trait-cast-table-layout`, `cross-global-crate-casts`, `print-trait-cast-stats`, `dump-trait-*`. ## Known caveats for review - The `call_id` chain is threaded through every `TerminatorKind::Call` construction site in the compiler and in test mocks (which use `ty::List::empty()`). If there's a cleaner place to stash this — e.g. a side table keyed on basic-block / statement index — I'd take that feedback. - `OutlivesArg` lands as a first-class `GenericArgKind` variant with pack/unpack. Whether this belongs in `GenericArg` or should live as a separate field on `Instance` is a legitimate design question; it's in `GenericArgKind` today so mangling/encoding come along for free. - `library/alloc/*` and a few other paths carry pre-existing churn from earlier iterations; I'll rebase/squash those out before this is reviewable outside of a draft. - Perf was evaluated with rustc-perf and the impact on crates that do not use trait casting was found to be minimal. The SCC + Floyd-Warshall pass only runs over directly- and transitively-sensitive call graphs and stops at the ground-level caller, so crates with no cast graph pay effectively nothing. Heavy trait-casting usage has not yet been benched; guidance on a representative workload would be welcome. ## Not in this PR - Stabilization / `rustc_deny_explicit_impl` on `TraitMetadataTable` (the RFC discussion around a `pub root trait` keyword is unresolved). - `cast!` on `Pin<P>` or user smart pointers. - `rustdoc` surfacing of cast graphs.
…fcs#3952) Tracking issue: TBD r? @ghost (draft) ## Summary This PR implements the compiler- and library-side plumbing for the **bounded intertrait casting** proposal in rust-lang/rfcs#3952. It adds a mechanism for casting between `dyn Trait` objects that share an explicitly-declared common root supertrait, resolved at runtime in `O(1)` via a per-root metadata table — no `'static` bound, no `TypeId`, and no global registry. Stabilization is not proposed here; everything is gated behind `#![feature(trait_cast)]` and the new items are `#[unstable]`. The feature is large (~16k LoC across ~200 files) and intentionally landed as one commit so the graph/layout/augmentation passes stay coherent; I'd like reviewer guidance on whether to split before further review, and where the natural seams are. ## Surface ```rust #![feature(trait_cast)] use core::marker::TraitMetadataTable; trait Animal: TraitMetadataTable<dyn Animal> {} // declares `Animal` as a cast root trait Dog: Animal { fn bark(&self); } fn maybe_bark(a: &dyn Animal) { if let Ok(d) = core::cast!(in dyn Animal, a => dyn Dog) { d.bark(); } } ``` A trait becomes a **cast root** by naming `TraitMetadataTable<dyn Self>` as a supertrait. Every subtrait of a root inherits the `TraitMetadataTable<dyn Root>` bound and is eligible as a cast target within that root's graph. `core::cast!`, `core::try_cast!`, and `core::unchecked_cast!` macros (in a new `core::trait_cast` module) dispatch through the `TraitCast<I, U>` trait implemented for `&T`, `&mut T`, `Box<T>`, `Rc<T>`, and `Arc<T>`. Runtime cost per cast: two loads and a branch against the table for the root's graph. ## Library additions (`core`/`alloc`) - `core::marker::TraitMetadataTable<SuperTrait>` — the marker/lang-item that declares a cast root; blanket impl for all `Sized` types (the actual root-supertrait obligation is enforced by the supertrait relationship itself, not the where-clauses, to break a cycle through `Unsize`). - `core::trait_cast` — `TraitCast`/`TraitCastError` and the `cast!` / `try_cast!` / `unchecked_cast!` macros. - `alloc::{boxed, rc, sync}` — owned-cast impls. - New intrinsics in `core::intrinsics`: - `trait_metadata_index<SuperTrait, Trait>() -> (&'static u8, usize)` - `trait_metadata_table<SuperTrait, ConcreteType>() -> (&'static u8, NonNull<Option<NonNull<()>>>)` - `trait_metadata_table_len<SuperTrait>() -> usize` - `trait_cast_is_lifetime_erasure_safe<SuperTrait, TargetTrait>() -> bool` The `&'static u8` returned alongside each index/table pointer is a per-global-crate sentinel used to detect the `ForeignTraitGraph` case when two independently-built artifacts are linked into one binary. ## Compiler additions **New passes / modules** (all under `rustc_monomorphize` unless noted): - `trait_graph.rs` — per-root `TraitGraph` built from gathered `trait_metadata_index` / `trait_metadata_table` requests. - `table_layout.rs` — assigns slots for `(sub_trait, outlives_class)` pairs with condensation (`BitMatrix` row-grouping) to collapse classes admitting identical impl sets. - `erasure_safe.rs` — resolves `trait_cast_is_lifetime_erasure_safe` by DFS-walking binder vars of the target dyn type and checking each is expressible through the root's binder. - `cast_sensitivity.rs` — SCC-based batch computation of per-`Instance` `CastRelevantLifetimes` (direct + transitive via call-graph). - `resolved_bodies.rs`, `trait_cast_requests.rs` — request gathering and delayed-codegen queue. - `partitioning.rs` — cascade-canonicalization of augmented callees so sensitive subgraphs are emitted once per signature group. **MIR**: `TerminatorKind::{Call, TailCall}` grows a `call_id: &'tcx List<(DefId, u32, GenericArgsRef<'tcx>)>` recording the full inlining chain. `TerminatorKind` size assertion goes from 80 → 88. Before inlining each list has length 1; the inliner prepends the caller's chain to each inlined callee's. **Borrowck**: new `region_summary.rs` publishes a `BorrowckRegionSummary` per fn (walk-position → `RegionVid`, call-site region mappings keyed on the `u32` counter) consumed by the sensitivity pass after typeck but before mono. **Generic args**: new `GenericArgKind::Outlives(OutlivesArg)` variant (tag `0b11`) carrying `(longer, shorter)` region-index pairs. Appended to an `Instance`'s args when a sensitive callee must be specialized for a given caller's outlives environment. Wired through interning, encode/decode, folding/visiting, symbol mangling, and all the usual suspects. **New lang item**: `TraitMetadataTable` (`sym::trait_metadata_table`). **HIR analysis** (`wfcheck.rs`, `dyn_trait.rs`): eagerly diagnoses at trait-definition time when a root-connected trait introduces a lifetime not expressible through the root (would be manufactured at downcast time — unsound). ## Diagnostics - `UNUSED_CAST_TARGET` lint — cast to a target no concrete type in the final binary implements (always `Err` at runtime). - `trait graph rooted at {root} is not downcast-safe` — erased-lifetime manufacturability check. - `TraitMetadataTable type argument must be a trait object` — non-`dyn T` arg. - `TraitMetadataTable type argument does not match a cast root` — `dyn X` where `X` isn't `Self` or a transitive cast-root supertrait. - `cast target not reachable in graph` / `non-dyn-compat target` / `tmt-arg-*` — various ill-formed roots and targets. A "not part of any global crate" diagnostic was considered but is not feasible — the detection info is categorically unavailable at compile time. ## Debugging / inspection flags All `-Z`, all dump to stderr: - `-Z dump-trait-graph[=FILTER]`, `-Z dump-trait-cast-sensitivity[=FILTER]`, `-Z dump-trait-cast-augmentation[=FILTER]`, `-Z dump-trait-cast-canonicalization`, `-Z dump-trait-cast-chain-composition[=FILTER]`, `-Z dump-trait-cast-erasure-safety[=FILTER]` - `-Z print-trait-cast-stats` Each has a matching `tests/run-make/dump-*` test. ## Tests - `tests/ui/trait-cast/` — 23 files: basic/lifetime-bounded downcasts, erasure-safety (chain-walk, projections, structural, outlives), cross-crate casts, invalid targets, non-dyn-compat targets, missing root bound, TMT arg mismatch, lifetime-in-generics (565 lines), torture-tests (306 lines), runtime cast failures. - `tests/run-make/` — 11 rmake tests: `trait-cast-condense-*` (baseline, param aliasing, static-in-impl, same-class-different-impls), `trait-cast-table-layout`, `cross-global-crate-casts`, `print-trait-cast-stats`, `dump-trait-*`. ## Known caveats for review - The `call_id` chain is threaded through every `TerminatorKind::Call` construction site in the compiler and in test mocks (which use `ty::List::empty()`). If there's a cleaner place to stash this — e.g. a side table keyed on basic-block / statement index — I'd take that feedback. - `OutlivesArg` lands as a first-class `GenericArgKind` variant with pack/unpack. Whether this belongs in `GenericArg` or should live as a separate field on `Instance` is a legitimate design question; it's in `GenericArgKind` today so mangling/encoding come along for free. - `library/alloc/*` and a few other paths carry pre-existing churn from earlier iterations; I'll rebase/squash those out before this is reviewable outside of a draft. - Perf was evaluated with rustc-perf and the impact on crates that do not use trait casting was found to be minimal. The SCC + Floyd-Warshall pass only runs over directly- and transitively-sensitive call graphs and stops at the ground-level caller, so crates with no cast graph pay effectively nothing. Heavy trait-casting usage has not yet been benched; guidance on a representative workload would be welcome. ## Not in this PR - Stabilization / `rustc_deny_explicit_impl` on `TraitMetadataTable` (the RFC discussion around a `pub root trait` keyword is unresolved). - `cast!` on `Pin<P>` or user smart pointers. - `rustdoc` surfacing of cast graphs.
## Summary This PR implements the compiler- and library-side plumbing for the **bounded intertrait casting** proposal in rust-lang/rfcs#3952. It adds a mechanism for casting between `dyn Trait` objects that share an explicitly-declared common root supertrait, resolved at runtime in `O(1)` via a per-root metadata table — no `'static` bound, no `TypeId`, and no global registry. ## Surface ```rust #![feature(trait_cast)] use core::marker::TraitMetadataTable; trait Animal: TraitMetadataTable<dyn Animal> {} // declares `Animal` as a cast root trait Dog: Animal { fn bark(&self); } fn maybe_bark(a: &dyn Animal) { if let Ok(d) = core::cast!(in dyn Animal, a => dyn Dog) { d.bark(); } } ``` A trait becomes a **cast root** by naming `TraitMetadataTable<dyn Self>` as a supertrait. Every subtrait of a root inherits the `TraitMetadataTable<dyn Root>` bound and is eligible as a cast target within that root's graph. `core::cast!`, `core::try_cast!`, and `core::unchecked_cast!` macros (in a new `core::trait_cast` module) dispatch through the `TraitCast<I, U>` trait implemented for `&T`, `&mut T`, `Box<T>`, `Rc<T>`, and `Arc<T>`. Runtime cost per cast: two loads and a branch against the table for the root's graph. ## Library additions (`core`/`alloc`) - `core::marker::TraitMetadataTable<SuperTrait>` — the marker/lang-item that declares a cast root; blanket impl for all `Sized` types (the actual root-supertrait obligation is enforced by the supertrait relationship itself, not the where-clauses, to break a cycle through `Unsize`). - `core::trait_cast` — `TraitCast`/`TraitCastError` and the `cast!` / `try_cast!` / `unchecked_cast!` macros. - `alloc::{boxed, rc, sync}` — owned-cast impls. - New intrinsics in `core::intrinsics`: - `trait_metadata_index<SuperTrait, Trait>() -> (&'static u8, usize)` - `trait_metadata_table<SuperTrait, ConcreteType>() -> (&'static u8, NonNull<Option<NonNull<()>>>)` - `trait_metadata_table_len<SuperTrait>() -> usize` - `trait_cast_is_lifetime_erasure_safe<SuperTrait, TargetTrait>() -> bool` The `&'static u8` returned alongside each index/table pointer is a per-global-crate sentinel used to detect the `ForeignTraitGraph` case when two independently-built artifacts are linked into one binary. ## Compiler additions **New passes / modules** (all under `rustc_monomorphize` unless noted): - `trait_graph.rs` — per-root `TraitGraph` built from gathered `trait_metadata_index` / `trait_metadata_table` requests. - `table_layout.rs` — assigns slots for `(sub_trait, outlives_class)` pairs with condensation (`BitMatrix` row-grouping) to collapse classes admitting identical impl sets. - `erasure_safe.rs` — resolves `trait_cast_is_lifetime_erasure_safe` by DFS-walking binder vars of the target dyn type and checking each is expressible through the root's binder. - `cast_sensitivity.rs` — SCC-based batch computation of per-`Instance` `CastRelevantLifetimes` (direct + transitive via call-graph). - `resolved_bodies.rs`, `trait_cast_requests.rs` — request gathering and delayed-codegen queue. - `partitioning.rs` — cascade-canonicalization of augmented callees so sensitive subgraphs are emitted once per signature group. **MIR**: `TerminatorKind::{Call, TailCall}` grows a `call_id: &'tcx List<(DefId, u32, GenericArgsRef<'tcx>)>` recording the full inlining chain. `TerminatorKind` size assertion goes from 80 → 88. Before inlining each list has length 1; the inliner prepends the caller's chain to each inlined callee's. **Borrowck**: new `region_summary.rs` publishes a `BorrowckRegionSummary` per fn (walk-position → `RegionVid`, call-site region mappings keyed on the `u32` counter) consumed by the sensitivity pass after typeck but before mono. **Generic args**: new `GenericArgKind::Outlives(OutlivesArg)` variant (tag `0b11`) carrying `(longer, shorter)` region-index pairs. Appended to an `Instance`'s args when a sensitive callee must be specialized for a given caller's outlives environment. Wired through interning, encode/decode, folding/visiting, symbol mangling, and all the usual suspects. **New lang item**: `TraitMetadataTable` (`sym::trait_metadata_table`). **HIR analysis** (`wfcheck.rs`, `dyn_trait.rs`): eagerly diagnoses at trait-definition time when a root-connected trait introduces a lifetime not expressible through the root (would be manufactured at downcast time — unsound). ## Diagnostics - `UNUSED_CAST_TARGET` lint — cast to a target no concrete type in the final binary implements (always `Err` at runtime). - `trait graph rooted at {root} is not downcast-safe` — erased-lifetime manufacturability check. - `TraitMetadataTable type argument must be a trait object` — non-`dyn T` arg. - `TraitMetadataTable type argument does not match a cast root` — `dyn X` where `X` isn't `Self` or a transitive cast-root supertrait. - `cast target not reachable in graph` / `non-dyn-compat target` / `tmt-arg-*` — various ill-formed roots and targets. ## Debugging / inspection flags All `-Z`, all dump to stderr: - `-Z dump-trait-graph[=FILTER]`, `-Z dump-trait-cast-sensitivity[=FILTER]`, `-Z dump-trait-cast-augmentation[=FILTER]`, `-Z dump-trait-cast-canonicalization`, `-Z dump-trait-cast-chain-composition[=FILTER]`, `-Z dump-trait-cast-erasure-safety[=FILTER]` - `-Z print-trait-cast-stats` Each has a matching `tests/run-make/dump-*` test. ## Known caveats for review - Perf was evaluated with rustc-perf and the impact on crates that do not use trait casting was found to be minimal. The SCC + Floyd-Warshall pass only runs over directly- and transitively-sensitive call graphs and stops at the ground-level caller, so crates with no cast graph pay effectively nothing. Heavy trait-casting usage has not yet been benched as no suitable public crates exist yet. ## Not in this PR - Stabilization / `rustc_deny_explicit_impl` on `TraitMetadataTable`. - `cast!` on `Pin<P>` or user smart pointers. - `rustdoc` surfacing of cast graphs.
|
|
||
| Meanwhile, the compiler already possesses the global knowledge required to solve this problem correctly. After monomorphization, the compiler effectively knows: | ||
|
|
||
| * every type implementing a particular root trait |
There was a problem hiding this comment.
We don't know that after monomorphization. Monomorphization happens during codegen, which is done one crate at a time, but the set of types implementing a trait is only known when all crates are known. In the presence of dlopen, the set of crates and by extension the set of types implementing a typw cannot be known at compile time.
I seem to recall another discussion which also incorrectly assumed that the set of crates can be known at compile time... Found it. That was a discussion you started too: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Inter-trait.20casting.3F/with/565020521
There was a problem hiding this comment.
There was a problem hiding this comment.
Sorry I just glossed over the rest of your comment.
We don't know that after monomorphization. Monomorphization happens during codegen,
Mono does happen during codegen, yes, but that's not the only time the mono process runs and it isn't the first time.
... which is done one crate at a time,
Where are cross-crate generic instantiations placed? The source-code crate does not determine code placement if the items are not mono-roots. Where are #[inline] functions placed? LTO/LTCG change this model even more.
... but the set of types implementing a trait is only known when all crates are known. In the presence of dlopen, the set of crates and by extension the set of types implementing a type cannot be known at compile time.
To be clear: there is no trait downcasting possible, given a lazy generic instantiation, without some sort of global crate notion. dlopen or not. I think you know this too, but to explicitly call out dlopen-specific non-sense, allow me to quote the RFC text:
The deeper reason a shared schema cannot be precomputed in C is that the trait graph is lazily monomorphized: dyn Trait2 does not exist from C's point of view until a downstream crate instantiates it. No precomputation in C can fix a canonical layout that covers all future instantiations downstream crates might invent. A dynamic registry would have to codegen new vtables at runtime — effectively shipping a subset of the compiler — so this RFC rejects that path.
Maybe this can be addressed in the future (this RFC explicitly carves out Err->Ok behavioral changes for this reason), but certainly not today.
| In short: casts never cross global-crate boundaries, even when the trait | ||
| and struct definitions are literally identical on both sides. A cast | ||
| whose source object and call site carry different identities returns | ||
| `Err(TraitCastError::ForeignTraitGraph)`. |
There was a problem hiding this comment.
How is that implemented? When compiling an rlib it isn't known yet in which dylib or executable it will end up.
There was a problem hiding this comment.
non-global crates also don't have a "maximal" type system, in addition to what you ask. I've introduced a "delayed codegen" mechanism for functions that depend on such global info. In the trait-casting case, an additional global-crate-identity token is embedded into the artifact, and its address is used to verify the table and the index are from the same global crate, no matter the final linkage structure.
In short: rlibs are not usually global crates, so trait casting codegen doesn't happen in them. In case they are, the above applies and casts across the global crate boundary will categorically fail at runtime.
There was a problem hiding this comment.
So if I understand correctly, it will not be possible to ever bypass rustc as linker when any crate uses bounded trait casting as otherwise the delayed codegen never happens? Also delayed codegen seems bad for compile time performance.
There was a problem hiding this comment.
In the global crate delayed codegen requests are processed (and patched, actually) before codegen happens. This feature does not require any linker cooperation other than what we already have. Codegen requires only a minimal change (optimized_mir -> codegen_mir) to pick up the patched MIR bodies.
On perf, let me quote myself from the PR message:
Perf was evaluated with rustc-perf and the impact on crates that do not use trait casting (as-in, all of them) was found to be minimal. The mono-level SCC + Floyd-Warshall pass only runs over directly- and transitively-sensitive call graphs and stops at the ground-level caller, so crates with no cast graph pay effectively nothing. Perf is from slightly better to neutral on average across rustc-perf, with modest rmeta bloat at ~2.5% for typical crates.
It was actually engineered to be zero-impact where no trait-casting/lifetime-sensitivity is present (this feature changes mono to be able to be lifetime sensitive - a soundness requirement for the index intrinsic), which is all crates today. No cost is paid if this feature is not utilized, even for incr comp cases.
There was a problem hiding this comment.
When rustc does not do the linking, there is no crate that could have the role of what you call "global crate". The reason that for example Bazel and the Chromium build system bypass rustc as linker is because these build systems don't have any support for knowing the full set of rust crates that ends up in an executable at any point where rustc is invoked. And for rust for linux it would also require changes to the build system that are not localized to the parts responsible for building rust code.
There was a problem hiding this comment.
How do these build systems codegen without upstream crate metadata?
There was a problem hiding this comment.
I'd assume:
- rustc does the codegen
- every time rustc is called it's passed all the dependencies of the current crate so it does get the metadata files
- rustc is never told to do linking or that it's the root crate in the crate graph, it's always just told it's compiling a library crate
so there is no global root rustc invocation, but it does still work
There was a problem hiding this comment.
every time rustc is called it's passed all the dependencies of the current crate so it does get the metadata files
That implies the build system knows the full crate graph, and thus can label a suitable root/global crate.
There was a problem hiding this comment.
no, it's dependencies of the current crate, so it only has to know the subgraph that starts at the current crate (which may not be the root crate).
Safe, constant-time, minimal-space-overhead casting between trait objects that share a common root supertrait. A bounded trait graph is one rooted at a single explicitly-declared supertrait; that root names the closure of traits a cast may target, so the compiler can compute a per-type metadata table globally and resolve each cast with two loads and a branch. The user-facing surface is a
cast!(in dyn Root, expr => dyn U)macro (plustry_cast!andunchecked_cast!variants) that works for references,&mut, and ownedBox/Rc/Arc. Unlike ecosystem solutions, casting does not require'static, global registries, orTypeId, and remains correct across crate boundaries and generic instantiations.I have a working implementation I used to validate the details of this RFC: rust-lang/rust#155624
Rendered