Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtually dispatched trait methods #3440

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions text/3440-virt-self.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
- Feature Name: `virt_self`
- Start Date: `2023-05-30`
- RFC PR: [rust-lang/rfcs#3440](https://github.com/rust-lang/rfcs/pull/3440)
- Rust Issue: [rust-lang/rust#3440](https://github.com/rust-lang/rust/issues/3440)

# Summary
[summary]: #summary

Enable virtual dispatch of trait methods.

# Motivation
[motivation]: #motivation

Coming to Rust from an OOP language such as C++ one is told to favor composition over inheritance. In general, I think that's a great thing. However, lack of language features such as delegation and what this RFC proposes can make certain patterns more difficult to express than they should be.

Consider the following situation. Say we have a trait `System` that models a given system.

```rust
pub trait System {
fn op1(&self) -> SystemOp1;
fn op2(&self) -> SystemOp2;
// etc.
}
```

Assume that we now have a particular impl. `SystemA` which implements `op1` through a reference to `op2`. For instance,

```rust
// asume the following
type SystemOp1 = i64;
type SystemOp2 = i64;

pub struct SystemA {

}

impl System for SystemA {
fn op1(&self) -> SystemOp1 {
self.op2() * 5
}
fn op2(&self) -> SystemOp2 {
3
}
}
```

Assume we now want to have a general purpose wrapper that allows us to somehow map the result of `op2`, say. For instance,

```rust
pub struct DoubleOp2<S: System> {
sys: S,
}

impl<S: System> System for DoubleOp2<S> {
fn op1(&self) -> SystemOp1 {
self.sys.op1()
}
fn op2(&self) -> SystemOp2 {
self.sys.op2() * 2
}
}
```

Clearly, this has the intended effect of changing `op2`. However, it also has the unintended effect of keeping `DoubleOp2<SystemA>::op1()` out of sync with `DoubleOp2<SystemA>::op2`. We got static dispatch when in this context virtual dispatch made more sense.

Of course, in Rust, we usually associate dynamic dispatch with the `dyn` keyword. However, `dyn System` only gives a vtable for a particular impl without virtualizing any subsequent calls. In other words, calling `op1` through `DoubleOp2<SystemA> as dyn System` will still call `SystemA::op2`.

Of course, this behaviour may be what is desired. But sometimes, virtualizing at depth is more appropriate. Certainly when adopting OOP patterns.

When I first stumbled upon this, I wondered if I could simply take `self` as `&dyn System` (or `Arc` or `Box` equivalent, etc.). Nope.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this allowed by arbitrary self types?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would definitely help for quite a few use cases, yes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. To what degree does #![feature(arbitrary_self_types)] obsolete or outmode this RFC? If what you want (essentially, "more control over vtables") can be built on that more ergonomically, then I think it would be most appropriate to redraft this RFC with that feature in mind.


The code I was working on required me to keep the trait `System` as one unit so I did not investigate possible ways to split it up.

What I came up is passing a &dyn System virtualized self so now System looked like this

```rust
pub trait System {
fn op1(&self, vself: &dyn System) -> SystemOp1;
fn op2(&self, vself: &dyn System) -> SystemOp2;
}
impl System for SystemA {
fn op1(&self, vself: &dyn System) -> SystemOp1 {
vself.op2() * 5
}
fn op2(&self, vself: &dyn System) -> SystemOp2 {
3
}
}
```

And then calling `op2` with essentially another copy of `self`:
```rust
let system = DoubleOp2<SystemA> {};
system.op2(&system2); // works!
```

Of course this design is a bit clunky, but is very powerful. It allows complete control over what is virtualized and what isn't at any point in the call chain for every implementation.

As I have been thinking about this more and more, I wondered what Rust with this in the language would look like. This RFC proposes a sample syntax (bikeshedding welcome, but please focus on the concepts).

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Essentially, we now allow `&virt self` as a shorthand for the above.

```rust
pub trait System {
fn op1(&virt self) -> SystemOp1;
fn op2(&virt self) -> SystemOp2;
}
```

The effect is that rust will manage two pointers - one concretely typed and one dynamic. The The concretely typed (i.e. `self`) follows the existing Rust rules. The dynamic one is either `self` as `dyn System` again or an existing dynamic `self` (this is like choosing `self` or `vself` in the previous section).

Syntax could be e.g.
```rust
impl System for SystemA {
fn op1(&virt self) -> SystemOp1 {
// alternative 1
virt(self).op2() * 5
// alternative 2, I like this one most because it reminds me of C
self->op2() * 5
// something else...
}
fn op2(&virt self) -> SystemOp2 {
3
}
}
```

Working with the second syntax, the difference between `self.op2()` and `self->op2()` would be static vs dynamic dispatch. In other words, it would allow us to call `DoubleOp2<SystemA>::op2()` from within `SystemA::op1()` when the call to it is made from `DoubleOp2<SystemA>::op1()`.

If the called method is also declared virt, using `self->op2()` will retain the `vself` the same as originally passed to `op1`. Otherwise, it will replace it by `self as &dyn System`.

Outside of traits, `a->b()` wouldn't compile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax that is sensitive to the trait context but is not part of the trait's declaration syntax, but rather is part of the expression syntax, is extremely unusual. This would mean that macros that work inside a trait's functions do not work inside other functions. This would be very bad for the ecosystem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But macros that use this syntax would be meant for the trait case, otherwise normal syntax will still work as it does today. So what is really lost?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in syntax generation through macros, an expression is an expression and you don't have to think about it beyond that. This would create a "subclass" of $trait_expr that could not be used where all $expr could be used. This is a severe problem for macros concealing their implementation details:

Now macros that pass in an expression using this syntax to the inside of a trait and use that to do some work break if they instead pass the expression through to inside free functions or closures to do that work. In order to create slightly more opacity for traits, it blows a giant hole through opacity for macros.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( maybe that's a bit strong, but "expressions are expressions" is kiiinda important in a general sense. )


```[error-xxx] trait virtual methods can only be called virtually from within the trait```

Instead, only the traditional syntax of `a.b()` will be allowed and this would simply use a `vself` of `a as &dyn Trait`.

I believe this feature will make Rust more user-friendly to people more inclined to think in OOP terms or who (like me) simply found themselves writing code in a domain that is very amenable to an OOP approach.

The syntax changes are relatively minimal and there is no extra cost for code that does not use this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will become possible for this recursively virtualized pattern to be invoked via reference to a static, even inside a fully concrete function. My intuition suggests this could make optimization significantly harder in some cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Care to elaborate here? Rust already knows when you're calling into a trait (forcing you to import it in scope) so any such call should simply use the ref to static as the vself? Or did i miss your point?

Copy link
Contributor

@workingjubilee workingjubilee Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My statement is that by just adding this feature, now functions that internally reference global state can have dependencies that are already hard to resolve at compile time become almost impossible to resolve at compile time, so they may not be statically resolved and thus devirtualized as easily as they could before. This could impact even programs that do not use this feature. It could be that this doesn't actually make anything worse for optimizers, but I suspect it does, so it's probably the case that it is not true that it imposes no costs anywhere else. Many languages, by adding some additional undecidability somewhere, defeat optimization much harder than they may have before.


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Implementing the above for immutable references is easy. However, mut references are inherently unsafe as we will be aliasing self initially.

Solution is to either outright ban the above for `&mut self` or only allow it in unsafe contexts (e.g. treat it as `mut ptr`).
Comment on lines +148 to +150
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unacceptable for &mut self to have differing semantic guarantees in this context, because this makes analysis of Rust code much harder. This must make clear this would only be allowed with *mut T. Not even unsafe code is allowed to break the aliasing rules concerning &mut T.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that. I guess my point was that just like with refs to statics, the mutable case is more complicated and thus may necessitate unsafe {}. But point well taken!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static is unique and special and static mut should be regarded as a mistake, or at least, not something one should build any precedents off of. There are many subtleties regarding statics, like how referencing them is semantically equivalent to borrowing them, last I checked, which are non-obvious and don't really pattern elsewhere in the language. This is to be expected: the rest of the language doesn't really "do" global state, nevermind global mutable state.


# Drawbacks
[drawbacks]: #drawbacks

I fail to see any major drawbacks besides the override of `->`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Addition of context-specific syntax is a significant drawback.
  • Incompatibility with &mut self is a significant drawback and a sign this proposal is insufficiently general.
  • There are many other possible uses we might prefer to put -> to that would conflict.

Please think about and try to enumerate more of the drawbacks beyond what I can think of off the top of my head.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to reply. I find your critique insightful.

The reason why I don't mind the context sensitivity (or see it as an issue) is because of e.g. await in async context. But I see why someone may not like more of that. But mind you, from outside the trait what I'm suggesting simply looks like any old trait call.

Regarding mut self, I just meant it's not something that can obviously live in safe Rust and that's ok. To me it's just like yet another form of ref to self (and as such carries its own pros/cons just like &self, &mut self, self, Box, etc.)

Specific syntax is not that important. I am sure there are other proposals about ->.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const context is another example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const for the most part doesn't change what raw syntax is permissible. A function in const is still a function and is still called as a function. You are right about async, but it is distinct in that async is a distinct addition to the language, rather than pervading all traits. By being "boxed up" into well-defined spaces, rather than "all traits", they are better-contained.

Also, the "keyword generics" initiative has been launched to try to find a solution to the problem I mention.


# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

- Why is this design the best in the space of possible designs?

It is intuitive and doesn't clutter the signatures unnecessarily.

- What other designs have been considered and what is the rationale for not choosing them?

Explicit argument passing considered above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not give the rationale.


- What is the impact of not doing this?

The impact for experienced Rust users is probably low as they could figure their way to a solution. However, the impact for new Rust users coming from an OOP language may be great.

- If this is a language proposal, could this be done in a library or macro instead? Does the proposed change make Rust code easier or harder to read, understand, and maintain?

Admittedly, a lot of the above can be done with a macro but it would be awkward to use on the initial call site. Also, the generated code would necessarily have to expose the `vself` parameter and the IDE experience may not be that great. Also, a macro would find it hard to differentiate between mut and immutable references and will lead to subpar error messages.

# Prior art
[prior-art]: #prior-art

OOP languages naturally support this via implicit or explicit virtual markers.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

The case of `mut self`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't state what the unresolved question actually is.


# Future possibilities
[future-possibilities]: #future-possibilities

I have to think if the above interacts negatively with async but it doesn't seem to on a first pass.