New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit `fn(self)` methods to be invoked on object types #10672

Closed
nikomatsakis opened this Issue Nov 26, 2013 · 15 comments

Comments

Projects
None yet
5 participants
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Nov 26, 2013

Currently, methods declared as self cannot be invoked on object types. The reason for this is that, without knowing the type of the receiver, we can't know whether self is to be passed with indirection or as an immediate value.

For example:

trait Consume { fn take(self) { ... } }
impl Consume for int {
    fn take(self) { ... /* self should be passed as an immediate here */ ... }
}
impl Consume for MyBigStruct {
    fn take(self) { ... /* self not passed as an immediate here */ ... }
}

So now if I have a ~int casted to ~Consume, and I were to invoke take(), I would want to load from the ~int, pass the loaded value, and then free the ~int after the call returns (or maybe in a different order; I don't want to think too hard about weird failure cases since that's not really the point of this issue). If I have a ~MyBigStruct, I would want to pass the pointer itself to take (since take expects a MyBigStruct*, essentially). After the call returns, take() will have freed the MyBigStruct but not the ~ pointer itself, so I can shallow free the pointer. But of course all I know at codegen time is that I have a ~Consume and thus I can't distinguish these two cases.

This is however rather inconvenient, as the above example shows. It is particularly inconvenient since using a ~self method isn't really a good alternative, particularly if you try to implement the trait for the object type:

trait Message { fn send(~self) { ... } }
impl Message for ~Message { fn send(~self) { /* self: ~~Message */ } }

Now I need a ~~Message! Silly. This would work fine if send() were a self method, though.

The thing is, if we were a bit more clever, we could permit by-value calls on object types. We can just say that for virtual calls self is always passed indirectly, and then generate a shim function to use in the vtable that does a load for immediate receivers.

Nominating although I think this is something that can possibly wait till post 1.0, since it's not a backwards compat question. I'd still call it high priority (presuming others agree with my reasoning).

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Dec 2, 2013

I am of two minds on this issue.

One part of me likes simple rules of thumb like: "eschew fn foo(~self); do foo(self) instead." (as noted on niko's Thoughts on DST part I). So that part of me would like such methods to actually be supported on object types.

Another part of me likes to have a simplified compilation model in my mind, with as little magic as possible (or at least the potential for ALMAP ... I am after all still a GC guy...). And that part of me figures we should keep things the way that they are; if you want a method to be invokable on the objects of a trait, then the type for the self pointer needs to be reflected in the method's type, including which kind of pointer it is.


Having said that, I haven't thought of a concrete reason to disallow the generalization suggested here, beyond mental complexity when trying to understand the control flow from the invocation point to the entry to the method itself.

For which pointer-variants would we generate these shim functions? Will it only work for ~Trait objects? Seeing as how we are taking self by value, it seems like we could get into trouble if we allow it for &Trait and even &mut Trait (especially since self might have non-copyable state, and so implicitly loading a copy from &mut Trait seems bad).

Or is this something that we can infer on a impl by impl basis, choosing the most general option that is sound?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 2, 2013

On Mon, Dec 02, 2013 at 08:43:47AM -0800, Felix S Klock II wrote:

Another part of me likes to have a simplified compilation model in
my mind, with as little magic as possible (or at least the
potential for ALMAP ... I am after all still a GC guy...).

In general I agree, but, if we don't wind up with a DST-based system,
this makes it impossible to have an object method that "gives up" the
value without a double-~. That seems pretty bad. I also don't
consider simple adaptation methods in the vtable to be particularly magic.

For which pointer-variants would we generate these shim functions?
Will it only work for ~Trait objects? Seeing as how we are taking
self by value, it seems like we could get into trouble if we allow
it for &Trait and even &mut Trait (especially since self might
have non-copyable state, and so implicitly loading a copy from &mut Trait seems bad).

As magic goes, what I proposed is pretty minimal. Basically a variant
where the self pointer is always taken indirectly -- to put another
way, it'd be roughly equivalent to:

fn shim(self: *Self, ...) {
    unsafe { actual(*self, ...) }
}

That said, this interacts with the smart pointers. I outlined some
details in a recent blog post. In particular supporting types
like Smaht<T> may require generating shims for each method that
perform the necessary derefs, in which case handling fn(self)
might get folded in.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Dec 2, 2013

I have closed #9893 in favor of this bug, but if this bug is rejected then that bug should be re-opened (because the error message today isn't exactly ideal).

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 5, 2013

As part of another discussion, I realized that this is a special case of a more general sort of pointer. For want of a better name, I'll call it a "my" pointer: my 'a T -- this kind of pointer would mean that you own the referent of the pointer, but not the memory itself. In other words, you are obligated to free the T, but not the memory where the T is stored -- that memory will be otherwise freed sometime after the end of lifetime 'a.

So the "adapter" methods I described for fn(self) are really accepting a fn(my self) -- you get to use the contents of the memory, but the memory itself you do not own.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Dec 6, 2013

Is this the same mythical pointer type that's needed to make once stack closures as DSTs work? (Maybe that's what the my stands for?;)

I think it makes sense to view this as a natural progression on & and &mut: with & you can only read, &mut gives you permission to write, and this new one (&my? &once? &take? &move?) also gives you permission to move (and the associated obligation to deinitialize).

I think it also makes sense to think about the converse: a pointer to potentially-unitialized memory which you have the obligation to initialize, corresponding to C#'s concept of an out parameter. Presumably once you initialize it, it would go out of scope, and perhaps leave you with an &mut. This also happens to be exactly the same thing as the return values on functions, as far as I can tell, just a little more flexible.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 6, 2013

Yes, i've spent some time thinking about "obliation to initialize" -- that is much harder. In particular, that would have to be a linear value -- meaning it could not be 'dropped' (except possibly by failure -- but even there we have to be careful). Currently, though, we assume all values can be dropped, and hence permit generic functions like the following:

fn drop<T>(x: T) { }

and I invoke with a &out pointer, that data will never be initialized.

The natural implication is that either (1) dropping a type parameter must become an explicit operation:

fn drop<T:Drop>(x: T) { ... x.drop() ... }

(Incidentally this only works if drop() takes self by-value, I guess, but we could address this somehow) or (2) we need some kind of default bound setup like sized/unsized.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 6, 2013

I guess that the natural pointer type for drop() would be &my, actually. It would avoid all the annoying problems we found trying to make drop have time fn(self).

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Dec 7, 2013

You're right, I was sloppy in my thinking about the out-pointer. I was thinking it would error if you reached the end of the function without initializing through it, but that obviously didn't take generic functions into account.

I wonder how many generic functions would actually wind up needing a Drop constraint under this scheme. Sounds like it would be a lot, but it's not like I've actually checked - maybe it's less than I assume (or maybe it's more). If the only benefit were having &out pointers I'd say it's not remotely worth it, but maybe it would also be helpful/necessary for the existential/erased/higher-rank things.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jan 9, 2014

Not a backcompat risk; so accepting at P-low. (If one finds more evidence to up its priority, then renominate.)

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Jan 15, 2014

I couldn't fall asleep last night because I was thinking about pointer types; how's that for fun? &out, &mut, and&move (yes, I think that's the right name - goes well with &mut) have some fascinating and non-obvious interactions.

As you alluded to, the problem of &out and generics could possibly be resolved, in a worse is better kind of way, by just making its destructor invoke fail!(). (As before, assigning through the pointer would count as a move, and thus "cancel" it.) When the types are known to the compiler, it can issue a warning or error statically (so similar to how @mut was). Certainly not ideal, but possibly a solution. EDIT: I think the precise requirement is that if an &out 's T is allowed to die without being assigned through, its destructor must run before the end of the borrow, i.o.w. the lifetime 's. The code it was borrowing the T from will be prevented from accessing it until after 's, so as long as this holds, I think we should be fine.

(I also don't see any reason why making &out copyable would be unsafe, it's just that... you would rather not. Its presence is an obligation, and why multiply your obligations? EDIT: Scratch this, this is doubly hogwash if &out pointers can turn into &mut, as below.)

As before, after assigning through an &out it could leave you with an &mut:

fn assign<'s, T>(to: &out 's T, from: &move T) -> &mut 's T

I think this is fine, because if someone lends you an &out pointer to their uninitialized memory, even if it's declared immutable, they can't observe whether you've written to it only once or many times before they regain access. (Interestingly, you can safely borrow an &out to either uninitialized data of any mutability or to initialized, mutable data, just not to initialized, immutable data.) There's also something like the reverse operation: you can deinitialize an &mut if you promise to reinitialize it again before control passes back:

fn take_then_put_back<'s, T>(from: &mut 's T, take: |&move T| once) -> &out 's T

These two effectively reify the cycling of data between initialized and uninitialized states as a first class abstraction for the programmer.

Using the above we can implement swap in safe code:

fn swap<T>(mut_a: &mut T, mut_b: &mut T) {
    let mut x: T;
    let out_x = &out x;
    // `take_then_put_back` taking a once fn is important so we can move `out_x` into it
    let out_b = take_then_put_back(mut_b, |move_b| *out_x = *move_b);
    let out_a = take_then_put_back(mut_a, |move_a| *out_b = *move_a);
    *out_a = x;
}

I suspect this means that arbitrary permutations could also be expressed using only safe code.

&out is also useful for placement new:

fn emplace_back<'s, T>(self: &mut 's Vec<T>) -> &out 's T

where it allocates an element and returns the obligation to initialize it. Smart pointers:

fn new_rc<'s, T>(self: &out 's Rc<T>) -> &out 's T

where it initializes the pointer, allocates the box, and returns the obligation to initialize its contents.

(I also suspect it could be used to tie the knot, though I haven't figured out a way to do it without unsafe.)

Anyway, I used to think &out is just a curiosity, but now I kinda want it.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Jan 28, 2014

@nikomatsakis I'm starting to have second thoughts, not about &my/&move itself, whose semantics are clear (permits moving out, destructs otherwise), but its interaction with ~T. You wrote:

this kind of pointer would mean that you own the referent of the pointer, but not the memory itself. In other words, you are obligated to free the T, but not the memory where the T is stored -- that memory will be otherwise freed sometime after the end of lifetime 'a.

I suppose, if it stays part of the language, we could hardcode this for ~, but how could this possibly be generalized to library smart pointer types without some kind of runtime drop flag? How could the destructor of a library Ownedish<T> know whether or not its contained object has been &moved out, and therefore whether to run its destructor and/or whether it is otherwise safe to access?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jan 28, 2014

I suppose, if it stays part of the language, we could hardcode this for ~, but how could this possibly be generalized to
library smart pointer types without some kind of runtime drop flag?

I envisioned a distinct method being called for shallow drop, in this case. Clearly the pointer type would have to opt-in. It's actually already an issue of sorts, in that we would like to be able to move out of smart pointer types.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jun 11, 2014

@pcwalton

This comment has been minimized.

Copy link
Contributor

pcwalton commented Jun 11, 2014

Nominating, P-backcompat-lang—needed for unboxed closures to replace proc().

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 12, 2014

Already marked P-backcompat-lang; adding to the 1.0 milestone.

@pnkfelix pnkfelix added this to the 1.0 milestone Jun 12, 2014

@alexcrichton alexcrichton removed the P-low label Jun 12, 2014

@pnkfelix pnkfelix removed the I-nominated label Jun 12, 2014

pcwalton added a commit to pcwalton/rust that referenced this issue Jun 28, 2014

librustc: Permit by-value-self methods to be invoked on objects
referenced by boxes.

I can't believe this worked! I believe that the way the ABI and
immediates work mean that this Just Works.

Closes rust-lang#10672.

pcwalton added a commit to pcwalton/rust that referenced this issue Jun 28, 2014

librustc: Permit by-value-self methods to be invoked on objects
referenced by boxes.

This is done by creating a shim function that handles the cleanup of the
box properly.

Closes rust-lang#10672.

pcwalton added a commit to pcwalton/rust that referenced this issue Jul 1, 2014

librustc: Permit by-value-self methods to be invoked on objects
referenced by boxes.

This is done by creating a shim function that handles the cleanup of the
box properly.

Closes rust-lang#10672.

bors added a commit that referenced this issue Jul 1, 2014

auto merge of #15242 : pcwalton/rust/self-in-trait-methods, r=alexcri…
…chton

I can't believe this worked! I believe that the way the ABI and
immediates work mean that this Just Works.

Closes #10672.

r? @alexcrichton

@bors bors closed this in #15242 Jul 1, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment