New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closures Capture Disjoint Fields #2229

Merged
merged 10 commits into from Aug 19, 2018

Conversation

@samsartor
Contributor

samsartor commented Nov 29, 2017

TL;DR: Closures will capture individual fields and not the whole struct when possible.

This RFC proposes that closure capturing should be minimal rather than maximal.
Specifically, existing rules regarding borrowing and moving disjoint fields
should be applied to capturing. If implemented, the following code examples
would become valid:

let _a = &mut foo.a;
|| &mut foo.b; // Error! cannot borrow `foo`
let _a = &mut foo.a;
move || foo.b; // Error! cannot move `foo`

Resolves rust-lang/rust#19004

Rendered

Tracking issue

@petrochenkov

This comment has been minimized.

Contributor

petrochenkov commented Nov 29, 2017

Some previous discussion https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387

nikomatsakis Jun 13
If someone is interested in turning this into an RFC, I would love to mentor it! Please let me know.

Ok, ping @nikomatsakis

move || foo.a;
```
- Borrowck passes because `foo` is not borrowed elsewhere.
- The closure moves and captures `foo.a` but not `foo.b`.

This comment has been minimized.

@oli-obk

oli-obk Nov 29, 2017

Contributor

This won't work if foo's type implements Drop

This comment has been minimized.

@samsartor

samsartor Nov 29, 2017

Contributor

@oli-obk Thanks for catching that! For those examples I am assuming only cases where the desugar will compile, since it should be functionally equivalent to the final implementation. If foo implements Drop then it can not be destructed anyway, capturing or not.

# Unresolved questions
[unresolved]: #unresolved-questions
- How exactly does codegen change?

This comment has been minimized.

@oli-obk

oli-obk Nov 29, 2017

Contributor

If the struct is not used outside the closure, this will add one reference per field used in the closure instead of just one reference for the entire struct

This comment has been minimized.

@samsartor

samsartor Nov 29, 2017

Contributor

@oli-obk That would be the trivial implementation, but (as per previous discussion) the closure would be smaller if it captured a single nonexclusive pointer to the whole struct and then derefrenced with constant offsets.

captured. This was simple to implement but is inconstant with the rest of the
language because rust normally allows simultaneous borrowing of disjoint fields.
Remembering this exception adds to the mental burden of the programmer and
worsens rust's already terrible learning curve.

This comment has been minimized.

@Centril

Centril Nov 29, 2017

Contributor

Nitpick: Terrible sounds a bit strong - I'd agree to Rust having a steep learning curve, but terrible is a bit much.

@Centril

Well-written and great RFC! I have a single nit =)

@scottmcm scottmcm added the T-lang label Nov 29, 2017

@joshtriplett

This comment has been minimized.

Member

joshtriplett commented Dec 6, 2017

Thank you for this!

When discussing the rationale, I wold suggest also mentioning that this can make it harder to factor a group of related variables into a struct and provide functionality as methods on that struct, since that can then lead to borrow-checker errors that would not exist otherwise.

@RalfJung

This comment has been minimized.

Member

RalfJung commented Dec 6, 2017

If I understand correctly, this has the potential of significantly increasing the size of environments passed to a closure -- if the closure uses 10 of 12 fields, now it will have 10 pointers instead of 1 in the environment. Could that be a problem?

@eddyb

This comment has been minimized.

Member

eddyb commented Dec 6, 2017

We could use refinement typing (or some analogue thereof) internally and keep the same environment but give it a type/extra information to indicate how fields are/can be used.

@bestouff

This comment has been minimized.

Contributor

bestouff commented Dec 6, 2017

@RalfJung I don't see why the closure should use many pointers. The generated code should be exactly the same, it's just the borrowck which should be a bit more lax.

@glaebhoerl

This comment has been minimized.

Contributor

glaebhoerl commented Dec 6, 2017

@bestouff At the two ends of the source code and machine code that's true, the concern is in between where we'd still like the generated MIR to be type-safe. (Which is what @eddyb's comment was addressing.)

@samsartor

This comment has been minimized.

Contributor

samsartor commented Jan 17, 2018

I would love to have some more input on this. Any thoughts on MIR safety? Another option might be to make each individual field an upvar with hints to combine the pointers back together during codegen. Treat it more like an optimization.

Also, pinging the lang team, what needs to happen so that we can move forward?

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jan 25, 2018

My take: I would very much like this to happen. I've been wanting to, in fact, to @evanbergeron who a long time back pinged me on this topic, but I've just been too busy to invest a lot of thought into it. (@evanbergeron -- still interested in implementing? And, if I have the wrong github name, er, sorry, about that.)

My one hesitation here is that perhaps we need to invest a bit more effort into thinking through if there are backwards compatibility hazards? @RalfJung mentions that the size of a closure could increase, which I think is true, but I'm not too worried about it -- we could probably optimize it as well, if we really wanted to. Probably it suffices to work things through in the implementation phase and see what crater has to say and so forth.

OK, I've kind of convinced myself. I think we should merge this. =) Nominating just for a brief check-in with @rust-lang/lang team.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jan 26, 2018

OK, so we talked about this in the @rust-lang/lang meeting. That discussion helped me to remember some of the edge cases I was concerned about. I still think we should make this change, but it's important that we lay these things up out front.

First, I think we should define this with respect to a "simple reference desugaring" but of course leave it undefined what layout the closure actually has. This is I guess more or less what the RFC does, but I think I'd prefer something a bit more concrete. Basically, I want to be able to "think of" closures as capturing a set of paths -- even if, in the underlying code, we sometimes choose not to. (In reality, I think we actually will do the dumb desugaring until forced to do otherwise by some horrendously performing code.)

Mostly I'm referring to text like this:

Borrowck rules are altered so that capture-by-reference allows simultaneous borrowing of disjoint fields. Field references are either individually captured or all captured by a single nonexclusive pointer to the whole struct.

I don't want to alter any borrowck rules. These rules are defined in terms of MIR and I think they should stay the same. What I want to do is to alter the "MIR desugaring" of closures to capture longer paths instead of just local variables. This contains the scope of the RFC and means that we don't endanger soundness (though we can still alter runtime semantics, more on that later).

Currently, closures are defined roughly like so. We take the set of free variables that appear in the closure definition (those variables referenced by the closure) and we infer a borrowing mode for each one. We then create a closure with one field per free variable.

We would be altering the definition so that we define a set of "free places" (lvalues). We'll have to tinker a bit to find the best way to define this, per the edge cases I'm about to bring up. =)

So as a sketch:

|| foo(&a.b.c);

would desugar to

{
    let r = &a.b.c;
    NewClosure { r }
}

where the Fn impl then accesses self.r instead of &self.a.b.c.

Second, we should distinguish indexing from other paths. If you have a closure like this:

let mut vecs = (vec![1, 2, 3], vec![4, 5, 6]);
let inc0 = |index: usize| {
    vecs.0[index] += 1;
};

Clearly, this closure can't borrow vecs.0[index] at the time of creation, since index is not yet known. I guess the answer is obvious: it should borrow vecs.0 and then do the actual indexing during execution. I didn't see any discussion of this case in the RFC, though. But basically it seems like we can

Third, overloaded deref is kinda' special, but I think we don't need a special case for it. In particular, something like some_rc.foo (where some_rc: Rc<T>) actually desugars to something like:

let tmp = Deref::deref(&some_Rc);
&tmp.foo

As a result, assuming foo is some smart pointer supporting DerefMut, Rust code like this might still error:

let x = || use(&mut foo.bar);
let y = || use(&mut foo.baz);

This would desugar to:

let tmp = DerefMut::deref_mut(&mut foo);
let f0 = &mut tmp.bar;
let tmp = DerefMut::deref_mut(&mut foo);
let f1 = &mut tmpbaz;
let x = Closure0 { f0 };
let x = Closure1 { f1 };

This would create an error. You can get similar errors today without closures. Still, it seems like this is an orthogonal problem, and we may solve it eventually, which would then carry over to closures. Great.

Fourth, we can't capture paths that are moved unless we know that there are no destructors. The RFC does mention the need not to move a field that is contained in something with a destructor, which is good, but in general this RFC can have a subtle impact on when destructors run.

Consider:

{
    let tuple = (vec![], vec![]);
    {
        let c = || send(tuple.0);
    } // c goes out of scope here
} // tuple goes out of scope here

Today, in this code, tuple is moved into c and thus dropped at the end of the inner scope. Under this RFC, if I understood, tuple.0 would be moved into c and thus dropped at the end of the inner scope, but tuple.1 would be dropped later.

It seems pretty damn unlikely that this would cause problems for anyone. However, this kind of silent change to semantics definitely makes me nervous. I suspect we should not do it and instead, for places that are moved, we should just always move the underlying variable. That said, we could probably write some code that detects this scenario and reports an error, just to get an idea how much code out there might be affected.

(I don't think we need to settle it now, but one way to change this that might be safer is via an epoch. We could for example issue warnings in cases like the one above -- where the timing of when a dtor runs would change -- and then actually make the change only with opt-in. I don't feel great about that, though, and the transition might have to span 2 epochs (one to make it an error to do anything that would change behavior, and one to change the behavior).)


Thoughts?

@samsartor

This comment has been minimized.

Contributor

samsartor commented Jan 27, 2018

@nikomatsakis A lot of great points here! I'm already thinking of tons of improvements that can be made to the RFC. In the mean time, there are several items I would like to discuss a bit more.

The underlying goal of this RFC is to improve consistency in the language. One way to think about this: block expressions and closures should have the same behavior whenever possible.

I want us to keep that idea of consistency in mind.

Change borrowck or lowering?

I don't want to alter any borrowck rules.

You are right! The compiler (ideally) does borrowck at the MIR level. Closures don't exist in MIR. Consequently, the borrowck implementation should not be altered as a result of this RFC. Only the conceptual understanding of borrowck will change. The text of the RFC does not reflect this.

Desugaring & Closure Size

I think we should define this with respect to a "simple reference desugaring" but of course leave it undefined what layout the closure actually has.

That's pretty reasonable, I think I'll end up doing exactly that.

In reality, I think we actually will do the dumb desugaring until forced to do otherwise by some horrendously performing code.

We really only need to change lowering (the HIR to MIR conversion). The reason for the added complexity is closure size. Currently, capturing a variable by reference only ever adds one pointer to the closure data. However, implementing this RFC as a simple desugar may result in many extra, unnecessary pointers being captured. In keeping with the rust philosophy of "you don't pay for features you don't use", we want to find a way to merge several borrows into a single, nonexclusive pointer. In order to preserve the safety of MIR, this borrow merging has to be done as an optimization during codgen.

Note that this potential size increase does not apply to capture-by-move/copy, which should become less expensive no matter how we implement this.

Paths

Currently this RFC proposes that only true lvalues get split up. Encountering a deref*() or index*(..) should end the splitting. Whether the closure explicitly or implicitly calls deref/index, the generated MIR should be the same:

let foo = Box::new(Foo { a: 1, b: 2 });

// all generate the same MIR:
|| &foo.b;
|| &Deref::deref(&foo).b;
Closure0 { &foo };

This is consistant with the behavior of block expressions:

let mut foo = Box::new(Foo { a: 1, b: 2 });
let _a = &mut foo.a;
{ &foo.b; } // cannot borrow `foo` (via `foo.b`) as immutable because `foo` is also borrowed as mutable (via `foo.a`)

Admittedly, the RFC needs to make this rule more clear.

Drop Ordering

Today, in this code, tuple is moved into c and thus dropped at the end of the inner scope. Under this RFC, if I understood, tuple.0 would be moved into c and thus dropped at the end of the inner scope, but tuple.1 would be dropped later.

I panic!ed a bit when I saw this drop-ordering example, since it demonstrates a case where the lifetime of a reference might increase as a result of this RFC. Thankfully, I have been unable to turn that into a backwards-incompatible borrowck error. Still, that doesn't mean we are out of the woods. I am particularly interested to see if someone can cook something up involving NLL.

Alas, Rust does provide guarantees on field drop order (which this would change). We can't forbid the splitting of structs that contain Drop types (it defeats the whole point) and a warning would really suck. Unfortunately I haven't thought of a better solution yet. If anyone has any ideas, speak up!

@eddyb

This comment has been minimized.

Member

eddyb commented Jan 27, 2018

In order to preserve the safety of MIR, this borrow merging has to be done as an optimization during codgen.

I don't think this is well-defined because it requires cooperation between:

  • the closure type, from which the memory layout is computed independently of its uses
  • the instantiation(s) of the closure type (you can have many of them if you inline)
  • the definition of the closure body

So I don't think this can be an opportunistic optimization at all, which means it has to be encoded in the type and guaranteed throughout, even if behaves like something else.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Jan 27, 2018

@eddyb Yes, it would need quite a lot information about where the borrows came from. It can't be opportunistic, as you said.

Still, it has to happen during codegen if we want to avoid changing the borrowck implementation. And because the point is to decrease memory usage, it can be thought of as a sort of optimization.

Or maybe not. I could be thinking about this wrong!

@aturon aturon removed the I-nominated label Feb 8, 2018

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Mar 12, 2018

@samsartor

Argh, I apologize again for the lengthy delay in replying! This slipped off my radar, obviously.

Only the conceptual understanding of borrowck will change.

👍

we want to find a way to merge several borrows into a single, nonexclusive pointer. In order to preserve the safety of MIR, this borrow merging has to be done as an optimization during codgen.

Agreed. Doing a "merge" operation like that after borrowck seems perfectly reasonable. I am not even sure if needs to be detailed in the RFC -- just adding a note that closure representation is undefined and that we are free to do such a thing seems sufficient to me.

Note that this potential size increase does not apply to capture-by-move/copy, which should become less expensive no matter how we implement this.

Yep.

Currently this RFC proposes that only true lvalues get split up

I don't know what a "true lvalue" is -- can you define that a bit more precisely? It sounds like you mean "no dereferences", particularly given this quote:

Encountering a deref*() or index*(..) should end the splitting.

If so, I don't think this is a good idea. Consider this example:

struct Foo {
  a: u32,
  b: u32,
}

impl Foo {
  fn bar(&mut self) {
    let increment_a = || self.a += 1;
    let increment_b = || self.b += 1;
    // ...
  }
}

These closures ought to be borrowed (*self).{a,b}, but today of course they both borrow self (and hence get an error). But maybe you meant overloaded deref specifically?

This is consistant with the behavior of block expressions:

So, that behavior is specific to overloaded deref (and, in fact, in MIR borrowck, we've gone back to treating Box more fully, so that your code example is accepted).

We can't forbid the splitting of structs that contain Drop types (it defeats the whole point)

Say more? This restriction would only apply to fields of structs that impl drop and which move bits of themselves into closures. We already have similar restrictions on moving out from such structs, it seems reasonable to account for that here too. (We could probably give a decent error message too, spelling out that the entire variable had to be moved because of the Drop impl.)

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Mar 12, 2018

@eddyb

So I don't think this can be an opportunistic optimization at all, which means it has to be encoded in the type and guaranteed throughout, even if behaves like something else.

I don't agree here. I mean, what you are saying is technically true, but we know a lot about the way the "origin" of closure types, and we can certainly include hints that say that all these references come from "adjacent" things. Moreover, at the time that we are optimizing the MIR for the closure, we also have access to all of its creators, so we can change them in concert -- no?

@eddyb

This comment has been minimized.

Member

eddyb commented Mar 14, 2018

Moreover, at the time that we are optimizing the MIR for the closure, we also have access to all of its creators, so we can change them in concert -- no?

How would that work? Right now the layout is computed from the type itself, which can easily leak out (e.g -> impl Trait), and even if you add a layer of indirection, it would have to be stored in the MIR itself, like generator state. If you want to go that route, it could work, I suppose.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Mar 22, 2018

@eddyb

How would that work? Right now the layout is computed from the type itself, which can easily leak out (e.g -> impl Trait), and even if you add a layer of indirection, it would have to be stored in the MIR itself, like generator state. If you want to go that route, it could work, I suppose.

Something like that. Anyway I don't think it's so important.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Mar 22, 2018

@samsartor

Gentle ping =) I'd particular like to discuss the meaning of this comment of yours, which I still am not sure I understand:

Currently this RFC proposes that only true lvalues get split up

@samsartor

This comment has been minimized.

Contributor

samsartor commented Mar 22, 2018

@nikomatsakis Sorry about the silence on this, school's been keeping me busy.

Any overloaded deref calls can not be moved outside of the closure. The plan is to desugar || foo.a into Closure0 { &foo.a } instead of Closure0 { &foo }. However, we can not desugar || boxed_foo.a into Closure0 { &boxed_foo.a }. This is the same as desugaring || boxed_foo.deref().a into Closure0 { &boxed_foo.deref().a }. It changes when <Box as Deref>::deref is called. Rust functions have side-effects, so we can't randomly change when they fire.

My understanding is that overloaded derefs generate an additional statement in MIR. Unlike boxed_foo.a, simple_foo.a is totally pure with no side effects, and exists in MIR as only an lvalue. That is what I meant by a "true lvalue". What is the correct word for this?

@Boscop

This comment has been minimized.

Boscop commented Mar 22, 2018

I'm looking forward to having this automatic/implicit disjoint capturing in the language because I have to do it manually very often..

@samsartor
Shouldn't any sane impl of Deref have no side effects?
(Or if it has side effects, the author should ensure that those side effects are themselves "sane" and can be executed in any situation.)
I think it should be acceptable to auto deref in this case.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Mar 22, 2018

@Boscop Yes, but we can't guarantee that there are no side effects, and rust is a language of guarantees. I am mostly fine with partially pre-evaluating closures in theory, but it would be a breaking change.

More importantly, precalling deref would not make this RFC more useful. Paths that include an overriden deref are not disjoint (Deref::deref borrows self). So there's no point in splitting those borrows past the deref in the first place.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Jul 29, 2018

@samsartor move closures always capture by move today, and will continue to do so under this proposal.

In the example you referenced:

let foo = &mut a;
let other = &mut foo.other;
move || &mut foo.bar; // cannot move `foo` into closure because it is borrowed
somefunc(other);

Previously, the compiler would attempt to move the entirety of foo into the closure, and this would fail due to the conflicting borrow.

With this proposal implemented, the compiler would try to move just foo.bar into the closure, which should succeed assuming foo can be destructured. At no point is the compiler capturing by reference, or even capturing a reference by value.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Jul 30, 2018

@eddyb @Diggsey I think I see the issue here! Consider this example again:

let foo = &mut a;
let other = &mut foo.other;
move || &mut foo.bar; // cannot move `foo` into closure because it is borrowed
somefunc(other);

The value of foo is a reference (&mut a), which is moved into the closure. It isn't dereferenced, just moved as it is. This is backed up by experimentation. Now, since only foo.bar is used, we want to make the capture more minimal. Since it is a move closure, we might be tempted to move foo.bar, as suggested. Unfortunately, this would be a "move out of borrowed context" since foo is still a reference.

Instead, we have to either continue capturing all of foo or capture foo.bar by reference. Capturing it by reference may seem strange in this case (after all, doesn't a move closure, by definition, only capture by value?) but doing so won't break existing code. Since foo was always a reference, the closure can't have owned foo.bar in the first place. And the lifetime of foo.bar is the same as the lifetime of foo, which the closure is already limited to.

Remember, the closure didn't capture a, it captured foo, which is a reference to a.

Also, this is more consistent with the non-move case. if you have some closure which borrows parts of &self (the original motivating case for this RFC), making that closure move shouldn't suddenly break all the partial borrows of &self.

@eddyb

This comment has been minimized.

Member

eddyb commented Jul 30, 2018

@samsartor Ah I see the confusion. You want to do a capture of (*foo).bar that doesn't interfere with any other (*foo).not_bar. I'm not even sure move is that relevant here.
Does this RFC as-is allow disjoint capture of (*foo).bar? I will admit I have not considered it.

Based on #2229 (comment) I'm guessing @nikomatsakis has thought about it.

In that case, I think I'm okay with a move closure capturing a foo: &mut T to actually reborrow it, i.e. "capture *foo by reference" - which is already a concept we don't have, and this RFC introduces sort of implicitly. Maybe it should be split out?

We can make this example work without anything field-specific:

fn main() {
    let x = &mut 0;
    (move || *x += 1)();
    println!("{}", x); // use of moved value: `x`
}

(in fact, reborrow-before-move/one-level-of-deref should be easier to deal with than fields)

Once move closures get reborrows of mutable references (as opposed to actually moving the originals), I don't think they should be any different wrt fields than by-reference captures.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Jul 30, 2018

@samsartor ah, I see.

Extending this to Box seems inconsistent with the way the borrow checker currently works:

http://play.rust-lang.org/?gist=fa206dc89bc41381ef100ecf4b397553&version=nightly&mode=debug&edition=2015

Re-borrowing a sub-field in a move closure does seem slightly confusing, but probably OK because you are originally moving a reference, so it's not too different.

@eddyb

This comment has been minimized.

Member

eddyb commented Jul 30, 2018

@Diggsey Note that Box is intentionally neutered to allow as many choices as possible for when we eventually allow applying that "deref to owned contents" behavior on arbitrary types.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Jul 30, 2018

@eddyb sure, but I still think it would be inconsistent to start allowing it in this new feature, without "un-neutering" it in the base case.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Jul 30, 2018

@eddyb

Ah I see the confusion. You want to do a capture of (*foo).bar that doesn't interfere with any other (*foo).not_bar. I'm not even sure move is that relevant here.

Yes, exactly! move isn't really relevant at all. It's just behavior inherited from non-move closures. I always wanted that to work, but hadn't really figured it out properly until now.

The way I see it, pre-borrowing a binding is a way to force a move closure to behave like a nonmove closure in respect to that binding. At least that's how I've thought about it while coding.

which is already a concept we don't have, and this RFC introduces sort of implicitly

It does need to be written out properly, which is why this discussion is happening in the first place!

Maybe it should be split out?

I don't think so. Again, I see this as something shared between both types of closures. Having borrow splitting work in one but not the other seems frustrating. I'm really picturing the &self case here.

@Diggsey

Extending this to Box seems inconsistent with the way the borrow checker currently works

You are right! I didn't really intend that. To be honest, I just assumed that Box was more special than it really is without checking. Obviously Box will need to be "neutered" in this case too. Still, it's a useful case to consider, and is still different even once neutered. I'll update the examples at the next opportunity.

@rfcbot

This comment has been minimized.

rfcbot commented Aug 6, 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Aug 6, 2018

Hmm, the interaction of move with partial borrows does seem somewhat confusing to me.

I think that the right way to think of move is really as "detach" -- that is, a move || closure is detached from its parent stack frame, and hence may be returned, used to spawn a thread, etc. Non-detached closures (that is, the default) may retain links into the parent stack which would prevent that (though they aren't required to do so).

In this case, yes, a closure can capture (*foo).bar by reference and still be detached from the parent stack frame (in NLL terms, it is not an error to have StorageDead(foo) while *foo is borrowed). I'm not sure of the details here though: I would definitely like that, before we stabilize, we try to write out in some form of "language docs" what algorithm we will ultimately use to decide what paths get captured.

@Centril

This comment has been minimized.

Contributor

Centril commented Aug 6, 2018

@eddyb @nikomatsakis is this RFC ready to be merged as-is?

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Aug 7, 2018

@Centril personally, I think so. At this point, what remains is mostly "work out the nitty gritty details"

@samsartor

This comment has been minimized.

Contributor

samsartor commented Aug 7, 2018

@nikomatsakis I like the phrasing of move as detach. Do you want me to write that out in the RFC text before the merge button gets pressed? I have WiFi for a few more hours today, before I head back up into the mountains.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Aug 7, 2018

@nikomatsakis @eddyb I was in the process of writing up move-as-detach, just for kicks, but I don't think it is general enough. Consider this case, which is allowed by today's compiler.

fn hello() {
    let vec = vec![1, 2, 3];
    let v = &vec;
    let foo = move || v[0];
}

The closure foo is valid, but not actually detachable from the parent stack frame. Attempting to return it raises the error that v does not live long enough. Even making vec a Copy type does not change the fact that v is not dereferenced when constructing the closure. It is simply not detachable, despite being move. This is the same sort of case we have been debating all week.

I'm thinking instead about it as a sort of decision. When lowering from HIR to MIR (which we will assume is prior to borrowck), we have to decide what kind of value each capture expression will return. In a non-move closure, the value should always be a reference, unless the closure requires ownership. In a move closure, the value should always be owned, unless ownership is not available (e.g. in the above case).

If we decide to produce a reference, then the borrow is split as possible. If we decide to produce a value, then structs will be destructed as possible.

How does that sound? I'm going to go ahead and add it to the text, since I am going to be away from the internet until Thursday. I'll be able to respond to any new concerns then.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Aug 9, 2018

@samsartor

I was in the process of writing up move-as-detach, just for kicks, but I don't think it is general enough.

I'm not sure quite what you mean. In any case, it is as @eddyb said -- the move keyword means that the compiler doesn't create references into the enclosing stack frame. But you it does not mean that no such references exist -- just that the compiler hasn't added any. It is tricky to phrase it properly.

In a move closure, the value should always be owned, unless ownership is not available (e.g. in the above case).

This isn't really how I think about it. Put another way, in your example, you do own the value v -- it just happens that what you own is a reference.

That said, the example you gave is somewhat complicated by the use of indexing. I assume we will not capture "some random index", since we don't suport moving out. Perhaps a better example is:

fn hello() {
    let tuple = (1, 2, 3);
    let p = &tuple;
    let foo = move || p.0;
}

Does this capture p.0? In this case, we could (as it is Copy). But often we can't.

I guess I still feel like we should move on from the "RFC" period -- because I think we really want to solve this problem and we should move on to implementing and exploring the solutiosn -- but I do think there is a lot of "nitty gritty" to work out here. I worry about the rules being quite hard to follow. :/

@samsartor

This comment has been minimized.

Contributor

samsartor commented Aug 9, 2018

@nikomatsakis The way I see it, that example would probably capture a reference to p.0, and then copy when the closure is evaluated. We could do it differently, capturing the actual integer value, since the tuple is of Copy types. However, if the tuple was of non Copy types, then the generated MIR would certainly borrow p.0, and then the closure body would attempt to "move out of borrowed context". We have to lower to MIR successfully even in invalid cases because of NLL.

I don't like saying that "a move closure simply doesn't create new references" because the borrow of p.0 above is, in a sense, a new reference. It isn't that we moved p, we have literally generated a new pointer offset from p (ignoring the planned memory optimization stuff).

Also, in my example, the indexing isn't relevant at all. It's just there to make the closure a little less boringly trivial.

Really, it's all just semantics at this point. We all intuitivly know how this should go, and I agree that we really just need to get some code down and see how it works in practice. Are you happy with the current text? I can change anything today if you want. I would be nice to see this merged in the next day or so.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Aug 9, 2018

@nikomatsakis I think I see what is going on here!

This isn't really how I think about it. Put another way, in your example, you do own the value v -- it just happens that what you own is a reference.

You are right of course. However, this RFC is about how captures are split up, made more minimal, so that closures work better alongside mutable references and partial moves. How we split up a capture does not depend on wether the closure is move or not, only on wether the capture is a reference or not.

It only makes sense to split a reference by borrowing fields. It only makes sense to split a value by moving fields.

If a move closure captures any kind reference, then we split by reference. If a non-move closure is forced to capture a value, then we split by value (that is, destructure).

Sure, strictly speaking, that move closure is moving the reference. But it's still a reference, so it is split up by reborrowing. New borrows are created.

Am I making any sense?

@Centril

This comment has been minimized.

Contributor

Centril commented Aug 9, 2018

Whenever you want this RFC merged, please write a comment ccing me saying so :)

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Aug 10, 2018

@samsartor interesting. I think that is going in the right direction, yes. You can think of it perhaps as a kind of iterative refinement. Let's say we start out capturing all of the local variables. Then we iteratively make those paths more precise based on which paths are used in the closure and how.

A capture of some place P will get refined to a capture of *P or P.f -- and this can happen many times, until we either reach a place that cannot be refined (because, say, it implements drop) or until we reach a path that is actually directly used in the closure. And yes whenever we refine a path that has type &T or &mut T to something more specific, we will wind up capturing a reference to that final path.

Seems to make sense.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Aug 10, 2018

@nikomatsakis YES! Looks like we are on exactly the same page.
@Centril I'm ready for this to be merged.

@samsartor

This comment has been minimized.

Contributor

samsartor commented Aug 16, 2018

@nikomatsakis @Centril Ping again. Can we merge this now? It sounds like everyone is happy.

@Centril

This comment has been minimized.

Contributor

Centril commented Aug 16, 2018

@samsartor apparently your first ping didn't reach me :/ I'll merge this sometime today :)

@Centril Centril merged commit 16ea7f6 into rust-lang:master Aug 19, 2018

@Centril

This comment has been minimized.

Contributor

Centril commented Aug 19, 2018

Huzzah! This RFC has been merged!

Tracking issue: rust-lang/rust#53488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment