New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Single-entry / multiple-exit regions for borrows #396

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
@zwarich
Copy link
Contributor

zwarich commented Oct 14, 2014

@zwarich

This comment has been minimized.

Copy link
Contributor

zwarich commented Oct 14, 2014

from the function entry to V contains Entry.

2. If V is in R and P is a path from Entry to V that does not leave R and
reenter through Entry, then every vertex of P is in R.

This comment has been minimized.

@kballard

kballard Oct 14, 2014

Contributor

This sounds tautological. "If [...] P is a path [...] that does not leave R [...], then every vertex of P is in R". Surely the condition "P is a path [...] that does not leave R" is identical to "P is a path where every vertex of P is in R", which makes this "If P is a path where every vertex of P is in R, then every vertex of P is in R".

This comment has been minimized.

@zwarich

zwarich Oct 14, 2014

Contributor

I first came up with the more technical definition below and then tried to make a plain English counterpart.

What I was attempting to capture was the part about reentering through Entry. For example, in a program fragment like

loop {
    [Entry]
    [Exit]
}

there is allowed to be a path from Entry to Exit, back around the outer loop that reenters Entry, without all of the vertices on that path being in the region. On second thought, it seems that this would be fine:

If V is in R and P is a path from Entry to V that does not contain Entry multiple times, then every vertex of P is in R.

This comment has been minimized.

@kballard

kballard Oct 14, 2014

Contributor

The idea of counting uses of Entry seems a little odd. Could you not use something like the following:

For every pair of vertices V and U in R, every path from V to U that contains vertices not in R also contains Entry.

This comment has been minimized.

@zwarich

zwarich Oct 14, 2014

Contributor

That seems equivalent and a bit simpler, so I'll go with that.

This comment has been minimized.

@kballard

kballard Oct 14, 2014

Contributor

Or the more general definition, which subsumes the first condition:

For all vertices V in R, for all vertices U not in R, every path from U to V contains Entry.

Instead of evaluating paths only from the CFG entry to V, evaluating from all nodes not in R allows you to avoid doing any counting.

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Oct 15, 2014

You shouldn't need indentation AND fences, just use fences.

@zwarich

This comment has been minimized.

Copy link
Contributor

zwarich commented Oct 15, 2014

@steveklabnik You didn't leave a line comment, but if you're referring to what I think you're referring to then it was necessary to use both indentation and fences in order to keep the code block indented to the same level as the rest of the list item.

@kballard

This comment has been minimized.

Copy link
Contributor

kballard commented Oct 15, 2014

On that note, line 29 is indented by a single space, which is a little confusing. It does seem to result in a paragraph within the list item, but it does make the following code block at line 33 appear to be incorrectly indented. Other paragraphs in these list items are also similarly indented by a single space. They should probably be indented by a standard amount; either 4 spaces, if that's truly necessary for the code block, or 3 spaces if that works, to line up with the first paragraph in the list item.

@zwarich

This comment has been minimized.

Copy link
Contributor

zwarich commented Oct 15, 2014

@kballard As far as I can tell, the output looks correct, so I'm not really inclined to be overly pedantic regarding Markdown formatting when Markdown is so ill-defined to begin with.

@gulbanana

This comment has been minimized.

Copy link

gulbanana commented Oct 16, 2014

Excellent idea, this extends checkability to some major cases with a sound foundation. I've got a few questions & will try to figure out how to do the line-comments thing

let mut i = 0i;
let p = &mut i;
let q: &mut int;

This comment has been minimized.

@gulbanana

gulbanana Oct 16, 2014

it's unclear to me what this fragment is demonstrating. q is an immutable binding, so it has to be initialised once-and-only-once subsequently, right? is the idea that MEME would allow the q= lines in each branch to borrowcheck because they're each an entry to the same region (and each of the drop(p) is an exit from the previous region?)

This comment has been minimized.

@zwarich

zwarich Oct 17, 2014

Contributor

Yes, that's the scenario I'm describing. It is initialized in two different program locations, but only once on every path through the program.

This comment has been minimized.

@gulbanana

gulbanana Oct 17, 2014

ok, that makes sense

}
```

This doesn't really seem useful enough to warrant the complexity. It would also

This comment has been minimized.

@gulbanana

gulbanana Oct 16, 2014

if so, i agree that it's a niche usage compared to what SEME enables.

we would have to track exit edges rather than exit vertices), and unfortunately
it becomes very difficult to compute the lattice operations given a
representation of two regions in terms of entry and exit points. The problem is
that when shrinking or extending a region, there is no easy way to compute the

This comment has been minimized.

@gulbanana

gulbanana Oct 16, 2014

is shrinking/extending part of the lifetime analysis implementation in rustc? if it was instead implemented to gather entry/exit points then solve them all at once, would this make the trivial extension (defining exits as graph edges, rather than dominated vertices) feasible?

This comment has been minimized.

@zwarich

zwarich Oct 17, 2014

Contributor

Yes, the lifetime inference algorithm requires extending regions to include a new point, taking meet and join of regions, etc. Any lifetime inference algorithm based on any of the standard approaches to inference with subtyping will require this.

reformulate it to make it more palatable. We say that a vertex V *dominates* a
vertex W if every path from the CFG entry to W goes through V, and that it
strictly dominates W if V is distinct from W. By induction, every vertex V other
than the CFG entry has an *immediate dominator* that does not dominate any other

This comment has been minimized.

@gulbanana

gulbanana Oct 16, 2014

you mean "does not strictly dominate", right?

This comment has been minimized.

@zwarich

zwarich Oct 17, 2014

Contributor

It's sort of implied, since I say 'does not dominate any other...', but maybe it would be more clear the way that you suggest.

If R is a SEME region rooted at the borrow used in the definition of `p` that
contains the entry point of the loop, then it must contain the entire loop. This
is because in the loop backedge V -> W, V doesn't strictly dominate W. You can
use similar reasoning to inductively show that the entire loop is contained in R.

This comment has been minimized.

@gulbanana

gulbanana Oct 16, 2014

this would be clearer if it noted that this is for nodes W where the node is part of the other three lines in the loop

We can then reformulate the definition above to say that R is a *single-entry /
multiple-exit region* if the following conditions hold:

1. R is a subtree (i.e. a connected subgraph) of the dominator tree.

This comment has been minimized.

@gulbanana

gulbanana Oct 16, 2014

this is a very powerful idea. treating regions as subgraphs of the dominator graph could open up other optimisations and analyses - i think it would be worthwhile as a representation even without this specific motivation.


# Unresolved questions

Does there exist an efficient sparse representation for SEME regions?

This comment has been minimized.

@gulbanana

gulbanana Oct 16, 2014

half-formed representation ideas:

  • tree branch-choice lists, ala huffman coding
  • pointers back to the control flow graph for each exit, tagged with their common "up-tree" nodes (the greatest common of which would be the entry)
  • take advantage of the fact that only certain CFG node types can be region entry/exit points and thin out the dominator graph itself!
a.foo(bar, baz);
```

All this is doing is making the evaluation of temporaries explicit, so it

This comment has been minimized.

@huonw

huonw Oct 16, 2014

Member

It's possibly changing the order of evaluation, e.g. changing a.foo().bar(a.baz()) to let tmp = a.baz(); a.foo().bar(tmp) runs things in different orders.

(It may be necessary to change it in this manner, since foo could return a mutable borrow of a.)

This comment has been minimized.

@eddyb

eddyb Oct 17, 2014

Member

I'm not sure the actual problem is presented properly, maybe @nikomatsakis could confirm this, but my intuition is that the auto-borrow of an lvalue receiver can be delayed to after the evaluation of the arguments, without breaking backwards compatibility or creating unsoundness.

This comment has been minimized.

@zwarich

zwarich Oct 17, 2014

Contributor

If you have to call deref_mut in order to borrow the receiver, then changing it from its current order will technically be backwards incompatible. Whether this ordering matters in practice is anyone's guess. Like destructor ordering of temporaries, it may be something that can be slightly tweaked in the future without breaking anyone's code.

You could do something crazy like only pick a different order in the case where the program would've failed to compile under the current rules, but then you're adding an unpredictable inconsistency for the sake of strict backwards compatibility.

I realized this after I posted this RFC, and was already planning on updating the RFC soon to add a section that accounts for this. There isn't a great description of the current behavior besides the implementation.

This comment has been minimized.

@eddyb

eddyb Oct 17, 2014

Member

Calling deref_mut would be auto-deref, not auto-borrow (which produces a &T or &mut T rvalue from a T value), and given proper desugaring, analysis passes would see x.deref_mut().foo(x.deref().bar()).

This comment has been minimized.

@zwarich

zwarich Oct 18, 2014

Contributor

The program

struct A;
struct B {
    label: uint,
    a: A,
}

impl A {
    fn foo(&self, other1: (), other2: ()) { }
    fn bar(&self) { }
}

impl Deref<A> for B {
    fn deref(&self) -> &A {
        println!("deref: {}", self.label);
        &self.a
    }
}

fn main() {
    let b1 = B {
        label: 1,
        a: A,
    };
    let b2 = B {
        label: 2,
        a: A,
    };
    let b3 = B {
        label: 3,
        a: A,
    };

    b1.foo(b2.bar(), b3.bar());
}

currently prints

deref: 1
deref: 2
deref: 3

with the receiver being deref'd first. If we want to accept as many programs as possible then the order will have to be changed to evaluate the arguments first, changing the order in which deref_mut gets called on each of the structs.

This comment has been minimized.

@retep998

retep998 Dec 31, 2014

Member

I am in favor of having parameters evaluated before the receiver.

This comment has been minimized.

@eddyb

eddyb Jan 1, 2015

Member
 // doesn't work today
vec.push(vec[5])
 // making the receiver mut borrow explicit
Vec::push(&mut vec, *Index::index(&vec, &5))
// now let's use imaginary syntax for what happens in the compiler
lvalue vec = ...;
rvalue a = &mut vec;
rvalue b = &vec; // error, vec is mutably borrowed by x
rvalue c = &5
rvalue d = Index::index(b, c);
lvalue e = *d;
rvalue f = e; // copies the value from the vector
Vec::push(a, f);
// we can observe that evaluating `a` has no side-effect, thus
// it can be reordered after normal evaluation.
lvalue vec = ...;
lvalue a = vec;
rvalue b = &vec; // immutable borrow of vec starts
rvalue c = &5
rvalue d = Index::index(b, c);
lvalue e = *d;
rvalue f = e; // copies the value from the vector
// the borrow of vec through b ends here
rvalue a' = &mut a;
rvalue f' = f;
Vec::push(a', f');

This comment has been minimized.

@viesturz

viesturz Jan 1, 2015

Is there a reason you want to go with this specific evaluation order? A Google search did not produce any documentation on this.

Your workaround would work for simple cases. It will break apart unpredictably at more complex cases. Such unpredictably would really frustrate the users.
What about debug mode with optimizations disabled? Would it still reorder stuff?

What to do if you need a mutable borrow to evaluate parameters:

obj.paint(obj.x, obj.y, obj.getCachedOrDownloadImage())

Or a mutable borrow to get to the function to call:

obj.getSomethingMutablyBorrowsObj().someFunction(obj.name)  

To generalize self is just another parameter. If a parameter needs to borrow obj after a mutable borrow is done, there is no way out.

We can either define an order that would work for most cases or leave it undefined and let compiler to figure out if there is a evaluation order that complies with borrow restrictions.

It's much more likely that self will need a mutable borrow of obj than the other parameters.
Thus I'm for evaluating self last to allow more code to work.

This comment has been minimized.

@eddyb

eddyb Jan 1, 2015

Member

My ordering would allow strictly more programs to compile, it doesn't affect side effects, and it's quite deterministic.
Evaluating the receiver and the auto-borrow on it are not coupled. The evaluation includes side effects and reodering it would be disastrous, while an lvalue auto-borrow (and even an explicit borrow) can be placed anywhere between the evaluation of the lvalue and when it is actually needed (i.e. the argument is required for the call).
Moving the borrows later effectively shrinks the duration of the borrow, potentially removing overlaps with other borrows, but never adding new overlaps.

On top of that, we can also reorder just the evaluation of the autoderef on the receiver, not the entire receiver, spec'ing it as being evaluated as part of the second pass over the arguments. Even if restricting Deref impls to have no side-effects (using an unsafe trait) to get this behavior is doable, I don't think it's necessary.

This comment has been minimized.

@viesturz

viesturz Jan 1, 2015

Okay, this is getting out of my comfort zone. Possibly you are right.

Let me elaborate on the last example a bit:

obj.getSomethingMutablyBorrowsObj().someFunction(obj.name)  

This can be rewritten as

let &mut  foo = obj.getSomethingMutablyBorrowsObj();
foo.someFunction(obj.name);

And if getSomethingMutablyBorrowsObj is something like this:

 fn getSomethingMutablyBorrowsObj(&'a mut self) -> &'a mut Something {
    self.a ? self.obj1 : self.obj2
 }

I understand that obj is mutably borrowed for lifetime of foo. The receiver in this case is foo and borrowing of obj has already happened while evaluating getSomething.

@Ericson2314

This comment has been minimized.

Copy link
Contributor

Ericson2314 commented Oct 22, 2014

I imagine this would impact the static/dynamic drop debate?

@tikue

This comment has been minimized.

Copy link

tikue commented Jan 19, 2015

Any status on this? Because this affects a lot of common code and could have a great impact on how beginners of Rust perceive the language, it seems prudent to introduce any required backwards-incompatible changes before 1.0.

@zwarich zwarich removed their assignment Jan 31, 2015

@zwarich

This comment has been minimized.

Copy link
Contributor

zwarich commented Jan 31, 2015

@Ericson2314 This shouldn't impact the static / dynamic drop debate. It will allow more moves due to borrow scopes being more precise, however.

@tikue I am no longer involved in Rust development, but it is my understanding that this is not going to happen for 1.0. To me it seems that the evaluation order issue is backwards incompatible, but @nikomatsakis has mentioned that he thinks he has a way to resolve this; I don't know what it is, but it might be along the lines of what @eddyb discusses above.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Feb 5, 2015

We'd love to have this, but we do not think we can spend time working on it before the release.

Postponing

@pnkfelix pnkfelix closed this Feb 5, 2015

@pnkfelix pnkfelix added the postponed label Feb 5, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment