Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: Single-entry / multiple-exit regions for borrows #396
Conversation
This comment has been minimized.
This comment has been minimized.
lilyball
reviewed
Oct 14, 2014
| from the function entry to V contains Entry. | ||
|
|
||
| 2. If V is in R and P is a path from Entry to V that does not leave R and | ||
| reenter through Entry, then every vertex of P is in R. |
This comment has been minimized.
This comment has been minimized.
lilyball
Oct 14, 2014
Contributor
This sounds tautological. "If [...] P is a path [...] that does not leave R [...], then every vertex of P is in R". Surely the condition "P is a path [...] that does not leave R" is identical to "P is a path where every vertex of P is in R", which makes this "If P is a path where every vertex of P is in R, then every vertex of P is in R".
This comment has been minimized.
This comment has been minimized.
zwarich
Oct 14, 2014
Author
Contributor
I first came up with the more technical definition below and then tried to make a plain English counterpart.
What I was attempting to capture was the part about reentering through Entry. For example, in a program fragment like
loop {
[Entry]
[Exit]
}there is allowed to be a path from Entry to Exit, back around the outer loop that reenters Entry, without all of the vertices on that path being in the region. On second thought, it seems that this would be fine:
If V is in R and P is a path from Entry to V that does not contain Entry multiple times, then every vertex of P is in R.
This comment has been minimized.
This comment has been minimized.
lilyball
Oct 14, 2014
Contributor
The idea of counting uses of Entry seems a little odd. Could you not use something like the following:
For every pair of vertices V and U in R, every path from V to U that contains vertices not in R also contains Entry.
This comment has been minimized.
This comment has been minimized.
zwarich
Oct 14, 2014
Author
Contributor
That seems equivalent and a bit simpler, so I'll go with that.
This comment has been minimized.
This comment has been minimized.
lilyball
Oct 14, 2014
Contributor
Or the more general definition, which subsumes the first condition:
For all vertices V in R, for all vertices U not in R, every path from U to V contains Entry.
Instead of evaluating paths only from the CFG entry to V, evaluating from all nodes not in R allows you to avoid doing any counting.
This comment has been minimized.
This comment has been minimized.
|
You shouldn't need indentation AND fences, just use fences. |
This comment has been minimized.
This comment has been minimized.
|
@steveklabnik You didn't leave a line comment, but if you're referring to what I think you're referring to then it was necessary to use both indentation and fences in order to keep the code block indented to the same level as the rest of the list item. |
This comment has been minimized.
This comment has been minimized.
|
On that note, line 29 is indented by a single space, which is a little confusing. It does seem to result in a paragraph within the list item, but it does make the following code block at line 33 appear to be incorrectly indented. Other paragraphs in these list items are also similarly indented by a single space. They should probably be indented by a standard amount; either 4 spaces, if that's truly necessary for the code block, or 3 spaces if that works, to line up with the first paragraph in the list item. |
nrc
assigned
zwarich
Oct 15, 2014
This comment has been minimized.
This comment has been minimized.
|
@kballard As far as I can tell, the output looks correct, so I'm not really inclined to be overly pedantic regarding Markdown formatting when Markdown is so ill-defined to begin with. |
ntrel
referenced this pull request
Oct 15, 2014
Closed
borrow scopes should not always be lexical #6393
This comment has been minimized.
This comment has been minimized.
gulbanana
commented
Oct 16, 2014
|
Excellent idea, this extends checkability to some major cases with a sound foundation. I've got a few questions & will try to figure out how to do the line-comments thing |
gulbanana
reviewed
Oct 16, 2014
| let mut i = 0i; | ||
| let p = &mut i; | ||
| let q: &mut int; |
This comment has been minimized.
This comment has been minimized.
gulbanana
Oct 16, 2014
it's unclear to me what this fragment is demonstrating. q is an immutable binding, so it has to be initialised once-and-only-once subsequently, right? is the idea that MEME would allow the q= lines in each branch to borrowcheck because they're each an entry to the same region (and each of the drop(p) is an exit from the previous region?)
This comment has been minimized.
This comment has been minimized.
zwarich
Oct 17, 2014
Author
Contributor
Yes, that's the scenario I'm describing. It is initialized in two different program locations, but only once on every path through the program.
This comment has been minimized.
This comment has been minimized.
gulbanana
reviewed
Oct 16, 2014
| } | ||
| ``` | ||
|
|
||
| This doesn't really seem useful enough to warrant the complexity. It would also |
This comment has been minimized.
This comment has been minimized.
gulbanana
reviewed
Oct 16, 2014
| we would have to track exit edges rather than exit vertices), and unfortunately | ||
| it becomes very difficult to compute the lattice operations given a | ||
| representation of two regions in terms of entry and exit points. The problem is | ||
| that when shrinking or extending a region, there is no easy way to compute the |
This comment has been minimized.
This comment has been minimized.
gulbanana
Oct 16, 2014
is shrinking/extending part of the lifetime analysis implementation in rustc? if it was instead implemented to gather entry/exit points then solve them all at once, would this make the trivial extension (defining exits as graph edges, rather than dominated vertices) feasible?
This comment has been minimized.
This comment has been minimized.
zwarich
Oct 17, 2014
Author
Contributor
Yes, the lifetime inference algorithm requires extending regions to include a new point, taking meet and join of regions, etc. Any lifetime inference algorithm based on any of the standard approaches to inference with subtyping will require this.
gulbanana
reviewed
Oct 16, 2014
| reformulate it to make it more palatable. We say that a vertex V *dominates* a | ||
| vertex W if every path from the CFG entry to W goes through V, and that it | ||
| strictly dominates W if V is distinct from W. By induction, every vertex V other | ||
| than the CFG entry has an *immediate dominator* that does not dominate any other |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
zwarich
Oct 17, 2014
Author
Contributor
It's sort of implied, since I say 'does not dominate any other...', but maybe it would be more clear the way that you suggest.
gulbanana
reviewed
Oct 16, 2014
| If R is a SEME region rooted at the borrow used in the definition of `p` that | ||
| contains the entry point of the loop, then it must contain the entire loop. This | ||
| is because in the loop backedge V -> W, V doesn't strictly dominate W. You can | ||
| use similar reasoning to inductively show that the entire loop is contained in R. |
This comment has been minimized.
This comment has been minimized.
gulbanana
Oct 16, 2014
this would be clearer if it noted that this is for nodes W where the node is part of the other three lines in the loop
gulbanana
reviewed
Oct 16, 2014
| We can then reformulate the definition above to say that R is a *single-entry / | ||
| multiple-exit region* if the following conditions hold: | ||
|
|
||
| 1. R is a subtree (i.e. a connected subgraph) of the dominator tree. |
This comment has been minimized.
This comment has been minimized.
gulbanana
Oct 16, 2014
this is a very powerful idea. treating regions as subgraphs of the dominator graph could open up other optimisations and analyses - i think it would be worthwhile as a representation even without this specific motivation.
gulbanana
reviewed
Oct 16, 2014
|
|
||
| # Unresolved questions | ||
|
|
||
| Does there exist an efficient sparse representation for SEME regions? |
This comment has been minimized.
This comment has been minimized.
gulbanana
Oct 16, 2014
half-formed representation ideas:
- tree branch-choice lists, ala huffman coding
- pointers back to the control flow graph for each exit, tagged with their common "up-tree" nodes (the greatest common of which would be the entry)
- take advantage of the fact that only certain CFG node types can be region entry/exit points and thin out the dominator graph itself!
huonw
reviewed
Oct 16, 2014
| a.foo(bar, baz); | ||
| ``` | ||
|
|
||
| All this is doing is making the evaluation of temporaries explicit, so it |
This comment has been minimized.
This comment has been minimized.
huonw
Oct 16, 2014
Member
It's possibly changing the order of evaluation, e.g. changing a.foo().bar(a.baz()) to let tmp = a.baz(); a.foo().bar(tmp) runs things in different orders.
(It may be necessary to change it in this manner, since foo could return a mutable borrow of a.)
This comment has been minimized.
This comment has been minimized.
eddyb
Oct 17, 2014
Member
I'm not sure the actual problem is presented properly, maybe @nikomatsakis could confirm this, but my intuition is that the auto-borrow of an lvalue receiver can be delayed to after the evaluation of the arguments, without breaking backwards compatibility or creating unsoundness.
This comment has been minimized.
This comment has been minimized.
zwarich
Oct 17, 2014
Author
Contributor
If you have to call deref_mut in order to borrow the receiver, then changing it from its current order will technically be backwards incompatible. Whether this ordering matters in practice is anyone's guess. Like destructor ordering of temporaries, it may be something that can be slightly tweaked in the future without breaking anyone's code.
You could do something crazy like only pick a different order in the case where the program would've failed to compile under the current rules, but then you're adding an unpredictable inconsistency for the sake of strict backwards compatibility.
I realized this after I posted this RFC, and was already planning on updating the RFC soon to add a section that accounts for this. There isn't a great description of the current behavior besides the implementation.
This comment has been minimized.
This comment has been minimized.
eddyb
Oct 17, 2014
Member
Calling deref_mut would be auto-deref, not auto-borrow (which produces a &T or &mut T rvalue from a T value), and given proper desugaring, analysis passes would see x.deref_mut().foo(x.deref().bar()).
This comment has been minimized.
This comment has been minimized.
zwarich
Oct 18, 2014
Author
Contributor
The program
struct A;
struct B {
label: uint,
a: A,
}
impl A {
fn foo(&self, other1: (), other2: ()) { }
fn bar(&self) { }
}
impl Deref<A> for B {
fn deref(&self) -> &A {
println!("deref: {}", self.label);
&self.a
}
}
fn main() {
let b1 = B {
label: 1,
a: A,
};
let b2 = B {
label: 2,
a: A,
};
let b3 = B {
label: 3,
a: A,
};
b1.foo(b2.bar(), b3.bar());
}currently prints
deref: 1
deref: 2
deref: 3
with the receiver being deref'd first. If we want to accept as many programs as possible then the order will have to be changed to evaluate the arguments first, changing the order in which deref_mut gets called on each of the structs.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
eddyb
Jan 1, 2015
Member
// doesn't work today
vec.push(vec[5])
// making the receiver mut borrow explicit
Vec::push(&mut vec, *Index::index(&vec, &5))
// now let's use imaginary syntax for what happens in the compiler
lvalue vec = ...;
rvalue a = &mut vec;
rvalue b = &vec; // error, vec is mutably borrowed by x
rvalue c = &5
rvalue d = Index::index(b, c);
lvalue e = *d;
rvalue f = e; // copies the value from the vector
Vec::push(a, f);
// we can observe that evaluating `a` has no side-effect, thus
// it can be reordered after normal evaluation.
lvalue vec = ...;
lvalue a = vec;
rvalue b = &vec; // immutable borrow of vec starts
rvalue c = &5
rvalue d = Index::index(b, c);
lvalue e = *d;
rvalue f = e; // copies the value from the vector
// the borrow of vec through b ends here
rvalue a' = &mut a;
rvalue f' = f;
Vec::push(a', f');
This comment has been minimized.
This comment has been minimized.
viesturz
Jan 1, 2015
Is there a reason you want to go with this specific evaluation order? A Google search did not produce any documentation on this.
Your workaround would work for simple cases. It will break apart unpredictably at more complex cases. Such unpredictably would really frustrate the users.
What about debug mode with optimizations disabled? Would it still reorder stuff?
What to do if you need a mutable borrow to evaluate parameters:
obj.paint(obj.x, obj.y, obj.getCachedOrDownloadImage())
Or a mutable borrow to get to the function to call:
obj.getSomethingMutablyBorrowsObj().someFunction(obj.name)
To generalize self is just another parameter. If a parameter needs to borrow obj after a mutable borrow is done, there is no way out.
We can either define an order that would work for most cases or leave it undefined and let compiler to figure out if there is a evaluation order that complies with borrow restrictions.
It's much more likely that self will need a mutable borrow of obj than the other parameters.
Thus I'm for evaluating self last to allow more code to work.
This comment has been minimized.
This comment has been minimized.
eddyb
Jan 1, 2015
Member
My ordering would allow strictly more programs to compile, it doesn't affect side effects, and it's quite deterministic.
Evaluating the receiver and the auto-borrow on it are not coupled. The evaluation includes side effects and reodering it would be disastrous, while an lvalue auto-borrow (and even an explicit borrow) can be placed anywhere between the evaluation of the lvalue and when it is actually needed (i.e. the argument is required for the call).
Moving the borrows later effectively shrinks the duration of the borrow, potentially removing overlaps with other borrows, but never adding new overlaps.
On top of that, we can also reorder just the evaluation of the autoderef on the receiver, not the entire receiver, spec'ing it as being evaluated as part of the second pass over the arguments. Even if restricting Deref impls to have no side-effects (using an unsafe trait) to get this behavior is doable, I don't think it's necessary.
This comment has been minimized.
This comment has been minimized.
viesturz
Jan 1, 2015
Okay, this is getting out of my comfort zone. Possibly you are right.
Let me elaborate on the last example a bit:
obj.getSomethingMutablyBorrowsObj().someFunction(obj.name)
This can be rewritten as
let &mut foo = obj.getSomethingMutablyBorrowsObj();
foo.someFunction(obj.name);
And if getSomethingMutablyBorrowsObj is something like this:
fn getSomethingMutablyBorrowsObj(&'a mut self) -> &'a mut Something {
self.a ? self.obj1 : self.obj2
}
I understand that obj is mutably borrowed for lifetime of foo. The receiver in this case is foo and borrowing of obj has already happened while evaluating getSomething.
This comment has been minimized.
This comment has been minimized.
|
I imagine this would impact the static/dynamic drop debate? |
alexcrichton
force-pushed the
rust-lang:master
branch
from
b9e2b8c
to
5020131
Oct 29, 2014
erickt
referenced this pull request
Dec 14, 2014
Merged
RFC: io and os reform: initial skeleton #517
Gankro
referenced this pull request
Jan 1, 2015
Closed
Should the mutable borrow of `self` in method call range over the arguments? #20403
This comment has been minimized.
This comment has been minimized.
tikue
commented
Jan 19, 2015
|
Any status on this? Because this affects a lot of common code and could have a great impact on how beginners of Rust perceive the language, it seems prudent to introduce any required backwards-incompatible changes before 1.0. |
zwarich
removed their assignment
Jan 31, 2015
This comment has been minimized.
This comment has been minimized.
|
@Ericson2314 This shouldn't impact the static / dynamic drop debate. It will allow more moves due to borrow scopes being more precise, however. @tikue I am no longer involved in Rust development, but it is my understanding that this is not going to happen for 1.0. To me it seems that the evaluation order issue is backwards incompatible, but @nikomatsakis has mentioned that he thinks he has a way to resolve this; I don't know what it is, but it might be along the lines of what @eddyb discusses above. |
This comment has been minimized.
This comment has been minimized.
|
We'd love to have this, but we do not think we can spend time working on it before the release. Postponing |
zwarich commentedOct 14, 2014
Rendered