Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: deref coercions #241

Merged
merged 5 commits into from Jan 20, 2015

Conversation

Projects
None yet
@aturon
Copy link
Member

aturon commented Sep 16, 2014

Add the following coercions:

  • From &T to &U when T: Deref<U>.
  • From &mut T to &U when T: Deref<U>.
  • From &mut T to &mut U when T: DerefMut<U>

These coercions eliminate the need for "cross-borrowing" (things like &**v) and calls to as_slice.

Rendered

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Sep 16, 2014

Of course, method dispatch can implicitly execute code via `Deref`. But `Deref`
is a pretty specialized tool:

* Each type `T` can only deref to *one* other type.

This comment has been minimized.

@sfackler

sfackler Sep 16, 2014

Member

Is this a formal restriction or an informal one? AFAIK it's currently possible to do something like

trait Foo { ... }
impl Foo for int { ... }
impl Foo for uint { ... }

struct Bar;

impl<T> Deref<T> for Bar where T: Foo { ... }

This comment has been minimized.

@aturon

aturon Sep 16, 2014

Author Member

Once associated types land, we'd make T an associated type, which would force it to be uniquely determined by the Self type.

That restriction is necessary for the coercion algorithm being proposed to work.

This comment has been minimized.

@aturon

aturon Sep 16, 2014

Author Member

@sfackler Updated the RFC to clarify.

This is a key difference from the
[cross-borrowing RFC](https://github.com/rust-lang/rfcs/pull/226).

### Limit implicitly execution of arbitrary code

This comment has been minimized.

@pnkfelix

pnkfelix Sep 16, 2014

Member

Typo: should be "implicit execution of" or "implicitly executed"

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Sep 16, 2014

Did any of the examples show an instance of the second bullet's coercion, namely: "From &mut T to &U when T: Deref" ?

I ask because that is a case where something that looks like an ownership transfer (of the &mut itself) will actually not be so, unlike (I believe) the first and third bullets.

It could be that the trade off here is in the RFC's favor. I just want to make sure we show an example justifying the second bullet in particular.

```rust
let v = vec![0u8, 1, 2];
foo(v); // is v moved here?
bar(v); // is v still available?

This comment has been minimized.

@nrc

nrc Sep 16, 2014

Member

My basic counter-argument to this is that the compiler tells you exactly this - if this is working code, then you know for sure that v is available in the call to bar so, you don't benefit from having to write &.

This comment has been minimized.

@reem

reem Oct 29, 2014

I think this is a fair assumption for experienced Rust developers, but this is a large disadvantage for people new to Rust, who will just be extremely confused when two function calls that look like they use v in exactly the same way actually use it completely differently.

proposes a more aggressive form of deref coercion: it would allow converting
e.g. `Box<T>` to `&T` and `Vec<T>` to `&[T]` directly. The advantage is even
greater convenience: in many cases, even `&` is not necessary. The disadvantage
is the change to local reasoning about ownership:

This comment has been minimized.

@nrc

nrc Sep 16, 2014

Member

The primary advantage as I see it is that we continue to respect the level of indirection, we would never implicitly change it, whereas with this proposal we do.

fn use_nested(t: Rc<Box<T>>) {
use_ref(&**t); // what you have to write today
use_ref(&t); // what you'd be able to write (note: recursive deref)

This comment has been minimized.

@nrc

nrc Sep 16, 2014

Member

This strikes me as a bit scary - you have to know which types implement Deref to know what type you're going to get.

This comment has been minimized.

@pnkfelix

pnkfelix Sep 16, 2014

Member

Are we expecting things besides smart pointer types to implement Deref? I thought that was the thrust of the argument regarding implicit calls to user defined code.

This comment has been minimized.

@nrc

nrc Sep 16, 2014

Member

I realise that should generally just be 'smart pointers' but looking at what does already, that is not the case, there are also 'smart pointer helpers'. If we add it for Vec/String too, then it will be 'smart pointers' and 'smart pointer-ish things' and 'some collections', which is maybe matches the intuition, but seems a little vague.

This comment has been minimized.

@nikomatsakis

nikomatsakis Sep 16, 2014

Contributor

How is this any different from any other cross-borrowing proposal based on Deref? Ah, I suppose because it's willing to do multiple levels of deref?

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Sep 16, 2014

I ask because that is a case where something that looks like an ownership transfer (of the &mut itself) will actually not be so, unlike (I believe) the first and third bullets.

Actually I just remembered: of course there are other cases where passing a &mut is reborrowed rather than transfered. So I overstated the situation above.

Still, adding the requested example would benefit the RFC presentation.

The design satisfies both of the principles laid out in the Motivation:

* It does not introduce implicit borrows of owned data, since it only applies to
already-borrowed data.

This comment has been minimized.

@nrc

nrc Sep 16, 2014

Member

This is really nice!

for implicitly running unknown code; together with the expectation that
programmers are generally aware when they are using `Deref` types, this should
retain the kind of local reasoning Rust programmers can do about
function/method invocations today.

This comment has been minimized.

@nrc

nrc Sep 16, 2014

Member

This too, I'm glad we agree on Deref for the basis of these coercions.

surrounding `&`, and in particular somewhat muddies the idea that it creates a
pointer. This change could make Rust more difficult to learn (though note that
it puts *more* attention on ownership), though it would make it more convenient
to use in the long run.

This comment has been minimized.

@nrc

nrc Sep 16, 2014

Member

I'm still not sold on this idea, you are suggesting that the ownership intuition in Rust trumps being explicit about indirection. While I agree that ownership is a very important principal, I feel it comes second to indirection. In a language with explicit memory management (which I think Rust is, even if you don't explicitly call 'free' yourself) I think programmers (beginner or otherwise) have to be extremely concious of indirection and that this is primary to ownership. I fear losing that explicitness in Rust.

I believe it will make programs more convenient to write in the long run, but harder to read, and easier to introduce bugs.

I really like that this proposal doesn't introduce implicit borrows (cf 226). I think that would pay for the extra &s (which I'm otherwise not a fan of). But losing the explicitness around indirection would make me sad.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 16, 2014

@pnkfelix actually, coercing &mut T to &T is a kind of ownership transfer -- in particular, the &mut T is unusable during the time it is re-borrowed as an &T. So it's the same sort of transfer in particular as reborrowing an &mut T to another &mut T.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 16, 2014

@nick29581 it seems like the key question is whether, indeed, programmers need to be very aware of the levels of indirection. I contend that they do not.

In terms of capabilities, there is virtually no difference between a T and a Box<T>. In fact, the only difference between a T and a Rc<T> is that a T can be mutated ("unique ownership" vs "shared ownership") -- basically let x: Rc<T> and let y" T are equivalent to one another, except that the T in y can be moved.

Speaking personally, I find accounting for the precise amount of indirection to be tedious and I don't think it adds much information. It's not a big deal, but converting from &x to &*x to &**x is just a kind of regular irritant. When reading code, I basically ignore the number of * that I see. That is, if I see code like this (from rustc): check_expr_coercable_to_type(fcx, &**arg, formal_ty);, do you really care whether it says &*arg or &arg? The gist is the same. Certainly my eyes glaze over the distinction, and I tend to always just write & or &* and then let the compiler correct me. As further evidence, in other parts of the language, we often obscure the precise amount of indirection. Autoderef and closures come to mind, as well as autoref for operator overloading (though perhaps that should change, at least in some cases, but for independent reasons).

However, when we used to have cross-borrowing, I did find that confusing. I remember distinctly being surprised to see vectors that looked like they were being moved, but were in fact being (implicitly) borrowed, and also being surprised that a Box<T> behaved differently from a plain T. Of course this is anecdotal, but I think it's indicative of the kind of thinking we want to encourage: one that focuses on ownership and borrowing. Put another way, pointers are the means, not the end.

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Sep 16, 2014

@nikomatsakis yes, totally agree it boils down to whether programmers need to be concerned about the level of indirection. I could be persuaded. My experience is mostly from C++ where it is deadly important to know.

I agree that at first blush I ignore the &** stuff on actual parameters. When I do care, is when I have a bug and I'm trying to work out what is going wrong with some code which compiles and gives the wrong result. In that case I really want to know exactly where something can be mutated or referenced. It seems to me that being imprecise about indirection will make that harder (perhaps there is a case that we will never obscure the different between a pointer and a value with this system, only between pointers with different levels of indirection, so it is all OK).

I want to believe that we can not worry about pointers and only about ownership, it seems a much more attractive model. But my experience in the past has always been that if I don't know precisely what is going on, there is confusion, and that makes debugging harder. I guess I think of borrowing, deep down, as just a pointer to something, so trying to not think of pointers is difficult for me.

I'll try and think only of ownership and borrowing for a while and see if I can warm to the idea of reasoning without pointers :-)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 16, 2014

On Tue, Sep 16, 2014 at 04:31:35AM -0700, Nick Cameron wrote:

I agree that at first blush I ignore the &** stuff on actual parameters. When I do care, is when I have a bug and I'm trying to work out what is going wrong with some code which compiles and gives the wrong result. In that case I really want to know exactly where something can be mutated or referenced.

That's interesting. I don't think I've ever had a bug in Rust code
that boiled down to a mistake in layers of indirections. I could be
just failing to remember. It feels like these mismatches always
manifest at compilation time. Can you give me an example of the sort
of bug you are thinking of?

The closest example I can come up with is that I have seen errors
where something contained a Cell was implicitly being copied, and
thus mutation was occuring on a copy of the cell and not the original.
This was basically fixed by making Cell linear (in general I contend
that anything with interior mutability has "identity" and hence should
be linear). However, it seems to argue in favor of this
"ownership-focused" proposal, since the important thing is whether the
underlying value is being moved or borrowed.

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Sep 16, 2014

I am still not decided on whether this is the best approach, but a couple of thoughts if it is:

  • if we need a & to convert from String to &strand vecs (as in this proposal) I prefer to either not do it (i.e., don't implement Deref for String and Vec) or to remove the empty slicing notation. Otherwise you always have this choice of writing &s or s[] and neither is an obvious choice (I think converting &String to &str etc., would be rare, but perhaps I'm wrong.
  • Which leads to... do you have an idea how common it is to convert Smaht<T> to &T vs &Smaht<T> to &T? My intuition is the former is far more common, but the latter happens some times (i.e., I see a lot more &* than &**, but a non-negligible number of &**)?
  • I ask the above because, I am warming to the general idea here, but I think I would prefer if this were implemented as a change to & semantics rather than a coercion. The brief idea is that & would be the borrow operator, rather than the address of operator. It would do as many derefs as possible (i.e., using a built-in deref or with the trait), which might be zero and one address-of operation. So assuming T does not implement Deref, here is a table of examples of the type of e and &e (where e is any expression):
e &e
T &T
&T &T
Rc &T
&Rc &T

and so forth. I think this gives similar results to this RFC, but also works for &T types. This does mean that if you need a &T you can always write &e no matter the type of e and it should work. This might lead people to be sloppy, I'm not sure how bad that is. I think this is as close as we can get to auto-borrowing in Rust (i.e., more so than this RFC) without losing some vital distinctions between values and references. Note that * would still do a single deref. To do an address-of rather than a borrow, I propose a &(e). Although that is a little ambiguous (although I think we can parse it sensibly, it might not be very ergonomic).

I believe the borrow operator has the following advantages over a cross-borrow-deref coercion:

  • clearer behaviour since it is not type driven,
  • better interaction with coercions (because you would still get the ones we want (e.g., unsizing) after applying the & operator, but we don't make coercions more complicated, nor check after every iteration),
  • makes the principle of borrowing even stronger because we have an operator for it,
  • better interaction with type inference (you can write let x = &y;, a coercion requires an explicit type here),
  • easier to explain.

Does this sound like a sensible alternative or is it silly? If it is the former I can write up an RFC so we have a place to discuss, rather than doing so here.

@aturon aturon force-pushed the aturon:deref-coercions branch from aba44ab to 896e453 Sep 16, 2014

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Sep 16, 2014

@nick29581

if we need a & to convert from String to &strand vecs (as in this proposal) I prefer to either not do it (i.e., don't implement Deref for String and Vec) or to remove the empty slicing notation. Otherwise you always have this choice of writing &s or s[] and neither is an obvious choice (I think converting &String to &str etc., would be rare, but perhaps I'm wrong.

I tend to agree; we could drop []. Converting from &String to &str is rare, but &mut Vec<T> to &[T] is I suspect somewhat more common.

Which leads to... do you have an idea how common it is to convert Smaht<T> to &T vs &Smaht<T> to &T? My intuition is the former is far more common, but the latter happens some times (i.e., I see a lot more &* than &**, but a non-negligible number of &**)?

Yes, though it depends on what implements Deref. For example, working with &Box<T> and &Vec<T> (or mutable versions) is not so uncommon.

I ask the above because, I am warming to the general idea here, but I think I would prefer if this were implemented as a change to & semantics rather than a coercion.

(details elided)

Does this sound like a sensible alternative or is it silly? If it is the former I can write up an RFC so we have a place to discuss, rather than doing so here.

It's an intriguing idea, but seems problematic. Consider cases like the following:

fn wants_vec_ref(v: &mut Vec<u8>) { ... }
fn has_vec(v: Vec<u8>) {
    wants_vec_ref(&mut v)
}

If Vec implements Deref, we'd have to use the special "address of" notation even though we really do want a borrow -- a borrow of the "smart pointer" itself rather than the slice it points to.

(Just wanted to jot that down for now -- more later.)

@aturon aturon force-pushed the rust-lang:master branch from 4c0bebf to b1d1bfd Sep 16, 2014

@CloudiDust

This comment has been minimized.

Copy link
Contributor

CloudiDust commented Sep 17, 2014

@aturon @nick29581, another alternative would be making coercions explicit, but not too explicit.

Some ideas here: Semi-explicit coercion control with ~.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 17, 2014

On Tue, Sep 16, 2014 at 01:43:23PM -0700, Nick Cameron wrote:

I think this gives similar results to this RFC, but also works for &T types.

Why does the current RFC not work for &T types? I mean, presumably
&T implements Deref<T> (it may not today, but obviously it
should), and hence &&T can be coerced to &T just as &Rc<T> can
be coerced to &T.

Regarding the slice notation, I agree that [] is not so
important. In fact, I'm wondering if maybe we actually do want
foo[a..b] to act "like an lvalue", so that one writes &foo[3..] or
&mut foo[3..]. The latter would replace foo[mut 3..]. Before I
thought this had bad ergonomics, but now it seems consistent with this
general trend towards making borrows apparent with the & operator,
as well as with &vec being (basically) the way to do what was
vec[].

@aturon aturon referenced this pull request Sep 17, 2014

Merged

RFC: Collections reform #235

@Thiez

This comment has been minimized.

Copy link

Thiez commented Jan 12, 2015

@Valloric since Vec derefs to a slice you can just do &*foo.bar(y).zoo(z).goo(w) or foo.bar(y).zoo(z).goo(w).as_slice().and have it only on one side of your choice, and I imagine your case is relatively rare since you usually want to store a Vec returned from a bunch of methods somewhere or it'll get dropped immediatelly and your slice will not live long enough.

I oppose this RFC because I don't think having to type as_slice less often is awesome enough to make the semantics of & more complex.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Jan 13, 2015

I was +0 when I first read this, but since then I’ve repeatedly written code that would have been nicer with it. Big +1.

@Valloric

This comment has been minimized.

Copy link

Valloric commented Jan 13, 2015

@Thiez Syntax like &*foo would rightfully produce a WTF-stream from any Rust newcomer. It's not the API we should be proposing for a common operation like taking a slice out of a vector.

@Thiez

This comment has been minimized.

Copy link

Thiez commented Jan 14, 2015

@Valloric And syntax like &'a foo would also produce a WTF-stream. And syntax like |&: n| { ... }. I think once a person understands the basics of pointers in Rust, (which are quite simple: & creates a reference, and * dereferences) and knows about Deref they should be able to understand &*foo, because it is very simple and works exactly how one would expect.

If you're going to learn a new language, you have to learn the syntax. That is not a bad thing. And while the syntax certainly shouldn't be made more complex that it needs to be, I don't think making the semantics of & more complex is going to make the language easier to learn.

@liigo

This comment has been minimized.

Copy link
Contributor

liigo commented Jan 14, 2015

+1
2015年1月14日 上午7:19于 "Val Markovic" notifications@github.com写道:

@Thiez https://github.com/Thiez Syntax like &*foo would rightfully
produce a WTF-stream from any Rust newcomer. It's not the API we should be
proposing for a common operation like taking a slice out of a vector.


Reply to this email directly or view it on GitHub
#241 (comment).

eddyb added a commit to eddyb/rust that referenced this pull request Jan 18, 2015

@eddyb eddyb referenced this pull request Jan 18, 2015

Merged

Implement deref coercions. #21351

eddyb added a commit to eddyb/rust that referenced this pull request Jan 18, 2015

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 18, 2015

+1
This RFC seems like it manages to retain enough explicitness while making a relatively common task simpler to write.
I have one question though, given T: Deref<U> if a function takes a generic which is impl'd for both T and U, and I pass &T, does it result in ambiguity or simply take the option with the least derefs?

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Jan 19, 2015

There is an implementation now, and there seems to be a lot of positive sentiment for this change, should we push forward with it?

@aturon are there any open questions you're aware of? (I haven't re-read the whole conversation here, but I will if we are going to move on to this. But I'm mostly checking you don't have any new concerns which aren't already here).

I'm still a little uneasy about the 'searching' aspect of the coercion, but I think it is worth it for the ergonomic benefit.

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Jan 20, 2015

@retep998

I have one question though, given T: Deref<U> if a function takes a generic which is impl'd for both T and U, and I pass &T, does it result in ambiguity or simply take the option with the least derefs?

The & operator by itself keeps the same meaning it has always had: it gives you a &T. The only time the deref coercion kicks in is if you then use this &T value in a place where some &U is expected an T != U.

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Jan 20, 2015

@retep998

Note that this is one reason why the slice syntax &v[] is still desirable: when you're invoking a generic function and want to pass a slice, not a reference to the vector.

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Jan 20, 2015

@nick29581

Yes, we're ready to move forward with this; this way we can gain some experience before the beta.

@nikomatsakis nikomatsakis merged commit 434552d into rust-lang:master Jan 20, 2015

@aturon

This comment has been minimized.

Copy link
Member Author

aturon commented Jan 20, 2015

Note: this RFC has now been merged, after some significant digestion on all our parts :-)

I won't recap the motivation/arguments here, which are well-represented in the RFC and thread. However, I will note that an implementation is nearly ready to land, and so we should have time to gain some experience here before shipping beta.

eddyb added a commit to eddyb/rust that referenced this pull request Jan 21, 2015

eddyb added a commit to eddyb/rust that referenced this pull request Jan 29, 2015

eddyb added a commit to eddyb/rust that referenced this pull request Jan 29, 2015

daramos added a commit to daramos/rust-memcmp that referenced this pull request Feb 18, 2015

@richo richo referenced this pull request Mar 25, 2015

Merged

Update pointers.md #23690

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.