New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for a MIR operator to take a raw reference #2582

Open
wants to merge 14 commits into
base: master
from

Conversation

@RalfJung
Copy link
Member

RalfJung commented Nov 1, 2018

Introduce a new primitive operator on the MIR level: &[mut|const] raw <place> to create a raw pointer to the given place (this is not surface syntax, it is just how MIR might be printed). Desugar the surface syntax &[mut] <place> as *[mut|const] _ as well as coercions from references to raw pointers to use this operator, instead of two MIR statements (first take normal reference, then cast).

Rendered

Cc @Centril @rust-lang/wg-unsafe-code-guidelines

EDIT: rfcbot comment is here

@camlorn

This comment has been minimized.

Copy link

camlorn commented Nov 1, 2018

I've not been following Rust enough lately to judge this RFC from the technical side, but from the usability/ergonomics side, I think we should hide this semantic behind a magic function and/or macro, then tell people to use the macro. I think it's a bad idea to have a language construct where whether or not a part of an expression is stored in a variable changes defined behavior to undefined behavior.

In this case I understand why we want the semantic. What I'm saying is do both, put the semantic in the rustonomicon or somewhere else suitably advanced, and direct people to the macro instead.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 1, 2018

MIR is not a part of the language, and therefore, this does not require an RFC, afaik. It's simply a way of encoding the semantics we want into the compiler.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Nov 1, 2018

@ubsan From my reading of this RFC, this affects Rust's abstract machine (in the sense that it makes some things defined behavior and some things explicitly UB...) and makes certain things that weren't hard errors into hard errors. Thus, it requires an RFC.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 1, 2018

@Centril I disagree - it was always the case that we wanted &<lvalue expression> as *const T to be defined forall lvalue expressions, at least imo. The abstract machine isn't defined enough for this to be necessary, this should be done in the Unsafe Guidelines WG.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 1, 2018

I might not be able to keep track of the discussion but I should at least note down two things I thought of:

  • implicit reference -> pointer coercion is just as important, if not more
  • we don't have to make this about syntax, we could talk about the reference value being used always as the input to a coercion/cast to a raw pointer
    • we can compute this before building MIR, so MIR would still have a special operation
    • note that for a &expr expression being directly coerced/cast, this is trivially always the case, making this a superset of the original proposal
    • but also, these would be equivalent (iff r is never named again):
let r = &*p;
let p2 = r as *const _;
let p3 = r as *const _;
let p2 = &*p as *const _;
let p3 = &*p as *const _;

A more realistic scenario that this could enable is r being passed to two function calls that take raw pointers (so there are coercions in the caller, similar to the explicit casts above).

We might even be able to relax this so other operations involving the reference are also allowed, but I haven't fully considered the implications.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 1, 2018

I do believe an RFC is appropriate here -- but perhaps I mistake the purpose of the RFC. I presumed that the RFC is more about the surface Rust syntax that generates this new MIR operator than the operator itself.

It seems to me that there are four main alternatives here:

  • What this RFC advocates, which is using &foo in conjunction with some other operator (as).
    • We probably do want to decide to what extent coercions apply here too.
  • Adding a very magic intrinsic which treats its argument as a place (lvalue) expression and not a value (rvalue) expression. That seems strictly more surprising to me.
  • Adding a brand new operator (ref x in an expression, for example) that produces a *mut directly.
  • Find some way to defer the validation of an &T reference which is created until it is first used or otherwise somehow is "confirmed" as a safe reference. That is basically the same as this RFC, which is proposing a fairly conservative rule in this regard (one that I think we could loosen in the future, if I'm not mistaken?).

Of these, I currently prefer the first. @RalfJung, am I correct that this would leave us room in the future to extend the rules to more cases (e.g., if coerced etc)?

One open question mark for me is how best to manage the design process here. This applies to all the efforts of the Unsafe Code Guidelines work, in my view. I will leave a separate comment on how I think that should work, actually.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 1, 2018

Regarding the meta question of how the UCG group should be approaching RFCs. The challenge is that we have a lot of "little pieces" that have to be put together that affect one another. I don't think we should be working out a huge "all or nothing" proposal, but I also think it's hard to reason about random RFCs for one tiny piece without seeing the whole picture.

I think I'd like to compromise somehow but having

  • a sketch of the current big picture in the UCG repo
  • RFCs like this for key pieces that are collected in a sort of "preliminary state"
    • like any RFC, I would expect this decision to have a "stabilization FCP" before it becomes final
    • and I would want to do that once we have the "stacked borrows" concept worked out (or whichever concept we end up with), so that you can see the whle

I'm not entirely sure where these RFCs should be opened. I'd like to be trying out the "phase RFC" procedure, which I think basically implies that we are opening up RFCs on this repo as we move between "phases" -- so this RFC in particular might correspond to moving from the "proposal" (nee spitballing) to "prototyping" -- so it's not really a final decision.

But anyway I suppose this isn't the right forum for this comment. Perhaps I should open an internals thread to talk about it. I think it'd be good to figure this out, though, as it seems relevant to a lot of things the UCG team will be producing.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 1, 2018

I would say the best thing would probably be that the result of the &t expression shouldn't cause UB until it is bound to an &T value somehow; if it's bound to a *const T value, it'll be okay; but all of the following should be UB:

fn foo() -> &T {
  &<invalid lvalue expression>
}

let x = &<invalid lvalue expression>;

fn bar(x: &T) {}
bar(&<invalid lvalue expression>); // note: this is UB at the call site, not in bar

&<invalid lvalue expression>;
  // this is equivalent to `let __tmp = &t; drop(__tmp);
&<invalid lvalue expression> as &T as *const T
  // this creates a value of reference type with the 'as &T'
@AllenChong

This comment has been minimized.

Copy link

AllenChong commented Nov 2, 2018

👍

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 2, 2018

@ubsan

MIR is not a part of the language, and therefore, this does not require an RFC, afaik. It's simply a way of encoding the semantics we want into the compiler.

This RFC is mostly about changing/specifying those semantics. MIR is a good way to do that, as it provides a much clearer language to talk about what happens when a Rust program gets executed than if we tried to do the same thing in the surface language.

This is not the first RFC to talk about MIR or use MIR as means of specification, either.

it was always the case that we wanted & as *const T to be defined forall lvalue expressions

Which RFC/reference is saying that?

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 2, 2018

implicit reference -> pointer coercion is just as important, if not more

I do not understand. This RFC is not primarily about the "cast" part of "take-ref-then-cast", it is about the "take-ref" part.

Or are you saying that we should also compile let x: *const _ = &packed.field; to taking a raw reference? I would agree with that.

we don't have to make this about syntax, we could talk about the reference value being used always as the input to a coercion/cast to a raw pointer

This is about defining the semantics, which starts with the syntax.
If we make this about how the reference value is used, then fundamentally we are allowing an unaligned value at reference type to exist. We could decide that, but then we'd have to dial back the amount of aligned and dereferencable annotations we are sending to LLVM.

@RalfJung, am I correct that this would leave us room in the future to extend the rules to more cases (e.g., if coerced etc)?

This RFC is first and foremost about adding such a primitive operation to the MIR -- acknowledging the need for it. Secondly, it is about how the user can write code that ends up generating this operation. It would certainly be easy to later add more cases where we take a raw reference instead of a safe one, that can only make more code defined.
(Or, to use terminology more like what @ubsan would likely use, this RFC is first and foremost about acknowledging that "taking a raw reference" is a primitive operation that exists in Rust, and about specifying some situations where this operation is what is encoded by surface Rust. I am using MIR for terminology here merely as a more precise way to talk about the semantics of Rust: In my experience and in what I see in several other language specifications that actually had some success with formalization, semantics of a complex language are best specified by means of translation to a simpler language.)

I don't think we should be working out a huge "all or nothing" proposal, but I also think it's hard to reason about random RFCs for one tiny piece without seeing the whole picture.

I felt this is somewhat different from the kind of "let us decide about this type's layout/invariant"-RFC that we (UCG) will likely be producing eventually: The invariant this refers to is already encoded in rustc, and the RFC proposes to add a new ingredient to our "toolbox for defining surface language semantics", namely taking a raw borrow.

the result of the &t expression shouldn't cause UB until it is bound to an &T value somehow

I and the RFC agree these should all be UB. More interesting would be some examples you think should not be UB that the RFC leaves UB.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Nov 2, 2018

Another possibility, just so that it doesn't go unmentioned:

Currently the &x operator produces an &T, with &T having an implicit coercion into *const T. Another way to go would be to reframe this as &x being a kind of polymorphic literal similarly to how numeric constants work, with its type initially being a type metavariable, until further constraints determine whether it is actually &T or *const T -- with a fallback default to the former if unconstrained, much like how numeric constants default to i32.

This might be a way to codify the "result of the &t expression shouldn't cause UB until it is bound to an &T value somehow" intuition.

It's not obvious to me whether there would be backwards compatibility edge cases to worry about?

(To be clear this neither would nor could entirely replace the reference-to-pointer coercion, as that applies to any expression, whereas this only applies to literals.)

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Nov 2, 2018

This seems really important to have. I even have a crate which already relies on this behaviour. One thing I am concerned about is the implicitness - it would be very easy for someone not familiar with this corner case to come along and accidentally refactor the code in a way which "broke" the guarantee, even with changes that would in most languages be semantically identical.

As a way to solve this, would it make sense to relax the constraint on values being initialised if they are only used in this way?

eg. start allowing:

fn main() {
    let x: u32;
    
    let y = &x as *const _;
    println!("{:?}", y);
}

This would allow writing code that would turn into a hard error if you somehow tried to actually do something with the value, and would also make it possible to calculate field offsets from completely safe code: we would no longer have to do hacks with mem::uninitialized to pretend to construct a value.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 2, 2018

We could decide that, but then we'd have to dial back the amount of aligned and dereferencable annotations we are sending to LLVM.

But the only cases where we'd not treat a borrow as being of a reference type is if it's only ever used as a raw pointer, in my value-use-based model. Anything else (including calling a function with the reference as an argument) would still impose reference requirements.

I actually think @glaebhoerl's formulation of a polymorphic borrow operator is equivalent to mine in behavior but it might be easier to implement it as an analysis in order to construct the MIR (as I've suggested) instead of during type-checking.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 2, 2018

@RalfJung yeah, alright that makes sense. I don't believe there are any examples on which me and the RFC disagree, although if you can think of any, that'd be useful to know. I would like to see that list put in the RFC, if you don't mind, since I feel it gives a nice overview from a Rust pov; I also don't feel the transform is defined well enough

I think discussing "binding to reference values" is probably more useful, instead of talking about "the same statement", since that would make your let example valid if written like this:

let x: *const T = {
  let y: &T = &*null;
  y as *const T
};

since y and the cast to *const T are in the same statement.

It's also important to think about when the coercion actually occurs - for example, does

let x: *const T = {&*null};

translate to

let x: *const T = {&*null} as *const T;
// or
let x: *const T = {(&*null) as *const T};

if the former, it should be UB. If the latter, it should not be.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 2, 2018

@glaebhoerl

It's not obvious to me whether there would be backwards compatibility edge cases to worry about?

Certainly we could not today just make &x yield up an inference variable with fallback: fallback occurs quite late in the type inferencing stage, and we sometimes need to know types before that (e.g., with the . operator).

I do sort of like the idea of making &x an "overloaded operator", and we definitely have need of improving our handling of type inferencing in any case, so it's conceivable it may be possible at some point. (I would like in general to make our coercions less eager, which carries precisely the same challenges.)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 2, 2018

@RalfJung

This RFC is first and foremost about adding such a primitive operation to the MIR -- acknowledging the need for it. Secondly, it is about how the user can write code that ends up generating this operation.

Yes. I'm not sure I agree about the relative importance of those two points, but it doesn't really matter very much.

I would argue that simply from a backwards compatibility point of view we really want to make &x as *const u32 "work", because there is tons of code in the wild that does this. (The same argument applies I suspect to coercions like let p: *const T = &x.)

There seem to be two "basic ways" we can make such code work:

  • Statically, as you propose: we identify (somehow) some set of cases where &x is really an "raw borrow" operation that is semantically distinct. We could in theory even add an explicit operator for this.
  • or dynamically, where there is one "borrow operation" and we somehow define the machine semantics such that insta-UB does not occur.

I don't think anybody has a good idea how to make the dynamic idea work. It seems "imaginable", though. To start, however, we'd probably have to remove the various annotations we give to LLVM.

Presuming that we are going to go with the static option, then we return to: what is this static subset? As I wrote above, I think that for backwards compat reasons it basically has to include cases where the &u32 is coerced to a raw pointer, or we will rule out a lot of extant code.

It is conceivable that we might go further and add an explicit Rust syntax for this. I had a strawman proposal of ref <place> -- so e.g. let p: *mut _ = &x would be written let p = ref x. You could then imagine deprecating the "two part" syntax and linting for people to write ref explicitly. (Note that I am not proposing this, just pointing out that it is a possible future extension of this RFC.)

Does all that make sense? Do we all agree that &x as *const _ must work even for potentially unaligned things, just for backwards compatibility reasons? (I suppose it is probably technically UB today, but still..)

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 2, 2018

@nikomatsakis I would argue that it should be valid for all (possibly-invalid) lvalue expressions, since we have no guarantees on raw pointers - i.e., let x = null(); let y = &*x as *const T; should, indeed, be valid, imo.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

@eddyb

But the only cases where we'd not treat a borrow as being of a reference type is if it's only ever used as a raw pointer, in my value-use-based model. Anything else (including calling a function with the reference as an argument) would still impose reference requirements.

But in let x: &T = &s.field; there is a use a a reference: The value gets copied into x at type &T. What you are suggesting then is to NOT desugar let x = &s.field as *const _ into let _tmp = &s.field; let x = _tmp as *const _;, and that is exactly what I am proposing as well. I am just trying to be precise about what happens, whereas I cannot deduce a precise spec from what you said.

Another way to go would be to reframe this as &x being a kind of polymorphic literal similarly to how numeric constants work, with its type initially being a type metavariable, until further constraints determine whether it is actually &T or *const T -- with a fallback default to the former if unconstrained, much like how numeric constants default to i32.

It seems to me that my proposal is a prerequisite for yours. You are also suggesting that there be a way to create a raw pointer to a field without creating an intermediate reference. We need a way to represent your inference after some kind of desugaring -- we need a primitive operation to "take a raw reference". You are just going further than I did in terms of when we use that operation, i.e. when we take a raw reference vs a safe one.

I will add a remark to the RFC saying that we might want to use the new operation for more cases. But I do not see a way to realize any of the proposals (by @glaebhoerl and @eddyb) without having this new operation that is distinct from any operation we can express so far; and if we do have such an operation it should explicitly show up in the MIR. Making such things explicit is part of what MIR is about.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

My problem with this stronger inference proposed by @glaebhoerl is that if someone relies on this behavior, there is a danger of accidentally adding a non-raw-ptr use to a reference, which would then rather subtly make the program have UB. If we say that you have to cast immediately, things cannot be correct for "subtle" reasons. But I guess we could have a lint against any "taking a raw reference" that is not immediately followed by a cast: Then, more existing code works (because we take raw references in more cases), but it is less likely that people will accidentally break their code because they relied on this behavior.

We might even make this an err-by-default lint after some transition period?


@ubsan

since that would make your let example valid if written like this:

let x: *const T = {
  let y: &T = &*null;
  y as *const T
};

since y and the cast to *const T are in the same statement.

I never said that the new operation is used when the cast happens in the same statement. I said that it is used when the reference is "immediately cast [...] to a raw pointer":

When translating HIR to MIR, we recognize &[mut] <place> as *[mut|const] _ as
a special pattern and turn it into a single MIR Rvalue that takes the address
and produces it as a raw pointer

Anyway, I will add some examples.

let x: *const T = {&*null};

That is an interesting example indeed. During my experiments, I noticed a similar problem with implicit coercions, namely coercion &mut to *const will happen through & -- which is a problem because I plan to assign meaning to a mut-to-shr cast (namely, this is when the location gets frozen so it may not be mutated again).

I would argue that it should be valid for all (possibly-invalid) lvalue expressions, since we have no guarantees on raw pointers - i.e., let x = null(); let y = &*x as *const T; should, indeed, be valid, imo.

Not sure which part of @nikomatsakis posts' you are referring to here, but assuming null() returns a raw pointer, I agree that code has defined behavior.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

@nikomatsakis Very well put, I do not have anything to add. :)

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 4, 2018

@RalfJung What I'm suggesting is a pre-MIR analysis, not considering those copies.

Alternatively, a dataflow analysis on MIR, where copies are considered noops, and which rewrites the MIR to "weaken" borrows, as needed.

Something I have not considered is interaction with mutable state, but I suspect dataflow analysis would be able to understand that.

I will add a remark to the RFC saying that we might want to use the new operation for more cases.

Yes, I never said anything about the MIR operation not being needed, but rather that the syntactic condition for producing it, could be relaxed to something more general.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

Yes, I never said anything about the MIR operation not being needed, but rather that the syntactic condition for producing it, could be relaxed to something more general.

I see, okay. I do not fully understand which syntactic condition you have in mind, but if the result is that at some point (pre-borrowck, I would guess) we have the result of this inference encoded explicitly in the MIR, then I think I am fine.

I'd prefer this condition to be as simple and hence predictable as possible, but I'd be basically satisfied with any syntactic condition.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Jan 23, 2019

@joshtriplett

I don't want to check a box for "merge" based on a github comment that goes in a completely different direction than the RFC and that hasn't been reflected in the RFC.

Could you elaborate on which aspects "what's being merged" deviates from the RFC ?

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Jan 23, 2019

@RalfJung With a good nights sleep and further consideration, I agree that the concerns of surface syntax and underlying MIR syntax are orthogonal enough to prioritize having at least one surely defined way of getting a pointer to a place without an intermediate reference. Even when the surface level considerations for these are not fully generalized but fit the common and obvious ways 👍

There's one curious paragraph here:

let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior

There is no situation in which the above code is correct, and hence it is a hard error to write this.

What about:

assert!(std::mem::align_of<Field>() == 1);
let x = unsafe { &packed.field };

That looks like correctly, manually upholding the alignment requirements (even though execution will rarely reach the unsafe code). Assuming it is correctly initialized through other means.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Jan 23, 2019

@rfcbot reviewed

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 23, 2019

That looks like correctly, manually upholding the alignment requirements (even though execution will rarely reach the unsafe code). Assuming it is correctly initialized through other means.

Fair enough, I edited the RFC.

(I almost missed this because you edited this in later, and so this wasn't in the email notification. When substantially changing the content of a post, it is better to add a second reply than edit the existing one, so people reading along via email don't miss the edit.)

@mjbshaw

This comment has been minimized.

Copy link

mjbshaw commented Jan 23, 2019

Just to clarify, this doesn't offer any protections against accidentally going through Deref (which would create a proper (non-raw) reference), right? So &mut (*x).field as *mut _ might be well-defined behavior or not, depending on the context and whether the field access requires going through DerefMut.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Jan 23, 2019

Misunderstanding on my part, between MIR primitives and surface syntax. Sorry for the confusion.

@rfcbot resolved rfc-text-does-not-reflect-what-is-being-merged
@rfcbot reviewed

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 23, 2019

@mjbshaw Deref does not automatically happen for raw pointers. I don't think the case you describe can happen, but if you find an example let us know!

@joshtriplett Do you have suggestions for how to improve the wording in the RFC to help avoid such confusion?

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Jan 23, 2019

@RalfJung I'm not sure if the edited version is undefined enough. std::mem::align_of::<Packed>() == 1 and thus it could, in theory, on some target get a stack place with uneven address. But there is a more pressing point about this. I don't think we should ever refuse to compile this. Specifically, this really is definitely undefined on most targets only because it is a value on the stack. But I could manually allocate some global memory and place such a struct at an uneven address, and thus align the second member. Then get a reference to it and I get type-level equivalent code that must be able to compile. This is exactly the kind of bit mangling and trickery unsafe code is really necessary and useful for.

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Jan 23, 2019

Besides, I think the reasoning on MIR level alone should be a compelling reason to see this in a positive light. I think the introduction should focus instead on the core point by using a union or pointer to uninitialized memory to make the same point. That references must not be involved at intermediate steps but current MIR forces them to be created.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 23, 2019

I'm not sure if the edited version is undefined enough. std::mem::align_of::() == 1 and thus it could, in theory, on some target get a stack place with uneven address.

You cannot make any assumptions about stack layout. This is UB enough.

I don't think we should ever refuse to compile this. Specifically, this really is definitely undefined on most targets only because it is a value on the stack. But I could manually allocate some global memory and place such a struct at an uneven address, and thus align the second member. Then get a reference to it and I get type-level equivalent code that must be able to compile. This is exactly the kind of bit mangling and trickery unsafe code is really necessary and useful for.

Then just write &mut packed.field as *mut T. That avoids the unaligned reference and hence will remain allowed.

But maybe a hard error is indeed too strict -- such details can be resolved during stabilization.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Jan 23, 2019

@RalfJung RalfJung changed the title RFC for an operator to take a raw reference RFC for a MIR operator to take a raw reference Jan 23, 2019

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 23, 2019

Ah, that's the confusion.

I edited the PR title. I am not sure if it is okay to rename the file this late in the process? @Centril?

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 23, 2019

I edited the PR title. I am not sure if it is okay to rename the file this late in the process? @Centril?

It's totally fine (remember to update the PR description too tho) :)

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 24, 2019

It's totally fine

Done.

@mjbshaw

This comment has been minimized.

Copy link

mjbshaw commented Jan 24, 2019

@mjbshaw Deref does not automatically happen for raw pointers. I don't think the case you describe can happen, but if you find an example let us know!

@RalfJung But in the expression &mut (*x).field as *mut _, x is being dereferenced, so it's eligible for auto Deref. The last line of foo() in this code goes through DerefMut.

To be clear, I'm not objecting to this RFC at all (I'm actually really excited for it!). I'm just trying to make sure I understand potential accidents that could happen. If I understand this RFC correctly, the intention is for the expression &mut (*x).field as *mut _ to be well-defined: it creates a pointer to field without creating any intermediate references (please correct me if I've misunderstood the RFC and discussion). But that's only true if .field isn't obtained through Deref/DerefMut, because going through those traits most definitely requires creating a real reference. The code &mut (*x).field as *mut _ gives no indication whether it's going through Deref/DerefMut (and thus whether or not there are any intermediate references created), and so the onus is on the author to ensure that .field is accessible without going through Deref (both at the initial time of writing that expression, as well as later when updating code or crate dependencies). If developers aren't careful, they might accidentally create a reference they had been hoping to avoid.

I just wanted to make sure I'm understanding the caveat correctly (because it's entirely possible (and probable!) I'm misunderstanding something).

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 25, 2019

But in the expression &mut (*x).field as *mut _, x is being dereferenced, so it's eligible for auto Deref. The last line of foo() in this code goes through DerefMut.

Oh, that is interesting. I was not aware of this. So once we have a place, even if it comes from a raw ptr, we'll auto-deref further.

If I understand this RFC correctly, the intention is for the expression &mut (*x).field as *mut _ to be well-defined:

The intention is for that post-expansion expression to be well-defined. The meaning of programs, IMO, is best defined e.g. on MIR. We can then try to back-translate that to the surface language.

In your example, the pointer is turned into a reference to call deref_mut. The post-expansion code is something like

&mut DerefMut::deref_mut(&*x).field as *mut _

So, a reference is materialized, and that reference must be valid (aligned and dereferencable and possibly more). I don't see any other reasonable definition.

I just wanted to make sure I'm understanding the caveat correctly (because it's entirely possible (and probable!) I'm misunderstanding something).

I think you are. And thanks for asking! This is a catch. I will try to incorporate it into the RFC text. I think it'd also be worth having a lint for this? Basically, if the only reason a reference is created is that auto-deref kicked in, that seems problematic. It is probably too late to make auto-deref less aggressive...

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 25, 2019

@rfcbot reviewed

Strongly in favour.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 26, 2019

I added some comments about the interaction with auto-deref.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 26, 2019

@rfcbot concern centril-needs-time-to-reread-reflect

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Jan 26, 2019

I added some comments about the interaction with auto-deref.

I agree that a lint would be the right solution here.

@varkor

This comment has been minimized.

Copy link
Member

varkor commented Feb 4, 2019

@rfcbot reviewed

1 similar comment
@estebank

This comment has been minimized.

Copy link
Contributor

estebank commented Feb 5, 2019

@rfcbot reviewed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment