Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for an operator to take a raw reference #2582

Open
wants to merge 20 commits into
base: master
from

Conversation

Projects
None yet
@RalfJung
Copy link
Member

RalfJung commented Nov 1, 2018

Introduce new variants of the & operator: &raw mut <place> to create a *mut <T>, and &raw const <place> to create a *const <T>. This creates a raw pointer directly, as opposed to the already existing &mut <place> as *mut _/&<place> as *const _, which create a temporary reference and then cast that to a raw pointer. As a consequence, the existing expressions <term> as *mut <T> and <term> as *const <T> where <term> has reference type are equivalent to &raw mut *<term> and &raw const *<term>, respectively. Moreover, add a lint to existing code that could use the new operator, and treat existing code that creates a reference and immediately casts or coerces it to a raw pointer as if it had been written with the new syntax.

As an option, we could treat &mut <place> as *mut _/&<place> as *const _ as if they had been written with &raw to avoid creating temporary references when that was likely not the intention.

Rendered

The RFC got half-rewritten; click here to jump to the beginning of the post-rewrite discussion.

Cc @Centril @rust-lang/wg-unsafe-code-guidelines

@camlorn

This comment has been minimized.

Copy link

camlorn commented Nov 1, 2018

I've not been following Rust enough lately to judge this RFC from the technical side, but from the usability/ergonomics side, I think we should hide this semantic behind a magic function and/or macro, then tell people to use the macro. I think it's a bad idea to have a language construct where whether or not a part of an expression is stored in a variable changes defined behavior to undefined behavior.

In this case I understand why we want the semantic. What I'm saying is do both, put the semantic in the rustonomicon or somewhere else suitably advanced, and direct people to the macro instead.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 1, 2018

MIR is not a part of the language, and therefore, this does not require an RFC, afaik. It's simply a way of encoding the semantics we want into the compiler.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Nov 1, 2018

@ubsan From my reading of this RFC, this affects Rust's abstract machine (in the sense that it makes some things defined behavior and some things explicitly UB...) and makes certain things that weren't hard errors into hard errors. Thus, it requires an RFC.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 1, 2018

@Centril I disagree - it was always the case that we wanted &<lvalue expression> as *const T to be defined forall lvalue expressions, at least imo. The abstract machine isn't defined enough for this to be necessary, this should be done in the Unsafe Guidelines WG.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 1, 2018

I might not be able to keep track of the discussion but I should at least note down two things I thought of:

  • implicit reference -> pointer coercion is just as important, if not more
  • we don't have to make this about syntax, we could talk about the reference value being used always as the input to a coercion/cast to a raw pointer
    • we can compute this before building MIR, so MIR would still have a special operation
    • note that for a &expr expression being directly coerced/cast, this is trivially always the case, making this a superset of the original proposal
    • but also, these would be equivalent (iff r is never named again):
let r = &*p;
let p2 = r as *const _;
let p3 = r as *const _;
let p2 = &*p as *const _;
let p3 = &*p as *const _;

A more realistic scenario that this could enable is r being passed to two function calls that take raw pointers (so there are coercions in the caller, similar to the explicit casts above).

We might even be able to relax this so other operations involving the reference are also allowed, but I haven't fully considered the implications.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 1, 2018

I do believe an RFC is appropriate here -- but perhaps I mistake the purpose of the RFC. I presumed that the RFC is more about the surface Rust syntax that generates this new MIR operator than the operator itself.

It seems to me that there are four main alternatives here:

  • What this RFC advocates, which is using &foo in conjunction with some other operator (as).
    • We probably do want to decide to what extent coercions apply here too.
  • Adding a very magic intrinsic which treats its argument as a place (lvalue) expression and not a value (rvalue) expression. That seems strictly more surprising to me.
  • Adding a brand new operator (ref x in an expression, for example) that produces a *mut directly.
  • Find some way to defer the validation of an &T reference which is created until it is first used or otherwise somehow is "confirmed" as a safe reference. That is basically the same as this RFC, which is proposing a fairly conservative rule in this regard (one that I think we could loosen in the future, if I'm not mistaken?).

Of these, I currently prefer the first. @RalfJung, am I correct that this would leave us room in the future to extend the rules to more cases (e.g., if coerced etc)?

One open question mark for me is how best to manage the design process here. This applies to all the efforts of the Unsafe Code Guidelines work, in my view. I will leave a separate comment on how I think that should work, actually.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 1, 2018

Regarding the meta question of how the UCG group should be approaching RFCs. The challenge is that we have a lot of "little pieces" that have to be put together that affect one another. I don't think we should be working out a huge "all or nothing" proposal, but I also think it's hard to reason about random RFCs for one tiny piece without seeing the whole picture.

I think I'd like to compromise somehow but having

  • a sketch of the current big picture in the UCG repo
  • RFCs like this for key pieces that are collected in a sort of "preliminary state"
    • like any RFC, I would expect this decision to have a "stabilization FCP" before it becomes final
    • and I would want to do that once we have the "stacked borrows" concept worked out (or whichever concept we end up with), so that you can see the whle

I'm not entirely sure where these RFCs should be opened. I'd like to be trying out the "phase RFC" procedure, which I think basically implies that we are opening up RFCs on this repo as we move between "phases" -- so this RFC in particular might correspond to moving from the "proposal" (nee spitballing) to "prototyping" -- so it's not really a final decision.

But anyway I suppose this isn't the right forum for this comment. Perhaps I should open an internals thread to talk about it. I think it'd be good to figure this out, though, as it seems relevant to a lot of things the UCG team will be producing.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 1, 2018

I would say the best thing would probably be that the result of the &t expression shouldn't cause UB until it is bound to an &T value somehow; if it's bound to a *const T value, it'll be okay; but all of the following should be UB:

fn foo() -> &T {
  &<invalid lvalue expression>
}

let x = &<invalid lvalue expression>;

fn bar(x: &T) {}
bar(&<invalid lvalue expression>); // note: this is UB at the call site, not in bar

&<invalid lvalue expression>;
  // this is equivalent to `let __tmp = &t; drop(__tmp);
&<invalid lvalue expression> as &T as *const T
  // this creates a value of reference type with the 'as &T'
@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 2, 2018

@ubsan

MIR is not a part of the language, and therefore, this does not require an RFC, afaik. It's simply a way of encoding the semantics we want into the compiler.

This RFC is mostly about changing/specifying those semantics. MIR is a good way to do that, as it provides a much clearer language to talk about what happens when a Rust program gets executed than if we tried to do the same thing in the surface language.

This is not the first RFC to talk about MIR or use MIR as means of specification, either.

it was always the case that we wanted & as *const T to be defined forall lvalue expressions

Which RFC/reference is saying that?

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 2, 2018

implicit reference -> pointer coercion is just as important, if not more

I do not understand. This RFC is not primarily about the "cast" part of "take-ref-then-cast", it is about the "take-ref" part.

Or are you saying that we should also compile let x: *const _ = &packed.field; to taking a raw reference? I would agree with that.

we don't have to make this about syntax, we could talk about the reference value being used always as the input to a coercion/cast to a raw pointer

This is about defining the semantics, which starts with the syntax.
If we make this about how the reference value is used, then fundamentally we are allowing an unaligned value at reference type to exist. We could decide that, but then we'd have to dial back the amount of aligned and dereferencable annotations we are sending to LLVM.

@RalfJung, am I correct that this would leave us room in the future to extend the rules to more cases (e.g., if coerced etc)?

This RFC is first and foremost about adding such a primitive operation to the MIR -- acknowledging the need for it. Secondly, it is about how the user can write code that ends up generating this operation. It would certainly be easy to later add more cases where we take a raw reference instead of a safe one, that can only make more code defined.
(Or, to use terminology more like what @ubsan would likely use, this RFC is first and foremost about acknowledging that "taking a raw reference" is a primitive operation that exists in Rust, and about specifying some situations where this operation is what is encoded by surface Rust. I am using MIR for terminology here merely as a more precise way to talk about the semantics of Rust: In my experience and in what I see in several other language specifications that actually had some success with formalization, semantics of a complex language are best specified by means of translation to a simpler language.)

I don't think we should be working out a huge "all or nothing" proposal, but I also think it's hard to reason about random RFCs for one tiny piece without seeing the whole picture.

I felt this is somewhat different from the kind of "let us decide about this type's layout/invariant"-RFC that we (UCG) will likely be producing eventually: The invariant this refers to is already encoded in rustc, and the RFC proposes to add a new ingredient to our "toolbox for defining surface language semantics", namely taking a raw borrow.

the result of the &t expression shouldn't cause UB until it is bound to an &T value somehow

I and the RFC agree these should all be UB. More interesting would be some examples you think should not be UB that the RFC leaves UB.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Nov 2, 2018

Another possibility, just so that it doesn't go unmentioned:

Currently the &x operator produces an &T, with &T having an implicit coercion into *const T. Another way to go would be to reframe this as &x being a kind of polymorphic literal similarly to how numeric constants work, with its type initially being a type metavariable, until further constraints determine whether it is actually &T or *const T -- with a fallback default to the former if unconstrained, much like how numeric constants default to i32.

This might be a way to codify the "result of the &t expression shouldn't cause UB until it is bound to an &T value somehow" intuition.

It's not obvious to me whether there would be backwards compatibility edge cases to worry about?

(To be clear this neither would nor could entirely replace the reference-to-pointer coercion, as that applies to any expression, whereas this only applies to literals.)

@Diggsey

This comment has been minimized.

Copy link
Contributor

Diggsey commented Nov 2, 2018

This seems really important to have. I even have a crate which already relies on this behaviour. One thing I am concerned about is the implicitness - it would be very easy for someone not familiar with this corner case to come along and accidentally refactor the code in a way which "broke" the guarantee, even with changes that would in most languages be semantically identical.

As a way to solve this, would it make sense to relax the constraint on values being initialised if they are only used in this way?

eg. start allowing:

fn main() {
    let x: u32;
    
    let y = &x as *const _;
    println!("{:?}", y);
}

This would allow writing code that would turn into a hard error if you somehow tried to actually do something with the value, and would also make it possible to calculate field offsets from completely safe code: we would no longer have to do hacks with mem::uninitialized to pretend to construct a value.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 2, 2018

We could decide that, but then we'd have to dial back the amount of aligned and dereferencable annotations we are sending to LLVM.

But the only cases where we'd not treat a borrow as being of a reference type is if it's only ever used as a raw pointer, in my value-use-based model. Anything else (including calling a function with the reference as an argument) would still impose reference requirements.

I actually think @glaebhoerl's formulation of a polymorphic borrow operator is equivalent to mine in behavior but it might be easier to implement it as an analysis in order to construct the MIR (as I've suggested) instead of during type-checking.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 2, 2018

@RalfJung yeah, alright that makes sense. I don't believe there are any examples on which me and the RFC disagree, although if you can think of any, that'd be useful to know. I would like to see that list put in the RFC, if you don't mind, since I feel it gives a nice overview from a Rust pov; I also don't feel the transform is defined well enough

I think discussing "binding to reference values" is probably more useful, instead of talking about "the same statement", since that would make your let example valid if written like this:

let x: *const T = {
  let y: &T = &*null;
  y as *const T
};

since y and the cast to *const T are in the same statement.

It's also important to think about when the coercion actually occurs - for example, does

let x: *const T = {&*null};

translate to

let x: *const T = {&*null} as *const T;
// or
let x: *const T = {(&*null) as *const T};

if the former, it should be UB. If the latter, it should not be.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 2, 2018

@glaebhoerl

It's not obvious to me whether there would be backwards compatibility edge cases to worry about?

Certainly we could not today just make &x yield up an inference variable with fallback: fallback occurs quite late in the type inferencing stage, and we sometimes need to know types before that (e.g., with the . operator).

I do sort of like the idea of making &x an "overloaded operator", and we definitely have need of improving our handling of type inferencing in any case, so it's conceivable it may be possible at some point. (I would like in general to make our coercions less eager, which carries precisely the same challenges.)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 2, 2018

@RalfJung

This RFC is first and foremost about adding such a primitive operation to the MIR -- acknowledging the need for it. Secondly, it is about how the user can write code that ends up generating this operation.

Yes. I'm not sure I agree about the relative importance of those two points, but it doesn't really matter very much.

I would argue that simply from a backwards compatibility point of view we really want to make &x as *const u32 "work", because there is tons of code in the wild that does this. (The same argument applies I suspect to coercions like let p: *const T = &x.)

There seem to be two "basic ways" we can make such code work:

  • Statically, as you propose: we identify (somehow) some set of cases where &x is really an "raw borrow" operation that is semantically distinct. We could in theory even add an explicit operator for this.
  • or dynamically, where there is one "borrow operation" and we somehow define the machine semantics such that insta-UB does not occur.

I don't think anybody has a good idea how to make the dynamic idea work. It seems "imaginable", though. To start, however, we'd probably have to remove the various annotations we give to LLVM.

Presuming that we are going to go with the static option, then we return to: what is this static subset? As I wrote above, I think that for backwards compat reasons it basically has to include cases where the &u32 is coerced to a raw pointer, or we will rule out a lot of extant code.

It is conceivable that we might go further and add an explicit Rust syntax for this. I had a strawman proposal of ref <place> -- so e.g. let p: *mut _ = &x would be written let p = ref x. You could then imagine deprecating the "two part" syntax and linting for people to write ref explicitly. (Note that I am not proposing this, just pointing out that it is a possible future extension of this RFC.)

Does all that make sense? Do we all agree that &x as *const _ must work even for potentially unaligned things, just for backwards compatibility reasons? (I suppose it is probably technically UB today, but still..)

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Nov 2, 2018

@nikomatsakis I would argue that it should be valid for all (possibly-invalid) lvalue expressions, since we have no guarantees on raw pointers - i.e., let x = null(); let y = &*x as *const T; should, indeed, be valid, imo.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

@eddyb

But the only cases where we'd not treat a borrow as being of a reference type is if it's only ever used as a raw pointer, in my value-use-based model. Anything else (including calling a function with the reference as an argument) would still impose reference requirements.

But in let x: &T = &s.field; there is a use a a reference: The value gets copied into x at type &T. What you are suggesting then is to NOT desugar let x = &s.field as *const _ into let _tmp = &s.field; let x = _tmp as *const _;, and that is exactly what I am proposing as well. I am just trying to be precise about what happens, whereas I cannot deduce a precise spec from what you said.

Another way to go would be to reframe this as &x being a kind of polymorphic literal similarly to how numeric constants work, with its type initially being a type metavariable, until further constraints determine whether it is actually &T or *const T -- with a fallback default to the former if unconstrained, much like how numeric constants default to i32.

It seems to me that my proposal is a prerequisite for yours. You are also suggesting that there be a way to create a raw pointer to a field without creating an intermediate reference. We need a way to represent your inference after some kind of desugaring -- we need a primitive operation to "take a raw reference". You are just going further than I did in terms of when we use that operation, i.e. when we take a raw reference vs a safe one.

I will add a remark to the RFC saying that we might want to use the new operation for more cases. But I do not see a way to realize any of the proposals (by @glaebhoerl and @eddyb) without having this new operation that is distinct from any operation we can express so far; and if we do have such an operation it should explicitly show up in the MIR. Making such things explicit is part of what MIR is about.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

My problem with this stronger inference proposed by @glaebhoerl is that if someone relies on this behavior, there is a danger of accidentally adding a non-raw-ptr use to a reference, which would then rather subtly make the program have UB. If we say that you have to cast immediately, things cannot be correct for "subtle" reasons. But I guess we could have a lint against any "taking a raw reference" that is not immediately followed by a cast: Then, more existing code works (because we take raw references in more cases), but it is less likely that people will accidentally break their code because they relied on this behavior.

We might even make this an err-by-default lint after some transition period?


@ubsan

since that would make your let example valid if written like this:

let x: *const T = {
  let y: &T = &*null;
  y as *const T
};

since y and the cast to *const T are in the same statement.

I never said that the new operation is used when the cast happens in the same statement. I said that it is used when the reference is "immediately cast [...] to a raw pointer":

When translating HIR to MIR, we recognize &[mut] <place> as *[mut|const] _ as
a special pattern and turn it into a single MIR Rvalue that takes the address
and produces it as a raw pointer

Anyway, I will add some examples.

let x: *const T = {&*null};

That is an interesting example indeed. During my experiments, I noticed a similar problem with implicit coercions, namely coercion &mut to *const will happen through & -- which is a problem because I plan to assign meaning to a mut-to-shr cast (namely, this is when the location gets frozen so it may not be mutated again).

I would argue that it should be valid for all (possibly-invalid) lvalue expressions, since we have no guarantees on raw pointers - i.e., let x = null(); let y = &*x as *const T; should, indeed, be valid, imo.

Not sure which part of @nikomatsakis posts' you are referring to here, but assuming null() returns a raw pointer, I agree that code has defined behavior.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

@nikomatsakis Very well put, I do not have anything to add. :)

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 4, 2018

@RalfJung What I'm suggesting is a pre-MIR analysis, not considering those copies.

Alternatively, a dataflow analysis on MIR, where copies are considered noops, and which rewrites the MIR to "weaken" borrows, as needed.

Something I have not considered is interaction with mutable state, but I suspect dataflow analysis would be able to understand that.

I will add a remark to the RFC saying that we might want to use the new operation for more cases.

Yes, I never said anything about the MIR operation not being needed, but rather that the syntactic condition for producing it, could be relaxed to something more general.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Nov 4, 2018

Yes, I never said anything about the MIR operation not being needed, but rather that the syntactic condition for producing it, could be relaxed to something more general.

I see, okay. I do not fully understand which syntactic condition you have in mind, but if the result is that at some point (pre-borrowck, I would guess) we have the result of this inference encoded explicitly in the MIR, then I think I am fine.

I'd prefer this condition to be as simple and hence predictable as possible, but I'd be basically satisfied with any syntactic condition.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Nov 4, 2018

It seems to me that my proposal is a prerequisite for yours.

I didn't mean to imply otherwise :)

there is a danger of accidentally adding a non-raw-ptr use to a reference, which would then rather subtly make the program have UB

It seems we are concerned about different things, both of which however seem worth being concerned about:

  1. If one cannot visually determine at a glance whether UB is invoked, that is bad

  2. If one performs a transformation (such as lifting a temporary into a let) which seems "obviously benign" but actually ends up introducing UB, that is also bad

I guess the latter is easier to lint against than the former. But otherwise as long as we wish to re-use the & operator for creating raw pointers, it seems like the tension between them is inherent. Which leads me to...


It is conceivable that we might go further and add an explicit Rust syntax for this.

(Once again, this is in the spirit of "so that the option doesn't go unmentioned", and I'm not sure how well I like it.)

How about, as a (drawback) slightly jarring but otherwise (advantage) extremely simple and straightforward alternative: *const x creates a *const T pointing to x, and likewise *mut x creates a *mut T pointing to x?

It is slightly jarring because of course * on its own means dereferencing. But I don't think *const and *mut are otherwise valid syntax in expressions, so it's unambiguous; and it's entirely consistent, and even predictable, with respect to "literal syntax follows type syntax". And I suppose one could argue that the jarringness stone here was originally cast back when we decided to name our reference types after the referencing operator and our pointer types after the dereferencing operator.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Nov 4, 2018

As a non-expert on all of these issues who's hoping for Rust to become the first language with UB where non-wizards can actually figure out whether or not they're invoking UB, I'm strongly in favor of only making the &x as *... syntax do this magic for now. It's very easy for me to understand, remember and apply that rule. At least right now, all of the options for generalizing it feel pretty spooky. We're just too far away from having "complete" unsafe code guidelines, or from knowing how good programmatic UB-checking will end up being in practice.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 18, 2019

We don't warn against unsafe { &mut x.foo as *mut _ }.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 18, 2019

Why would anyone write that code?

What people were writing before was: &mut x.foo and &mut x.foo as *mut _, without unsafe.

Then we added a warning saying that that code probably has UB, and that they should write unsafe { &mut x.foo as *mut _ } instead. However, this warning also has a note that says that that iff &mut x.foo has UB, then unsafe { &mut x.foo as *mut _ } also has UB because right now, both snippets would be creating a misaligned reference.

So why would anyone upgrade their code to use an unsafe { ... } block here ? At the very least, those who read the warning will try to assert! that the field is indeed properly aligned, but there is no practical way to check that in Rust. So if their goal with the fix was to avoid UB, then the unsafe { ... } approach doesn't achieve anything at all.

What people have done instead, is copy the struct fields out of the struct into aligned storage, operate on them, and then copy them back. This shouldn't be necessary, but that's the only reasonable thing to do here in Rust today.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 18, 2019

So if we do a crater run today, the only thing I expect to see is the code that still uses &mut x.foo or &mut x.foo as *mut _ without unsafe, which is probably just unmaintained code that doesn't care about the warning. I don't expect to see much unsafe { &mut x.foo as *mut _ } in practice, but if some people are writing that code, without this RFC we should be probably telling them that the change they made didn't actually fix anything unless they are somehow ensuring that the field is aligned (which again, can't be easily checked in practice).

If we wanted to look up for anything meaningful, we would need to be looking out for code that unnecessarily copies fields of a packed struct into aligned storage, just to create a reference from them, or patterns like that. This is the kind of code that libc has, and what I expect others to be doing as well. All other options don't really solve the problem.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 18, 2019

So why would anyone upgrade their code to use an unsafe { ... } block here ?

Because it is the easiest way to silence the warning.

What people have done instead, is copy the struct fields out of the struct into aligned storage, operate on them, and then copy them back. This shouldn't be necessary, but that's the only reasonable thing to do here in Rust today.

Have they? Can you point at projects (that you are not involved in) that did this change?

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 18, 2019

Can you point at projects (that you are not involved in) that did this change?

I can only point at projects I know, and have already pointed at them. Those either copy the fields back and forth, or just use &mut T as *mut _ without unsafe { }. None of them uses unsafe { &mut T as *mut _ } because doing so does not make any sense.

Because it is the easiest way to silence the warning.

The easiest way to silence the warning is to just write: #![allow(safe_packed_borrows)], but easier than silencing the warning is, for example, doing absolutely nothing - that's actually what some of the projects I know do.


FYI I'm unsubscribing here, this problem has existed for so long that I wonder whether its worth fixing. The status quo allows &mut T as *mut _ without unsafe, this works for me, and #![allow(safe_packed_borrows)] is a minimal change that makes it pleasant. If the status quo ever change it, I might revisit this, but in the mean time there are other issues that are more important to me.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 18, 2019

 I can only point at projects I know, and have already pointed at them. Those either copy the fields back and forth, or just use &mut T as *mut _ without unsafe { }. None of them uses unsafe { &mut T as *mut _ } because doing so does not make any sense.

(The unsafe does not matter for the crater experiment, FWIW. I mentioned it just because it silences the warning.)

Well, the crater run showed that not many people do &mut foo.bar as *mut _ on packed structs. So according to your hypothesis, that'd mean they all moved to doing copies, or maybe there just isn't much affected code to begin with.

The status quo allows &mut T as *mut _ without unsafe

Not really, the status quo says this is UB but we don't exploit that. We are trying to gather data for how often this gets used despite being UB, and it seems the answer is "not very often", at least when you ignore code that is UB for other reasons.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 18, 2019

@gnzlbg

It feels to me that everybody agrees that the minimal incremental step that this RFC proposes is something that we'd like to do anyway, and doing this solves a real problem right now (or last year, had it been merged then): there is no way to create a raw pointer without going through a reference.

Everybody agrees that we want to support this somehow. Based on the crater data I think there's little justification for baking it into the concrete syntax as proposed. The number of regressions in total are quite few and even fewer would be the cases helped by this specific RFC.

@RalfJung

The alternative is to expand this RFC in scope to directly propose a new syntax &raw const and &raw mut.

Yes that would be my strong preference. I think it is clean and easy-to-reason-about surface syntax that is even more ergonomic than &x as const _. We may want to try a crater run looking for &raw which is followed by anything that could possibly be a place expression and see if we can get away with &raw x. It's not that unlikely based on an initial source-graph test.

However, this is quite a different RFC and I think we should re-run the FCP process for that (cc @nikomatsakis).

That still leaves open the question whether &mut foo as *mut _ should be sugar for &raw mut foo, but maybe at that point that could be something to be evaluated after implementation and before stabilization.

I would be willing to accept this on a temporary basis together with linting under the understanding (which will need to make itself into the RFC text) that it will eventually be removed in favor of the new syntax.

@gnzlbg

The easiest way to silence the warning is to just write: #![allow(safe_packed_borrows)], but easier than silencing the warning is, for example, doing absolutely nothing - that's actually what some of the projects I know do.

FYI I'm unsubscribing here, this problem has existed for so long that I wonder whether its worth fixing. The status quo allows &mut T as *mut _ without unsafe, this works for me, and #![allow(safe_packed_borrows)] is a minimal change that makes it pleasant. If the status quo ever change it, I might revisit this, but in the mean time there are other issues that are more important to me.

This is exactly something we don't want to happen and is a reason that allow(foo) should not stop warnings from happening for future compatibility lints. cc @pnkfelix

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 18, 2019

Backing up a bit here, it seems like there is no consensus for the RFC as-is, due to a lack of evidence. Absence of evidence is not evidence of absence; there is still no data on code that uses references to uninitialized data. I am not even that worried about such code because I think such code should be legal anyway; it is @Centril who proposes to make such code illegal -- hence I am somewhat surprised that he is also objecting this RFC, which is the only avenue that we know of to tell people today how to write code that will be compatible with the model he is proposing.

At the same time, we came up with what seems like a possible syntax for creating raw references: &raw mut place and &raw const place. That's awesome! I think I should expand the RFC to make that syntax the main concern. Is it possible to abort the FCP process, so that we can restart a new one once I updated the RFC?

I imagine under this RFC code like &mut place as *mut _ would be linted against, telling people to write &raw mut place instead. The main open question would be whether such code would be additionally implicitly treated as actually equivalent to &raw mut place. The purpose of doing so would be (a) compatibility with existing code (not a lot of code when it comes to packed structs, but an entirely unknown amount of code when it comes to dangling references or references to uninitialized memory), and (b) being able to give people a pattern they can write today, something we can put into the docs of MaybeUninit.

I think the latter point is valuable on its own. Without this special case, once/if &raw mut place stabilizes, only code that is willing to drop support for old Rust versions will be able to use it. That will exclude some fundamental ecosystem crates. Moreover, I would find it very disappointing if we stabilized MaybeUninit without being able to actually initialize a struct field-by-field. IMO that would be failure on our side.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 18, 2019

(I'll address other bits later... but for now:)

Is it possible to abort the FCP process, so that we can restart a new one once I updated the RFC?

@rfcbot cancel

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 18, 2019

@Centril proposal cancelled.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 21, 2019

I think I should expand the RFC to make that syntax the main concern.

I have done that. The RFC is now mainly about introducing a new syntax &raw [const|mut] to create a raw pointer, and makes the desugaring of references immediately cast into raw pointers just an option.

Please read this and tell me what you think!

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Mar 21, 2019

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 21, 2019

A surface syntax was previously mentioned as a "future possibility". However, there were enough reservations against the pattern, while there was also general agreement that an operation like this is needed, and a potentially workable syntax was discovered during the discussion, that I felt a dedicated syntax is the better approach.

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Mar 22, 2019

Somewhat caught me offguard, I was under the impression you want to leave surface syntax open for discussion. I had been looking into using match and patterns for this cause, specifically because it avoids any issues with auto-deref and user defined indexing interfering with readability and expressivity. This seems to have progressed in a similar timeline, also finished today. See #2666

@mjbshaw

This comment has been minimized.

Copy link

mjbshaw commented Mar 22, 2019

Personally, I'd love to see @glaebhoerl's idea adopted instead of new surface syntax.

If that's infeasible, I think I might prefer the alternative option of using an offset_of! macro (one provided by core Rust, not a third-party one) instead of new surface syntax. That way pointers to fields could be made like so:

let uninit: MaybeUninit<T> = MaybeUninit::uninit();
let offset = offset_of!(T, field);
let field_ptr = (uninit.as_mut_ptr() as *mut u8).wrapping_offset(offset) as *mut FieldType;

It's a bit verbose, but a small user-level macro could make it concise.

@RalfJung RalfJung changed the title RFC for a MIR operator to take a raw reference RFC for an operator to take a raw reference Mar 22, 2019

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 22, 2019

Personally, I'd love to see @glaebhoerl's idea adopted instead of new surface syntax.

So that would be basically inferring as part of type inference whether &T is &raw const T or has its current meaning?
I have nowhere near enough knowledge of type inference to say whether that is realistic. I am a bit worried about implicitly inferring things that are this vital for UB. So far, I feel when UB-related things get inferred (such as let x: *const T = &mut t, which creates a shared reference and hence means we must not write through the resulting raw pointer), it was a mistake.

If that's infeasible, I think I might prefer the alternative option of using an offset_of! macro (one provided by core Rust, not a third-party one) instead of new surface syntax. That way pointers to fields could be made like so:

So you are suggesting we entirely stop using the type system when it comes to writing subtle unsafe code? I cannot see how that is a good idea.
I cringe every time I see a raw pointer cast, because every one of them is potentially a transmute and also a footgun for later changes (given that the cast will keep working no matter how the source type changes, as long as it remains a raw pointer). (uninit.as_mut_ptr() as *mut u8).wrapping_offset(offset) as *mut FieldType; is a disaster from a readability and maintenance stand-point.

@mjbshaw

This comment has been minimized.

Copy link

mjbshaw commented Mar 22, 2019

I am a bit worried about implicitly inferring things that are this vital for UB.

I think (but could be wrong) UB could be avoided by making the (implicit or explicit) cast from a polymorphic reference to a &T/&mut T a warning or error. Going through the items listed in the motivation section of the RFC:

  1. Unaligned structs and packed fields. Creating a &T/&mut T should be detectable at compile time and could be turned into a hard error or warning.
  2. Uninitialized data. If people use MaybeUninit, they can't write &mut uninit.field as *mut _ anyway. They have to work with pointers (which forces them to use something like offset_of!, which is part of why I brought it up in my last comment, though I didn't expand on it very well).
  3. Computing the address/offset of a field. I hope these cases are made obsolete by an offset_of! macro being added to Rust's core library.

For each of these motivating cases, UB can be avoided.

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Mar 22, 2019

@mjbshaw Polymorphic references are a backwards compatibility mess. I tried. The problem is coercion, and type inference both.

// Must be typed as `Fn() -> &_`
let f = || &some.place;
// Must work, does this already coerce to reference now?
// Those branches may have potentially different lifetimes. Is this a problem?
let foo = if condition {
    &some.a
} else {
    &some.b
}
// Is this a reference? What if we had two breaks instead?
let foo = loop {
    ...
    if should_term {
        break val;
    }
};

// Has this now realized foo as a reference? Or was it already a reference before?
let bar = &foo.sub;
// What about now? Is `foo` a reference? Is `bar` a refernce? Both? Neither?
bar.method();
let partial = unsafe { &(*ptr).field };
// Technically, the reference is only realized here:
// But should safe code be allowed to do this? If not, how do we prevent it?
// Keep backwards compatibility in mind, this does work in current code.
partial.some_method();

// What if the call only occurs in one particular branch
if condition {
    // Is it fine if `condition` is always false when reference would be UB?
    // What if `partial` was supposed to be `&mut` which is not `Copy`?
    partial.method();
}

Uninitialized data. If people use MaybeUninit, they can't write &mut uninit.field as *mut _ anyway. They have to work with pointers (which forces them to use something like offset_of!, which is part of why I brought it up in my last comment, though I didn't expand on it very well).

But they can write &mut (*uninit).field as *mut. That's even exactly what reference desugar to and it becomes obvious if you read the resulting MIR of both. The only difference is that a) pointers don't auto-dereference b) pointer dereference needs unsafe block.

Computing the address/offset of a field. I hope these cases are made obsolete by an offset_of! macro being added to Rust's core library.

I have major problems with offset_of!:

  • This commits to a representation where every field starts at byte-offsets

  • Rust has algebraic data types that are seemingly not captured in offset_of. that is understandable, as it is, imho a hack, introduced in C as a 'least burden' on an ecosystem with a multitude of compilers. Rust should not have this problem and rather deliver a solution that works in its type system.

  • It still requires pointer-manipulation. Since offset_of! is a byte offset, you need to

    1. Cast the original pointer to *u8 (what are memory regions and DSTs anyways)
    2. Manually apply the byte offset (unsafe pointer::add or 'safe' pointer::wrapping_add)
    3. Cast the pointer back to the type of the selected member. And you better never change its type because all hell will silently break loose.

    Even C++ member pointers are nicer than that.

@mjbshaw

This comment has been minimized.

Copy link

mjbshaw commented Mar 22, 2019

// Must be typed as `Fn() -> &_`
let f = || &some.place;

When you say "must be typed" do you mean explicitly? Because I don't think explicit typing would be necessary. It would be Fn() -> _, where _ is later deduced to &_ or *const _.

// Must work, does this already coerce to reference now?
// Those branches may have potentially different lifetimes. Is this a problem?
let foo = if condition {
    &some.a
} else {
    &some.b
}

Sorry, I don't see the problem in this example.

// Is this a reference? What if we had two breaks instead?

Depends on what val is and what the break expressions are.

// Has this now realized foo as a reference? Or was it already a reference before?
let bar = &foo.sub;

I would say yes. Any attempt to access a field would cause foo to be typed as a reference, not a pointer.

// What about now? Is `foo` a reference? Is `bar` a refernce? Both? Neither?
bar.method();

I would say yes, bar is now a reference. Any attempt to access a field or a method call would cause the type to be deduced as a reference.

The only caveat is if method takes an arbitrary self type that is a pointer, in which case one could argue that bar could be deduced as a pointer. I personally would be okay with defaulting to making bar a reference and require an explicit type coercion to pointer if that is what is desired.

let partial = unsafe { &(*ptr).field };
// Technically, the reference is only realized here:
// But should safe code be allowed to do this? If not, how do we prevent it?
partial.some_method();

I'm okay with safe code doing that. If partial isn't valid to dereference, then I don't think ptr was valid to dereference in the first place.

But they can write &mut (*uninit).field as *mut.

I believe that is UB.

  • This commits to a representation where every field starts at byte-offsets

Rust already needs to do that, otherwise you couldn't pass a pointer to a field through FFI.

  • Rust has algebraic data types that are seemingly not captured in offset_of.

That's an interesting point. I'm not sure how useful this would be in practice, but I'll have to give it more thought.

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Mar 22, 2019

// Must be typed as `Fn() -> &_`
let f = || &some.place;

You must already fix the result type, because the equivalent of this is currently a valid program. For your case, you'd replace 0 with some reference operation. AsRef is implemented for &_ but not *const. If the result of the closure is subject to type inference, then the program should error just like the unknown integer, but it mustn't because of compatibility. Looser type inference + backwards compatibility = bad.

// Must work, does this already coerce to reference now?
// Those branches may have potentially different lifetimes. Is this a problem?
let foo = if condition {
    &some.a
} else {
    &some.b
}

If this already turns both into a true reference, you make the raw-concept a lot less useful in terms of type-inference at the same time as making the reference cast more implicit and at (imho) surprising locations. However, if you keep the result of that if-block to also depend on type inference, then match or loops with multiple breaks suddenly become a death-trap: One single result returning a true reference silently moves all your reference casts to a much stricter location.

This commits to a representation where every field starts at byte-offsets
Rust already needs to do that, otherwise you couldn't pass a pointer to a field through FFI.

This only commits to every currently availabe type to have a pointer representation that can be cast from their reference representation, but not that there may not be future types for which you can not construct a thin pointer or even a pointer at all (though the latter is maybe unlikely to not be possible). In fact, we already have different kinds of pointers, not all of which are ffi-safe and not all can be simply cast from a usize. offset_of however asserts that the information in the pointer representation of a field must be a proper subset of the pointer information of the containing struct. This is currently true because unsized types/DST contain exactly one member that has the same kind of additional information, namely the length of the referred to memory region. offset_of would more or less 'accidentally' be enough here because we can subtract the offset from the length while applying the offset. It's just even uglier than the thin pointer case.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Mar 29, 2019

I'm still trying to go back and wrap my head around the new "raw reference" operator. But in the meantime, a question: is there any plan, on the 2015 or 2018 edition, to make &foo as *const _ or &mut foo as *mut _ stop working entirely? (Even via a lint that becomes "deny"?) Because that pattern appears extensively throughout the ecosystem, and I think such a transition would be extremely disruptive even as part of an edition, let alone outside of one.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Mar 30, 2019

is there any plan, on the 2015 or 2018 edition, to make &foo as *const _ or &mut foo as *mut _ stop working entirely? (Even via a lint that becomes "deny"?) Because that pattern appears extensively throughout the ecosystem,

What do you mean by "stop working"? It will keep compiling for sure. The question is whether it has the semantics of creating an intermediate reference (that has to satisfy whatever invariants we want for these), or not.

The previous version of this RFC failed because we did not have enough evidence of uses of that pattern that would rely on not satisfying reference invariants to convince @Centril that it is worth special-casing that pattern. Nobody spoke up to provide more evidence, so I concluded I should take another route. Was that premature?
The current version of this RFC is compatible with both (special casing or not special casing that pattern). I am fine either way, but I'd appreciate if the lang team could come to an agreement here. ;) I feel a bit like being tossed back and forth.

I also assumed that having an explicit syntax for this would be welcomed by most everyone, if only because it makes this entire thing much easier to talk about. But @joshtriplett I get the vibe that you'd prefer only special-casing the pattern and not having dedicated syntax? I'd appreciate clarification. :)

EDIT: Ah, I think I understand what you are replying to here now. I forgot about the "unresolved question" concerning the lint level. I agree that if the lint ends up applying to &mut foo as *mut _ it cannot be deny.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.