Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC for an operator to take a raw reference #2582
Conversation
This comment has been minimized.
This comment has been minimized.
camlorn
commented
Nov 1, 2018
|
I've not been following Rust enough lately to judge this RFC from the technical side, but from the usability/ergonomics side, I think we should hide this semantic behind a magic function and/or macro, then tell people to use the macro. I think it's a bad idea to have a language construct where whether or not a part of an expression is stored in a variable changes defined behavior to undefined behavior. In this case I understand why we want the semantic. What I'm saying is do both, put the semantic in the rustonomicon or somewhere else suitably advanced, and direct people to the macro instead. |
Centril
added
T-lang
T-compiler
labels
Nov 1, 2018
This comment has been minimized.
This comment has been minimized.
|
MIR is not a part of the language, and therefore, this does not require an RFC, afaik. It's simply a way of encoding the semantics we want into the compiler. |
This comment has been minimized.
This comment has been minimized.
|
@ubsan From my reading of this RFC, this affects Rust's abstract machine (in the sense that it makes some things defined behavior and some things explicitly UB...) and makes certain things that weren't hard errors into hard errors. Thus, it requires an RFC. |
This comment has been minimized.
This comment has been minimized.
|
@Centril I disagree - it was always the case that we wanted |
Centril
assigned
nikomatsakis
Nov 1, 2018
This comment has been minimized.
This comment has been minimized.
|
I might not be able to keep track of the discussion but I should at least note down two things I thought of:
let r = &*p;
let p2 = r as *const _;
let p3 = r as *const _;let p2 = &*p as *const _;
let p3 = &*p as *const _;A more realistic scenario that this could enable is We might even be able to relax this so other operations involving the reference are also allowed, but I haven't fully considered the implications. |
This comment has been minimized.
This comment has been minimized.
|
I do believe an RFC is appropriate here -- but perhaps I mistake the purpose of the RFC. I presumed that the RFC is more about the surface Rust syntax that generates this new MIR operator than the operator itself. It seems to me that there are four main alternatives here:
Of these, I currently prefer the first. @RalfJung, am I correct that this would leave us room in the future to extend the rules to more cases (e.g., if coerced etc)? One open question mark for me is how best to manage the design process here. This applies to all the efforts of the Unsafe Code Guidelines work, in my view. I will leave a separate comment on how I think that should work, actually. |
This comment has been minimized.
This comment has been minimized.
|
Regarding the meta question of how the UCG group should be approaching RFCs. The challenge is that we have a lot of "little pieces" that have to be put together that affect one another. I don't think we should be working out a huge "all or nothing" proposal, but I also think it's hard to reason about random RFCs for one tiny piece without seeing the whole picture. I think I'd like to compromise somehow but having
I'm not entirely sure where these RFCs should be opened. I'd like to be trying out the "phase RFC" procedure, which I think basically implies that we are opening up RFCs on this repo as we move between "phases" -- so this RFC in particular might correspond to moving from the "proposal" (nee spitballing) to "prototyping" -- so it's not really a final decision. But anyway I suppose this isn't the right forum for this comment. Perhaps I should open an internals thread to talk about it. I think it'd be good to figure this out, though, as it seems relevant to a lot of things the UCG team will be producing. |
This comment has been minimized.
This comment has been minimized.
|
I would say the best thing would probably be that the result of the fn foo() -> &T {
&<invalid lvalue expression>
}
let x = &<invalid lvalue expression>;
fn bar(x: &T) {}
bar(&<invalid lvalue expression>); // note: this is UB at the call site, not in bar
&<invalid lvalue expression>;
// this is equivalent to `let __tmp = &t; drop(__tmp);
&<invalid lvalue expression> as &T as *const T
// this creates a value of reference type with the 'as &T' |
This comment has been minimized.
This comment has been minimized.
This RFC is mostly about changing/specifying those semantics. MIR is a good way to do that, as it provides a much clearer language to talk about what happens when a Rust program gets executed than if we tried to do the same thing in the surface language. This is not the first RFC to talk about MIR or use MIR as means of specification, either.
Which RFC/reference is saying that? |
This comment has been minimized.
This comment has been minimized.
I do not understand. This RFC is not primarily about the "cast" part of "take-ref-then-cast", it is about the "take-ref" part. Or are you saying that we should also compile
This is about defining the semantics, which starts with the syntax.
This RFC is first and foremost about adding such a primitive operation to the MIR -- acknowledging the need for it. Secondly, it is about how the user can write code that ends up generating this operation. It would certainly be easy to later add more cases where we take a raw reference instead of a safe one, that can only make more code defined.
I felt this is somewhat different from the kind of "let us decide about this type's layout/invariant"-RFC that we (UCG) will likely be producing eventually: The invariant this refers to is already encoded in rustc, and the RFC proposes to add a new ingredient to our "toolbox for defining surface language semantics", namely taking a raw borrow.
I and the RFC agree these should all be UB. More interesting would be some examples you think should not be UB that the RFC leaves UB. |
This comment has been minimized.
This comment has been minimized.
|
Another possibility, just so that it doesn't go unmentioned: Currently the This might be a way to codify the "result of the It's not obvious to me whether there would be backwards compatibility edge cases to worry about? (To be clear this neither would nor could entirely replace the reference-to-pointer coercion, as that applies to any expression, whereas this only applies to literals.) |
This comment has been minimized.
This comment has been minimized.
|
This seems really important to have. I even have a crate which already relies on this behaviour. One thing I am concerned about is the implicitness - it would be very easy for someone not familiar with this corner case to come along and accidentally refactor the code in a way which "broke" the guarantee, even with changes that would in most languages be semantically identical. As a way to solve this, would it make sense to relax the constraint on values being initialised if they are only used in this way? eg. start allowing: fn main() {
let x: u32;
let y = &x as *const _;
println!("{:?}", y);
}This would allow writing code that would turn into a hard error if you somehow tried to actually do something with the value, and would also make it possible to calculate field offsets from completely safe code: we would no longer have to do hacks with |
This comment has been minimized.
This comment has been minimized.
But the only cases where we'd not treat a borrow as being of a reference type is if it's only ever used as a raw pointer, in my value-use-based model. Anything else (including calling a function with the reference as an argument) would still impose reference requirements. I actually think @glaebhoerl's formulation of a polymorphic borrow operator is equivalent to mine in behavior but it might be easier to implement it as an analysis in order to construct the MIR (as I've suggested) instead of during type-checking. |
This comment has been minimized.
This comment has been minimized.
|
@RalfJung yeah, alright that makes sense. I don't believe there are any examples on which me and the RFC disagree, although if you can think of any, that'd be useful to know. I would like to see that list put in the RFC, if you don't mind, since I feel it gives a nice overview from a Rust pov; I also don't feel the transform is defined well enough I think discussing "binding to reference values" is probably more useful, instead of talking about "the same statement", since that would make your let x: *const T = {
let y: &T = &*null;
y as *const T
};since It's also important to think about when the coercion actually occurs - for example, does let x: *const T = {&*null};translate to let x: *const T = {&*null} as *const T;
// or
let x: *const T = {(&*null) as *const T};if the former, it should be UB. If the latter, it should not be. |
This comment has been minimized.
This comment has been minimized.
Certainly we could not today just make I do sort of like the idea of making |
This comment has been minimized.
This comment has been minimized.
Yes. I'm not sure I agree about the relative importance of those two points, but it doesn't really matter very much. I would argue that simply from a backwards compatibility point of view we really want to make There seem to be two "basic ways" we can make such code work:
I don't think anybody has a good idea how to make the dynamic idea work. It seems "imaginable", though. To start, however, we'd probably have to remove the various annotations we give to LLVM. Presuming that we are going to go with the static option, then we return to: what is this static subset? As I wrote above, I think that for backwards compat reasons it basically has to include cases where the It is conceivable that we might go further and add an explicit Rust syntax for this. I had a strawman proposal of Does all that make sense? Do we all agree that |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis I would argue that it should be valid for all (possibly-invalid) lvalue expressions, since we have no guarantees on raw pointers - i.e., |
This comment has been minimized.
This comment has been minimized.
But in
It seems to me that my proposal is a prerequisite for yours. You are also suggesting that there be a way to create a raw pointer to a field without creating an intermediate reference. We need a way to represent your inference after some kind of desugaring -- we need a primitive operation to "take a raw reference". You are just going further than I did in terms of when we use that operation, i.e. when we take a raw reference vs a safe one. I will add a remark to the RFC saying that we might want to use the new operation for more cases. But I do not see a way to realize any of the proposals (by @glaebhoerl and @eddyb) without having this new operation that is distinct from any operation we can express so far; and if we do have such an operation it should explicitly show up in the MIR. Making such things explicit is part of what MIR is about. |
This comment has been minimized.
This comment has been minimized.
|
My problem with this stronger inference proposed by @glaebhoerl is that if someone relies on this behavior, there is a danger of accidentally adding a non-raw-ptr use to a reference, which would then rather subtly make the program have UB. If we say that you have to cast immediately, things cannot be correct for "subtle" reasons. But I guess we could have a lint against any "taking a raw reference" that is not immediately followed by a cast: Then, more existing code works (because we take raw references in more cases), but it is less likely that people will accidentally break their code because they relied on this behavior. We might even make this an err-by-default lint after some transition period?
I never said that the new operation is used when the cast happens in the same statement. I said that it is used when the reference is "immediately cast [...] to a raw pointer":
Anyway, I will add some examples.
That is an interesting example indeed. During my experiments, I noticed a similar problem with implicit coercions, namely coercion
Not sure which part of @nikomatsakis posts' you are referring to here, but assuming |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis Very well put, I do not have anything to add. :) |
This comment has been minimized.
This comment has been minimized.
|
@RalfJung What I'm suggesting is a pre-MIR analysis, not considering those copies. Alternatively, a dataflow analysis on MIR, where copies are considered noops, and which rewrites the MIR to "weaken" borrows, as needed. Something I have not considered is interaction with mutable state, but I suspect dataflow analysis would be able to understand that.
Yes, I never said anything about the MIR operation not being needed, but rather that the syntactic condition for producing it, could be relaxed to something more general. |
This comment has been minimized.
This comment has been minimized.
I see, okay. I do not fully understand which syntactic condition you have in mind, but if the result is that at some point (pre-borrowck, I would guess) we have the result of this inference encoded explicitly in the MIR, then I think I am fine. I'd prefer this condition to be as simple and hence predictable as possible, but I'd be basically satisfied with any syntactic condition. |
This comment has been minimized.
This comment has been minimized.
I didn't mean to imply otherwise :)
It seems we are concerned about different things, both of which however seem worth being concerned about:
I guess the latter is easier to lint against than the former. But otherwise as long as we wish to re-use the
(Once again, this is in the spirit of "so that the option doesn't go unmentioned", and I'm not sure how well I like it.) How about, as a (drawback) slightly jarring but otherwise (advantage) extremely simple and straightforward alternative: It is slightly jarring because of course |
This comment has been minimized.
This comment has been minimized.
|
As a non-expert on all of these issues who's hoping for Rust to become the first language with UB where non-wizards can actually figure out whether or not they're invoking UB, I'm strongly in favor of only making the |
This comment has been minimized.
This comment has been minimized.
|
We don't warn against |
This comment has been minimized.
This comment has been minimized.
|
Why would anyone write that code? What people were writing before was: Then we added a warning saying that that code probably has UB, and that they should write So why would anyone upgrade their code to use an What people have done instead, is copy the struct fields out of the struct into aligned storage, operate on them, and then copy them back. This shouldn't be necessary, but that's the only reasonable thing to do here in Rust today. |
This comment has been minimized.
This comment has been minimized.
|
So if we do a crater run today, the only thing I expect to see is the code that still uses If we wanted to look up for anything meaningful, we would need to be looking out for code that unnecessarily copies fields of a packed struct into aligned storage, just to create a reference from them, or patterns like that. This is the kind of code that |
This comment has been minimized.
This comment has been minimized.
Because it is the easiest way to silence the warning.
Have they? Can you point at projects (that you are not involved in) that did this change? |
This comment has been minimized.
This comment has been minimized.
I can only point at projects I know, and have already pointed at them. Those either copy the fields back and forth, or just use
The easiest way to silence the warning is to just write: FYI I'm unsubscribing here, this problem has existed for so long that I wonder whether its worth fixing. The status quo allows |
This comment has been minimized.
This comment has been minimized.
(The Well, the crater run showed that not many people do
Not really, the status quo says this is UB but we don't exploit that. We are trying to gather data for how often this gets used despite being UB, and it seems the answer is "not very often", at least when you ignore code that is UB for other reasons. |
This comment has been minimized.
This comment has been minimized.
Everybody agrees that we want to support this somehow. Based on the crater data I think there's little justification for baking it into the concrete syntax as proposed. The number of regressions in total are quite few and even fewer would be the cases helped by this specific RFC.
Yes that would be my strong preference. I think it is clean and easy-to-reason-about surface syntax that is even more ergonomic than However, this is quite a different RFC and I think we should re-run the FCP process for that (cc @nikomatsakis).
I would be willing to accept this on a temporary basis together with linting under the understanding (which will need to make itself into the RFC text) that it will eventually be removed in favor of the new syntax.
This is exactly something we don't want to happen and is a reason that |
This comment has been minimized.
This comment has been minimized.
|
Backing up a bit here, it seems like there is no consensus for the RFC as-is, due to a lack of evidence. Absence of evidence is not evidence of absence; there is still no data on code that uses references to uninitialized data. I am not even that worried about such code because I think such code should be legal anyway; it is @Centril who proposes to make such code illegal -- hence I am somewhat surprised that he is also objecting this RFC, which is the only avenue that we know of to tell people today how to write code that will be compatible with the model he is proposing. At the same time, we came up with what seems like a possible syntax for creating raw references: I imagine under this RFC code like I think the latter point is valuable on its own. Without this special case, once/if |
This comment has been minimized.
This comment has been minimized.
|
(I'll address other bits later... but for now:)
@rfcbot cancel |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Mar 18, 2019
|
@Centril proposal cancelled. |
rfcbot
removed
proposed-final-comment-period
disposition-merge
labels
Mar 18, 2019
RalfJung
added some commits
Mar 21, 2019
This comment has been minimized.
This comment has been minimized.
I have done that. The RFC is now mainly about introducing a new syntax Please read this and tell me what you think! |
This comment has been minimized.
This comment has been minimized.
|
On March 21, 2019 11:54:39 AM PDT, Ralf Jung ***@***.***> wrote:
> I think I should expand the RFC to make that syntax the main concern.
I have done that. The RFC is now mainly about introducing a new syntax
`&raw [const|mut]` to create a raw pointer, and makes the desugaring of
references immediately cast into raw pointers just an option.
Please read this and tell me what you think!
That very much changes the nature of this RFC. A MIR operation and a pattern that turned into that operation seemed fine. That wasn't previously a proposal for surface syntax.
|
This comment has been minimized.
This comment has been minimized.
|
A surface syntax was previously mentioned as a "future possibility". However, there were enough reservations against the pattern, while there was also general agreement that an operation like this is needed, and a potentially workable syntax was discovered during the discussion, that I felt a dedicated syntax is the better approach. |
HeroicKatora
referenced this pull request
Mar 22, 2019
Open
RFC for a match based surface syntax to get pointer-to-field #2666
This comment has been minimized.
This comment has been minimized.
HeroicKatora
commented
Mar 22, 2019
|
Somewhat caught me offguard, I was under the impression you want to leave surface syntax open for discussion. I had been looking into using |
This comment has been minimized.
This comment has been minimized.
mjbshaw
commented
Mar 22, 2019
|
Personally, I'd love to see @glaebhoerl's idea adopted instead of new surface syntax. If that's infeasible, I think I might prefer the alternative option of using an let uninit: MaybeUninit<T> = MaybeUninit::uninit();
let offset = offset_of!(T, field);
let field_ptr = (uninit.as_mut_ptr() as *mut u8).wrapping_offset(offset) as *mut FieldType;It's a bit verbose, but a small user-level macro could make it concise. |
RalfJung
changed the title
RFC for a MIR operator to take a raw reference
RFC for an operator to take a raw reference
Mar 22, 2019
This comment has been minimized.
This comment has been minimized.
So that would be basically inferring as part of type inference whether
So you are suggesting we entirely stop using the type system when it comes to writing subtle unsafe code? I cannot see how that is a good idea. |
This comment has been minimized.
This comment has been minimized.
mjbshaw
commented
Mar 22, 2019
I think (but could be wrong) UB could be avoided by making the (implicit or explicit) cast from a polymorphic reference to a
For each of these motivating cases, UB can be avoided. |
This comment has been minimized.
This comment has been minimized.
HeroicKatora
commented
Mar 22, 2019
•
|
@mjbshaw Polymorphic references are a backwards compatibility mess. I tried. The problem is coercion, and type inference both.
But they can write
I have major problems with
|
This comment has been minimized.
This comment has been minimized.
mjbshaw
commented
Mar 22, 2019
•
When you say "must be typed" do you mean explicitly? Because I don't think explicit typing would be necessary. It would be
Sorry, I don't see the problem in this example.
Depends on what
I would say yes. Any attempt to access a field would cause
I would say yes, The only caveat is if
I'm okay with safe code doing that. If
I believe that is UB.
Rust already needs to do that, otherwise you couldn't pass a pointer to a field through FFI.
That's an interesting point. I'm not sure how useful this would be in practice, but I'll have to give it more thought. |
This comment has been minimized.
This comment has been minimized.
HeroicKatora
commented
Mar 22, 2019
•
You must already fix the result type, because the equivalent of this is currently a valid program. For your case, you'd replace
If this already turns both into a true reference, you make the raw-concept a lot less useful in terms of type-inference at the same time as making the reference cast more implicit and at (imho) surprising locations. However, if you keep the result of that
This only commits to every currently availabe type to have a pointer representation that can be cast from their reference representation, but not that there may not be future types for which you can not construct a thin pointer or even a pointer at all (though the latter is maybe unlikely to not be possible). In fact, we already have different kinds of pointers, not all of which are ffi-safe and not all can be simply cast from a |
This comment has been minimized.
This comment has been minimized.
|
I'm still trying to go back and wrap my head around the new "raw reference" operator. But in the meantime, a question: is there any plan, on the 2015 or 2018 edition, to make |
This comment has been minimized.
This comment has been minimized.
What do you mean by "stop working"? It will keep compiling for sure. The question is whether it has the semantics of creating an intermediate reference (that has to satisfy whatever invariants we want for these), or not. The previous version of this RFC failed because we did not have enough evidence of uses of that pattern that would rely on not satisfying reference invariants to convince @Centril that it is worth special-casing that pattern. Nobody spoke up to provide more evidence, so I concluded I should take another route. Was that premature? I also assumed that having an explicit syntax for this would be welcomed by most everyone, if only because it makes this entire thing much easier to talk about. But @joshtriplett I get the vibe that you'd prefer only special-casing the pattern and not having dedicated syntax? I'd appreciate clarification. :) EDIT: Ah, I think I understand what you are replying to here now. I forgot about the "unresolved question" concerning the lint level. I agree that if the lint ends up applying to |
RalfJung commentedNov 1, 2018
•
edited
Introduce new variants of the
&operator:&raw mut <place>to create a*mut <T>, and&raw const <place>to create a*const <T>. This creates a raw pointer directly, as opposed to the already existing&mut <place> as *mut _/&<place> as *const _, which create a temporary reference and then cast that to a raw pointer. As a consequence, the existing expressions<term> as *mut <T>and<term> as *const <T>where<term>has reference type are equivalent to&raw mut *<term>and&raw const *<term>, respectively. Moreover, add a lint to existing code that could use the new operator, and treat existing code that creates a reference and immediately casts or coerces it to a raw pointer as if it had been written with the new syntax.As an option, we could treat
&mut <place> as *mut _/&<place> as *const _as if they had been written with&rawto avoid creating temporary references when that was likely not the intention.Rendered
The RFC got half-rewritten; click here to jump to the beginning of the post-rewrite discussion.
Cc @Centril @rust-lang/wg-unsafe-code-guidelines