Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coercing &mut to *const should not create a shared reference #56604

Open
RalfJung opened this issue Dec 7, 2018 · 58 comments
Open

Coercing &mut to *const should not create a shared reference #56604

RalfJung opened this issue Dec 7, 2018 · 58 comments
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Dec 7, 2018

It has long been a rule in Rust that you must not mutate through a shared reference, or a raw pointer obtained from a shared reference.

Unfortunately, that rule currently forbids the following code:

fn direct_mut_to_const_raw() {
    let x = &mut 0;
    let y: *const i32 = x;
    unsafe { *(y as *mut i32) = 1; }
    assert_eq!(*x, 1);
}

The reason for this is that coercing &mut T to *const T implicitly first creates a shared reference and then coerces that to *const T, meaning y in the example above is technically a raw pointer obtained from a shared reference.

We should fix our coercion logic to no longer create this intermediate shared reference.

See #56161 for how we uncovered this problem.

Cc @eddyb @nikomatsakis

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2018

I believe the relevant code might be

coerce_mutbls(mt_a.mutbl, mutbl_b)?;
// Although references and unsafe ptrs have the same
// representation, we still register an Adjust::DerefRef so that
// regionck knows that the region for `a` must be valid here.
if is_ref {
self.unify_and(a_unsafe, b, |target| {
vec![Adjustment {
kind: Adjust::Deref(None),
target: mt_a.ty
}, Adjustment {
kind: Adjust::Borrow(AutoBorrow::RawPtr(mutbl_b)),
target
}]
})
} else if mt_a.mutbl != mutbl_b {
self.unify_and(a_unsafe, b, simple(Adjust::MutToConstPointer))
} else {
self.unify_and(a_unsafe, b, identity)
}

but I am not sure because this is type checking. I have no idea where the code that decides about lowering of coercions to reborrows/casts lives.

@eddyb
Copy link
Member

eddyb commented Dec 7, 2018

I have no idea where the code that decides about lowering of coercions to reborrows/casts lives.

What do you mean? That code is generating a deref and a borrow, each of those gets turned into the equivalent MIR later, and the result is as if the user wrote &* explicitly.

@eddyb
Copy link
Member

eddyb commented Dec 7, 2018

@nikomatsakis I believe the comment above if is_ref { has been outdated for quite a while (and makes no sense for NLL), can you confirm?

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2018

@eddyb that code generates "adjustments". I have no idea what that means. I found no use of AutoBorrow::RawPtr that would turn this into casts or reborrows or so.

@eddyb
Copy link
Member

eddyb commented Dec 7, 2018

Oh I got confused as to what this is doing, I missed the AutoBorrow::RawPtr bit, looks like "borrow" and "cast reference to raw pointer" are fused into one adjustment. The lowering is here, btw:

Adjust::Borrow(AutoBorrow::RawPtr(m)) => {
// Convert this to a suitable `&foo` and
// then an unsafe coercion. Limit the region to be just this
// expression.
let region = ty::ReScope(region::Scope {
id: hir_expr.hir_id.local_id,
data: region::ScopeData::Node
});
let region = cx.tcx.mk_region(region);
expr = Expr {
temp_lifetime,
ty: cx.tcx.mk_ref(region,
ty::TypeAndMut {
ty: expr.ty,
mutbl: m,
}),
span,
kind: ExprKind::Borrow {
region,
borrow_kind: m.to_borrow_kind(),
arg: expr.to_ref(),
},
};
let cast_expr = Expr {
temp_lifetime,
ty: adjustment.target,
span,
kind: ExprKind::Cast { source: expr.to_ref() }
};

So looking again at the testcase, y as *mut i32 still goes through the reference->raw pointer coercion logic, but it does so for coercing &mut T to *mut T which does a mutable reborrow instead of an immutable reborrow (and then there's a separate *mut T to *const T coercion).

Changing AutoBorrow::RawPtr(mutbl_b) to AutoBorrow::RawPtr(mt_a.mutbl) would work, but you then also need to push Adjust::MutToConstPointer to that vec![...], if mt_a.mutbl != mutbl_b.

EDIT: this may be backwards incompatible if the mutable reference being coerced was also already immutably borrowed, since you'd be introducing a mutable reborrow.

Maybe we can avoid reborrows altogether here? But I'd leave that to @nikomatsakis.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2018

Changing AutoBorrow::RawPtr(mutbl_b) to AutoBorrow::RawPtr(mt_a.mutbl) would work, but you then also need to push Adjust::MutToConstPointer to that vec![...], if mt_a.mutbl != mutbl_b.

I did the first but not the last and tests seem to still pass...^^

But yeah, there is a compatibility problem with outstanding shared references. Ouch.

@eddyb
Copy link
Member

eddyb commented Dec 7, 2018

Without Adjust::MutToConstPointer the MIR may end up malformed (using *mut T where *const T is expected), I'm surprised you don't get any errors!

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2018

using *mut T where *const T is expected

Yeah I figured. run-pass and ui tests all pass, so either the change did nothing or nothing checks the MIR^^

For the backwards compatibility issue:
I think my inclination is that we don't want to reborrow on a cast-to-raw. That would also make testing Stacked Borrows in miri easier, these implicit unavoidable reborrows are a pain. :P We'd have to make sure though that the borrow checker understands that after a (x: &mut T) as *mut T, x is still there -- it hasn't been moved away.

@eddyb
Copy link
Member

eddyb commented Dec 7, 2018

I guess the tricky bit with making it a cast is that what we have to work with is Operand::{Move,Copy}.

Instead, we could just add to the MIR a sort of "borrow to raw pointer" (a lot like AutoBorrow::RawPtr, really), and then we just need to make sure the old borrowck doesn't treat even a mutable AutoBorrow::RawPtr like a real borrow.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 8, 2018

"borrow to raw pointer"

You mean like rust-lang/rfcs#2582? :D

@eddyb
Copy link
Member

eddyb commented Dec 8, 2018

@RalfJung Yupp, that's what I mean, that seems like the perfect solution here.

@RalfJung RalfJung closed this as completed Dec 9, 2018
@RalfJung RalfJung reopened this Dec 9, 2018
@RalfJung
Copy link
Member Author

RalfJung commented Dec 9, 2018

(Sorry, that was the wrong button.)

Notice that the lint discussed in rust-lang/rfcs#2582, at least when implemented on the MIR, would actually flag these reborrows that are in our way now: The new reference is created just to turn it into a raw pointer, it has no other use.

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Dec 11, 2018
@RalfJung
Copy link
Member Author

Instead, we could just add to the MIR a sort of "borrow to raw pointer"

Following this idea, I have updated miri to treat the case of "escaping a ptr to be usable for raw accesses" as just another reborrow: Both trigger a retag; the retag after a cast-to-raw is a bit special because it also retags raw pointers.

So, I am convinced now that the best way forward is to entirely ditch ref-to-raw casts, and encode them all as reborrow-to-raw.

@RalfJung
Copy link
Member Author

@RalfJung
Copy link
Member Author

This is actually more subtle than I thought. The thing that I did not know when discovering this problem is that the borrow checker treats *const and *mut different. The following is safe code:

let x = &mut 0;
let shared = &*x;
let y = x as *const i32; // if we use *mut here instead, this stops compiling
let _val = *shared;

IOW, the borrow checker considers a cast to a const raw pointer as just a read access. Also see rust-lang/unsafe-code-guidelines#106 and Cc @matthewjasper who brought this to my attention.

This means that if we pull through with the plan to make x as *const i32 (where x is a mutable reference) behave like x as *mut i32 as *const i32, that code will stop compiling! Ouch.

So, everybody in rust-lang/unsafe-code-guidelines#106 (including me and @nikomatsakis but also in particular @SimonSapin) seemed to feel that the following code should be allowed (that's from the OP here):

// example 1
let x = &mut 0;
let y: *const i32 = x;
unsafe { *(y as *mut i32) = 1; }

But how do we all feel about this code?

// example 2
let x = &mut 0;
let shared = &*x;
let y: *const i32 = x;
let _val = *shared;
unsafe { *(y as *mut i32) = 1; }

Here, we cannot possibly maintain the idea that *const vs *mut does not matter, because the code stops compiling if we use *mut instead.

Does that mean that we are fine with example 2 being UB? And if it does, does that change our position about example 1?

My opinion

Personally I am less sure now about example 1, and feel maybe it should actually be UB, after all. The reason for this is that it leads to the cleanest model, with fewer cases to consider. I will go into several possible models for a bit here.

Clean and strict

The IMO most clean model (with the usual caveat that we'll not know how clean this ends up being until it got implemented and tested) is that the behavior of cast-to-raw depends only on the type of the raw pointer (mut vs const). We would have a rule that "writing through a raw pointer that was originally created as a *const is UB" (except for UnsafeCell). This is nice because it would replace/subsume the current rule that "writing through a raw pointer created from a shared reference is UB" (UnsafeCell yada yada), since shared references can only be cast to *const -- so it would be very consistent in that regard.

Basically, x as *const T would be the same as &*x as *const T, including all quirks around UnsafeCell.

However, this would rule out the example in the OP, and run counter the idea (that I repeated a lot) that *const vs *mut should not matter.

Extrapolate from current behavior

Currently, we accept example 1 if it gets changed to let y: *const i32 = x as *mut i32;. Applying the same rewrite to example 2 does not work, the borrow checker rejects that. Stacked Borrows ignores *const vs *mut and just checks whether the location is currently frozen; if it is, then the resulting raw pointer cannot be used for writing. If we want to continue along this line, we do have to consider a &mut to *const cast as a primitive operation instead of being composed of two steps, &mut to *mut and then *mut to *const. So we'd have three reference-to-raw-pointer-casts as three distinct primitive operations:

  • &mut to *mut: Can always be written to (so memory gets unfrozen and a Raw tag pushed, in the model).
  • &mut to *const: Can be written to (except UnsafeCell) only if there are no outstanding shared references. If memory is not frozen, we push a Raw and you can write through this, but if memory is frozen we just allow read access.
  • & to *const: Cannot be written to (except UnsafeCell).

This would allow example 1. disallow example 2, and it would also disallow the following:

// example 2
let x = &mut 0;
let shared = &*x;
let y: *const i32 = x;
unsafe { *(y as *mut i32) = 1; }

shared gets never used again, but we cannot know that when y gets created so we cannot make y a writable pointer. This might seem strange because here, the rewrite to let y: *const i32 = x as *mut i32; actually works, but that is because the borrow checker "knows the future" and determines that shared does not get used again. We cannot do this in a dynamic semantics such as Stacked Borrows.

Two-phase

@matthewjasper suggested it might be possible to treat cast-to-const as something like a two-phase borrow. I don't think that makes me happy.^^

@matthewjasper
Copy link
Contributor

@matthewjasper suggested it might be possible to treat cast-to-const as something like a two-phase borrow. I don't think that makes me happy.^^

No, I'm was saying that a model that accepts both example 1 and 2 would effectively be modelling two-phase borrows. I can't see any way to allow this without complicating the model significantly.

@elichai
Copy link
Contributor

elichai commented Mar 11, 2020

I'd love to see this change.
The current status is too foot gunny IMHO.

I really like the current status of rust that pointers are basically numbers. and they only matter when you dereference, and even then we have no strict alias rules etc. just alignment+size.(and data must be defined which might have implications with padding).

@RalfJung
Copy link
Member Author

RalfJung commented Mar 11, 2020

I really like the current status of rust that pointers are basically numbers

That is not true at all I am afraid. Even if you are talking about raw pointers only. See this brief introduction to pointer provenance. Also, the fact that pointers have provenance is entirely off-topic here.

"Pointers are just numbers" is plain impossible with LLVM as a backend.

I'd love to see this change.

So I am wondering now what you mean by "this". My assumption of course is that you mean this issue, i.e. specifically casting a mutable reference to a const raw pointer. But then you go on talking about things way broader than that (pointer provenance in general) and entirely out of scope for this issue, so I am confused.

Also the reason this issue is open is that we agree it's a foot-gun, what we are looking for is a good solution.

@elichai
Copy link
Contributor

elichai commented Mar 11, 2020

That is not true at all I am afraid. Even if you are talking about raw pointers only. See this brief introduction to pointer provenance. Also, the fact that pointers have provenance is entirely off-topic here.

Will read. I heard that a lot so it was more of a quote for me, but interesting to understand. thanks

Also the reason this issue is open is that we agree it's a foot-gun, what we are looking for is a good solution.

Yes, I mostly wanted to say this because I encountered this UB in the past, mostly in types similar to NonNull, that accepted *const T instead of accepting *mut T and casting. and then you'd do accepts_const_ptr(&mut something); without realizing it casts it to a shared reference.

@RalfJung
Copy link
Member Author

(and the compiler tries to insert the most permissive borrows it can, when the borrow is inserted as a result of a coercion)

One surprising consequence of this is that x as *const _ and &raw const *x are not the same. Or maybe that's just surprising to me?

@nikomatsakis
Copy link
Contributor

I agree that is somewhat surprising.

@camelid camelid added the A-raw-pointers Area: raw pointers, MaybeUninit, NonNull label Oct 31, 2021
tmiasko added a commit to tmiasko/rust that referenced this issue Nov 3, 2021
The exact set of permissions granted when forming a raw reference is
currently undecided rust-lang#56604.

To avoid presupposing any particular outcome, adjust the const
qualification to be compatible with decision where raw reference
constructed from `addr_of!` grants mutable access.
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 3, 2021
…li-obk

`addr_of!` grants mutable access, maybe?

The exact set of permissions granted when forming a raw reference is
currently undecided rust-lang#56604.

To avoid presupposing any particular outcome, adjust the const
qualification to be compatible with decision where raw reference
constructed from `addr_of!` grants mutable access.

Additionally, to avoid keeping `MaybeMutBorrowedLocals` in sync with
const qualification, remove it. It's no longer used.

`@rust-lang/wg-const-eval`
@RalfJung
Copy link
Member Author

@chorman0773 noticed that &mut [0i32] as *const i32 actually generates a &raw mut, while &mut 0i32 as *const i32 generates a &raw const leading to this issue -- that seems odd? I have no idea what happens here during MIR building though.

@joshlf
Copy link
Contributor

joshlf commented May 12, 2024

I've suggested adding a lint for this: rust-lang/rust-clippy#12791

@briansmith
Copy link
Contributor

fn direct_mut_to_const_raw() {
    let x = &mut 0;
    let y: *const i32 = x;
    unsafe { *(y as *mut i32) = 1; }
    assert_eq!(*x, 1);
}

The reason for this is that coercing &mut T to *const T implicitly first creates a shared reference and then coerces that to > *const T, meaning y in the example above is technically a raw pointer obtained from a shared reference.

We should fix our coercion logic to no longer create this intermediate shared reference.

ptr::from_ref(r) is documented as "This is equivalent to r as *const T" and ptr::from_ref takes a shared reference as its parameter. Thus we are free to write any r as *const T as ptr::from_ref(r). When r is a &mut T, this will coerce r to a shared reference. Consequently, r as *const T most also coerce r to a shared reference. Otherwise, ptr::from_ref would not be equivalent to it.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 1, 2024 via email

@briansmith
Copy link
Contributor

ptr::from_ref(r) is documented as "This is equivalent to r as *const T" and ptr::from_ref takes a shared reference as its parameter. Thus we are free to write any r as *const T as ptr::from_ref(r). 
This argument holds only if r has argument type &T.

There are various ways of reading the documentation. One way is to assume that r is qualitied by the r: &T argument in the signature of the function. The other is to read the statement independently. Not everybody is going to read it the same way as you. And, more importantly, people have already been refactoring code based on the other reading.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 2, 2024

Given that the statement contains r, I didn't see any other possible interpretation than to take the domain of quantification of r from the type signature. There are obvious counterexamples otherwise: for r: *mut T, one of the options does not even compile, so clearly they are not equivalent.

Looks like we should clarify the docs then.

@briansmith
Copy link
Contributor

briansmith commented Jun 2, 2024

The reason for this is that coercing &mut T to *const T implicitly first creates a shared reference and then coerces that to *const T, meaning y in the example above is technically a raw pointer obtained from a shared reference.

As I mentioned in #125897, there are at multiple interpretations of r as *const T for r: &mut T:

  1. ((r as *mut T) as *const T) using the transitive rule with rule &mut T to *mut T and then rule *mut T to *const T.
  2. ((r as &T) as *const T using transitive rule with rule &mut T to &T and then rule &T to *const T.
  3. Deref::deref(r) as *const T using transitive rule with the rule for Deref and then the rule &T to *const T

Maybe the solution here is to precisely define the precedence of the rules.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 2, 2024

That is indeed exactly the open question that is tracked here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants