Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about: use-after-move and (maybe) use-after-drop #188

Open
RalfJung opened this issue May 31, 2019 · 40 comments
Open

What about: use-after-move and (maybe) use-after-drop #188

RalfJung opened this issue May 31, 2019 · 40 comments
Labels
A-memory Topic: Related to memory accesses A-uninitialized Topic: Related to uninitialized memory C-open-question Category: An open question that we should revisit S-pending-design Status: Resolving this issue requires addressing some open design questions

Comments

@RalfJung
Copy link
Member

RalfJung commented May 31, 2019

Miri currently considers Copy and Move the same operation. There are some proposals in particular by @eddyb to make them not the same: it might be beneficial for optimizations to be able to rely on the data not being read again after a Move. This would correspond to replacing the data by Undef. It seems reasonable to do similar things for drop -- though that requires a precise definition of what is actually considered a drop (does that include just the Drop terminator or also calls to drop_in_place?).

(Miri concern: One reason why I am a bit hesitant about this is that this makes read have a side-effect, so we'd have to make all read methods in Miri take the interpreter context by &mut self. Reads already have a side-effect in Stacked Borrows but that is fairly local and we just use a RefCell there.)

@eddyb what would be such optimizations that benefit from this, where StorageDead is not happening so we need to rely on Move?

Potential blockers that would prevent deinit-on-move:

Reasons to do deinit-on-move:

Testcases:

@eddyb
Copy link
Member

eddyb commented Jun 1, 2019

I was actually betting on having a Move invalidate borrows, I'm not sure it actually helps optimizations to consider the value as undef.

But that doesn't appear to be happening, so this would at most be a correctness check.

As for reads becoming &mut - isn't Operand relatively high-level? Or is the problem that the actual read is performed way after evaluating the Operand into its miri form?

@RalfJung
Copy link
Member Author

RalfJung commented Jun 2, 2019

I was actually betting on having a Move invalidate borrows, I'm not sure it actually helps optimizations to consider the value as undef.

Oh, interesting. I suppose we could treat a Move as a write to the source. A write to a local invalidates all pointers to that local.

As for reads becoming &mut - isn't Operand relatively high-level? Or is the problem that the actual read is performed way after evaluating the Operand into its miri form?

Not sure what you mean. The actual read is performed by methods in operand.rs that take an interpret::Operand and &self.

We'd have to preserve on an interpret::Operand whether this was a "copy" or "move" operand, and then still change all those methods to &mut self to do the writing. I was so happy when we were able to make them all &self.^^

@RalfJung
Copy link
Member Author

RalfJung commented Jun 2, 2019

If we make it just act like a write for Stacked Borrows, but not actually change memory, that would work with &self though. Seems like a rather hacky solution though, I'd rather not pollute the spec with such concerns.

StorageDead has the effect you want. But I guess there are examples where that is not appropriate?

@oli-obk
Copy link

oli-obk commented Jul 11, 2019

We could run a second visitor after the evaluation of a statement whose sole purpose is to find Operand::Moves and clear the moved from memory.

We can't use StorageDead for partial moves

@RalfJung
Copy link
Member Author

It would be hard to guarantee that evaluating the statement did not end up changing what those Operand::Move evaluate to. MIR is not pure enough for this.

For partial moves, how would that ever have the effect of invalidating pointers? The neighboring fields have to stay in place after all!

@RalfJung
Copy link
Member Author

To mirror what I said on Zulip: I would like to see an example of an optimization that this enables. Since opening this issue, I've come around to realize that the examples where I thought it would help, are actually not valid.

@RalfJung RalfJung transferred this issue from rust-lang/miri Aug 2, 2019
@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2019

I moved this to the UCG repo because really this is a question of what the semantics should be, not how to implement it in Miri.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2019

Jim Blandy asked via email if similar things to what has been proposed for Move, apply to ptr::read: should that leave the memory from which things came as Undef? I think for Copy types the answer is definitely "no", but other types, if we decide to make Move "kill" the memory it came from, we might or might not want to do the same for ptr::read.

@RalfJung RalfJung changed the title Properly check for use-after-move and (maybe) use-after-drop What about: use-after-move and (maybe) use-after-drop Aug 2, 2019
@RalfJung RalfJung added the C-open-question Category: An open question that we should revisit label Aug 2, 2019
@RalfJung RalfJung added A-memory Topic: Related to memory accesses A-uninitialized Topic: Related to uninitialized memory labels Aug 14, 2019
m-ou-se added a commit to m-ou-se/rust that referenced this issue Dec 30, 2020
…r=m-ou-se

Do not create dangling &T in Weak<T>::drop

Since at this point all strong pointers have been dropped, the wrapped `T` has also been dropped. As such, creating a `&T` to the dropped place is negligent at best (language UB at worst). Since we have `Layout::for_value_raw` now, use that instead of `Layout::for_value` to avoid creating the `&T`.

This does have implications for custom (potentially thin) DSTs, though much less severe than those discussed in rust-lang#80407. Specifically, one of two things has to be true:

- It has to be possible to use a `*const T` to a dropped (potentially custom, potentially thin) unsized tailed object to determine the layout (size/align) of the object. This is what is currently implemented (though with `&T` instead of `&T`). The validity of reading some location after it has been dropped is an open question IIUC (rust-lang/unsafe-code-guidelines#188) (except when the whole type is `Copy`, per `drop_in_place`'s docs).
  In this design, custom DSTs would get a `*mut T` and use that to return layout, and must be able to do so while in the "zombie" (post-drop, pre-free) state.
- `RcBox`/`ArcInner` compute and store layout eagerly, so that they don't have to ask the type for its layout after dropping it.

Importantly, this is already true today, as you can construct `Rc<DST>`, create a `Weak<DST>`, and drop the `Rc` before the `Weak`. This PR is a strict improvement over the status quo, and the above question about potentially thin DSTs will need to be resolved by any custom DST proposal.
@RalfJung
Copy link
Member Author

RalfJung commented Jan 8, 2021

Regarding use-after-drop: as @CAD97 pointed out, rc::Weak is making the assumption that size_of_val_raw can be called on some value after it has been drop_in_placed. If we ever have thin DSTs with a vtable in the pointee, that would entail reading the vtable pointer after the object has been dropped.

@RalfJung
Copy link
Member Author

See rust-lang/rust#71117 (comment) for a possible reason why we might need moves to deinit their source.

@RalfJung
Copy link
Member Author

On the other hand, it looks like we actually need to be able to read enum discriminants after partial moves out of an enum (rust-lang/rust#91029, rust-lang/rust#89764 (comment)). That is a problem if that move de-inited the part of the enum where the discriminant is stored -- can that happen with niche optimizations?

@Lokathor
Copy link
Contributor

An example would probably be Option<Vec>. Vec is a ptr/len/capacity but the ptr is always a non-null so that makes a niche where the None part of Option<Vec<T>> goes. I'm not sure if that entirely fits the scenario you were imagining.

@RalfJung
Copy link
Member Author

Well, the motivation for 'de-init on move' came up in rust-lang/rust#71117 (comment), so the question is if that would ever be in conflict with something like Option<Vec>.

@Lokathor
Copy link
Contributor

Ah, i think that part of it is a little too theoretical for me.

(though "some kind of aliasing rules" like you suggest at the end of your comment is often how I've imagined it to work)

@arielb1
Copy link

arielb1 commented Jan 5, 2023

If rustc performance wasn't an issue, I would lower
{ b }
into

_temp = b
b = poison

And as for address inequality, I'm for that if a local is currently deinitialized (i.e. trying to borrow it would give you a "use of moved value" error), then it is allowed for other locals to have the same address as it. Of course, if you are in a part of the compiler that does not track "use of move value", you have to assume that locals are not deinitialized.

Again, this is only for HAIR moves. The function ptr::read is ptr::read, not ptr::read_and_poison.

If I understand things correctly, stacked borrows should make writing poison over a local should kill all pointers that people managed to get to it and make it not-live.

And then we can remove the b = poison whenever we want since it only makes code more defined.

Since that might have a deleterious effect on rustc performance, from an implementation perspective it might be better to regard "early MIR" as having a _temp = move b always followed by b = poison - for example, have it as a flag in MIR.

@afetisov
Copy link

afetisov commented Jan 5, 2023

With regards to destructive move, that's the semantics of the surface language, so personally I would expect it to also happen at the MIR level. Is there a reason not to implement it, other than implementation-level difficulties?

With regards to ptr::read, its safety contract doesn't forbid multiple reads, so deinitializing the place seems unwarranted, and likely to break unsafe code. We must make sure that the destructor runs only once, but there is nothing preventing us from doing multiple reads and forgetting them all. In fact, the documentation explicitly states that

Reads the value from src without moving it. This leaves the memory in src unchanged.

which implies that the memory is left initialized, even for !Copy types, and can be read several times.

In fact, here is an example from alloc::collections::btree which does a similar thing, reading the same location twice.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 6, 2023

Is there a reason not to implement it, other than implementation-level difficulties?

Yes. It makes MIR a lot harder to analyze, since the order in which operands are evaluated suddenly matters.

Also IMO this is the wrong question: when it comes to adding UB, we should always be asking whether there is a good reason to have this be UB, and make it not UB otherwise.

@afetisov
Copy link

afetisov commented Jan 6, 2023

Whether it is technically UB or not doesn't matter. The basic rules of Rust are that moves are destructive, so the old location must never be used again. Even if it's not UB, reading the moved-from place would still be a major logical error, because that moved object can have all kinds of external resources attached or reasons to be non-cloneable. I expect tools like Miri to help in finding double drops, regardless whether it is UB in the technical sense which could affect generated code.

It makes MIR a lot harder to analyze, since the order in which operands are evaluated suddenly matters.

That may be true, but since moves must be destructive, you should still encode the invalidation of old places in some way, If the question is "do we invalidate on move, or do we encode it is a separate operation", then I have no opinion which implementation strategy is better, but if the question is "do we keep moved-from places initialized and usable", then it violates the expected semantics of moves and drops.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 6, 2023

This repository, and Miri, are all about whether code is "technically" UB or not. That is literally the only question that matters here.

since moves must be destructive

You are presupposing that we have established this as consensus. We have not. That's why this issue is open.

@zeegomo
Copy link

zeegomo commented Jan 24, 2023

Looks like this discussion mostly revolves around moves, while maybe the thing for drops is a little bit easier (and some recent proposals try to decouple those).
For types with destructor it seems to me that reading the value again should be undefined behavior (for example because the drop implementation violates the, possibly custom, guarantees of the type, e.g. by releasing the backing memory of a Vec).

Not sure if it's helpful, but I have this weird idea that a drop (both TerminatorKind::drop and drop_in_place, as they both execute the destructor) might turn the type of the dropped value from T to a special Dropped([u8; size_of::<T>()]).
This would release all type invariants on the value, while still preserving the raw bytes and size of the type (not sure there's a way for the destructor to mess with that)

@JakobDegen
Copy link
Contributor

@zeegomo I don't see why that means you need any amount of UB - using a value of an unknown type after it has been dropped is clearly unsound, there's nothing that making it UB gets you.

@zeegomo
Copy link

zeegomo commented Jan 24, 2023

Not sure what you mean by unknown type.
The caller of drop_in_place might call it on a vec and then try to access it again. That subsequent use of vec should be UB, as should a use of any type with a destructor.

Have I misunderstood something?

@CAD97
Copy link

CAD97 commented Jan 25, 2023

The difference between library UB and language UB. Using a value after it's been dropped can obviously cause UB. But that's library UB, the same as using ptr::copy to duplicate a value and using both can cause UB.

Library UB is when you've misused an unsafe API and now the use of safe functions could cause language UB (or using a different version of the library, or of course immediately, etc); you've violated an API requirement and the library no longer guarantees freedom from language UB.

Library UB doesn't mean that the program execution is necessarily undefined, though. As a simple example, consider that a possible implementation of any unsafe new_unchecked is to just call the safe new, which obviously has defined behavior. Language UB is an absolute terror; the result is mathematically undefined, the compiler assumes that it never happens, and as such if UB actually does happen then retroactively all guarantees about your program are voided and your debugger has been lying to you since the beginning.

By saying that reading memory after it's been dropped is UB, you're saying that the example program of

let p: *mut Box<i32> = Box::into_raw(Box::new(Box::new(0)));
drop_in_place(p);
let x: *mut i32 = *(p as *mut *mut i32);
dbg!(x);

should be UB and that there is no incorrect compilation of this program. The semantically very similar

let p: *mut Box<i32> = Box::into_raw(Box::new(Box::new(0)));
drop(ptr::copy(p));
let x: *mut i32 = *(p as *mut *mut i32);
dbg!(x);

is unambiguously a defined program.

The extra fun version is that by saying reading the memory is UB, that's an even stronger requirement than saying the memory is uninitialized. Abstract bytes now have a 258th state (still ignoring provenance) where the mere act of reading them even as MaybeUninit<u8> is UB.

Personally, my current leaning is that running drop glue on a place (via drop_in_place or otherwise) should be defined to leave the bytes in the state that the Drop::drop implementation(s) leave(s) them. This is potentially necessary for Rc<T> for certain new kinds of dynamically sized T. (Rc's current implementation relies on the fact that it's sufficient for validity to call Layout::for_value_raw on a pointer to dropped value.

I'm also of the personal opinion that a move should be properly destructive and uninitialize the memory being moved from. (Probably in separate typed copy and uninitialize steps at the opsem level, but imo ideally in a way such that the two places are allowed to have the same address if they're both stack allocated objects.)

@zeegomo
Copy link

zeegomo commented Jan 25, 2023

By saying that reading memory after it's been dropped is UB,

I might have used the wrong terminology, but that's not what I meant to say. The destructor might leave some fixed byte string in place of the value, but that fixed value has no special meaning besides being a byte string (i.e. is the destructor allowed to leave the value in a state that violates type guarantees?). Using that value again as a value of any type T requires great care and possibly cause UB, as would transmute::<[u8], T>()

@digama0
Copy link

digama0 commented Jan 25, 2023

The destructor might leave some fixed byte string in place of the value, but that fixed value has no special meaning besides being a byte string (i.e. is the destructor allowed to leave the value in a state that violates type guarantees?).

This is the case for any unsafe function which can obtain a pointer to the allocation. It is not language UB to write 2 to a stack variable of type bool, it is only UB if you perform a typed copy of that invalid value at the original type, for example by returning it from a function or reading the variable after it was trashed. If you just let it go out of scope then this is DB (since bool doesn't have a destructor).

@nbdd0121
Copy link

nbdd0121 commented Jun 1, 2023

I recently run into this being considered !Send even with -Zdrop-tracking-mir

struct B();
impl !Send for B {}

fn do_sth(_: &B) {}

async fn test() {
    let b = B();
    do_sth(&b);
    drop(b);
    yield_now().await;
}

which lead me to rust-lang/rust#94067. I saw a PR rust-lang/rust#110420 which was rejected because @JakobDegen points out that it's undecided whether use-after-move is allowed (this issue).

In the case above, because b was borrowed, the storage of b is still alive at the await point, and thus B: !Send implies test(): !Send. I feel that this example gives a fairly strong motivation?

@RalfJung
Copy link
Member Author

RalfJung commented Jun 2, 2023

I think there is a somewhat separate question here. Even if b is guaranteed to have storage until it goes out of scope, should its 'type' really matter after it has been dropped? Arguably it is just raw storage at this point, and could be considered Send.

Note that this is not even primarily a question of motivation. Having storage disappear earlier is hard, I haven't even seen a proposal that actually achieves this. We'd need to emit suitable MIR statements indicating that b is deallocated, so that Miri can flag appropriate UB. It is certainly not sufficient to just change the generator lowering, we need the following code to be UB (flagged by Miri):

struct B(i32);

fn main() {
    let mut b = B(0);
    let ptr = std::ptr::addr_of_mut!(b);
    drop(b);
    unsafe { ptr.write(B(1)); }
}

@saethlin
Copy link
Member

saethlin commented Jun 2, 2023

That sounds to me like tightening up the way we handle StorageLive/StorageDead. Or just revisiting that whole system. Selfishly, I would really like to see that area of MIR semantics revisited because it is incredibly hard to work with in MIR optimizations.

@nbdd0121
Copy link

nbdd0121 commented Jun 2, 2023

I think there is a somewhat separate question here. Even if b is guaranteed to have storage until it goes out of scope, should its 'type' really matter after it has been dropped? Arguably it is just raw storage at this point, and could be considered Send.

Yeah that could be one way to tackle the Send problem. However we still need to make b's storage dead in order to reduce the size of the generator.

@RalfJung
Copy link
Member Author

I created a new issue specifically for "what is the semantics of move (in current codegen, and beyond)": #416.

@JakobDegen JakobDegen added the S-pending-design Status: Resolving this issue requires addressing some open design questions label Aug 1, 2023
@JakobDegen
Copy link
Contributor

Visiting for backlog bonanza. This may need deduplication with some other open issues, but leaving open as the canonical one

@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2023

Above and elsewhere, I said things like

making a move count like a Stacked Borrows write, i.e., invalidating other pointers

I actually today realized that while this is true for locals, it is not true for places inside a Box or &mut. Consider:

mod other {
    /// Some private memory to store stuff in.
    static mut S: *mut i32 = 0 as *mut i32;

    pub fn lib1(x: & &mut i32) { unsafe {
        S = (x as *const &mut i32).cast::<*mut i32>().read();
    } }
    
    pub fn lib2() { unsafe {
        *S = 1337;
    } }
}

fn main() {
    let x = &mut 0;
    other::lib1(&x);
    *x = 42; // a write to x -- invalidates other pointers?
    other::lib2();
    assert_eq!(*x, 1337); // oops, the value changed
}

Giving away &x lets the other module take a copy of the tag of x, which can later be used.

So at least for move(*place), we don't get a guarantee that all other pointers have been invalidated.

And with Tree Borrows, even for move(local), we don't get such a guarantee. This is because Tree Borrows does not retag on addr_of_mut!(local). The fact that Stacked Borrows retags here causes tons of problems so I strongly prefer the Tree Borrows semantics.


I also realized I didn't reply to this part by @arielb1

And as for address inequality, I'm for that if a local is currently deinitialized (i.e. trying to borrow it would give you a "use of moved value" error), then it is allowed for other locals to have the same address as it.

If a local has storage then it must have a distinct address, even if that storage is uninitialized.

The way to achieve potential equal addresses is to remove its storage. We already have a way to express that: StorageDead. I'm all for having HIR-to-MIR lowering move StorageDead earlier, and we need to carefully specify how early. But having a move be StorageDead makes little sense since move works on arbitrary places, such as local.field, and those cannot in general be deallocated.

So basically, replace the b = poison in your examples with StorageDead(b) and I agree. My position is still that any kind of use-after-move/drop UB should be fully explicitly encoded by deallocating the backing memory.

Now, you bring up rustc performance. Given that we have explicit StorageDead for many locals already, I wouldn't expect that moving that to earlier (namely when the move occurs, instead of when the local goes out of scope) would make MIR so much bigger.

bors added a commit to rust-lang-ci/rust that referenced this issue Oct 4, 2023
Stop considering moved-out locals when computing auto traits for generators

Addresses rust-lang#94067 (but does not fix it since drop-tracking-mir is unstable)

This PR, unlike rust-lang#110420 or rust-lang#112050, does not attempt to reduce the number of live locals across suspension points. It however ignores moved-out locals for trait purposes. So this PR solves the non-send issue, but not the size issue.

Suggested by `@RalfJung` in [rust-lang/unsafe-code-guidelines#188](rust-lang/unsafe-code-guidelines#188 (comment))

cc `@b-naber` who's working on this from a different perspective.

r? `@cjgillot`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-memory Topic: Related to memory accesses A-uninitialized Topic: Related to uninitialized memory C-open-question Category: An open question that we should revisit S-pending-design Status: Resolving this issue requires addressing some open design questions
Projects
None yet
Development

No branches or pull requests