Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift internal assumption is false #14

Closed
noamtashma opened this issue Oct 22, 2022 · 5 comments
Closed

Lift internal assumption is false #14

noamtashma opened this issue Oct 22, 2022 · 5 comments

Comments

@noamtashma
Copy link
Contributor

In lift.rs, in the implementation of the lift function, there is this line:

 debug_assert_ne!(slot as *const _, &root as *const _);

slot, &root presumably can't point to the same location because root owns a value whereas slot is a mutable reference to a value (of the same type), and both of them being the same would violate Rust's aliasing guarantees.

However, this is false in this example:

fn lift_counterexample() {
    GhostToken::new(|mut token| {
        let cell = GhostCell::new(Box::new(7));
        lift_with_mut(cell, &mut token, |cell_ref, token_ref| {
            GhostCell::from_mut(cell_ref.borrow_mut(token_ref))
        });
    })
}

Here, a &GhostCell<T> is safely aliasing a &mut GhostCell<T>. This happens by the chain of &GhostCell<T> -> &mut T -> &mut GhostCell<T>, which can be called in regular code without using the lift function.
To my understanding, this is safe because

  • GhostCell ensures these references cannot be used at the same time, by the use of the token.
  • GhostCell is/contains an UnsafeCell which causes some things to be allowable, both in terms of stacked borrows and in terms of rustc assuming less things.

Therefore, the code of lift is wrong to assume that this can never happen.

@noamtashma
Copy link
Contributor Author

I believe that this is unsound, because then in the replace function, the function will read a moved (logically uninitialized) value from dst, since it has been moved in the src argument of the function.

I didn't manage to cause any concrete unsoundness or to cause miri to catch it.

Here's some speculation on what may actually happen in practice:

Note that It will then also "leak" the src argument into dst. (This is a "leak", since dst actually points to a logically-uninitialized value, which will therefore be ignored in the future).

I'm guessing that's because it will end up copying the logically uninitialized value in dst which will turn out to be the same as src, and then leak the src value, resulting in the net effect of just moving from src.
However, to my understanding that's definitely unsound, even if it's hard to observe.

@matthieu-m
Copy link
Owner

You are definitely correct about the internal assumption, but I do not think there is a soundness issue.

As far as I know there is no notion of "logically" uninitialized value in Rust. The ptr::read or ptr::copy functions explicitly create multiple copies of a value, with a caveat that unless the value is Copy, then only one of the copies must be used (and dropped). This is exactly what happens here: the same value is copied in multiple places, but at the end, only one copy is returned and everything else is forgotten.

In fact, should there be "logically" uninitialized values, the entire lift would be unsound: slot is logically borrowing root, then root is moved (into slot) whilst borrowed, which should mean that slot is pointing into a "logically" uninitialized value.


I see multiple axes of improvement:

  1. The internal assertions should be removed, they do not hold.
  2. Tests should be created to exercise this usecase.
  3. Comments, and in particular safety comments, should be refined. This is not about initialized values, but about valid for read/write pointers.

Thoughts?

@noamtashma
Copy link
Contributor Author

To clarify, by "logically uninitialized" I mean it in the sense that a variable that has been moved out from, still exists as a variable, but behaves like it's uninitialized, since no one can use it. You can see this term in the rustonomicon

Specifically, in this case, root is moved in the replace call:

pub fn lift<F, R>(root: R, fun: F) -> R
where
    F: for<'a> FnOnce(&'a R) -> &'a mut R,
{
    let root = ManuallyDrop::new(root);
    let slot = fun(&root) as *mut R as *mut ManuallyDrop<R>;
    // Assume that they do point to the same point, and that we're in release mode
    debug_assert_ne!(slot as *const _, &root as *const _);

    unsafe {
        replace(slot, /* root is moved here */ root)
        // `slot` now points to a moved-from, (logically) uninitialized place.
    }
}

unsafe fn replace<T>(dest: *mut ManuallyDrop<T>, src: ManuallyDrop<T>) -> T {
    // Now `dest` points to the `lift` function's `root`, which is uninitialized, and different than this functions `src`.
    //  This reads from an uninitialized place
    let result = ptr::read(dest);
    ptr::copy(/* doesn't necessarilly point to the same position as `dest` */ &src as *const _, dest, 1);

    ManuallyDrop::into_inner(result)
}

I believe that rustc would be justified in marking root as uninitialized / llvm poison at the entrance to replace, which would then make subsequently reading from it UB.

However, miri doesn't report UB. Remember though, that miri isn't guaranteed to catch UB.
If the compiler decided to just leave the value of root as is even while it's considered uninitialized, then even though it's UB on an intermediate language level, it might perform as you expect.
Additionally, if you inline the call, you get rid of the problematic move as well:

pub fn lift<F, R>(root: R, fun: F) -> R
where
    F: for<'a> FnOnce(&'a R) -> &'a mut R,
{
    let root = ManuallyDrop::new(root);
    let slot = fun(&root) as *mut R as *mut ManuallyDrop<R>;
    // Assume that they do point to the same point, and that we're in release mode
    debug_assert_ne!(slot as *const _, &root as *const _);

    unsafe {
        let slot_ptr = slot as *const _;
        let result = ptr::read(slot_ptr);
        ptr::copy(/* This now points to the same place as `slot` */ &root as *const _, slot_ptr, 1);
        ManuallyDrop::into_inner(result)
    }
}

In the inlined version, you temporarily get two copies of the same value, which is then immediately forgotten. Since panicking cannot happen during that time, it seems okay.

In fact, I think inlining the call is probably a good way to solve this issue, and I propose to solve it this way.

@matthieu-m
Copy link
Owner

Inlining is indeed a possibility, but it's a code-block that'd have to be repeated 3 times, which makes it unpalatable.

The reason that inlining works is that replace doesn't actually use the fact that src is moved into it, and instead immediately turns it into a pointer; as such, I'll instead switch things around and simply pass src by pointer into replace, thereby not moving in the various lift functions.

matthieu-m added a commit that referenced this issue Oct 28, 2022
* Changes:

- Remove debug assertions that the user-provided function cannot return
  a reference to root in the lift functions.
- Remove useless move of root variable, to avoid muddying waters with
  regard to references to it.
- Add tests for self-lifting.
- Clarify safety comments, moving from initialized to valid, as
  ptr::read and ptr::copy are defined in terms of validity.

* Motivation:

@noamtashma demonstrated in #14 that it was possible -- using GhostCell
or similar -- for the user-provided function to return an alias to the
root, so ensure this usecase is supported correctly.
@matthieu-m
Copy link
Owner

Fixed in v0.6.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants