Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTFE: there is no way to compute the difference between two ptrs in the same allocation if they might be out-of-bounds #92512

Open
RalfJung opened this issue Jan 3, 2022 · 19 comments
Labels
A-const-eval Area: constant evaluation (mir interpretation) A-raw-pointers Area: raw pointers, MaybeUninit, NonNull T-opsem Relevant to the opsem team

Comments

@RalfJung
Copy link
Member

RalfJung commented Jan 3, 2022

offset_from is the only way to do pointer subtraction during CTFE (since casting pointers to integers is not possible). However, offset_from requires both pointers to be in-bounds of the same allocation, making it unsuited for code that handles possibly out-of-bounds pointers. The need for this arises, for example, in the typical way that array/slice iterators support ZSTs, see e.g. slightlyoutofphase/staticvec#48.

We should provide some way for const code to subtract pointers to the same allocation even if they are not in-bounds. Example code:

const C: isize = {
    let start_ptr = &() as *const ();
    let length = 10;
    let end_ptr = (start_ptr as *const u8).wrapping_add(length) as *const ();
    unsafe { (end_ptr as *const u8).offset_from(start_ptr as *const u8) }
};

Also see this Zulip discussion.
Cc @rust-lang/wg-const-eval

@slightlyoutofphase
Copy link
Contributor

slightlyoutofphase commented Jan 3, 2022

One thing worth noting here again is that you can't actually use regular some_ptr.offset_from(some_other_ptr) on a typed ZST pointer anyways, as it explicitly panics when encountering them. ptr_offset_from(some_ptr, some_other_ptr) on the other hand does not panic, but still only actually works properly if you cast the ZST pointers to *const u8.

I'd also want to point out that the result of the function discussed in the issue you linked is mathematically correct in 100% of "runtime context where T actually is a ZST" cases, as I was careful to verify in tests like this one and various others.

If actually called in a const context specifically with a ZST though, my internal distance_between function already errors at compile time like this:

error[E0080]: evaluation of constant value failed
  --> src/main.rs:11:19
   |
11 |     0 => unsafe { ptr_offset_from(dest as *const u8, origin as *const u8) as usize },
   |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |                   |
   |                   ptr_offset_from cannot compute offset of pointers into different allocations.

Which is fine with me as all I care about with regards to it is just "this is a single function that simultaneously works properly for ZST array element pointers at runtime and also non-ZST array element pointers at compile time". The original implementation (which no longer compiles of course) served that purpose just as well, producing the exact same valid results for ZSTs at runtime:

/// An internal function for calculating pointer offsets as usizes, while accounting
/// directly for possible ZSTs. This is used specifically in the iterator implementations.
#[inline(always)]
pub(crate) const fn distance_between<T>(dest: *const T, origin: *const T) -> usize {
  match size_of::<T>() {
    0 => (dest as usize).wrapping_sub(origin as usize),
    // Safety: this function is used strictly with linear inputs
    // where dest is known to come after origin.
    _ => unsafe { ptr_offset_from(dest, origin) as usize },
  }
}

Edit: actually, I wasn't entirely correct above: my distance_between function is callable without compile time errors on ZST pointers in some very particular const situations, and does return the correct result then as well:

#![feature(core_intrinsics, const_ptr_offset_from, const_ptr_offset)]

use core::intrinsics::*;

#[inline(always)]
pub(crate) const fn distance_between<T>(dest: *const T, origin: *const T) -> usize {
    // Safety: this function is used strictly with linear inputs where `dest` is known to come after
    // `origin`.
    match size_of::<T>() {
        // If `T` is a ZST, we cannot use typed pointers as it would either return an incorrect value
        // or in some cases segfault.
        0 => unsafe { ptr_offset_from(dest as *const u8, origin as *const u8) as usize },
        // For all other sizes of `T`, however typed pointers are just fine.
        _ => unsafe { ptr_offset_from(dest, origin) as usize },
    }
}

struct S;

const X: [S; 4] = [S, S, S, S];
const Y: *const S = X.as_ptr();
// The call to `distance_betweeen` will return 0 *unless* we set this pointer
// up as a u8. Also, const eval fails and aborts the compilation if regular
// `add` is used below instead of `wrapping_add`.
const Z: *const S = (Y as *const u8).wrapping_add(2) as *const S;
const W: usize = distance_between(Z, Y);

fn main() {
    // Prints "2".
    println!("{}", W);
}

Playground link.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 3, 2022

One thing worth noting here again is that you can't actually use regular some_ptr.offset_from(some_other_ptr) on a typed ZST pointer anyways, as it explicitly panics when encountering them. ptr_offset_from(some_ptr, some_other_ptr) on the other hand does not panic, but still only actually works properly if you cast the ZST pointers to *const u8.

The internal intrinsic ptr_offset_from is UB when the pointers point to a ZST.

I'd also want to point out that the result of the function discussed in the issue you linked is mathematically correct in 100% of "runtime context where T actually is a ZST" cases, as I was careful to verify in tests like this one and various others.

No it is not -- it is UB. The UB just happens to give the correct result for current compilers.

actually, I wasn't entirely correct above: my distance_between function is callable without compile time errors on ZST pointers in some very particular const situations, and does return the correct result then as well:

Yes, that's what I would expect: when the two pointers are derived from the same pointer in a provenance-preserving way, CTFE will correctly compute their difference. However, it only does so because it fails to check for UB. Once that UB check is implemented, your last example would also error.

@slightlyoutofphase
Copy link
Contributor

slightlyoutofphase commented Jan 3, 2022

No it is not -- it is UB. The UB just happens to give the correct result for current compilers.

It's a completely logical contexual result, at the very least. My actual code does solely consist of what you called the "provenance-preserving way" when it comes to ZSTs and I would not expect anything else to work.

Don't get me wrong though, it's not like I don't want to do "the right thing" (which I think I was in the first place with my original usize-cast implementation of that function, as it was consistent with the standard non-const-context usize cast approach I continue to use pretty much everywhere else for ZSTs).

What that did for ZST pointers in practice is exactly the same thing as what ptr_offset_from does currently, in any case, so perhaps the solution is to re-allow that sort of casting to usize in const contexts but only for ZST pointers rather than pointers to any sized type?

It doesn't seem like any valid future solution for this overall issue could actually be any different than the current behaviour while still producing desirable results, also.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 3, 2022

perhaps the solution is to re-allow that sort of casting to usize in const contexts but only for ZST pointers rather than pointers to any sized type?

No, that won't work -- then one can cast an arbitrary *const T to usize by going through *const ().

I see two solutions to the problem laid out in the issue description:

  • Weakening the requirements of offset_from to just demand that both pointers are derived from the same pointer (to the same allocation). As you noted, this indeed already matches the current implementation, so this would be a documentation-only change. That's what I suggested on Zulip.
  • Adding a wrapping_offset_from that has the weaker precondition. Though this is probably a bad name since the function would still have to be unsafe. So it's not clear that we really want two variants of this method, and I don't think the extra in-bounds guarantee is useful enough for optimizations that it justifies having two methods.

@slightlyoutofphase
Copy link
Contributor

  • Weakening the requirements of offset_from to just demand that both pointers are derived from the same pointer (to the same allocation).

That seems like the best way to go to me, although I should be clear that my own use case for this does not extend beyond "something that works properly for ZST array element pointers (either directly or indirectly) without panicking".

So it's not clear that we really want two variants of this method, and I don't think the extra in-bounds guarantee is useful enough for optimizations that it justifies having two methods.

It seems to already be as optimized as it could be if you ask me, so you're likely right.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 3, 2022

It seems to already be as optimized as it could be if you ask me, so you're likely right.

FWIW, the optimizations I alluded to would be optimizations of the surrounding code that could exploit the fact that these pointers are in-bounds.

But that, too, seems rather speculative, and is probably not the best way for the programmer to make such a guarantee to the compiler.

@slightlyoutofphase
Copy link
Contributor

I'm not sure there actually even are too many useful scenarios where a person would call ptr_offset_from or offset_from on pointers that were not either to proper sized array elements or at least conceptually in bounds as in the case of ZST array elements, anyways, which probably helps the compiler along with knowing their validity I'd think.

Like, I don't know what you'd expect to do with the value returned from calling those functions on arbitrary unrelated pointers.

Until this all gets figured out though, should I bother changing the function you initially linked to something more like the following? It does the same thing to the extent I really care about, while objectively not being UB (as far as I'm aware):

#![feature(
  const_eval_select,
  core_intrinsics,
  const_ptr_offset,
  const_ptr_offset_from,
  inline_const
)]

use core::hint::unreachable_unchecked;
use core::intrinsics::*;

#[inline(always)]
pub(crate) fn runtime_zst_handler<T>(dest: *const T, origin: *const T) -> usize {
  (dest as usize).wrapping_sub(origin as usize)
}

#[inline(always)]
pub(crate) const fn compiletime_zst_handler<T>(_dest: *const T, _origin: *const T) -> usize {
  match size_of::<T>() {
    0 => panic!("`distance_between` is only currently usefully `const` for non-ZSTs!"),
    _ => unsafe { unreachable_unchecked() },
  }
}

#[inline(always)]
pub(crate) const fn distance_between<T>(dest: *const T, origin: *const T) -> usize {
  // Safety: this function is used strictly with linear inputs where `dest` is known to come after
  // `origin`.
  match size_of::<T>() {
    0 => unsafe {
      const_eval_select((dest, origin), compiletime_zst_handler, runtime_zst_handler)
    },
    _ => unsafe { ptr_offset_from(dest, origin) as usize },
  }
}

struct S;

const X: [S; 4] = [S, S, S, S];
const Y: *const S = X.as_ptr();
const Z: *const S = (Y as *const u8).wrapping_add(2) as *const S;

// The next line does the panic at compile time if uncommented.
// const W: usize = distance_between(Z, Y);

pub fn main() {
  // This one works properly.
  println!("{}", distance_between(Z, Y));
}

@RalfJung
Copy link
Member Author

RalfJung commented Jan 3, 2022

Yeah, that would be the Technically Correct thing to do. I'd recommend to call the offset_from method though, instead of directly calling ptr_offset_from. core_intrinsics is one of those feature gates one should rather avoid -- directly calling intrinsics is not going to ever be stabilized and their API contract, including what is and is not UB, is not stable and not even always properly documented.

  // Safety: this function is used strictly with linear inputs where `dest` is known to come after
  // `origin`.

Note that it does not have to "come after", but both do have to point into the same allocation (the same array, in your case).

@slightlyoutofphase
Copy link
Contributor

slightlyoutofphase commented Jan 3, 2022

I'd recommend to call the offset_from method though, instead of directly calling ptr_offset_from.

Yeah, I think the original reason I was using it is that the method-like offset_from wasn't const at the time, while the intrinsic was. Guess I can just switch it over now, though.

Note that it does not have to "come after", but both do have to point into the same allocation (the same array, in your case).

I think I wrote that comment moreso in reference to the fact I'm casting the isize result to a usize, which is part of what makes it a useful "convenience" function.

@RalfJung
Copy link
Member Author

RalfJung commented Apr 9, 2022

@Gankra since you spoke up around offset_from in #95837, I wonder what your thoughts are on relaxing the requirements of offset_from to just demand that both pointers are derived from the same pointer (to the same allocation), rather than actually requiring them to be in-bounds.

Knowing your stance on wrapping_offset I assume you won't like it, OTOH this closes a fundamental expressiveness gap in const code around subtracting out-of-bounds pointers -- which legitimately happens in slice iterators that support ZST (e.g. here, we currently use addr but that would not work for const). Looking at that code now it seems like we would also have to allow wrapping, or at least allow unsigned_offset_from to actually be used on pointers more than isize::MAX apart.

@Gankra
Copy link
Contributor

Gankra commented Apr 9, 2022

Ah I see, the offsets going wonky is "fine" for miri as long as the provenance is known, because then everything is well-defined offsets? I definitely don't like encouraging wrapping_offset stuff, but I believe this would work even under CHERI, because they specifically guarantee that the address is always right, even if compression scheme falls over.

(The "slice" in the metadata is effectively defined to be relative to the address of the "actual" pointer, so the slice is the thing that goes wonky, but in a way that also self-heals if you go back in-bounds. Unfortunately the "corrupt" bit is sticky, so you lose if you actually load/store with it.)

@RalfJung
Copy link
Member Author

RalfJung commented Apr 9, 2022

Yeah I think for CHERI none of this matters since we're ultimately not doing any memory accesses and producing an integer.

@onestacked
Copy link
Contributor

onestacked commented Feb 25, 2023

I ran into this when trying to implementing const Iteratror for slices.

@RalfJung would it be possible to somehow allow wrapping offsets only for ZST pointers for now?

@RalfJung
Copy link
Member Author

The type of the pointer doesn't really matter, so I don't see how it helps for them to be ZST.

@onestacked
Copy link
Contributor

Well I thought since ZST pointers cant be used to access memory anyways it would not be a problem if they would be considerd invalid if they ever go out of bounds

@RalfJung
Copy link
Member Author

You can (safely) cast them back to a non-ZST pointer though. And anyway this issue here is not about accessing memory to begin with, it's about the offset_from method.

Can you explain in more detail what kind of code you'd like to write?

@onestacked
Copy link
Contributor

I want to implement the is_empty! and len! macros of core::slice::Iter for const contexts.

https://github.com/rust-lang/rust/pull/104491/files#diff-4a1a6c5ac9bcf4c877565c6aa0f6d649a84907b9778c9b7cabebbc314acb8d06

@RalfJung
Copy link
Member Author

All right, that's similar to what prompted this issue in the first place. But the ZST part won't help.

What this code needs is basically to remove this requirement from offset_of

Both the starting and other pointer must be either in bounds or one byte past the end of the same allocated object.

We have to still keep this one

Both pointers must be derived from a pointer to the same object. (See below for an example.)

and we'll have to say something about ptr::invalid(X) and X as *const/mut _ pointers...

@jyn514 jyn514 added A-const-eval Area: constant evaluation (mir interpretation) A-raw-pointers Area: raw pointers, MaybeUninit, NonNull T-opsem Relevant to the opsem team labels Apr 26, 2023
@RalfJung
Copy link
Member Author

RalfJung commented Jul 4, 2023

Looking at the current code in libcore, I think slice iterators can handle ZST without ever doing a pointer subtraction in the ZST case, so I think the only concrete motivation we had for this issue vaporized? Only the non-ZST path uses ptr_sub but that is fine even during CTFE as all pointers are in-bounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: constant evaluation (mir interpretation) A-raw-pointers Area: raw pointers, MaybeUninit, NonNull T-opsem Relevant to the opsem team
Projects
None yet
5 participants