New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for ptr::offset_from #41079

Open
Amanieu opened this Issue Apr 5, 2017 · 26 comments

Comments

Projects
None yet
8 participants
@Amanieu
Contributor

Amanieu commented Apr 5, 2017

PR: #40943

Adds an offset_to method to calculate the distance between two raw pointers.

List o' stuff:

  • Rename to offset_from and make unsafe (with wrapping_offset_from for the safe one) #41079 (comment) -- done in #49297
    • Should offset_from abort instead of panic on ZSTs?
  • Given that the difference between arbitrary pointers is meaningless and subject to change at the whims of LLVM anyway, do we even want the safe version? #41079 (comment) Experience reports from the field appreciated.
  • Should there be more methods like this with different restrictions (to allow nuw/nsw, perhaps) or that return usize (like how isize-taking offset is more conveniently done with usize-taking add these days)
@scottmcm

This comment has been minimized.

Member

scottmcm commented Jul 21, 2017

I wonder whether this should have the unsafe + wrapping_ split that offset does.

The unsafe one could, for example, use sdiv exact by requiring that the pointers are to the beginning or past-the-end of real objects inside the same allocated object.

@ubsan

This comment has been minimized.

Contributor

ubsan commented Jul 22, 2017

It seems suspicious at best to calculate the offset between two unrelated pointers... I agree with @scottmcm

@joshtriplett

This comment has been minimized.

Member

joshtriplett commented Oct 17, 2017

@pornel made the point that this should be offset_from, with the order of the subtraction reversed, which I agree with. (Or, perhaps, both methods should exist.)

@joshtriplett

This comment has been minimized.

Member

joshtriplett commented Oct 17, 2017

Also, I find it unfortunate that this forces handling the ZST case even if you know the pointer type and know that it doesn't point to a zero-sized type.

Could we use traits to provide a method that only exists for non-zero-sized types, and then returns isize?

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 10, 2017

I like @scottmcm's suggestion to have an unsafe ptr::offset_to that uses sdiv exact. This is the situation sdiv exact was designed for, and it often reduces a 5-instruction sequence down to 2. And, if the unsafe version could be documented to have UB if the pointers don't point into the same array (or one-past-the-end), it would make me less worried about breaking pervasive assumptions that LLVM makes about pointer differences [0]. And it would be sufficient for Vec and related use cases.

I suggest omitting the wrapping_ form though. Even if the implementation is entirely "safe" code, I agree with @ubsan that it's suspicious at best and doesn't seem like the kind of thing that should be encouraged via a convenient standard library function.

[0] For example, GetUnderlyingObject, which is used throughout the optimizer, assumes that getelementptr X, Y aliases a subset of X. This doesn't work if Y can be the difference from X to an independent object.

@iitalics

This comment has been minimized.

iitalics commented Jan 4, 2018

I think that returning None for the zero-sized-type case is unnecessary. It is a very unlikely event, and a majority of the time the user knows they will never get the None case, so unwrapping is just clutter.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Mar 17, 2018

Renaming to offset_from and swapping the subtraction order seems easy enough.

I don’t really understand the unsafe / sdiv exact aspect, though. I’ll leave that decision to someone else.

@Amanieu since you added this, what do you think?

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Mar 17, 2018

I have no objection to renaming it to offset_from.

The unsafe / sdiv exact issue basically boils around one question: what happens if the distance between the two given pointers is not a multiple of sizeof_of::<T>(). There's only 2 ways we can realistically handle this:

  • Make this situation UB (in which case offset_from becomes an unsafe function).
  • Round the result of the division towards zero.

The current implementation rounds towards zero, however I am considering going with the UB route instead. Note that this situation can only happen if one of the pointers is misaligned, so we could just require that both pointers be properly aligned for their type.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Mar 19, 2018

this situation can only happen if one of the pointers is misaligned

Can’t it also happen with aligned pointers, if size_of::<T>() > align_of::<T>()?

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Mar 19, 2018

That's a good point, I'm not sure why I thought of alignment there. You are correct.

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Mar 19, 2018

@SimonSapin That's a good point. C manages to avoid that issue, because in C it's undefined behavior if the pointers aren't both pointing to elements of the same array (6.5.6p9). I don't know whether Rust's offset_from wants a similar rule, though you can get into dangerous territory with pointer aliasing if you access one object using a pointer to another plus the distance between the two.

@ubsan

This comment has been minimized.

Contributor

ubsan commented Mar 20, 2018

@sunfishcode it seems... incredibly suspect to allow offset_from between two pointers from different objects.

@scottmcm scottmcm referenced this issue Mar 23, 2018

Merged

Introduce unsafe offset_from on pointers #49297

3 of 3 tasks complete
@scottmcm

This comment has been minimized.

Member

scottmcm commented Mar 23, 2018

I started on a PR for this at #49297; feedback and suggestions appreciated.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Mar 24, 2018

Amanieu and sunfishcode commented that doing fn(*const _, *const _) -> isize needs a subtraction that doesn't fit with any of the "does not overflow" flags available for LLVM sub (like the GEP rules are specific to it, and not available on a normal add instruction).

As such, should this API change form to something where it can be -> usize, and thus use nuw by requiring an order between the arguments? Such a thing probably shouldn't use the word offset; my strawman proposal is something involving distance (since I'm primed by C++).

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Mar 24, 2018

Keep in mind that std::distance and pointer subtraction in C++ both return a signed value. This is one of the reasons why I proposed making offset_to return a signed value.

LLVM IR generated by Clang is just a sub (without nsw/nuw) and a sdiv exact, so we currently don't lose anything compared to C.

I considered the name distance, however I'm not a big fan of it because the name somewhat implies that the order of parameters doesn't matter and that an absolute value is returned.

bors added a commit that referenced this issue Mar 26, 2018

Auto merge of #49297 - scottmcm:offset-from, r=dtolnay
Introduce unsafe offset_from on pointers

Adds intrinsics::exact_div to take advantage of the unsafe, which reduces the implementation from
```asm
    sub rcx, rdx
    mov rax, rcx
    sar rax, 63
    shr rax, 62
    lea rax, [rax + rcx]
    sar rax, 2
    ret
```
down to
```asm
    sub rcx, rdx
    sar rcx, 2
    mov rax, rcx
    ret
```
(for `*const i32`)

See discussion on the `offset_to` tracking issue #41079

Some open questions
- Would you rather I split the intrinsic PR from the library PR?
- Do we even want the safe version of the API?  #41079 (comment)  I've added some text to its documentation that even if it's not UB, it's useless to use it between pointers into different objects.

and todos
- [x] ~~I need to make a codegen test~~ Done
- [x] ~~Can the subtraction use nsw/nuw?~~ No, it can't #49297 (comment)
- [x] ~~Should there be `usize` variants of this, like there are now `add` and `sub` that you almost always want over `offset`?  For example, I imagine `sub_ptr` that returns `usize` and where it's UB if the distance is negative.~~ Can wait for later; C gives a signed result #41079 (comment), so we might as well, and this existing to go with `offset` makes sense.
@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Mar 28, 2018

In Nightly this is the tracking issue for three different inherent methods of both *const T and *mut T.

pub fn offset_to(self, other: *const T) -> Option<isize> where T: Sized {…}
pub unsafe fn offset_from(self, origin: *const T) -> isize where T: Sized {…}
pub fn wrapping_offset_from(self, origin: *const T) -> isize where T: Sized {…}

(Note: the *mut T methods also take a *const T parameter. I don’t know if this is intentional or if the parameter’s type should be changed to Self.)

The libs team discussed this and it wasn’t clear from this tracking issue or the implementation PR what is the motivation for this feature.

@Amanieu Could you comment on what situations this would be used in, why it should be in the standard library, and why three different methods are needed?

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Mar 28, 2018

My understanding is that offset_from is the "new" form which is intended to replace the old offset_to.

These methods are useful when converting between slices and raw pointers, which can happen when building data structures or with FFI. offset_to is currently used in two places in the standard library:

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Mar 31, 2018

If this is a replacement, should offset_to be removed? It’s also unstable.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Mar 31, 2018

I'll make a PR for that and to move vec&slice to offset_from. (Unless it should be deprecated for a bit first? I don't remember what the rules are for nightly things...)

@scottmcm

This comment has been minimized.

Member

scottmcm commented Apr 1, 2018

PR up at #41079

That did reinforce my feeling that isize, while good for consistency with offset, might not be the best signature, as the only places that use it in core & alloc know which pointer is higher and want usize.

bors added a commit that referenced this issue Apr 12, 2018

Auto merge of #49551 - scottmcm:deprecate-offset_to, r=KodrAus
Deprecate offset_to; switch core&alloc to using offset_from instead

Bonus: might make code than uses `.len()` on slice iterators faster

cc #41079
@ghost

This comment has been minimized.

ghost commented Aug 20, 2018

I tried using the offset_from() method on a pointer today. I like the name and API; it seemed rather intuitive. The only issue I have is that the implementations of offset_from() and wrapping_offset_from() use assert!() rather than debug_assert!(). The result is that offset_from() is a bit slower than it could be when debug = false.

Is there a reason to prefer assert!()?

@scottmcm

This comment has been minimized.

Member

scottmcm commented Aug 20, 2018

@dtrebbien The assert is on the size of the type, which is a compile-time constant, so I would expect it to always get optimized out. Are you seeing otherwise?

@ghost

This comment has been minimized.

ghost commented Aug 20, 2018

It looks like I am seeing a slight difference, but I am not entirely sure.

I am currently working on a piece of code where offset_from() would be used rather frequently. When I run my benchmark using:

let all_code_units_ptr = all_code_units.as_ptr();

//...

let code_unit_index = unsafe { iter.as_slice().as_ptr().offset_from(all_code_units_ptr) } as usize - 1;

//...

let end_code_unit_index = unsafe { slice.as_ptr().offset_from(all_code_units_ptr) } as usize - 1;

.. then my benchmark numbers are:

bench:   6,136,818 ns/iter (+/- 344,438) = 751 MB/s
bench:   6,156,946 ns/iter (+/- 349,123) = 748 MB/s
bench:   6,140,547 ns/iter (+/- 298,704) = 750 MB/s
bench:   6,135,250 ns/iter (+/- 248,410) = 751 MB/s
bench:   6,141,995 ns/iter (+/- 228,005) = 750 MB/s

When I use this "inlined" version instead:

let all_code_units_ptr_as_usize = all_code_units.as_ptr() as usize;

//...

let code_unit_index = unsafe { intrinsics::exact_div(iter.as_slice().as_ptr() as usize - all_code_units_ptr_as_usize, mem::size_of::<CodeUnitT>()) } - 1;

//...

let end_code_unit_index = unsafe { intrinsics::exact_div(slice.as_ptr() as usize - all_code_units_ptr_as_usize, mem::size_of::<CodeUnitT>()) } - 1;

.. then my benchmark numbers are:

bench:   6,120,998 ns/iter (+/- 205,285) = 753 MB/s
bench:   6,108,348 ns/iter (+/- 198,379) = 754 MB/s
bench:   6,109,708 ns/iter (+/- 223,981) = 754 MB/s
bench:   6,113,860 ns/iter (+/- 230,122) = 754 MB/s
bench:   6,123,206 ns/iter (+/- 221,520) = 752 MB/s

Here, CodeUnitT is u8.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Aug 21, 2018

Now that I'm not on my phone, I tried this out explicitly in playground.

pub unsafe fn demo(x: *const u8, y: *const u8) -> isize {
    x.offset_from(y)
}
; playground::demo
; Function Attrs: norecurse nounwind readnone uwtable
define i64 @_ZN10playground4demo17h3547ff97a5dad82dE(i8* %x, i8* %y) unnamed_addr #0 {
start:
  %0 = ptrtoint i8* %x to i64
  %1 = ptrtoint i8* %y to i64
  %2 = sub i64 %0, %1
  ret i64 %2
}

Which, as expected, has no sign of the assert. So while there may be some other issue here, it's more complex than just the debug_assert! vs assert!.

Do you have a bench you can share? Are you building with incremental? With LTO? With multiple codegen units? Do you see something different with a locally-built nightly without the assert!?

@ghost

This comment has been minimized.

ghost commented Aug 21, 2018

Thanks for looking into this. I can see that you are right; there isn't an effect of debug_assert!() vs. assert!(). There must be something else at play.

My benchmark is not public at the moment, but I am working to try to clean up everything so that I can eventually publish it.

@Amanieu Amanieu changed the title from Tracking issue for ptr::offset_to to Tracking issue for ptr::offset_from Oct 23, 2018

@Amanieu

This comment has been minimized.

Contributor

Amanieu commented Oct 25, 2018

Here are my thoughts on the remaining issues:

  • Regarding usize/isize, I think that we can defer this concern: regardless of whether we add a usize API later, having a function returning isize will always be useful, and match the behavior of pointer subtraction in C.
  • Regarding wrapping_offset_from, I personally feel that there isn't much point to this function. The only thing that I can think of could be to calculate the position of a &T in a slice without using any unsafe code, but then again this is sufficiently niche that I don't feel it is worth adding a function just for that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment