Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Subslice-offset - Get the offset of references into a slice. #2796

Open
wants to merge 1 commit into
base: master
from

Conversation

@m-ou-se
Copy link

m-ou-se commented Oct 28, 2019

Rendered

(This idea was mentioned in the 'Future possibilities' section of #2791)


## Raw Pointers

By requiring a `&T` or a `&[T]` in these functions, we needlessly restrict them

This comment has been minimized.

Copy link
@rkruppe

rkruppe Oct 28, 2019

Member

This restriction to dereferenceable pointers side-steps potential problems with pointer comparisons: compiler optimizations can change the results of pointer comparisons to be different from the order of the runtime addresses when the pointers are dangling. See for example section 4.6 of this paper.

It is not immediately apparent to what degree LLVM performs such optimizations (especially on < and >, as opposed to ==), but it seems risky to assume that it never will. LLVM docs say icmp on pointers works "as if [the pointers] were integers" but this is imprecise and apparently contradicted by the fact that some comparisons which can be true at runtime under -O0 are folded to false at -O2.

cc @rust-lang/wg-unsafe-code-guidelines

This comment has been minimized.

Copy link
@m-ou-se

m-ou-se Oct 28, 2019

Author

Thanks for the feedback!

I mention somthing similar briefly, but it's a bit hidden at the end of 'option 4':

The downside is that we lose the guarantees of a &[T], and can no longer make assumptions such as start <= end or about the maximum size of a slice (which is needed to safely use pointer::offset_from).

It's part of the reason why I'm proposing the versions with references, and only mention the pointers as alternatives. I should probably mention this more clearly right at the start of the raw pointers section.

This comment has been minimized.

Copy link
@rkruppe

rkruppe Oct 28, 2019

Member

I did miss that part, thanks for pointing it out.

Those problems you list seem of a different nature, though. Not knowing that start <= end and that the slice spans no more than isize::MAX bytes requires a different, slightly less efficient implementation (e.g., some more comparisons). In contrast, it is not clear to me whether the problem of non-deterministic pointer comparison can be circumvented at all, since e.g. ptr->int casts do not currently stop the LLVM optimization I showcased earlier. That may have to change anyway to make LLVM's memory model consistent, but at present I don't have confidence the raw pointer version can be implemented correctly, at least according to the naive specification of comparing the pointer addresses numerically.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Nov 3, 2019

Member

@rkruppe unfortunately I don't think restricting to dereferencable pointers entirely side-steps the problems here when ZST are involved. ZST pointers are nominally "dangling" as far as LLVM is concerned (or at least, they might be). But then, as @HeroicKatora points out, there are larger problems with ZST.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Nov 3, 2019

Member

That may have to change anyway to make LLVM's memory model consistent, but at present I don't have confidence the raw pointer version can be implemented correctly, at least according to the naive specification of comparing the pointer addresses numerically.

Indeed LLVM is currently in a sad state where it is not possible to reliably compare the integer addresses of two pointers. Casting to integers first and then comparing should work but LLVM incorrectly "optimizes" that to comparing at pointer type instead. We could conceivably hack around that by "obfuscating" the comparison enough, which however would likely be catastrophic for performance. Also see these LLVM bugs: [1], [2].

@jeekobu

This comment has been minimized.

Copy link

jeekobu commented Oct 30, 2019

I feel these would be useful. I've implemented the following less-generic analog before and was surprised it didn't exist.

fn offset_within(parent: &str, child: &str) -> Option<usize> { ... }
Copy link

HeroicKatora left a comment

The feature might be confusing for slices of ZSTs. In particular, all elements are virtually indistinguishable from their pointers alone as they have the same address. The only useful return values of index_of seem Some(0) or None.

impl<T> [T] {
pub fn range_of(&self, subslice: &[T]) -> Option<Range<usize>>
}
```

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Oct 31, 2019

The implementation may be more tricky for range_of in the ZST case. One should be able to expect that if range_of(slice).is_some() then the returned range has the same length as the subslice given. Finding the index of the first and last element of the subslice each with index_of will return a wrong result from this point of view.

let element = element as *const _;
let range = self.as_ptr_range();
if range.contains(&element) {
unsafe { Some(element.offset_from(range.start) as usize) }

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Oct 31, 2019

This implementation panics for ZSTs (but would otherwise be unsound anyways). The requirements of offset_from state:

Both the starting and other pointer must be either in bounds or one byte past the end of the same allocated object. Note that in Rust, every (stack-allocated) variable is considered a separate allocated object.
[...]
Panics
This function panics if T is a Zero-Sized Type ("ZST").

References to ZSTs are always dangling pointers and thus offset_from never valid as they never point into any allocation. The non-ZST case could be improved with an explicit comment detailing soundness justifications.

Also note that even if having a special case implementation, slices of ZST will return Some(0) in many more cases than the caller might realize, sometimes depending on optimization levels.

let not_an_allocation = vec![(); 16];
let some_new = &();
// Entirely indetermined, might hold or not depending on compiler
assert!(not_an_allocation.index_of(&some_new).is_some());
pub fn range_of(&self, subslice: &[T]) -> Option<Range<usize>> {
let range = self.as_ptr_range();
let subrange = subslice.as_ptr_range();
if subrange.start >= range.start && subrange.end <= range.end {

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Oct 31, 2019

Again, this does not consider the ZST case. As outlined above, pointers may compare equal even though they logically refer to different regions in the slice. A sensible choice if the operation should succeed could be to assume a subslice starting at index 0 and having a length of subslice.len().

This shows that the implementation is entirely non-trivial. This is a major motivation for having such an interface in std, in my opinion.

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Oct 31, 2019

Also, it is unsound in general.

Consider that the subslice may be empty. In that case, the start and end pointer can fulfil the pointer comparison but not in fact be part of the same allocation. Note that this may be the case even for standard/non-ZST types. Then the offset_from calls are UB and do not panic. The crucial difference is that unlike the single element case even the start pointer may be a one-past-the-end pointer.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Nov 3, 2019

Member

Agreed -- if subslice is empty this should short-circuit and return None (or maybe Some(0..0)?).
Otherwise it should use index_of(subrange.start) and then add subslice.len to get the end index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.