Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ptr::offset should explicitly clarify 0-sized offset semantics #65108

Closed
Gankra opened this issue Oct 4, 2019 · 3 comments · Fixed by #117329
Closed

ptr::offset should explicitly clarify 0-sized offset semantics #65108

Gankra opened this issue Oct 4, 2019 · 3 comments · Fixed by #117329
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Gankra
Copy link
Contributor

Gankra commented Oct 4, 2019

https://doc.rust-lang.org/std/primitive.pointer.html#method.offset

I can't remember the past arguments we definitely had about this, but it would be nice to explicitly call out the answer in the docs.

As a relevant example, Vec::into_iter currently unconditionally computes the "end" pointer of the array by offsetting by len (as long as size_of T > 0). This means that we offset a dangling pointer by 0 when iterating an empty Vec. This is obviously useful to support and annoying to have to guard against, so I would hope that's well-defined.

https://doc.rust-lang.org/src/alloc/vec.rs.html#1860

fn into_iter(mut self) -> IntoIter<T> {
        unsafe {
            let begin = self.as_mut_ptr();
            let end = if mem::size_of::<T>() == 0 {
                arith_offset(begin as *const i8, self.len() as isize) as *const T
            } else {
                // SAFE when `begin` dangles and `len == 0`???
                begin.add(self.len()) as *const T
            };
            let cap = self.buf.capacity();
            mem::forget(self);
            IntoIter {
                buf: NonNull::new_unchecked(begin),
                phantom: PhantomData,
                cap,
                ptr: begin,
                end,
            }
        }
    }
@Gankra
Copy link
Contributor Author

Gankra commented Oct 4, 2019

cc @RalfJung

@Gankra
Copy link
Contributor Author

Gankra commented Oct 4, 2019

(For anyone driving by, the ZST branch is an irrelevant distraction. We guarantee ZST ops do nothing, and this code is specially making the offset non-zero so it can use the pointer difference as a counter.)

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 4, 2019
@RalfJung
Copy link
Member

RalfJung commented Oct 9, 2019

The problem is that we don't really know what LLVM's rules are for getelementptr inbounds with a 0 offset. Also see #54857 of which this seems to be a duplicate, the corresponding UCG issue rust-lang/unsafe-code-guidelines#93, and this thread (spread across 3 months) on the LLVM list:
https://lists.llvm.org/pipermail/llvm-dev/2019-February/130452.html, https://lists.llvm.org/pipermail/llvm-dev/2019-March/130831.html, https://lists.llvm.org/pipermail/llvm-dev/2019-April/131693.html.

What Miri currently implements is the most restrictive version that allows current code to work: if you cast an int to a ptr, you may offset that by 0 and it's okay unless that int is 0 (NULL ptrs are special). If you have a "real" pointer, you may only offset it by 0 if it is inbounds of an allocation. With ptr-int-casts this distinction between "real pointers" and integer pointers gets blurry, however...

bors added a commit to rust-lang-ci/rust that referenced this issue May 22, 2024
…cottmcm

offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang#65108
[Tracking issue](rust-lang#117945) | [T-lang summary](rust-lang#117329 (comment))

Cc `@nikic`
@bors bors closed this as completed in 5d328a1 May 22, 2024
github-actions bot pushed a commit to rust-lang/miri that referenced this issue May 23, 2024
offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang/rust#65108
[Tracking issue](rust-lang/rust#117945) | [T-lang summary](rust-lang/rust#117329 (comment))

Cc `@nikic`
flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue May 24, 2024
offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang/rust#65108
[Tracking issue](rust-lang/rust#117945) | [T-lang summary](rust-lang/rust#117329 (comment))

Cc `@nikic`
bors added a commit to rust-lang/rust-analyzer that referenced this issue Jun 20, 2024
offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang/rust#65108
[Tracking issue](rust-lang/rust#117945) | [T-lang summary](rust-lang/rust#117329 (comment))

Cc `@nikic`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants