Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meeting proposal: inbounds requirements for offset (and potentially place projections) #10

Closed
RalfJung opened this issue Jun 14, 2023 · 3 comments
Labels
meeting-proposal Proposal for a discussion topic at a team meeting

Comments

@RalfJung
Copy link
Member

Summary

Currently, ptr.offset requires the input and output pointer to be in-bounds of the same live allocation (as in, stack variable / heap allocation / static). This creates subtle open questions around offset(0) and introduces a notion of "bounds" for a pointer that is separate from provenance (provenance might restrict a pointer to only be useful for a subrange of an allocation, but the actual allocation bounds are still relevant for offset).

If we weaken the semantics of place expressions to not have a "validity invariant" (see rust-lang/unsafe-code-guidelines#418), the semantics of offset could also become relevant for place projections, so the inbounds requirements would then be the reason why addr_of!(ptr::null::<Type>().field) is UB.

There is an obvious alternative here: make offset (and place projections, if they follow offset semantics) just require "no overflowing the edges of the address space". I'd like to gauge the team's opinion on this question. In particular, one problem here is that LLVM currently has no good way to express "offset that does not wrap": the only option is a plain getelementptr, which loses the "nowrap" information. If we have team consensus that "nowrap" is the semantics we want, maybe that would be a good starting point for organizing some work on the LLVM side to add support for getelementptr nowrap. (We might hence want to invite some LLVM people to this meeting.)

(There are other alternatives, e.g. getelementptr inbounds in LLVM actually allows arithmetic that is inbounds of a freed allocation; we could expose that in Rust.)

This does have some overlap with #9, but we can separate the discussion by just talking about offset/wrapping_offset semantics in #9 when defining place expression evaluation; we don't have to know whether the offset requirements will remain as they are or should be weakened.

Reading

Comment policy

These issues are meant to be used as an "announcements channel" regarding the proposal, and not as a
place to discuss the technical details. Feel free to subscribe to updates. We'll post comments when
reviewing the proposal in meetings or making a scheduling decision. In the meantime, if you have
questions or ideas, ping the proposers on Zulip (or elsewhere).

@RalfJung RalfJung added the meeting-proposal Proposal for a discussion topic at a team meeting label Jun 14, 2023
@RalfJung
Copy link
Member Author

Summary of the discussion:

  • Connor brings up a potential optimization that Rust's offset (but not LLVM's getelementptr inbounds) permits: deducing that the range between old and new pointer is dereferenceable, and using that to prefetch memory.
  • The counterpoint is that in particular for place projections, people often don't expect the inbounds requirement, and accidentally cause UB. For offset this is less common due to the detailed docs, but we'd rather not have place projection be a third offset operation distinct from wrapping_offset and offset.
  • The counter-counterpoint is that this mostly came up when people tried to hand-write offset_of, and hopefully that won't happen any more when offset_of is natively supported.
  • Overall we have fairly concrete evidence of people not expecting the UB, but only in code that hopefully won't be written any more, vs a very tentative idea that this could be useful for optimizations.
  • The general conclusion was that for now it doesn't seem clear that dropping the inbounds is the best call. There was consensus that offset(0) should be allowed always, so we will try to see if LLVM can commit to supporting this. Also we should keep our eyes open for real-world UB caused by the inbounds requirement that does not fall in the category of "hand-rolled offset_of implementation".

@RalfJung
Copy link
Member Author

The general conclusion was that for now it doesn't seem clear that dropping the inbounds is the best call. There was consensus that offset(0) should be allowed always, so we will try to see if LLVM can commit to supporting this.

That turns out to be more tricky than anticipated.

@RalfJung
Copy link
Member Author

Anyway, the meeting has been had, so we can close this.

RalfJung added a commit to minirust/minirust that referenced this issue Jul 30, 2023
This implements the decision from rust-lang/opsem-team#10 (zero-sized offset on null is allowed).
It goes further than that to keep the semantics consistent between zero-sized offsets and accesses.
bors added a commit to rust-lang-ci/rust that referenced this issue May 22, 2024
…cottmcm

offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang#65108
[Tracking issue](rust-lang#117945) | [T-lang summary](rust-lang#117329 (comment))

Cc `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this issue May 22, 2024
…cottmcm

offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang#65108
[Tracking issue](rust-lang#117945) | [T-lang summary](rust-lang#117329 (comment))

Cc `@nikic`
github-actions bot pushed a commit to rust-lang/miri that referenced this issue May 23, 2024
offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang/rust#65108
[Tracking issue](rust-lang/rust#117945) | [T-lang summary](rust-lang/rust#117329 (comment))

Cc `@nikic`
flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue May 24, 2024
offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang/rust#65108
[Tracking issue](rust-lang/rust#117945) | [T-lang summary](rust-lang/rust#117329 (comment))

Cc `@nikic`
bors added a commit to rust-lang/rust-analyzer that referenced this issue Jun 20, 2024
offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes rust-lang/rust#65108
[Tracking issue](rust-lang/rust#117945) | [T-lang summary](rust-lang/rust#117329 (comment))

Cc `@nikic`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meeting-proposal Proposal for a discussion topic at a team meeting
Projects
None yet
Development

No branches or pull requests

1 participant