Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elaborate on the invariants for references-to-slices #121965

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Mar 4, 2024

The length limit on slices is clearly a safety invariant, and I'd like it to also be a validity invariant. With function parameter metadata making progress in LLVM, I'd really like to be able to use it when &[_] is passed as a scalar pair, in particular.

The documentation for references is cagey about what exactly is a validity invariant, so for now just elaborate on the consequences of the existing safety rules on slices -- the length restriction follows from the size_of_val restriction -- as a way to help discourage people from trying to violate them.

I also made the existing warning stronger, since I'm fairly sure it's already UB to violate at least the "references must be non-null" rule, rather than it just being that it "might be UB in the future".

cc @joshlf @RalfJung

@rustbot
Copy link
Collaborator

rustbot commented Mar 4, 2024

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 4, 2024
@joboet
Copy link
Contributor

joboet commented Mar 4, 2024

Given that slice::from_raw_parts already states that "the total size len * mem::size_of::<T>() of the slice must be no larger than isize::MAX" and that its behaviour is undefined otherwise, I'd say that this is entirely uncontroversial. Still, I'd appreciate some team sign-off on this, I think this concerns lang?

@rustbot label +I-lang-nominated
(feel free to remove/adjust so that this reaches the right people)

@rustbot rustbot added the I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. label Mar 4, 2024
@RalfJung RalfJung added T-opsem Relevant to the opsem team T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 4, 2024
@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2024

The length limit on slices is clearly a safety invariant, and I'd like it to also be a validity invariant.

That sounds like we should go through t-opsem FCP, maybe joint with t-lang.

@scottmcm could you spell out the motivation for why it should be a validity invariant?
This does pose some extra complications for the opsem, since so far the validity invariants of metadata was always the same no matter whether it was in a raw pointer or a reference.

The authoritative location for validity invariants is the Reference ("behavior considered undefined"), so could you prepare an accompanying reference PR?

/// `isize::MAX / size_of::<E>()`. (Raw pointers may have longer lengths, but
/// references must not. For example, compare the documentation of
/// [`ptr::slice_from_raw_parts`](ptr/fn.slice_from_raw_parts.html) and
/// [`slice::from_raw_parts`](slice/fn.from_raw_parts.html).)
Copy link
Member

@RalfJung RalfJung Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to apply to all types with slice tails, or deliberately restricted to only slices and str?

In &(i32, [i32]), is the max length reduced by one to make sure that the entire type always fits into isize?

@Jules-Bertholet
Copy link
Contributor

Does this affect the design of the unstable Pointee trait? (As in, the validity invariant of reference metadata no longer matches that of Pointee::Metadata)

@joshlf
Copy link
Contributor

joshlf commented Mar 5, 2024

See also: #117474

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 23, 2024
Add `assume`s to slice length calls

Since `.len()` on slices is safe, let's see how this impacts things vs what we could do with rust-lang#121965 and LLVM 19
@joboet
Copy link
Contributor

joboet commented Mar 23, 2024

Does this affect the design of the unstable Pointee trait? (As in, the validity invariant of reference metadata no longer matches that of Pointee::Metadata)

I don't think so, as this is only an invariant for references, so ptr::slice_from_raw_parts and ptr::from_raw_parts wouldn't be affected.

@scottmcm
Copy link
Member Author

could you spell out the motivation for why it should be a validity invariant?
This does pose some extra complications for the opsem, since so far the validity invariants of metadata was always the same no matter whether it was in a raw pointer or a reference.

Can you elaborate on the complications it would cause? I don't know how to judge what is or isn't hard in there. I'd assumed that it wouldn't be substantially harder to deal with in opsem than how references have different validity invariants on the address+provenance part from pointers already.

(Talking about the pointee I think I understand how it's harder to enforce/check something, from the other conversations about that, but since the metadata isn't behind the pointer and thus it's right there to see when doing a typed copy, I didn't imagine it being substantially harder than checking things like the alignment of the pointer that's also reference-only, not applicable for pointers.)

I think the only reason I have to make it strictly a validity invariant would be for niches. That would be a better version of what RawVec is currently doing by hand, for example, with

#[repr(transparent)]
#[cfg_attr(target_pointer_width = "16", rustc_layout_scalar_valid_range_end(0x7fff))]
#[cfg_attr(target_pointer_width = "32", rustc_layout_scalar_valid_range_end(0x7fff_ffff))]
#[cfg_attr(target_pointer_width = "64", rustc_layout_scalar_valid_range_end(0x7fff_ffff_ffff_ffff))]
struct Cap(usize);

Today the very-common &[u8] has only the null available, but we could give it a whole isize::MAX more niches with a validity rule here. And having the niche rule for slice metadata would make it apply to Box<[T]> too, not just Vec<T>. (Plus I'd happily lose the ZST niche in Vec in exchange for everything else having a smarter one.)

Otherwise it's mostly convenience. Telling LLVM about the range restrictions is actually helpful, even just putting it on length checks, but also very expensive today -- both demonstrated in #122926

I'd love to just make it a validity invariant we can put on all loads, like we do with enums. (And soon we'll be able to do that for enum parameters too, which I'd love to do for slice lengths too.)

Is this meant to apply to all types with slice tails, or deliberately restricted to only slices and str?

Hmm, that's a very good question. I guess everything with a slice tail would be the most consistent, and thus really does need to be phrased in terms of the implied size of the object. (With the element count rule being the simple consequent case for plain slices.)

@RalfJung
Copy link
Member

RalfJung commented Mar 24, 2024

Can you elaborate on the complications it would cause? I don't know how to judge what is or isn't hard in there. I'd assumed that it wouldn't be substantially harder to deal with in opsem than how references have different validity invariants on the address+provenance part from pointers already.

It means we can't view metadata as "just a type" and check the metadata field as if it were a field of some type. Instead we have to view it as an inherent part of the pointer that cannot be described separately. (Or there are two separate types, one for 'metadata of reference' and one for 'metadata of raw pointer'.) We also get more degrees of freedom as we have to separately define the invariant for references and raw pointers.

It's not a big deal, so if there are clear benefits I don't think this should stop us. But we should clearly motivate breaking this symmetry.

In other words, it would have been nice to say that the validity of the metadata field of a pointer/reference to T is exactly that of <T as Pointee>::Metadata, but doesn't have to be like that as long as we are aware that we are breaking this property, we are doing so with sufficient motivation, and we are documenting this properly.

@saethlin
Copy link
Member

saethlin commented Apr 4, 2024

if there are clear benefits

Someone should diff the optimized IR for the compiler or some other large project before and after the PR that adds the assumes to see what optimizations are derived. I didn't notice this demonstrated in the PR, but I suspect with the assumes, LLVM optimizes code like &slice[idx..][..4] to one bounds check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants