Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add note that Vec::as_mut_ptr() does not materialize a reference to the internal buffer #113859

Merged
merged 7 commits into from
Aug 29, 2023

Conversation

Manishearth
Copy link
Member

See discussion on thomcc/rust-typed-arena#62 and t-opsem

This method already does the correct thing here, but it is worth guaranteeing that it does so it can be used more freely in unsafe code without having to worry about potential Stacked/Tree Borrows violations. This moves one more unsafe usage pattern from the "very likely sound but technically not fully defined" box into "definitely sound", and currently our surface area of the latter is woefully small.

I'm not sure how best to word this, opening this PR as a way to start discussion.

@rustbot
Copy link
Collaborator

rustbot commented Jul 19, 2023

r? @thomcc

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 19, 2023
@thomcc
Copy link
Member

thomcc commented Jul 19, 2023

I think this needs some work on wording (CC @rust-lang/opsem) and needs approval by libs-api (for which I'll nominate it).

I believe we have tests in the repo that already effectively guarantee this (they'd fail under miri if this were violated), but on the other hand some decision was made not to add equivalent methods to String in #97483 (although perhaps it seems that might have been premature?)

@thomcc thomcc added the I-libs-api-nominated The issue / PR has been nominated for discussion during a libs-api team meeting. label Jul 19, 2023
@RalfJung
Copy link
Member

I'm not sure how best to word this, opening this PR as a way to start discussion.

That's a good question. We are still quite far away from deciding on an aliasing model so we have to state things somewhat indirectly. We could either talk about not having intermediate references as you did, or we could describe the effects of this: when calling as_mut_ptr twice (without moving the Vec itself in the mean time -- Unique might cause trouble if we do), the resulting pointers can be used in arbitrarily aliasing ways. However really we'd have to list all the methods that can be called and what can be done with the references they return without invalidating the resulting raw pointer, which is a lot.

@saethlin
Copy link
Member

saethlin commented Jul 19, 2023

I think the most troubling example in this space is:

fn main() {
    let mut s = &mut [0, 1];
    unsafe {
        std::ptr::copy_nonoverlapping(s.as_ptr(), s.as_mut_ptr().add(1), 1);
    }
}

This program is UB under Stacked Borrows and well-defined in Tree Borrows. I think that's a big win for Tree Borrows.

The refactoring to make this work in Stacked Borrows that I encourage is:

fn main() {
    let mut s = &mut [0, 1];
    let ptr = s.as_mut_ptr();
    unsafe {
        std::ptr::copy_nonoverlapping(ptr, ptr.add(1), 1);
    }
}

I think this really unfortunate because the type signature of ptr::copy{_nonoverlapping} suggests, especially to a new Rust user, that the standard refactoring won't work. The types are wrong. I actually suspect that it is this intuition about the different types of the arguments that leads people to even write the troubling code anyway.

Ralf pointed out in this example: #97483 (comment) that Tree Borrows doesn't fully get us out of the "You must hoist calls to as{_mut}_ptr" as a good guideline for writing unsafe code.

What I would like to tell people is that inserting calls to these methods does not have any aliasing implications. Currently that is true of Vec only, and I'm wary of that, especially because we have common performance advice to turn &mut Vec<T> arguments into &mut [T] where possible. There is a clippy lint for this transformation, but it doesn't fire if the function calls .as_mut_ptr() (I didn't look into why).

So I would be happiest if we had consistency between Vec<T>, String, and [T]. And I'm wary of adding a note about how this is valid to the documentation of Vec<T>, because there is no such guarantee for the other types, which isn't called out and even if it were still makes code less stable under refactoring.

In short, I'm concerned that telling anyone about this property of Vec that you want documented opens the need to explain a frightening amount about aliasing in order to not make this a footgun.

I do not have any suggestions for how to adjust the docs to alleviate this concern in light of our current need to state things indirectly.

@thomcc
Copy link
Member

thomcc commented Jul 19, 2023

Notably these methods were added to Vec in #61114, and the motivation was to avoid materializing a reference to the internal buffer. It makes sense that if these methods exist solely for this reason, to document that.

In short, I'm concerned that telling anyone about this property of Vec that you want documented opens the need to explain a frightening amount about aliasing in order to not make this a footgun.

I'm not really convinced by this, mostly because it is already the case that you have to understand a frightening amount about aliasing to avoid footguns in unsafe code (and it's sadly unlikely that this will get much better in the future).

@Manishearth
Copy link
Member Author

because there is no such guarantee for the other types,

there should be 😄

mostly because it is already the case that you have to understand a frightening amount about aliasing to avoid footguns in unsafe code (and it's sadly unlikely that this will get much better in the future).

I agree, we're already in the situation of "if you're using raw pointers you need to understand this borrowing model that is still changing and know caveats about closed and open opsem discussions", having a semi-cryptic note about materializing references does not make things worse for people who are not fully up to speed, but for people who do understand these models it gives them clarity that a piece of code is Actually Correct.

(and as I mentioned in the issue, our surface area of useful unsafe things to do that are Definitely Sound is woefully small)

@RalfJung
Copy link
Member

RalfJung commented Jul 20, 2023

@saethlin so concretely, you are saying we should make this not UB?

fn main() {
    let s = &mut [0,1,2,3];
    let ptr = s.as_mut_ptr();
    unsafe { ptr.write(97) };
    let ptr2 = s.as_mut_ptr();
    unsafe { ptr2.write(0) };
    unsafe { ptr.write(97) };
}

I agree that would be nice. But it's not clear how to do that without allowing way more programs than we want to allow... I'll open a thread on Zulip.

The key difference between this example (using arrays/slices) and the ones with Vec/String is that for slices, fn as_mut_ptr(&mut self) has a noalias on the pointer pointing to the slice contents, while Vec/String don't.

@RalfJung
Copy link
Member

For the as_mut_ptr methods, I guess the question is whether we should insist that the aliasing model must allow such code, and hence having as_mut_ptr on Vec/String/Box is unnecessary, or whether we are accepting that such code might be UB and have Vec/String/Box methods to deal with that situation.

@thomcc
Copy link
Member

thomcc commented Jul 20, 2023

I think this is getting off topic for the discussion at hand. At the end of the day, these method exist for this purpose, so we might as well document that they serve that purpose.

@thomcc
Copy link
Member

thomcc commented Jul 25, 2023

This was discussed in the libs-api meeting. The conclusion was positive, but that we need much better wording (perhaps a link to our existing documentation on provenance, and/or examples of what is allowed vs not).

It will also need a libs-api FCP.

@thomcc thomcc added the needs-fcp This change is insta-stable, so needs a completed FCP to proceed. label Jul 25, 2023
@Manishearth
Copy link
Member Author

Manishearth commented Jul 25, 2023

Thanks! The wording I chose here was mostly as an example; do people have ideas as to what the wording should be? (There's already some discussion here)

@RalfJung
Copy link
Member

RalfJung commented Jul 25, 2023

As a start, it could talk more about what this actually means for users:

/// This method guarantees that when it is called multiple times without
/// the buffer being reallocated in the mean time, the returned pointer will
/// always be exactly the same, even for the purpose of the aliasing model.
/// That means the following is legal:
/// ```rust
/// let mut v = vec![0];
/// let ptr1 = v.as_mut_ptr();
/// ptr1.write(1);
/// let ptr2 = v.as_mut_ptr();
/// ptr2.write(2);
/// // Notably, the write to `ptr2` did *not* invalidate `ptr1`:
/// ptr1.write(3);
/// ```

@Manishearth Manishearth force-pushed the vec-as-mut-ptr-stacked-borrow branch from 845f633 to 778fdf2 Compare July 25, 2023 20:06
/// let mut v = vec![0];
/// let ptr = v.as_ptr();
/// let x = ptr.read();
/// v[0] = 5;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried to write a similar example for as_ptr and wanted to avoid using as_mut_ptr(), but I think this is also unsound, yes? @RalfJung

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it is up to t-libs if they want to guarantee this -- basically this means that the pointer returned here was not derived via a shared reference. By analogy with addr_of!, I think we want to allow this code. (Is it worth mentioning that analogy?)

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@Manishearth Manishearth force-pushed the vec-as-mut-ptr-stacked-borrow branch from 986595a to 21906f9 Compare July 31, 2023 19:17
@Manishearth
Copy link
Member Author

@RalfJung I think this is sufficiently clear without committing too much or explaining a thesis' worth of semantics, thoughts?

@rust-log-analyzer

This comment has been minimized.

@thomcc
Copy link
Member

thomcc commented Aug 16, 2023

I suppose this needs T-libs-api FCP and not T-libs, so I should reassign it.

r? @m-ou-se

@rustbot rustbot assigned m-ou-se and unassigned thomcc Aug 16, 2023
@dtolnay
Copy link
Member

dtolnay commented Aug 16, 2023

@rust-lang/libs-api:
@rfcbot fcp merge

We discussed this pull request in the library API meeting on July 25 and had no objections, other than improving the original wording with examples, which has been done.

@rfcbot
Copy link

rfcbot commented Aug 16, 2023

Team member @dtolnay has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 16, 2023
@rfcbot
Copy link

rfcbot commented Aug 19, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Aug 29, 2023
@rfcbot
Copy link

rfcbot commented Aug 29, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dtolnay
Copy link
Member

dtolnay commented Aug 29, 2023

@bors r+

@bors
Copy link
Contributor

bors commented Aug 29, 2023

📌 Commit 7cea69c has been approved by dtolnay

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 29, 2023
@dtolnay dtolnay assigned dtolnay and unassigned m-ou-se Aug 29, 2023
@bors
Copy link
Contributor

bors commented Aug 29, 2023

⌛ Testing commit 7cea69c with merge cedbe5c...

@bors
Copy link
Contributor

bors commented Aug 29, 2023

☀️ Test successful - checks-actions
Approved by: dtolnay
Pushing cedbe5c to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 29, 2023
@bors bors merged commit cedbe5c into rust-lang:master Aug 29, 2023
12 checks passed
@rustbot rustbot added this to the 1.74.0 milestone Aug 29, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (cedbe5c): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.7% [0.7%, 0.7%] 1
Improvements ✅
(primary)
-1.7% [-2.4%, -0.8%] 6
Improvements ✅
(secondary)
-1.7% [-2.4%, -0.7%] 5
All ❌✅ (primary) -1.7% [-2.4%, -0.8%] 6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.9% [1.0%, 5.1%] 11
Regressions ❌
(secondary)
1.7% [1.0%, 2.4%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.9% [1.0%, 5.1%] 11

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 631.918s -> 630.078s (-0.29%)
Artifact size: 316.19 MiB -> 316.22 MiB (0.01%)

@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet