Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend Rc/Arc::from_raw() docs regarding unsafety #68099

Merged
merged 3 commits into from Mar 22, 2020

Conversation

@lukaslueg
Copy link
Contributor

lukaslueg commented Jan 10, 2020

This question on SO boils down to "is it safe to ::from_raw() a Rc<T>/Arc<T> using a dummy T even if T is never dereferenced via the new Rc/Arc?". It almost never is.

This PR amends the docs of from_raw() regarding this point.

Constructing an Rc/Arc is unsafe even if the wrapped `T`
is never dereferenced.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 10, 2020

r? @shepmaster

(rust_highfive has picked a reviewer for you, use r? to override)

@Dylan-DPC

This comment has been minimized.

Copy link
Member

Dylan-DPC commented Jan 11, 2020

I'm not sure if it applies to this PR as well, but worth noting this comment by @Mark-Simulacrum regarding something similar on another PR that added documentation about something being unsafe:
#67722 (comment)

/// This function is unsafe because improper use may lead to memory problems. For example, a
/// double-free may occur if the function is called twice on the same raw pointer.
/// This function is unsafe because improper use may lead to memory unsafety,
/// even if `T` is never accessed. For example, a double-free may occur if the function is

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Jan 11, 2020

Member

I guess this means "if Rc<T> is never accessed"`?

This comment has been minimized.

Copy link
@lukaslueg

lukaslueg Jan 11, 2020

Author Contributor

True as well, although the pitfall here is to assume accessing Rc<T> itself was safe if T is never touched.

even if Rc<T> or T is never accessed.

?

This comment has been minimized.

Copy link
@LukasKalbertodt

LukasKalbertodt Jan 16, 2020

Member

This whole paragraph seems a bit strange to me. Underspecified in a way. So what are the conditions for this method to be safe?

  • What's specified in the first paragraph (from a previous into_raw call)
  • Only call from_raw once per raw pointer.
  • T must (still) be initialized so that it can be dropped.

That's it, right? Shouldn't we explicitly document that? We can still mention the double free later.

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Jan 16, 2020

Member

It's worth noting (in my opinion) that the from_raw once per raw pointer makes relatively little sense, and IMO represents more hassle than it's worth. It means you can't technically store a pointer and then clone off copies of it, i.e., something like this would not be valid, but I think should be.

Other than that, I don't see any issues with the other points.

unsafe fn clone_arc(ptr: *const T) -> Arc<T> {
    let a = Arc::from_raw(ptr);
    let copy = a.clone();
    mem::forget(a);
    copy
}

let ptr = Arc::into_raw(some_arc);
clone_arc(ptr);
clone_arc(ptr);
clone_arc(ptr);

This comment has been minimized.

Copy link
@LukasKalbertodt

LukasKalbertodt Jan 17, 2020

Member

Good point. I guess we can just say "the user of from_raw has to make sure a specific T is only dropped once". Then it's the users responsibility.

@@ -573,10 +573,11 @@ impl<T: ?Sized> Rc<T> {
/// Constructs an `Rc` from a raw pointer.
///
/// The raw pointer must have been previously returned by a call to a
/// [`Rc::into_raw`][into_raw].
/// [`Rc::into_raw`][into_raw] using the same `T`.

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Jan 11, 2020

Member

I think this is not quite true, or at least is likely over-constraining. For example, I believe it is fine to go between the unsized representation and the sized representation via pointer casts (e.g,. Rc<T> unsizes to Rc<dyn Debug> and then you into_raw, cast to T, and from_raw).

I'm not sure what exact wording we want to guarantee here, though, maybe just same T is in practice both sufficient and good enough for most use cases. Maybe something like "layout and alignment equivalent to T; note that this is essentially a reference to reference transmute if the T differs from the one used by into_raw, and so the same conditions apply"?

This comment has been minimized.

Copy link
@lukaslueg

lukaslueg Jan 11, 2020

Author Contributor

Yes, I thought about that as well, it could be done. My thinking, however, is that it's too flaky to try to document under which conditions this is safe (and therefore binds all future changes to Rc/Arc).

This comment has been minimized.

Copy link
@LukasKalbertodt

LukasKalbertodt Jan 16, 2020

Member

How about this? (basically the same what @Mark-Simulacrum suggested)

The raw pointer must have been previously returned by a call to a Rc<U>::into_raw, where U must have the same alignment and memory layout as T (which is trivially satisfied if T == U). Note that if U is not T, this is basically like transmuting references of different types. See mem::transmute for more information on what restrictions apply in that case.

I usually wouldn't mind over-constraining, but if we document this exactly, this PR is just a doc bug fix.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Jan 11, 2020

I'm not sure if it applies to this PR as well, but worth noting this comment by @Mark-Simulacrum regarding something similar on another PR that added documentation about something being unsafe:
#67722 (comment)

No, that's specifically because we have not yet firmly decided whether Any should be an unsafe trait or not, so I didn't want to go adding documentation to that effect before we did so.

There's open questions here as well but I don't think they're "some other PR is changing things here as well" at this point.

I'm going to r? @LukasKalbertodt -- we'll likely want to nail down some proposed wording and then possibly FCP merge (since it's specifying, and likely narrowing in some sense, the allowed uses of into/from_raw).

@LukasKalbertodt

This comment has been minimized.

Copy link
Member

LukasKalbertodt commented Jan 16, 2020

then possibly FCP merge (since it's specifying, and likely narrowing in some sense, the allowed uses of into/from_raw).

But if this PR just specifies things that were true all along, this is just a doc bug fix, right? In that case an FCP shouldn't be necessary, should it? Not entirely familiar with the procedures yet.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Jan 16, 2020

I believe so.

However, I sort of expect we'll be moving from "unclear specification" to "fairly clear specification" -- in that case I would get sign off, personally. But if we end up with just general commentary without specification, then it seems fine to forgo FCP.

@JohnCSimon

This comment has been minimized.

Copy link
Member

JohnCSimon commented Jan 26, 2020

Ping from triage: @LukasKalbertodt any updates on this review?

@lukaslueg

This comment has been minimized.

Copy link
Contributor Author

lukaslueg commented Jan 27, 2020

New proposal for from_raw, including the comments above.

Constructs an Rc<T> from a raw pointer.

The raw pointer must have been previously returned by a call to Rc<U>::into_raw where U must have the same size and alignment as T. This is trivially true if T == U.
Note that if U is not T but has the same size and alignment, this is basically like transmuting references of different types. See mem::transmute for more information on what restrictions apply in this case.

The user of from_raw has to make sure the specific value of T is only dropped once.

This function is unsafe because improper use may lead to memory unsafety, even if the returned Rc<T> is never accessed.

The same would apply to Arc<T>::from_raw.

Notice that it says "... the same size and alignment ...", not "... the same alignment and memory layout ...", because the inner layout of T. does not need to be the same for from_raw to return a valid Rc. There may still be problems if it is dereferenced to T but there had been ample warning.

I'm unsure about the single sentence regarding "... only dropped once". Are people aware how to meet this requirement if they tinker with raw pointers?

@LukasKalbertodt

This comment has been minimized.

Copy link
Member

LukasKalbertodt commented Jan 27, 2020

@lukaslueg That sounds good to me!

I'm unsure about the single sentence regarding "... only dropped once". Are people aware how to meet this requirement if they tinker with raw pointers?

I think this should be sufficient. It's an unsafe method after all, people are expected to know what they are doing and research any unclear points.

If you push that as a commit, I'd be fine with starting an FCP merge (as was suggested above).

@lukaslueg

This comment has been minimized.

Copy link
Contributor Author

lukaslueg commented Jan 29, 2020

Pushed.

Quick recap: The point of this PR is to amend the docs regarding the behavior of [Arc/Rc]::from_raw; memory unsafety can occur if the inner type in from_raw is different from the original type, even if the resulting Arc/Rc is never accessed.

@LukasKalbertodt

This comment has been minimized.

Copy link
Member

LukasKalbertodt commented Jan 29, 2020

Let's see if I already have the rights to...

@rfcbot fcp merge

This is a rather minor documentation fix regarding the safety of {Arc, Rc}::from_raw. But since the new documentation is pretty specific and these are important types, @Mark-Simulacrum suggested going through an FCP with this.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Jan 29, 2020

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Jan 29, 2020

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

/// The raw pointer must have been previously returned by a call to
/// [`Rc<U>::into_raw`][into_raw] where `U` must have the same size
/// and alignment as `T`. This is trivially true if `U` is `T`.
Comment on lines +575 to +577

This comment has been minimized.

Copy link
@dtolnay

dtolnay Feb 2, 2020

Member

Is same size and align strong enough? What stops rustc from ordering the strong and weak fields of RcBox and ArcInner in a different order for different T?

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Feb 2, 2020

Member

Technically, nothing, but we know today that rustc will not do so I believe. I continue to believe that it would be a good idea to add #[repr(C)] or something like it to RcBox and ArcInner, since there should be no reason to not guarantee (internally at least) that the field layout is consistent.

Notably, the data field must be last, due to unsizing, and reodering two usize fields is not currently ever going to happen anyway, I believe.

This comment has been minimized.

Copy link
@dtolnay

dtolnay Feb 2, 2020

Member

reordering two usize fields is not currently ever going to happen anyway, I believe

Currently it doesn't, but rust-lang/unsafe-code-guidelines#36 is about whether to make this a guarantee, and I don't think it's decided yet. For example PGO-based field layout would absolutely reorder fields with the same size and align to keep frequently read data together and frequently written data in a different cache line.

I don't necessarily intend on blocking this PR on this, but I would feel more comfortable with #[repr(C)] on those types.

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Feb 2, 2020

Member

I would be in favor of adding #[repr(C)] as well. Doing so in this PR makes sense to me.

This comment has been minimized.

Copy link
@lukaslueg

lukaslueg Feb 3, 2020

Author Contributor

Amended RcBox and ArcInner with repr(C)

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Feb 7, 2020

Member

Could you also add a comment to those repr(C) annotations that notes that they're not for C-compat, but strictly so that the layout is equivalent for transmutable types (or so)?

This comment has been minimized.

Copy link
@lukaslueg

lukaslueg Feb 7, 2020

Author Contributor

force-pushed

lukaslueg added a commit to lukaslueg/rust that referenced this pull request Feb 3, 2020
Future-proof these types in case rustc reorders
the inner fields. As per discussion in PR rust-lang#68099.
Future-proof these types in case rustc reorders
the inner fields. As per discussion in PR #68099.
@lukaslueg lukaslueg force-pushed the lukaslueg:into_raw_unsafe branch from f90d359 to 586c7e3 Feb 7, 2020
/// This function is unsafe because improper use may lead to memory problems. For example, a
/// double-free may occur if the function is called twice on the same raw pointer.
/// The user of `from_raw` has to make sure a specific value of `T` is only
/// dropped once.

This comment has been minimized.

Copy link
@SimonSapin

SimonSapin Feb 13, 2020

Contributor

What does this mean? Isn’t Rc::drop the one responsible for dropping T when the refcount reaches zero?

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Feb 13, 2020

Member

The user is responsible for not calling from_raw (and allowing the returned value to drop) "too many times" is what this is getting at. I want to avoid saying something like "only once," since it can be incredibly useful to keep around a single pointer in FFI code and "clone out" Rc's from it, for example.

This comment has been minimized.

Copy link
@SimonSapin

SimonSapin Feb 13, 2020

Contributor

Ok, I see what you’re getting at but I feel this one sentence does not express it well. Maybe we could talk about how each Rc or Arc “owns” one strong reference, and into_raw/from_raw transfer ownership? I think that’s a better mental model, and double-drop is just one of the possible (bad) consequences.

(By the way I think there should be an alternative to from_raw that does not transfer ownership of the strong reference as a less fragile alternative to ManuallyDrop::new(Rc::from_raw(ptr)), but that’s out of scope for this PR. Maybe &*mut T -> &Rc<T>?)

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Feb 13, 2020

Member

Agreed on both points. I'm not sure of the exact wording, but maybe @lukaslueg can come up with some draft wording?

This comment has been minimized.

Copy link
@lukaslueg

lukaslueg Feb 13, 2020

Author Contributor

I'm a little hesitant on this point, for fear of convolution.

Calling into_raw() / from_raw() does not modify the internal reference counter, cloning and dropping the resulting Rc<T> does. One can think of these functions as temporarily lending exactly one "unit of ownership" to the raw pointer. The compiler can't check unique ownership through raw pointers, though, especially as all raw pointers are Copy. The user has to manually ensure T is always dropped exactly once, which is not the case if (among other things) T is dropped through the raw pointer or if from_raw() is called multiple times on inadvertently made copies of the raw pointer.

This comment has been minimized.

Copy link
@lukaslueg

lukaslueg Feb 25, 2020

Author Contributor

@Mark-Simulacrum any thoughts on the previous comment?

This comment has been minimized.

Copy link
@Mark-Simulacrum

Mark-Simulacrum Feb 25, 2020

Member

I think it's certainly more detailed :)

However, I think it should clarify that even if you know T is safe to drop lots of times (e.g., () unit type), it is still not okay to let Rc drop more than once for the "last" time.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 12, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 22, 2020

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@LukasKalbertodt

This comment has been minimized.

Copy link
Member

LukasKalbertodt commented Mar 22, 2020

@bors r+ rollup

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 22, 2020

📌 Commit 586c7e3 has been approved by LukasKalbertodt

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 22, 2020
Rollup of 10 pull requests

Successful merges:

 - rust-lang#68099 (Amend Rc/Arc::from_raw() docs regarding unsafety)
 - rust-lang#70172 (parse/lexer: support `StringReader::retokenize` called on external files.)
 - rust-lang#70209 (parser: recover on `for<'a> |...| body` closures)
 - rust-lang#70223 (fix type of const params in associated types.)
 - rust-lang#70229 (more clippy fixes)
 - rust-lang#70240 (Return NonZeroU64 from ThreadId::as_u64.)
 - rust-lang#70250 (Remove wrong entry from RELEASES.md)
 - rust-lang#70253 (Remove another wrong entry from RELEASES.md)
 - rust-lang#70254 (couple more clippy fixes (let_and_return, if_same_then_else))
 - rust-lang#70266 (proc_macro_harness: Use item header spans for errors)

Failed merges:

r? @ghost
@bors bors merged commit 0bc5fc9 into rust-lang:master Mar 22, 2020
4 checks passed
4 checks passed
pr Build #20200207.35 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-7) Linux x86_64-gnu-llvm-7 succeeded
Details
pr (Linux x86_64-gnu-tools) Linux x86_64-gnu-tools succeeded
Details
@lukaslueg lukaslueg deleted the lukaslueg:into_raw_unsafe branch Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.