Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Arc::unwrap_or_drop for safely discarding Arcs without calling the destructor on the inner type. #75911

Closed
wants to merge 5 commits into from

Conversation

steffahn
Copy link
Member

@steffahn steffahn commented Aug 25, 2020

This is my first contribution. The commit includes tests and documentation. I have marked a few stylistic or technical questions with FIXME. In particular, I don’t know how the tracking issues for new unstable standard library functions work.

There was previously some discussion on IRLO.

Motivation

The functionality provided by this new “method” on Arc was previously not archievable with the Arc API. The function unwrap_or_drop is related to (and hence similarly named similar to) try_unwrap. The expression Arc::unwrap_or_drop(x) is almost the same as Arc::try_unwrap(x).ok(), however the latter includes two steps, the try_unwrap call and dropping the Arc, whereas unwrap_or_drop accesses the Arc atomically. Since this is an issue in multi-threaded settings only, a similar function on Rc is not strictly necessary but could be wanted nontheless for ergonomic and API-similarity reasons. (This PR currently only offers the function on Arc, but I could add one for Rc if wanted.) In the IRLO discussion, I also mentioned two more functions that could possibly extend this API.

The function Arc::unwrap_or_drop(this: Arc<T>) -> Option<T> offers a way to “drop” an Arc without calling the destructor on the contained type. When the Arc provided was the last strong pointer to its target, the target value is returned. Being able to do this is valueable around linear(-ish) types that should not or cannot just be dropped ordinarity, but require extra arguments, or operations that can fail or are async to properly get rid of.

Further Remarks

The current documentation is adapted from and compares this function to try_unwrap so there’s no mention of the motivation (dropping Arc without calling a destructor). I don’t know if this should be added.

The names try_unwrap and unwrap_or_drop are a bit unfortunate since these operations seem quite different from the unwrap methods on Option or Result. This functionality could be renamed around into_inner, for example as try_into_inner (instead of try_unwrap, which would be deprecated) and into_inner (instead of unwrap_or_drop). Some people favored this kind of naming scheme in the IRLO discussion. On the other hand, into_inner is usually more straightforward and deterministic than what unwrap_or_drop offers.

Rendered Documentation

Screenshot_20200825_214643

@steffahn steffahn force-pushed the drop_linear_arc branch 2 times, most recently from a6f80e6 to 8155a9a Compare August 25, 2020 19:06
Copy link
Contributor

@poliorcetics poliorcetics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution ! I'm not an official Rust team member but I still left some style comments and small nits. :)

library/alloc/src/sync.rs Outdated Show resolved Hide resolved
library/alloc/src/sync/tests.rs Outdated Show resolved Hide resolved
library/alloc/src/sync/tests.rs Outdated Show resolved Hide resolved
library/alloc/src/sync/tests.rs Outdated Show resolved Hide resolved
library/alloc/src/sync/tests.rs Outdated Show resolved Hide resolved
library/alloc/src/sync/tests.rs Outdated Show resolved Hide resolved
@poliorcetics
Copy link
Contributor

@pietroalbini highfive didn't appear, is it down ? (I think you are one of the person in charge of it, sorry if I'm wrong)

@poliorcetics

This comment has been minimized.

@rustbot rustbot added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 26, 2020
@pietroalbini
Copy link
Member

Highfive failed to assign someone here. I'll add more logging so we'll discover what's causing this in the future.

When something like this happens again, please ping the whole infra team. Thanks :)

Also add documentation and tests for it.
This commit has some minor unresolved questions and is intended to be amended.
// function, like it's done with drop_slow in drop?

// using `ptr::read` where `drop_slow` was using `ptr::drop_in_place`
let inner = unsafe { ptr::read(Self::get_mut_unchecked(&mut this)) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot: Please write a // SAFETY: comment for why this is safe here. It doesn't have to be very long, just explains why the invariants are checked. :) Here is an example from core.

Copy link
Member Author

@steffahn steffahn Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation is a bit more complex. It is safe because it is doing the same things that the drop implementation does, with the difference of doing ptr::read instead of ptr::drop_in_place, but those two operations are applicable in pretty much exactly the same situations. The reason why every step of drop is safe also includes some really really really long comments in drop about ordering of atomic operations, hence my general question, as stated in the FIXME in line 538: “should I copy [...] the comments from drop?”

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I've done before is say something along the lines of "this is safe for the same reason the drop impl is safe; see that for more info".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't answer earlier, I didn't get the notification for this comment.

The comments in the drop implementation should not change very much, except for improving them so referencing them is probably okay.

You can add a line about the differences between ptr::read and ptr::drop_in_place and why the former must be used instead of the later below and it should be good.

@jyn514
Copy link
Member

jyn514 commented Aug 26, 2020

r? @LukasKalbertodt

@steffahn
Copy link
Member Author

steffahn commented Aug 29, 2020

I’ve found another use-case while reading: this book about linked lists in rust.

The author presents code like this

impl<T> Drop for List<T> {
    fn drop(&mut self) {
        let mut head = self.head.take();
        while let Some(node) = head {
            if let Ok(mut node) = Rc::try_unwrap(node) {
                head = node.next.take();
            } else {
                break;
            }
        }
    }
}

to avoid stack overflows on drop. This code does use try_unwrap and drop the Rc in the Err case. On the next page they state “In order to get thread safety, we have to use Arc. [...] All we need to do [...] is replace every reference to Rc with std::sync::Arc. That's it. We're thread safe. Done!”

At least they acknowledge that in Rust, “[...] it's impossible to cause data races, (not to be mistaken with the more general issue of race conditions).” They do not seem to notice that by replacing Rc with Arc, they are introducing a race condition themselves that can cause a stack overflow.

This example can of course be solved by unwrap_or_drop. Since it comes from starting out with Rc and only migrating the code to Arc afterwards, it convinces me that Rc should get a version of unwrap_or_drop, too.

@jyn514
Copy link
Member

jyn514 commented Aug 29, 2020

The current documentation is adapted from and compares this function to try_unwrap so there’s no mention of the motivation (dropping Arc without calling a destructor). I don’t know if this should be added.

This would be nice to have; just by reading the type signature it wasn't clear why you'd want this over arc.try_unwrap().ok().

@steffahn
Copy link
Member Author

steffahn commented Aug 29, 2020

The current documentation is adapted from and compares this function to try_unwrap so there’s no mention of the motivation (dropping Arc without calling a destructor). I don’t know if this should be added.

This would be nice to have; just by reading the type signature it wasn't clear why you'd want this over arc.try_unwrap().ok().

I’ve been working on the documentation a bit. I also added a paragraph to try_unwrap and a second example to unwrap_or_drop. This might be overkill though:


Edit: Updated screenshot

screencapture-file-home-frank-forks-rust-build-x86-64-unknown-linux-gnu-doc-std-sync-struct-Arc-html-2020-08-29-23_39_08


I mean, in case you like it, I can commit and push it so that people can correct any typos, etc.

@poliorcetics
Copy link
Contributor

Don't hesitate to push, at worst you'll be asked to squash before the PR is accepted. :)

Just avoid using force pushing when possible while reviews are going on, it makes following changes harder

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Sep 2, 2020

Concern: naming

After reading this thread a bunch of days later, remembering a vague idea of the topic at hand, I've realised that or_drop is actually misleading. The whole point of doing this "atomically" is to ensure that the or case will just decrease ref_count / release a non-owning handle, so that the pointee will not be dropped in that case. Calling that branch drop seems misleading.

I thus think that the name ought to be changed (other than that, it does seem to me like a good API for this very specific but not far-fetched use case 👍).

My personal suggestions regarding the naming:

  • .unwrap_or_relinquish_ownership(). Quite mouthful, but for such specific use cases it seems to be worth it.

  • Shorter version: .unwrap_or_release(), but Rust does not have the habit of using the retain/ release terminology.

  • .unwrap_or_decr_strong_count(), based on .decr_strong_count(). This goes against abstraction, by exposing a very low-level detail of the implementation, but, again, given how specific the use case is, among all these three options, this one looks like the best to me.

@CAD97
Copy link
Contributor

CAD97 commented Sep 2, 2020

Note that this is always called as Arc::unwrap_or_xxx(handle), so it should be possible to infer from that, that Arc::xxx_or_drop means drop the Arc, not the T.

That said, I agree that there is an opening for misinterpretation (since, as you point out, the point of this is that the pointee will not be dropped), but I disagree that any of your proposed options are strictly better (except maybe unwrap_or_release, but I agree that doesn't fit with existing naming).

@poliorcetics
Copy link
Contributor

The method is defined as taking this: Self so it will always be called as Arc::unwrap_or_drop(my_value). IMO this is enough to show what's going on. The fact it feats the current naming style is a nice bonus.

Now that I think about it: @steffahn, what do you think about modifying the doc for Arc::try_unwrap to give a pointer to Arc::unwrap_or_drop about the Arc::try_unwrap(x).ok() case ? Or maybe only adding a FIXME in the new method for later, when it is eventually stabilised ?

@steffahn
Copy link
Member Author

steffahn commented Sep 2, 2020 via email

@poliorcetics
Copy link
Contributor

I already modified the docs of try_unwrap in one of my newer commits.

Oh sorry, I checked quickly but didn't see it !

@steffahn
Copy link
Member Author

steffahn commented Sep 2, 2020

@poliorcetics also see the updated screenshot in my comment further up for a rendered version of that documentation

/// // The following code could still cause a stack overflow
/// // despite the manual `Drop` impl if that `Drop` impl used
/// // `Arc::try_unwrap(arc).ok()` instead of `Arc::unwrap_or_drop(arc)`.
/// {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the extra-indentation and block are necessary here, you should be able to put everything at the same level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary, I know. I felt like it looked better this way, especially making clear that the comment is referencing the entire following code. I also wrote this when the two examples where not split yet. I have not reevaluated if it looks a bit less confusing to have this block on top level now that the examples are split.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put a vertical space between the overarching comment and the rest:

/// // The following code could still cause a stack overflow
/// // despite the manual `Drop` impl if that `Drop` impl used
/// // `Arc::try_unwrap(arc).ok()` instead of `Arc::unwrap_or_drop(arc)`.

/// other comments / code

The example end with this block so there should be no confusion about it. (I hope)

@crlf0710 crlf0710 added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 18, 2020
@crlf0710
Copy link
Member

Triage: I guess this is still waiting on review. Somehow it doesn't has a S-* tag.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Sep 18, 2020

⚠️ Note that I am still concerned about the naming.

I've found a way to better phrase the problem. We have:

Arc: unwrap [the payload] or drop [the handle]

As you can see, there is a double elision of the object of the action, and contrary to what natural languages do, at least, w.r.t. English (and French and Spanish), when multiple elisions happen it's only because they all refer to the same elided entity.

This is factually not the case here, so I insist that such a niche use case should favor being less terse and thus potentially confusing, and on the contrary, lean on the side of explicit-ness. Be it by using unwrap_or_release, or whatever other name somebody else can come up with 🙂

  • EDIT: the suggested .into_inner(), for instance, LGTM 👍

That is: although some people may not be confused by the current naming (good for them), do all of you honestly think that nobody out there will? What's the harm in having a slightly longer / more explicit method name?


I hope I have, this time, managed to better convey my feeling, which is only a question of being potentially overly cautious, rather than the opposite 😉. And if I haven't, so be it, I won't insist anymore 🙃

Copy link
Member

@LukasKalbertodt LukasKalbertodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR. Sorry for my late review.

The reasoning for adding this method makes sense and I'm on board merging it unstably. However, I strongly agree that the name still has to change. As @danielhenrymantilla said here, it's confusing that "unwrap" and "drop" in the name don't refer to the same thing. I think into_inner is a pretty good name. Whether we rename/deprecate try_unwrap is another question.

Regarding the implementation: I checked it briefly and couldn't find any problems. However, this is code using relaxed atomics and I don't feel comfortable approving this kind of addition. I don't have experience with it and generally, it's a hard topic and the standard library was always very conservative with these kinds of changes. However, here, this is mostly just a copy of existing code and it seems to make sense to me. (And yes, I might or might not have spent the last 4 hours trying to understand Arc::drop.) So before merging this, I will probably ping the team to have a few more eyes on this.

Lastly, I left a few inline comments.

/// t1.join().unwrap();
/// t2.join().unwrap();
/// }
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second long example feels slightly overkill for this kind of highly specialized method. I would assume that people reaching for this method don't need a motivating real world example anymore. However, now that it's here already, you don't need to remove it. Maybe it helps someone after all ^_^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know, I called it overkill myself. Nontheless, more examples aren’t going to hurt, probably.

#[unstable(feature = "unwrap_or_drop", issue = "none")] // FIXME: add issue
// FIXME: should this copy all/some of the comments from drop and drop_slow?
pub fn unwrap_or_drop(this: Self) -> Option<T> {
// following the implementation of `drop` (and `drop_slow`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in std usually start uppercase. This also applies to other comments that I won't individually comment on.

Suggested change
// following the implementation of `drop` (and `drop_slow`)
// Following the implementation of `drop` (and `drop_slow`)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll still have to rework the comments a bit anyways, as suggested above. But thanks for the info anyways.

Comment on lines +621 to +622
// FIXME: should the part below this be moved into a seperate #[inline(never)]
// function, like it's done with drop_slow in drop?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary for now. We can still improve this later, if it seems useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly just curious what the reasoning for the separation in the drop implementation is good for. Possibly improved performance by reducing code size in the common case; but someone ought to have none some benchmarking determining that this is worth it somewhere, right? Perhaps I should look at the git blame and track down the original author and PR to find out...
I was just thinking that since unwrap_or_drop is pretty much also just a destructor for Arc, all the same performance considerations should apply there, too. On the other hand, try_unwrap does not do anything like this.

Comment on lines 118 to 119
let r_thread = std::thread::spawn(|| Arc::try_unwrap(x).ok());
let s_thread = std::thread::spawn(|| Arc::try_unwrap(y).ok());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawning a thread takes a while, so I would almost expect both try_unwraps to never run at the same time. Maybe try spawning both threads but then immediately blocking them to synchronize them. Then after both threads are spawned, in the main thread, you somehow signal to both threads that they may start now. I am not sure how best to do that, though. RwLock or rendezvous channel maybe?

But yeah, in any case, that's a really tricky test to write. Not sure what's best here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and the test does indeed fail more often than one would think (on my machine) if try_unwrap(x).ok() is used (at least on my machine). Which makes me notice... I totally forgot replacing it with unwrap_or_drop again after testing that these failures can happen!! Damn... this means also means that it didn’t fail once on the CI. But the hope was basically only that if unwrap_or_drop changed in the future and became broken (i.e. lost its guarantees of being atomic) then the test would fail for a lot of people (even if not for everyone) and the error would be noticed eventually.

@steffahn
Copy link
Member Author

Thanks a lot for the PR. Sorry for my late review.

The reasoning for adding this method makes sense and I'm on board merging it unstably. However, I strongly agree that the name still has to change. As @danielhenrymantilla said here, it's confusing that "unwrap" and "drop" in the name don't refer to the same thing. I think into_inner is a pretty good name. Whether we rename/deprecate try_unwrap is another question.

Regarding the implementation: I checked it briefly and couldn't find any problems. However, this is code using relaxed atomics and I don't feel comfortable approving this kind of addition. I don't have experience with it and generally, it's a hard topic and the standard library was always very conservative with these kinds of changes. However, here, this is mostly just a copy of existing code and it seems to make sense to me. (And yes, I might or might not have spent the last 4 hours trying to understand Arc::drop.) So before merging this, I will probably ping the team to have a few more eyes on this.

Lastly, I left a few inline comments.

Thanks for the review so far. I’m thinking into_inner is pretty good myself, too. I’m not sure if renaming try_unwrap is even necessary; I think I was thinking about that these functions must be named similarly for reasons like discoverability and cleanness. But super clean names are hard/almost impossibly anyways and discoverability is good now that I added a link to unwrap_or_drop to try_unwrap’s documentation. So I guess we could just live with try_unwrap + into_inner for now. What I also like about into_inner is that it is a shorter and more familiar name, suggesting that this operation isn’t obscure but instead exactly what you canonically want to do when getting rid of an Arc whilst caring about getting the inner value (if possible). And also there is no useful operation of type Arc<T> -> T (the common type of an into_inner method) that I can think of.


I would be curious to get your opinion on this, too:

Since this is an issue in multi-threaded settings only, a similar function on Rc is not strictly necessary but could be wanted nontheless for ergonomic and API-similarity reasons. (This PR currently only offers the function on Arc, but I could add one for Rc if wanted.)

[go to comment]

and

“In order to get thread safety, we have to use Arc. [...] All we need to do [...] is replace every reference to Rc with std::sync::Arc. That's it. We're thread safe. Done!”

[…] They do not seem to notice that by replacing Rc with Arc, they are introducing a race condition themselves that can cause a stack overflow.

This example can of course be solved by unwrap_or_drop. Since it comes from starting out with Rc and only migrating the code to Arc afterwards, it convinces me that Rc should get a version of unwrap_or_drop, too.

[go to comment]

@LukasKalbertodt
Copy link
Member

Mhhhh... good question regarding Rc::into_inner.

This example can of course be solved by unwrap_or_drop. Since it comes from starting out with Rc and only migrating the code to Arc afterwards, it convinces me that Rc should get a version of unwrap_or_drop, too.

That's a good point, but just adding Rc::into_inner won't solve the problem completely. Maybe the docs should have a section about "transitioning from Rc to Arc". (Not saying it should be included in this PR.) I don't know if there are other cases where just replacing Rc with Arc can lead to some problems.

I'm fine with adding Rc::into_inner in any case. It is a somewhat useful API addition either way. And unlike Arc::into_inner, Rc::into_inner will be trivial to implement (right?) and thus has only a tiny maintenance cost. We can still figure out if we actually want it before stabilization.

@Diggsey
Copy link
Contributor

Diggsey commented Oct 14, 2020

Bikeshed: I like atomic_unwrap() or unwrap_atomic() - from the name alone I had no idea what this did differently from try_unwrap. Upon reading further, it seems the important difference is that it's atomic.

@steffahn
Copy link
Member Author

In case anyone is wondering, sorry for not making any progress here at the moment, I'll have time for this again in about a week.

@jyn514 jyn514 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 30, 2020
@Dylan-DPC-zz
Copy link

@steffahn thanks for taking the time to contribute. I have to close this due to inactivity. If you wish and you have the time you can open a new PR with these changes and we'll take it from there. Thanks

@Dylan-DPC-zz Dylan-DPC-zz added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 1, 2020
@steffahn
Copy link
Member Author

steffahn commented Dec 3, 2020

opened #79665

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.