Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {into,from}_raw to Rc and Arc #37192

Merged
merged 1 commit into from
Nov 8, 2016
Merged

Conversation

cristicbz
Copy link
Contributor

@cristicbz cristicbz commented Oct 15, 2016

These methods convert to and from a *const T for Rc and Arc similar to the way they work on Box. The only slight complication is that from_raw needs to offset the pointer back to find the beginning of the RcBox/ArcInner.

I felt this is a fairly small addition, filling in a gap (when compared to Box) so it wouldn't need an RFC. The motivation is primarily for FFI.

(I'll create an issue and update a PR with the issue number if reviewers agree with the change in principle Edit: done #37197)

Edit: This was initially {into,from}_raw but concerns were raised about the possible footgun if mixed with the methods of the same name of Box.

Edit: This was went from {into,from}_raw to {into,from}_inner_raw then back to {into,from}_raw during review.

@rust-highfive
Copy link
Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@sfackler sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Oct 15, 2016
/// let x_ptr = Arc::into_raw(x);
/// assert_eq!(unsafe { *x_ptr }, 10);
/// ```
#[unstable(feature = "rc_raw", reason = "newly added", issue = "1")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason field is no longer required, fyi.

@sfackler
Copy link
Member

This seems reasonable to me I think - cc @rust-lang/libs.

@cristicbz
Copy link
Contributor Author

cristicbz commented Oct 15, 2016

Thanks @sfackler! I created the tracking issue #37197, updated the PR with the issue number and removed the reason field.

// `value` field from the pointer.

// Create a fake `ArcInner` to work with valid pointers only.
let invalid_arc_inner: ArcInner<T> = mem::zeroed();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like an offset_of! macro could be used instead of an arbitrary value here?

Copy link
Contributor Author

@cristicbz cristicbz Oct 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that's a common pattern in C++, but isn't that UB? Just like ptr.offset() is UB if going more than one past end of allocation? That's what the comment about valid pointers was meant to express. I could still wrap it in a macro though, but I'm not sure where it'd go if I wanted to share it between arc.rs and rc.rs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a safe version of offset_of! which doesn't rely on UB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amanieu thanks for the link! That's pretty much what I'm doing here (bar the clever check against Deref), do you think I should add your macro and use it for clarity and DRY though?

BTW: isn't this at risk of overflowing though: val.$field as *const _ as isize. I do the casts to usize, compute the difference and then cast to isize, since I thought the absolute address could well be greater than isize::MAX. Or is there some kind of guarantee that's not the case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cristicbz nice catch, you're probably right about the overflow issue, I'll fix that.

The reason I used isize is that I then pass the resulting value to .offset to recover the address of the parent object. Using .offset is better than performing arithmetic on usize since it uses an LLVM GEP instruction, which helps with alias analysis. See the container_of! macro later in the same file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I could cast to isize and use ptr.offset(-offset), good point! My only other worry with your code (and the reason I used mem::zeroed) is that, technically reads of uninitialized memory are UB as well. There's a lang lawyer question about whether &undef.field is a read of undef, but i wouldn't risk it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem::zeroed isn't much better than mem::uninitialized since many types (eg &T, Box<T>) are not allowed to have a value of zero. In any case this is irrelevant since the whole macro is optimized down to a constant by the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem::zeroed isn't much better than mem::uninitialized since many types (eg &T, Box) are not allowed to have a value of zero.

My (limited) understanding is that an invalid value (such as a null reference which would be invalid to dereference) is different in kind to a value obtained via mem::uninitialized which is undef as far as LLVM concerned. But this is over my head.

In any case this is irrelevant since the whole macro is optimized down to a constant by the compiler.

Agreed.

@alexcrichton
Copy link
Member

Seems quite nifty to me! I like how it mirrors Box and I've wanted this myself before in a few odd situations.

Let's see if this works...

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Oct 16, 2016

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@diwic
Copy link
Contributor

diwic commented Oct 16, 2016

Thumbs up from me - I already to this for Arc in RMBA and if this was in stdlib I would be less fragile w r t changes to Arc's internal layout.

@cristicbz
Copy link
Contributor Author

Thanks for the feedback @Amanieu & @alexcrichton! I moved the offset_of! code into a macro using ptr::offset instead of my usize addition. Also borrowed comments and the Deref-preventing pattern trick from @Amanieu's library (hope that's OK!).

I did keep my casts to usize to avoid overflow and kept mem::zeroed over mem::uninitialized to avoid UB potentially caused by a very pedantic understanding of 'reading undef'.

Didn't really know where to put these macros so I dropped them unexported in a liballoc/macros.rs module. Happy to move it anywhere else, of course!

pub unsafe fn from_raw(ptr: *const T) -> Self {
// To find the corresponding pointer to the `RcBox` we need to subtract the offset of the
// `value` field from the pointer.
Rc { ptr: Shared::new(ptr.offset(-offset_of_unsafe!(RcBox<T>, value)) as *mut _) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to cast the pointer to u8 before applying the offset. See my container_of! macro. You might want to add that macro to macros.rs as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh yes another reason i was doing usize arithmetic before!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(thanks for the heads up btw! :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed the name to unoffset_field_ptr since outside of intrusive containers container_of didn't seem that semantic for the operation.

@alexcrichton
Copy link
Member

This seems... overly complicated to me, but it's just an internal detail so seems fine.

@cristicbz
Copy link
Contributor Author

cristicbz commented Oct 18, 2016

@alexcrichton Yeah it did get a bit out of hand... Could simplify down to an offset_of macro + a (ptr as *const u8) .offset(-offset_of(RcBox<T>, data)) as *mut_ if you think that'd be a clearer.

@alexcrichton
Copy link
Member

Perhaps yeah, I find offset_of hard to follow here. I feel like in practice it's not UB to use null pointers and such because this is how I've always seen that macro defined in C ...

@Amanieu
Copy link
Member

Amanieu commented Oct 19, 2016

@alexcrichton A C version of the macro that dereferences a NULL pointer is still UB in C. This is why gcc defines offsetof as an alias for __builtin_offsetof, which avoids the UB.

@cristicbz
Copy link
Contributor Author

I pushed a simplified version which defines only the offset_of macro (as previously defined, using mem::zeroed and casts to usize) and inlines the (ptr as *const u8).offset(-offset_of!(RcBox<T>, value)) as *mut _) logic in {A,}Rc::from_raw implementations. I think this strikes a good balance, this was meant to be a very minimal change.

// Private macro to get the offset of a struct field in bytes from the address of the struct.
//
// This macro is identical to `offset_of!` but doesn't give a warning about unnecessary unsafe
// blocks when invoked from unsafe code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably fix this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, didn't do a last git add to that file, thanks!

@aturon
Copy link
Member

aturon commented Oct 22, 2016

@rfcbot fcp concern Footgun

I'm a little bit worried that this API is a footgun: the types and names strongly suggest that the only relevant raw representation is the T itself.

At the very least, it's a violation of the usual convention for raw, which is to give you the complete raw representation of an object. Perhaps into_inner_raw/from_inner_raw would be enough to signal what's happening?

@CryZe
Copy link
Contributor

CryZe commented Oct 22, 2016

I agree with the change to from_inner_raw. I recently wanted to create an "Arced slice", so I created a vector, turned it into a boxed slice, called Box::into_raw on that and would've called Arc::from_raw on it. Fortunately it didn't exist yet, cause that would've caused some bugs, that could be hard to find. Unfortunately this PR still doesn't provide any way to create an "Arced slice", but the additional indirection isn't too bad.

@SimonSapin
Copy link
Contributor

I agree with @aturon and @CryZe that types compatible with Box::{into,from}_raw but with different semantics (and memory allocator possibly relying on these semantics to avoid undefined behavior) is a footgun.

When I first thought of this feature, in my mind the into_raw methods returned *const ArcInner<T> and *const RcBox<T>. (Or perhaps *mut like Box::into_raw and Box::from_raw? Does it matter?)

ArcInner and RcBox would be made public, but with their fields kept private and no public API of their own. Probably one or both of them should be renamed so that their naming is consistent, but I don’t have an opinion on what name it should be.

@cristicbz
Copy link
Contributor Author

@SimonSapin If I understand your suggestion correctly, exposing ArcInner and RcBox ptrs without public fields would defeat the purpose of this change for my use case: roundtripping Rc as *T pointers through FFI (in particular this arose in an Rc<somelib_buffer_t> scenario where API methods operate on *const somelib_buffer_t and they allow a callback to free the buffer).

@aturon I can see the argument for the _inner_ variants and I'd be OK with it if that ends up being the consensus. Some counterarguments to consider:

  1. Footgun because it's incompatible with Box<T>. It's already the case that using any other pointer for Box::from_raw than one obtained using Box::into_raw is UB as the docs make clear, if nothing else because of allocator concerns. Something like Vec::from_raw_parts(Box::new(10).into_raw(), 1). Obviously the extra length parameter makes it less of a footgun, but my point is more that "Foo::from_raw is only compatible with Foo::into_raw" seems to be an already accepted rule.
  2. Foo::into_raw should return the entire representation of Foo. I'd argue that it does! Bear with me:
    • An Rc's representation is a single pointer: unlike the length in a slice's fat pointer, the ref count is not part of the Rc pointer, but is instead part of the allocation. I see the RcBox like jemalloc's allocation book-keeping information.
    • You could imagine an alternative implementation for Rc where a thread local HashMap<*const T, RefCounts> is used. This is what the semantics correspond to. The fact that the pointer->count mapping is performed via a pointer offset (i.e. member access) can be seen as an implementation detail.
    • This view also suggests that the footgun issue translates to the common "free-ing a pointer via a different allocator than the one it was malloc-d with".

All this said, I am sympathetic to the footgun argument. Just because we tell people not to pass an Box::into_raw result to an Rc::from_raw doesn't mean they won't (though the functions are unsafe so people should read the docs!).

I'd suggest into_rc_raw and from_rc_raw rather than inner to hammer home the fact that the methods are only compatible with Rc-s. Would that be acceptable?

@eternaleye
Copy link
Contributor

eternaleye commented Oct 22, 2016

@cristicbz

  • An Rc's representation is a single pointer: unlike the length in a slice's fat pointer, the ref count is not part of the Rc pointer, but is instead part of the allocation. I see the RcBox like jemalloc's allocation book-keeping information.

An Rc's representation is a single raw pointer to an RcBox<T>, not to a T. An RcBox<T> may not meet your needs (FFI), and it may be non-public, but that's not the same as it not being there - and unlike jemalloc's bookkeeping information, it really is what the pointer is to.

  • You could imagine an alternative implementation for Rc where a thread local HashMap<*const T, RefCounts> is used. This is what the semantics correspond to. The fact that the pointer->count mapping is performed via a pointer offset (i.e. member access) can be seen as an implementation detail.

You could imagine that, yes. However, that would have a different representation - a representation is part of the concrete implementation of a type, not the abstract behavior (and you're arguing that the latter being the same makes the former irrelevant).

@brson
Copy link
Contributor

brson commented Nov 3, 2016

This use case can be covered by a C API expecting *const Tand you have an Rc, right? That is, you could have a Rc pointer which you're passing ownership of one reference to into a C API. Later you're called to free that data, where you call from_raw, and it may or may not deallocate memory.

Yes. I was thinking you might as well just stash an Rc in safe code and hold ownership on behalf of the C code, but I understand wanting to give a share of the ownership to C.

That is, if you want to modify the reference count, you'd cast it back to a Rc and then back again once the operation is done.

If you have access to Rust. You can't use that to let C manage reference counts, which is more what I was thinking.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Nov 3, 2016

If you have access to Rust. You can't use that to let C manage reference counts, which is more what I was thinking.

That would require a #[no_mangle] extern "C" fn. That's pretty unprecedented in std. There's also the problem that it would either have to be generic (non-starter for being callable from C) or pick some fixed but wrong type for the Rc contents — I don't dare claim that this will be UB, but it will at least be extremely gross. Luckily, none of this is necessary, since any Rust crate can do the from_raw-clone-to_raw dance in its own #[no_mangle] extern "C" helper function if and when the C code it interacts with needs it, for whatever concrete types it deals with. (There always is some Rust code in the loop, otherwise the C code wouldn't be dealing with Rust's Rc type to begin with.)

@brson
Copy link
Contributor

brson commented Nov 3, 2016

Is *const really the correct type here? Arc's and Rc's are often filled with mutable things. And Shared, the type used to implement Rc, is for managing *mut (though weirdly actually contains a *const).

@brson
Copy link
Contributor

brson commented Nov 3, 2016

That would require a #[no_mangle] extern "C" fn

Or a pointer to the refcount.

My objections here are pretty squishy, that's its slightly different than other things, in naming and in the magicy implementation. But I understand the use case is real and we should offer some solution. I'd like an answer on *const vs *mut before accepting the fcp though.

@hanna-kruppe
Copy link
Contributor

Or a pointer to the refcount.

That may work for increasing the ref count (though I would rather not rely on random C code everywhere to correctly reproduce the atomic instruction necessary), but what about decreasing it? How do you invoke the destructor? (I just noticed that's also a problem for the "std provides a single #[no_mangle] extern "C" fn" approach but that solution was a straw man anyway.)

@alexcrichton
Copy link
Member

Yes. I was thinking you might as well just stash an Rc in safe code and hold ownership on behalf of the C code, but I understand wanting to give a share of the ownership to C.

At least with the use case I had in mio, it was perf sensitive, so stashing the pointer in safe code would have required extra storage requirements where passing ownership to C didn't require any extra data structures. In that sense I do think this'd be possible for many cases, but perhaps not all.

If you have access to Rust. You can't use that to let C manage reference counts, which is more what I was thinking.

Ah yeah that's true, but I'm not sure that C would ever expect to be able to do this? At least, I can't really think of a use case for that.

Is *const really the correct type here?

Hm yeah that's a good point, I might personally expect *mut


FWIW I slightly prefer the name into_raw as opposed to into_inner_raw like @brson, but I don't feel too strongly either way

@cristicbz
Copy link
Contributor Author

Hm yeah that's a good point, I might personally expect *mut

I changed this to *mut now, I'm not too fussed since mutability on raw ptr-s is a guideline at best :P.

My reasoning for *const T was that, mutating the value through the returned pointer breaks aliasing rules for any remaining Rc-s which expect the data they point to to be immutable. If a C function expects a non-const pointer that should be a warning sign that passing in the result of Rc::into_raw might not be correct.

FWIW I slightly prefer the name into_raw as opposed to into_inner_raw like @brson, but I don't feel too strongly either way

So do I, but I don't feel to strongly either---it was @aturon who requested it.

@aturon
Copy link
Member

aturon commented Nov 4, 2016

I'm willing to be overruled on the name. Let's just get this thing landed. :)

@alexcrichton
Copy link
Member

@brson, do you feel comfortable checking your box with an updated name?

@cristicbz cristicbz changed the title Add {into,from}_inner_raw to Rc and Arc Add {into,from}_raw to Rc and Arc Nov 5, 2016
@cristicbz
Copy link
Contributor Author

I changed this back to {into,from}_raw, let me know if there's anything else!

@cristicbz
Copy link
Contributor Author

cristicbz commented Nov 8, 2016

@aturon @alexcrichton @brson ping?

@alexcrichton
Copy link
Member

@cristicbz just waiting on @brson to check his box

@brson
Copy link
Contributor

brson commented Nov 8, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Nov 8, 2016

📌 Commit 651cf58 has been approved by brson

@cristicbz
Copy link
Contributor Author

Amazing, thanks for putting up with me and getting this merged!

@cristicbz
Copy link
Contributor Author

(also given the relative "controversy", I'd suggest relnotes may be appropriate to get another batch of input when TWiR comes around)

@bors
Copy link
Contributor

bors commented Nov 8, 2016

⌛ Testing commit 651cf58 with merge 0491a23...

bors added a commit that referenced this pull request Nov 8, 2016
Add `{into,from}_raw` to Rc and Arc

These methods convert to and from a `*const T` for `Rc` and `Arc` similar to the way they work on `Box`. The only slight complication is that `from_raw` needs to offset the pointer back to find the beginning of the `RcBox`/`ArcInner`.

I felt this is a fairly small addition, filling in a gap (when compared to `Box`) so it wouldn't need an RFC. The motivation is primarily for FFI.

(I'll create an issue and update a PR with the issue number if reviewers agree with the change in principle **Edit: done #37197**)

~~Edit: This was initially `{into,from}_raw` but concerns were raised about the possible footgun if mixed with the methods of the same name of `Box`.~~

Edit: This was went from `{into,from}_raw` to `{into,from}_inner_raw` then back to `{into,from}_raw` during review.
@aturon aturon added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 8, 2016
@aturon
Copy link
Member

aturon commented Nov 8, 2016

@cristicbz

Amazing, thanks for putting up with me and getting this merged!

No, thank you for sticking with it despite the lengthy process! Hope to see lots more to come.

@bors bors merged commit 651cf58 into rust-lang:master Nov 8, 2016
@cristicbz cristicbz deleted the rust-rc-into-raw branch November 15, 2016 22:24
@rkjnsn
Copy link
Contributor

rkjnsn commented Jan 9, 2017

I find @cristicbz's reasoning for initially going with *const T convincing, and am not sure how we ended up with *mut T. @brson says "Arc's and Rc's are often filled with mutable things", but as far I as can tell, this is only true when T has interior mutability of some sort. (The exception is get_mut, which is only usable if no other references exist.)

($container:path, $field:ident) => {{
// Make sure the field actually exists. This line ensures that a compile-time error is
// generated if $field is accessed through a Deref impl.
let $container { $field : _, .. };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could've used ref $field to avoid using field access below (came here from the tracking issue which I found from the docs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.