New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement `Arc`/`Rc` raw pointer conversions for `?Sized` #44073

Merged
merged 1 commit into from Sep 17, 2017

Conversation

Projects
None yet
@murarth
Contributor

murarth commented Aug 24, 2017

  • Add T: ?Sized bound to {Arc,Rc}::{from_raw,into_raw}
@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Aug 24, 2017

Collaborator

r? @BurntSushi

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Aug 24, 2017

r? @BurntSushi

(rust_highfive has picked a reviewer for you, use r? to override)

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Aug 24, 2017

Contributor

I believe this change is totally backward compatible and it makes sense with the recently merged From impls to build Rc<str> and Rc<[T]> from &str and &[T]. Users will now be enabled to make more use of Arc/Rc<T> with unsized T and this change to from_raw/into_raw will help with that.

(It was actually a comment on another project referencing that PR which made me aware that from_raw/into_raw were implemented for T: Sized only.)

Contributor

murarth commented Aug 24, 2017

I believe this change is totally backward compatible and it makes sense with the recently merged From impls to build Rc<str> and Rc<[T]> from &str and &[T]. Users will now be enabled to make more use of Arc/Rc<T> with unsized T and this change to from_raw/into_raw will help with that.

(It was actually a comment on another project referencing that PR which made me aware that from_raw/into_raw were implemented for T: Sized only.)

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Aug 28, 2017

Member

cc @rust-lang/libs It seems like this is a good thing to have, but I need help reviewing the actual implementation. Could someone describe at a high level what's going on in the from_raw implementations? (Perhaps adding a corresponding comment in the code would be helpful.)

Member

BurntSushi commented Aug 28, 2017

cc @rust-lang/libs It seems like this is a good thing to have, but I need help reviewing the actual implementation. Could someone describe at a high level what's going on in the from_raw implementations? (Perhaps adding a corresponding comment in the code would be helpful.)

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Aug 28, 2017

Contributor

@BurntSushi: The implementation of from_raw works as follows:

First, the definition of struct RcBox<T> looks like this (and ArcInner<T> is basically the same):

struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

So, the implementation works like this:

  1. Casts the input *const T to create fake_ptr, a *mut RcBox<T>. This is done so that, for unsized types, we retain the fat pointer's non-data pointer. (This is most important for trait objects as the "vtable" pointer also contains data about alignment, which is required for the compiler to accurately find the offset of the value: T field in RcBox<T>.
  2. The "data" pointer of fake_ptr is set to a value equal to the type's alignment. This is a hack I've seen elsewhere. It really just needs to be a non-null value (dereferencing a null pointer is undefined behavior) that correctly aligns the RcBox<T> pointer in memory (unaligned memory access may also be undefined?).
    The code isn't actually dereferencing the pointer, of course. The "data" pointer is only altered to avoid the possibility of overflowing the pointer. Should the input pointer somehow be a value less than two usizes from usize::MAX, it would overflow in calculating the offset to value: T.
    The original implementation (for sized T) didn't run into this issue because the implementation of the private offset_of macro declared an RcBox<T> value on the stack to calculate offset. However, it's not possible to declare RcBox<T> for unsized T on the stack and, besides, the compiler needs the "vtable" pointer at runtime in order to accurately determine the offset.
  3. Just as the old implementation did, find the offset of value: T and apply this in reverse to get the true offset of *mut RcBox<T> from the input *const T. Because it uses fake_ptr to find the offset, the compiler will transparently use the original "vtable" pointer of a trait object to determine the true position of that field.

All of the weird trickery is used so that the same code that does the right thing for a trait object T will also work for a slice object or a Sized T.

Contributor

murarth commented Aug 28, 2017

@BurntSushi: The implementation of from_raw works as follows:

First, the definition of struct RcBox<T> looks like this (and ArcInner<T> is basically the same):

struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

So, the implementation works like this:

  1. Casts the input *const T to create fake_ptr, a *mut RcBox<T>. This is done so that, for unsized types, we retain the fat pointer's non-data pointer. (This is most important for trait objects as the "vtable" pointer also contains data about alignment, which is required for the compiler to accurately find the offset of the value: T field in RcBox<T>.
  2. The "data" pointer of fake_ptr is set to a value equal to the type's alignment. This is a hack I've seen elsewhere. It really just needs to be a non-null value (dereferencing a null pointer is undefined behavior) that correctly aligns the RcBox<T> pointer in memory (unaligned memory access may also be undefined?).
    The code isn't actually dereferencing the pointer, of course. The "data" pointer is only altered to avoid the possibility of overflowing the pointer. Should the input pointer somehow be a value less than two usizes from usize::MAX, it would overflow in calculating the offset to value: T.
    The original implementation (for sized T) didn't run into this issue because the implementation of the private offset_of macro declared an RcBox<T> value on the stack to calculate offset. However, it's not possible to declare RcBox<T> for unsized T on the stack and, besides, the compiler needs the "vtable" pointer at runtime in order to accurately determine the offset.
  3. Just as the old implementation did, find the offset of value: T and apply this in reverse to get the true offset of *mut RcBox<T> from the input *const T. Because it uses fake_ptr to find the offset, the compiler will transparently use the original "vtable" pointer of a trait object to determine the true position of that field.

All of the weird trickery is used so that the same code that does the right thing for a trait object T will also work for a slice object or a Sized T.

@cristicbz

This comment has been minimized.

Show comment
Hide comment
@cristicbz

cristicbz Aug 30, 2017

Contributor

This was discussed in #37197 . We didn't implement it for ?Sized types on purpose, because we couldn't figure out a way to do it without deref-ing an invalid pointer (which seems to be UB according to http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds).

I'm pretty convinced at this point that the correct way to do this is to have a built-in offsetof like C++ does. Though if you find a way of doing this without deref-ing an invalid pointer, I'd be super happy!

Contributor

cristicbz commented Aug 30, 2017

This was discussed in #37197 . We didn't implement it for ?Sized types on purpose, because we couldn't figure out a way to do it without deref-ing an invalid pointer (which seems to be UB according to http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds).

I'm pretty convinced at this point that the correct way to do this is to have a built-in offsetof like C++ does. Though if you find a way of doing this without deref-ing an invalid pointer, I'd be super happy!

@cristicbz

This comment has been minimized.

Show comment
Hide comment
@cristicbz

cristicbz Aug 30, 2017

Contributor

cc @Amanieu (they came up with a DST impl before as well, but I don't think anything fundamental has changed)

Contributor

cristicbz commented Aug 30, 2017

cc @Amanieu (they came up with a DST impl before as well, but I don't think anything fundamental has changed)

@Amanieu

This comment has been minimized.

Show comment
Hide comment
@Amanieu

Amanieu Aug 30, 2017

Contributor

Here's a link to my implementation for reference. It is still technically UB since it derefs an invalid pointer, but it works well enough in practice until we get proper support for offsetof.

Contributor

Amanieu commented Aug 30, 2017

Here's a link to my implementation for reference. It is still technically UB since it derefs an invalid pointer, but it works well enough in practice until we get proper support for offsetof.

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Aug 30, 2017

Contributor

Well, shoot. I didn't realize that getting an offset from an invalid pointer was technically a dereference and therefore undefined behavior.

So, it looks like a compiler-supported offset_of may be the only path forward. Is there anything holding that back from being implemented?

Contributor

murarth commented Aug 30, 2017

Well, shoot. I didn't realize that getting an offset from an invalid pointer was technically a dereference and therefore undefined behavior.

So, it looks like a compiler-supported offset_of may be the only path forward. Is there anything holding that back from being implemented?

@carols10cents

This comment has been minimized.

Show comment
Hide comment
@carols10cents

carols10cents Sep 11, 2017

Member

So, it looks like a compiler-supported offset_of may be the only path forward. Is there anything holding that back from being implemented?

@BurntSushi @cristicbz @rust-lang/libs any of yinz have an answer to @murarth's question? ^^

Member

carols10cents commented Sep 11, 2017

So, it looks like a compiler-supported offset_of may be the only path forward. Is there anything holding that back from being implemented?

@BurntSushi @cristicbz @rust-lang/libs any of yinz have an answer to @murarth's question? ^^

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Sep 11, 2017

Member

I'm out of depth on this one. I've assigned it to @alexcrichton

Member

BurntSushi commented Sep 11, 2017

I'm out of depth on this one. I've assigned it to @alexcrichton

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 11, 2017

Member

@murarth ah yeah unfortunately I think that may still be holding this back. That being said though I don't think anything is blocking an implementation of offset_of as an intrinsic! If you're curious to implement that, I may be able to help out at least getting the intrinsic definition up and going :)

Member

alexcrichton commented Sep 11, 2017

@murarth ah yeah unfortunately I think that may still be holding this back. That being said though I don't think anything is blocking an implementation of offset_of as an intrinsic! If you're curious to implement that, I may be able to help out at least getting the intrinsic definition up and going :)

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 12, 2017

Contributor

@alexcrichton: I did run into a few issues trying to implement an offset_of intrinsic. The biggest issue I recall is that the intrinsic implementation requires pulling the literal string value (for the field name) out of its argument and fails if it receives anything else, including a parameter to a const fn. Unless I'm missing something, I think const generics may be required for a sane implementation.

Contributor

murarth commented Sep 12, 2017

@alexcrichton: I did run into a few issues trying to implement an offset_of intrinsic. The biggest issue I recall is that the intrinsic implementation requires pulling the literal string value (for the field name) out of its argument and fails if it receives anything else, including a parameter to a const fn. Unless I'm missing something, I think const generics may be required for a sane implementation.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 12, 2017

Member

That seems fine to me as an implementation detail for now, but an alternate route could be to use a custom macro like asm! with a custom AST node

Member

alexcrichton commented Sep 12, 2017

That seems fine to me as an implementation detail for now, but an alternate route could be to use a custom macro like asm! with a custom AST node

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 12, 2017

Contributor

If using a custom AST node is more flexible, that's probably a better idea than a weird intrinsic thing. Looking back at my attempted implementation, I realize the problem I ran into:

In order to have offset_of that works in any situation, there need to be two flavors: Classic offset_of, taking a type parameter and field name and yielding a compile-time constant; and what I had termed offset_of_val, which operates on trait objects (or a type containing a trait object, such as in this situation, Rc<T> with T: ?Sized).

Use of offset_of_val requires a *const Trait at runtime, using the size/alignment data in the vtable. This is where the intrinsic implementation strategy got tricky. offset_of_val needs to be evaluated at runtime, but it still needs the field name to be a compile-time constant. By this point in the intrinsic evaluation code, argument values have been reduced to LLVM ValueRef and trying to get a compile-time &str out of that seemed like a bad/impossible idea.

So, TL;DR: I'm not familiar with how AST expressions eventually become code, but I will look into it and see what I can do.

Contributor

murarth commented Sep 12, 2017

If using a custom AST node is more flexible, that's probably a better idea than a weird intrinsic thing. Looking back at my attempted implementation, I realize the problem I ran into:

In order to have offset_of that works in any situation, there need to be two flavors: Classic offset_of, taking a type parameter and field name and yielding a compile-time constant; and what I had termed offset_of_val, which operates on trait objects (or a type containing a trait object, such as in this situation, Rc<T> with T: ?Sized).

Use of offset_of_val requires a *const Trait at runtime, using the size/alignment data in the vtable. This is where the intrinsic implementation strategy got tricky. offset_of_val needs to be evaluated at runtime, but it still needs the field name to be a compile-time constant. By this point in the intrinsic evaluation code, argument values have been reduced to LLVM ValueRef and trying to get a compile-time &str out of that seemed like a bad/impossible idea.

So, TL;DR: I'm not familiar with how AST expressions eventually become code, but I will look into it and see what I can do.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 12, 2017

Member

Yeah I think using a custom AST node should help you solve that problem, for example the asm! macro expands to a custom AST variant that's not exposed syntactically in the language which then gets plumbed all the way through HAIR/HIR/MIR down to trans where all the various operands should be preserved to do w/e you need. The macro could then require a type argument and then an identifier for the field where the field is persisted all the way through to the end of compilation. (there could also be a variant that takes an expression perhaps?)

Another route would be to perhaps poke around with constant evaluation with an intrinsic to try to constant evaluate the field argument and that should be able to give you a string I think?

Member

alexcrichton commented Sep 12, 2017

Yeah I think using a custom AST node should help you solve that problem, for example the asm! macro expands to a custom AST variant that's not exposed syntactically in the language which then gets plumbed all the way through HAIR/HIR/MIR down to trans where all the various operands should be preserved to do w/e you need. The macro could then require a type argument and then an identifier for the field where the field is persisted all the way through to the end of compilation. (there could also be a variant that takes an expression perhaps?)

Another route would be to perhaps poke around with constant evaluation with an intrinsic to try to constant evaluate the field argument and that should be able to give you a string I think?

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 15, 2017

Contributor

@alexcrichton: I've run into an issue implementing offset_of as an asm!-like macro. When attempting to evaluate ExprKind::OffsetOf as a constant expression (here), a lifetime error arises, apparently from the use of a TyCtxt<'a, 'tcx, 'tcx> (note the double 'tcx) by LayoutCx.

error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
  --> src/librustc_mir/build/expr/as_constant.rs:35:15
   |
35 |             = expr;
   |               ^^^^
   |
note: first, the lifetime cannot outlive the lifetime 'tcx as defined on the impl at 22:1...
  --> src/librustc_mir/build/expr/as_constant.rs:22:1
   |
22 | / impl<'a, 'gcx, 'tcx> Builder<'a, 'gcx, 'tcx> {
23 | |     /// Compile `expr`, yielding a compile-time constant. Assumes that
24 | |     /// `expr` is a valid compile-time constant!
25 | |     pub fn as_constant<M>(&mut self, expr: M) -> Constant<'tcx>
...  |
70 | |     }
71 | | }
   | |_^
note: ...so that expression is assignable (expected hair::Expr<'_>, found hair::Expr<'tcx>)
  --> src/librustc_mir/build/expr/as_constant.rs:35:15
   |
35 |             = expr;
   |               ^^^^
note: but, the lifetime must be valid for the lifetime 'gcx as defined on the impl at 22:1...
  --> src/librustc_mir/build/expr/as_constant.rs:22:1
   |
22 | / impl<'a, 'gcx, 'tcx> Builder<'a, 'gcx, 'tcx> {
23 | |     /// Compile `expr`, yielding a compile-time constant. Assumes that
24 | |     /// `expr` is a valid compile-time constant!
25 | |     pub fn as_constant<M>(&mut self, expr: M) -> Constant<'tcx>
...  |
70 | |     }
71 | | }
   | |_^
note: ...so that types are compatible (expected &build::Builder<'_, '_, '_>, found &build::Builder<'a, 'gcx, 'tcx>)
  --> src/librustc_mir/build/expr/as_constant.rs:42:32
   |
42 |                 let hir = self.hir();
   |                                ^^^

Trying to resolve this by adding a 'gcx parameter to LayoutCx sets off a cascade of new 'gcx parameters, eventually running into another lifetime error that I'm having trouble resolving, in librustc/infer/mod.rs (here):

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
   --> src/librustc/infer/mod.rs:501:14
    |
501 |         self.infer_ctxt().enter(|infcx| {
    |              ^^^^^^^^^^
    |
note: first, the lifetime cannot outlive the lifetime 'a as defined on the impl at 473:1...
   --> src/librustc/infer/mod.rs:473:1
    |
473 | / impl<'a, 'gcx, 'tcx> TyCtxt<'a, 'gcx, 'tcx> {
474 | |     /// Currently, higher-ranked type bounds inhibit normalization. Therefore,
475 | |     /// each time we erase them in translation, we need to normalize
476 | |     /// the contents.
...   |
526 | |     }
527 | | }
    | |_^
note: ...so that types are compatible (expected ty::context::TyCtxt<'_, '_, '_>, found ty::context::TyCtxt<'a, 'gcx, 'tcx>)
   --> src/librustc/infer/mod.rs:501:14
    |
501 |         self.infer_ctxt().enter(|infcx| {
    |              ^^^^^^^^^^
note: but, the lifetime must be valid for the lifetime 'tcx as defined on the impl at 473:1...
   --> src/librustc/infer/mod.rs:473:1
    |
473 | / impl<'a, 'gcx, 'tcx> TyCtxt<'a, 'gcx, 'tcx> {
474 | |     /// Currently, higher-ranked type bounds inhibit normalization. Therefore,
475 | |     /// each time we erase them in translation, we need to normalize
476 | |     /// the contents.
...   |
526 | |     }
527 | | }
    | |_^
note: ...so that expression is assignable (expected &infer::InferCtxt<'_, '_, 'tcx>, found &infer::InferCtxt<'_, '_, '_>)
   --> src/librustc/infer/mod.rs:502:35
    |
502 |             value.trans_normalize(&infcx, param_env)
    |                                   ^^^^^^
Contributor

murarth commented Sep 15, 2017

@alexcrichton: I've run into an issue implementing offset_of as an asm!-like macro. When attempting to evaluate ExprKind::OffsetOf as a constant expression (here), a lifetime error arises, apparently from the use of a TyCtxt<'a, 'tcx, 'tcx> (note the double 'tcx) by LayoutCx.

error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
  --> src/librustc_mir/build/expr/as_constant.rs:35:15
   |
35 |             = expr;
   |               ^^^^
   |
note: first, the lifetime cannot outlive the lifetime 'tcx as defined on the impl at 22:1...
  --> src/librustc_mir/build/expr/as_constant.rs:22:1
   |
22 | / impl<'a, 'gcx, 'tcx> Builder<'a, 'gcx, 'tcx> {
23 | |     /// Compile `expr`, yielding a compile-time constant. Assumes that
24 | |     /// `expr` is a valid compile-time constant!
25 | |     pub fn as_constant<M>(&mut self, expr: M) -> Constant<'tcx>
...  |
70 | |     }
71 | | }
   | |_^
note: ...so that expression is assignable (expected hair::Expr<'_>, found hair::Expr<'tcx>)
  --> src/librustc_mir/build/expr/as_constant.rs:35:15
   |
35 |             = expr;
   |               ^^^^
note: but, the lifetime must be valid for the lifetime 'gcx as defined on the impl at 22:1...
  --> src/librustc_mir/build/expr/as_constant.rs:22:1
   |
22 | / impl<'a, 'gcx, 'tcx> Builder<'a, 'gcx, 'tcx> {
23 | |     /// Compile `expr`, yielding a compile-time constant. Assumes that
24 | |     /// `expr` is a valid compile-time constant!
25 | |     pub fn as_constant<M>(&mut self, expr: M) -> Constant<'tcx>
...  |
70 | |     }
71 | | }
   | |_^
note: ...so that types are compatible (expected &build::Builder<'_, '_, '_>, found &build::Builder<'a, 'gcx, 'tcx>)
  --> src/librustc_mir/build/expr/as_constant.rs:42:32
   |
42 |                 let hir = self.hir();
   |                                ^^^

Trying to resolve this by adding a 'gcx parameter to LayoutCx sets off a cascade of new 'gcx parameters, eventually running into another lifetime error that I'm having trouble resolving, in librustc/infer/mod.rs (here):

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
   --> src/librustc/infer/mod.rs:501:14
    |
501 |         self.infer_ctxt().enter(|infcx| {
    |              ^^^^^^^^^^
    |
note: first, the lifetime cannot outlive the lifetime 'a as defined on the impl at 473:1...
   --> src/librustc/infer/mod.rs:473:1
    |
473 | / impl<'a, 'gcx, 'tcx> TyCtxt<'a, 'gcx, 'tcx> {
474 | |     /// Currently, higher-ranked type bounds inhibit normalization. Therefore,
475 | |     /// each time we erase them in translation, we need to normalize
476 | |     /// the contents.
...   |
526 | |     }
527 | | }
    | |_^
note: ...so that types are compatible (expected ty::context::TyCtxt<'_, '_, '_>, found ty::context::TyCtxt<'a, 'gcx, 'tcx>)
   --> src/librustc/infer/mod.rs:501:14
    |
501 |         self.infer_ctxt().enter(|infcx| {
    |              ^^^^^^^^^^
note: but, the lifetime must be valid for the lifetime 'tcx as defined on the impl at 473:1...
   --> src/librustc/infer/mod.rs:473:1
    |
473 | / impl<'a, 'gcx, 'tcx> TyCtxt<'a, 'gcx, 'tcx> {
474 | |     /// Currently, higher-ranked type bounds inhibit normalization. Therefore,
475 | |     /// each time we erase them in translation, we need to normalize
476 | |     /// the contents.
...   |
526 | |     }
527 | | }
    | |_^
note: ...so that expression is assignable (expected &infer::InferCtxt<'_, '_, 'tcx>, found &infer::InferCtxt<'_, '_, '_>)
   --> src/librustc/infer/mod.rs:502:35
    |
502 |             value.trans_normalize(&infcx, param_env)
    |                                   ^^^^^^
@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 16, 2017

Member

Ah unfortunately I may not be of much help there, but maybe @eddyb can help?

Member

alexcrichton commented Sep 16, 2017

Ah unfortunately I may not be of much help there, but maybe @eddyb can help?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 16, 2017

Member

You can use pointers inside an union instead of invalid ones, to compute offsetof.

Member

eddyb commented Sep 16, 2017

You can use pointers inside an union instead of invalid ones, to compute offsetof.

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 16, 2017

Contributor

@eddyb: I'm not sure what you're talking about. Can you give me an example? And would it work with unsized types, e.g. Rc<Trait>?

Contributor

murarth commented Sep 16, 2017

@eddyb: I'm not sure what you're talking about. Can you give me an example? And would it work with unsized types, e.g. Rc<Trait>?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 16, 2017

Member

@murarth So the second thing people usually go to when &(*(0 as *T)).field doesn't work is using an uninitialized value of type T on the stack. That's tricky, so one can do the same with unions:

union MaybeUninitialized<T> {
    value: T,
    dummy: ()
}
let uninit: MaybeUninitialized<RcBox<T>> = MaybeUninitialized { dummy: () }

I assume you had a solution for unsized T (for your example it'd be Trait I think?), with the invalid pointer strategy, right? I'd probably just use another union to overwrite the data pointer:

union FatThin<T: ?Sized> {
    fat: *mut T,
    thin: *mut ()
}
let mut combined = FatThin { fat: ptr as *mut RcBox<T> };
combined.thin = &uninit.value as *const _ as *const ();
let base_ptr = &uninit.value as *const _ as usize;
let data_ptr = &(*combined.fat).data as *const _ as usize;
let offset = data_ptr - base_ptr;

Oh, I see, writing this, uninit can't be unsized, and if starts out sized the resulting data pointer has to end up inside the allocation still. So while doable, this a bit of a stretch with unsized data.

OTOH, how were you going to pass the fat metadata to the compiler in an offset_of! macro?
If you weren't, then this union strategy should work just fine (assuming you use something like RcBox<()> followed by aligning the offset of .data up to align_of_val(&*ptr)).

Member

eddyb commented Sep 16, 2017

@murarth So the second thing people usually go to when &(*(0 as *T)).field doesn't work is using an uninitialized value of type T on the stack. That's tricky, so one can do the same with unions:

union MaybeUninitialized<T> {
    value: T,
    dummy: ()
}
let uninit: MaybeUninitialized<RcBox<T>> = MaybeUninitialized { dummy: () }

I assume you had a solution for unsized T (for your example it'd be Trait I think?), with the invalid pointer strategy, right? I'd probably just use another union to overwrite the data pointer:

union FatThin<T: ?Sized> {
    fat: *mut T,
    thin: *mut ()
}
let mut combined = FatThin { fat: ptr as *mut RcBox<T> };
combined.thin = &uninit.value as *const _ as *const ();
let base_ptr = &uninit.value as *const _ as usize;
let data_ptr = &(*combined.fat).data as *const _ as usize;
let offset = data_ptr - base_ptr;

Oh, I see, writing this, uninit can't be unsized, and if starts out sized the resulting data pointer has to end up inside the allocation still. So while doable, this a bit of a stretch with unsized data.

OTOH, how were you going to pass the fat metadata to the compiler in an offset_of! macro?
If you weren't, then this union strategy should work just fine (assuming you use something like RcBox<()> followed by aligning the offset of .data up to align_of_val(&*ptr)).

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 16, 2017

Contributor

offset_of! won't accept an expression. I intend to implement another macro, offset_of_val!, which will accept an expression as its first argument and yield a runtime-computed value. For now, I'm just trying to get the simpler offset_of! macro functionality working.

Manually calculating offset using align_of_val might work, but was discouraged as an implementation strategy in my From<&[T]> for Rc<[T]> PR.

Contributor

murarth commented Sep 16, 2017

offset_of! won't accept an expression. I intend to implement another macro, offset_of_val!, which will accept an expression as its first argument and yield a runtime-computed value. For now, I'm just trying to get the simpler offset_of! macro functionality working.

Manually calculating offset using align_of_val might work, but was discouraged as an implementation strategy in my From<&[T]> for Rc<[T]> PR.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 16, 2017

Member

So you wanted to implemented a safe equivalent of getting the offset by subtracting the ptr argument from &(ptr as *mut RcBox<T>).data?
Also, why was that implementation strategy discouraged? It's not like anything supporting unsized types can actually have a non-deterministic layout, otherwise the unsizing wouldn't always work.

Member

eddyb commented Sep 16, 2017

So you wanted to implemented a safe equivalent of getting the offset by subtracting the ptr argument from &(ptr as *mut RcBox<T>).data?
Also, why was that implementation strategy discouraged? It's not like anything supporting unsized types can actually have a non-deterministic layout, otherwise the unsizing wouldn't always work.

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 16, 2017

Contributor

If it's safe to assume now and into the foreseeable future that data will always be at size_of<usize>*2 or align_of_val(&*ptr) (whichever's greater), then I'm fine using that to calculate the position of *mut RcBox<T> and avoid the invalid pointer dereference (and offset_of!) altogether.

@alexcrichton: Would that be acceptable for the implementation of this PR?

Contributor

murarth commented Sep 16, 2017

If it's safe to assume now and into the foreseeable future that data will always be at size_of<usize>*2 or align_of_val(&*ptr) (whichever's greater), then I'm fine using that to calculate the position of *mut RcBox<T> and avoid the invalid pointer dereference (and offset_of!) altogether.

@alexcrichton: Would that be acceptable for the implementation of this PR?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 16, 2017

Member

What I meant is more like aligning size_of::<usize>() * 2 to align_of_val(&*ptr), although since the former happens to always be a power of two, you're right this is the maximum of the two.

Also you could use size_of::<RcBox<()>>() instead of assuming its contents (for the unaligned size), or even getting the offset of data from RcBox { ..., data: () } if it's not too ugly.

Member

eddyb commented Sep 16, 2017

What I meant is more like aligning size_of::<usize>() * 2 to align_of_val(&*ptr), although since the former happens to always be a power of two, you're right this is the maximum of the two.

Also you could use size_of::<RcBox<()>>() instead of assuming its contents (for the unaligned size), or even getting the offset of data from RcBox { ..., data: () } if it's not too ugly.

@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 16, 2017

Contributor

Yeah, the power of two thing is what I was basing that on, but using size_of::<RcBox<()>> and a generic alignment function works, too.

Contributor

murarth commented Sep 16, 2017

Yeah, the power of two thing is what I was basing that on, but using size_of::<RcBox<()>> and a generic alignment function works, too.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 16, 2017

Member

This all sounds reasonable to me, thanks for the tips @eddyb!

Member

alexcrichton commented Sep 16, 2017

This all sounds reasonable to me, thanks for the tips @eddyb!

Implement `Arc`/`Rc` raw pointer conversions for `?Sized`
* Add `T: ?Sized` bound to {`Arc`,`Rc`}::{`from_raw`,`into_raw`}
@murarth

This comment has been minimized.

Show comment
Hide comment
@murarth

murarth Sep 17, 2017

Contributor

@alexcrichton: Tests passed on the new implementation.

Contributor

murarth commented Sep 17, 2017

@alexcrichton: Tests passed on the new implementation.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 17, 2017

Member

@bors: r+

A great trick!

Member

alexcrichton commented Sep 17, 2017

@bors: r+

A great trick!

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 17, 2017

Contributor

📌 Commit 1cbb2b3 has been approved by alexcrichton

Contributor

bors commented Sep 17, 2017

📌 Commit 1cbb2b3 has been approved by alexcrichton

TimNN added a commit to TimNN/rust that referenced this pull request Sep 17, 2017

Rollup merge of rust-lang#44073 - murarth:rc-into-raw-unsized, r=alex…
…crichton

Implement `Arc`/`Rc` raw pointer conversions for `?Sized`

* Add `T: ?Sized` bound to {`Arc`,`Rc`}::{`from_raw`,`into_raw`}

@TimNN TimNN referenced this pull request Sep 17, 2017

Closed

Rollup of 19 pull requests #44649

bors added a commit that referenced this pull request Sep 17, 2017

Auto merge of #44649 - TimNN:rollup, r=TimNN
Rollup of 19 pull requests

- Successful merges: #44073, #44088, #44381, #44397, #44509, #44515, #44533, #44549, #44553, #44562, #44567, #44573, #44595, #44604, #44617, #44622, #44630, #44639, #44647
- Failed merges:
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 17, 2017

Contributor

⌛️ Testing commit 1cbb2b3 with merge 1d2511e...

Contributor

bors commented Sep 17, 2017

⌛️ Testing commit 1cbb2b3 with merge 1d2511e...

bors added a commit that referenced this pull request Sep 17, 2017

Auto merge of #44073 - murarth:rc-into-raw-unsized, r=alexcrichton
Implement `Arc`/`Rc` raw pointer conversions for `?Sized`

* Add `T: ?Sized` bound to {`Arc`,`Rc`}::{`from_raw`,`into_raw`}

TimNN added a commit to TimNN/rust that referenced this pull request Sep 17, 2017

Rollup merge of rust-lang#44073 - murarth:rc-into-raw-unsized, r=alex…
…crichton

Implement `Arc`/`Rc` raw pointer conversions for `?Sized`

* Add `T: ?Sized` bound to {`Arc`,`Rc`}::{`from_raw`,`into_raw`}

@TimNN TimNN referenced this pull request Sep 17, 2017

Closed

Rollup of 18 pull requests #44652

@TimNN

This comment has been minimized.

Show comment
Hide comment
@TimNN

TimNN Sep 17, 2017

Contributor

@bors retry

  • prioritising rollup
Contributor

TimNN commented Sep 17, 2017

@bors retry

  • prioritising rollup

bors added a commit that referenced this pull request Sep 17, 2017

Auto merge of #44652 - TimNN:rollup, r=TimNN
Rollup of 18 pull requests

- Successful merges: #44073, #44088, #44381, #44397, #44509, #44515, #44533, #44549, #44553, #44562, #44567, #44595, #44604, #44617, #44622, #44630, #44639, #44647
- Failed merges:
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 17, 2017

Contributor

⌛️ Testing commit 1cbb2b3 with merge 278c86f...

Contributor

bors commented Sep 17, 2017

⌛️ Testing commit 1cbb2b3 with merge 278c86f...

bors added a commit that referenced this pull request Sep 17, 2017

Auto merge of #44073 - murarth:rc-into-raw-unsized, r=alexcrichton
Implement `Arc`/`Rc` raw pointer conversions for `?Sized`

* Add `T: ?Sized` bound to {`Arc`,`Rc`}::{`from_raw`,`into_raw`}

TimNN added a commit to TimNN/rust that referenced this pull request Sep 17, 2017

Rollup merge of rust-lang#44073 - murarth:rc-into-raw-unsized, r=alex…
…crichton

Implement `Arc`/`Rc` raw pointer conversions for `?Sized`

* Add `T: ?Sized` bound to {`Arc`,`Rc`}::{`from_raw`,`into_raw`}

@TimNN TimNN referenced this pull request Sep 17, 2017

Merged

Rollup of 17 pull requests #44654

@TimNN

This comment has been minimized.

Show comment
Hide comment
@TimNN

TimNN Sep 17, 2017

Contributor

@bors retry

  • prioritising rollup
Contributor

TimNN commented Sep 17, 2017

@bors retry

  • prioritising rollup

bors added a commit that referenced this pull request Sep 17, 2017

Auto merge of #44654 - TimNN:rollup, r=TimNN
Rollup of 17 pull requests

- Successful merges: #44073, #44088, #44381, #44397, #44509, #44533, #44549, #44553, #44562, #44567, #44595, #44604, #44617, #44622, #44630, #44639, #44647
- Failed merges:
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 17, 2017

Contributor

⌛️ Testing commit 1cbb2b3 with merge 1cdd689...

Contributor

bors commented Sep 17, 2017

⌛️ Testing commit 1cbb2b3 with merge 1cdd689...

@bors bors merged commit 1cbb2b3 into rust-lang:master Sep 17, 2017

1 of 2 checks passed

homu Testing commit 1cbb2b3a8827a2c7d37c022da1e706229b0e23af with merge 1cdd68922d143c6d1f18f66572251b7078e9e850...
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@murarth murarth deleted the murarth:rc-into-raw-unsized branch Sep 17, 2017

@cristicbz

This comment has been minimized.

Show comment
Hide comment
@cristicbz

cristicbz Sep 17, 2017

Contributor

@eddyb @murarth

+        // Align the unsized value to the end of the ArcInner.
+        // Because it is ?Sized, it will always be the last field in memory.
+        let align = align_of_val(&*ptr);
+        let layout = Layout::new::<ArcInner<()>>();
+        let offset = (layout.size() + layout.padding_needed_for(align)) as isize;

Hmmm, is this actually correct if T is not unsized and layout randomization/optimization is turned on? I get why it's the last field when actually unsized, but if T is sized, after monomorphisation, is it not possible that data ends up somewhere else?

Contributor

cristicbz commented Sep 17, 2017

@eddyb @murarth

+        // Align the unsized value to the end of the ArcInner.
+        // Because it is ?Sized, it will always be the last field in memory.
+        let align = align_of_val(&*ptr);
+        let layout = Layout::new::<ArcInner<()>>();
+        let offset = (layout.size() + layout.padding_needed_for(align)) as isize;

Hmmm, is this actually correct if T is not unsized and layout randomization/optimization is turned on? I get why it's the last field when actually unsized, but if T is sized, after monomorphisation, is it not possible that data ends up somewhere else?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 17, 2017

Member

@cristicbz a pointer to Struct<T> can be unsized to a pointer to Struct<Trait> iff T: Trait, at which point the data can't be reordered, so all sized layouts have to be compatible with unsized ones.

Member

eddyb commented Sep 17, 2017

@cristicbz a pointer to Struct<T> can be unsized to a pointer to Struct<Trait> iff T: Trait, at which point the data can't be reordered, so all sized layouts have to be compatible with unsized ones.

@cristicbz

This comment has been minimized.

Show comment
Hide comment
@cristicbz

cristicbz Sep 17, 2017

Contributor

That's interesting... thanks for the explanation. I guess I assumed the vtable somehow implicitly expressed the offset to the unsized field, but I'd never given it much thought.

Contributor

cristicbz commented Sep 17, 2017

That's interesting... thanks for the explanation. I guess I assumed the vtable somehow implicitly expressed the offset to the unsized field, but I'd never given it much thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment