-
Notifications
You must be signed in to change notification settings - Fork 347
Makes the reference types sound via a DST #1532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The current implementation of `ArrayRef` and its cousins has them as sized types, which turns out to be a critical and unsound mistake. This commit is large, but its heart is small: change the `ArrayRef` implementation to be unsized by appending empty slices to the end of them.
It's easier inside the library to have ArrayParts be public and have LayoutRef be a type alias. Outside of the library, though, having the ArrayParts be a public interface exposes implementation details. If we ever wanted to switch to a trait object or another approach for DSTs, having ArrayParts be public would make that harder.
The doc tests now contain a check that ensures that mem::swap can't create unsound behavior, so it's critical we continue to monitor that going forward.
/// A possibly-unsized container for array parts. | ||
/// | ||
/// This type only exists to enable holding the array parts in a single | ||
/// type, which needs to be sized inside of `ArrayBase` and unsized inside | ||
/// of the reference types. | ||
#[derive(Debug)] | ||
struct ArrayParts<A, D, T: ?Sized> | ||
{ | ||
/// A non-null pointer into the buffer held by `data`; may point anywhere | ||
/// in its range. If `S: Data`, this pointer must be aligned. | ||
ptr: NonNull<A>, | ||
/// The lengths of the axes. | ||
dim: D, | ||
/// The element count stride per axis. To be parsed as `isize`. | ||
strides: D, | ||
_dst_control: T, | ||
} | ||
|
||
type ArrayPartsSized<A, D> = ArrayParts<A, D, [usize; 0]>; | ||
type ArrayPartsUnsized<A, D> = ArrayParts<A, D, [usize]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first major review point: defining a possibly-sized type to hold the array parts.
Thanks to the use of an explicitly 0-length array, the ArrayPartsSized
type is no larger than just the ptr
, dim
, and strides
packaged together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would recommend factoring out the ptr, dim, strides to a separate struct, as there is no guarantee that ArrayParts<_, _, T>
and ArrayParts<_, _, U>
have the same field order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah excellent callout, I'll do that shortly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this more, and realized don't require the Sized
and Unsized
versions of array types to be transmutable, we just rely on Unsized Coercions. The only memory-layout guarantee we rely on is on the #[repr(transparent)]
of the reference types.
I do think that this would be a cleaner internal API; however, it increases the verbosity of a good number of internal implementations. Since these are private types, I'm going to defer this change until there's a need to decouple the parts from the sizing control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rustc has our back here based on that T: ?Sized
forces T
to be the last field, which is used as a prereq for unsized coercion.
#[derive(Debug)] | ||
pub struct LayoutRef<A, D> | ||
#[repr(transparent)] | ||
pub struct LayoutRef<A, D>(ArrayPartsUnsized<A, D>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the second major review point: changing LayoutRef
(which is the "lowest" type in the reference chain) to be a DST. We do this by having it wrap the "unsized" array parts.
The #[repr(transparent)]
gives us the guarantee that the LayoutRef
will have the same layout as ArrayPartsUnsized
, which will be critical for the third major review point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no issues here from what i see
unsafe { &*(&self.layout as *const LayoutRef<S::Elem, D>).cast::<ArrayRef<S::Elem, D>>() } | ||
let parts: &ArrayPartsUnsized<S::Elem, D> = &self.parts; | ||
let ptr = (parts as *const ArrayPartsUnsized<S::Elem, D>) as *const ArrayRef<S::Elem, D>; | ||
unsafe { &*ptr } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third, and final, major review point. This particular set of lines is the dereference from ArrayBase
to ArrayRef
; but the core idea is repeated many times across this file for all of the Deref
, DerefMut
, AsRef
, and AsMut
implementations.
Our first line is now an unsizing coercion - going from Sized
to a DST. The next line converts a reference to a pointer, then converts a pointer to ArrayPartsUnsized
to ArrayRef
. Since ArrayRef
is transparent to LayoutRef
which is transparent to ArrayPartsUnsized
, this is valid. Finally, we reborrow the pointer.
/// Tests that a mem::swap can't compile by putting it into a doctest | ||
/// | ||
/// ```compile_fail | ||
/// let mut x = Array1::from_vec(vec![0, 1, 2]); | ||
/// { | ||
/// let mut y = Array1::from_vec(vec![4, 5, 6]); | ||
/// let x_ref = x.as_layout_ref_mut(); | ||
/// let y_ref = y.as_layout_ref_mut(); | ||
/// core::mem::swap(x_ref, y_ref); | ||
/// } | ||
/// ``` | ||
#[allow(dead_code)] | ||
fn test_no_swap_via_doctests() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And one bonus section: a doctest to assert that the original use-after-free scenario no longer compiles.
/// use ndarray::{LayoutRef2, array}; | ||
/// | ||
/// fn aspect_ratio<T, A>(layout: &T) -> (usize, usize) | ||
/// where T: AsRef<LayoutRef2<A>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we could add + ?Sized
too maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely - I'll add + ?Sized
across the board in another documentation PR. I'm also going to write up an explainer for the release notes about how to write functions for LayoutRef
and RawRef
.
Nice work! What I can see this is all good, the changes are also well explained. I fear I can't do a full review justice, because I've paged most of this out of memory (or not been present for earlier layout changes). Getting feedback from other unsized type experts is always useful. |
looks good to me. im not fully sure about how the unsizing coercion part interacts with field reordering, but the rest looks good |
Thank you both! As best I can tell, the unsized coercion is independent from the field ordering for our application - we never transmute from the sized to the unsized versions. I'll add a second PR with additional documentation changes, and then make the necessary changes to introduce 0.17.1. |
The current implementation of
ArrayRef
and its cousins has them as sized types, which turns out to be a critical and unsound mistake. This PR is large, but its heart is small: change theArrayRef
implementation to be unsized.The approach this PR takes is to package the core array "metadata" - the pointer, dim, and strides - into a struct that can either be sized or unsized. This is done by appending a generic "
_dst_control
" field. For the "sized" version of the metadata, that field is a 0-length array. For the "unsized" version of the metadata, that sized field is a struct. This core type is private, so users cannot construct any other variants other than these two.We then put the sized version into the
ArrayBase
types, put the unsized version into the reference types, and perform an unsizing coercion to convert from one to the other. Because Rust has no (safe, supported) "resizing" coercion, this switch is irreversible. Sized types cannot be recovered from the unsized reference types.Although this PR contains considerable code changes, most are renames and minor changes from altering the internal structure of the array types. To make this PR easier to review, I'll give a list of major changes in the order I'd review them. Then, in comments below, I will highlight those changes in the same order:
ArrayParts
definitionLayoutRef
a newtypeCloses #1531.
An alternative method would be to use a trait object to create the DST. I'd originally used this approach, but was worried that embedding a vtable into the heart of the library's type could harm performance. One way around this would be to cast the trait object's fat pointer to a thin pointer, discarding the vtable. However, I could not find any guarantees in the Rust type layout documentation nor in the Rustonomicon on DSTs about the layout of DSTs. Relying on behavior that not even the Rustonomicon will guarantee gives me the heebie-jeebies, and the current method appears equally viable.