- Feature Name: from_bits
- Start Date: 2018-05-23
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Add the FromBits<T>
unsafe marker trait. U: FromBits<T>
indicates that they bytes any valid T
can be safely interpreted as a U
. Add the SizeLeq<T>
and AlignLeq<T>
unsafe marker traits. U: SizeLeq<T>
and U: AlignLeq<T>
indicate that U
is less than or equal to T
in size and alignment respectively. Add library- or compiler-supported custom derives for these traits, and various supporting library functions to make them more useful.
Various standard library types have from_bits
functions which construct a type from raw bits. However, none of these functions are generic. In domains such as SIMD, in which there are many conversions that are possible, and a given program will want to use many of them in practice, it is desirable to be able to write code which is generic on two types whose bits can be safely interchanged.
In the domain of parsing, a common task is to take an untrusted sequence of bytes and attempt to parse those bytes into an in-memory representation without violating memory safety. Today, that requires either explicit runtime checking and copying or unsafe
code and sound reasoning about the exact semantics of Rust's (undefined) memory model. The former is undesirable because it is slow, and there is a lot of boilerplate code, while the latter is undesirable because it introduces a high risk of memory unsafety. However, a type that implements FromBits<[u8]>
can safely be deserialized from any sufficiently-long untrusted byte sequence without any runtime overhead or risk of memory unsafety. In many cases, this enables the holy grail of "zero-copy" parsing.
Why are we doing this? What use cases does it support? What is the expected outcome?
A new unsafe marker trait, FromBits<T>
is introduced. If U: FromBits<T>
, then the bytes of any T
of sufficient length can be safely reinterpreted as a U
. A new custom derive is also introduced. Since FromBits
takes a type parameter T
, the custom derive also needs an argument indicating what values of T
FromBits
should be derived for. For example, the following asks the custom derive to emit unsafe impl FromBits<[u8]> for MyStruct {}
or cause a compile error if it would be unsafe.
#[derive(FromBits)]
#[from_bits_derive(T = [u8])]
#[repr(C)]
struct MyStruct {
a: u8,
b: u16,
c: u32,
}
Two new unsafe marker traits, SizeLeq<T> where T: Sized
and AlignLeq<T>
are introduced. If U: SizeLeq<T>
, then U
's size is less than or equal to T
's size. If U: AlignLeq<T>
, then U
's alignment requirements are less than or equal to T
's, implying that any address which satisfies T
's alignment also satisfies U
's.
Various library functions are introduced which add functionality for types implementing these traits. These include coerce<T, U>
, which is a safe variant of transmute
where U: FromBits<T>
, and coerce_ref
and coerce_mut
, which coerce references rather than values.
Explain the proposal as if it was already included in the language and you were teaching it to another Rust programmer. That generally means:
- Introducing new named concepts.
- Explaining the feature largely in terms of examples.
- Explaining how Rust programmers should think about the feature, and how it should impact the way they use Rust. It should explain the impact as concretely as possible.
- If applicable, provide sample error messages, deprecation warnings, or migration guidance.
- If applicable, describe the differences between teaching this to existing Rust programmers and new Rust programmers.
For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms.
A new unsafe auto marker trait, FromBits<T>
is introduced. If U: FromBits<T>
, then:
- If
T
andU
are bothSized
, thenstd::mem::transmute::<T, U>(t)
does not cause UB. - If
t: &T
andsize_of_val(t) >= size_of::<U>()
, thenstd::mem::transmute::<&T, &U>(t)
does not cause UB. - If
t: &mut T
andsize_of_val(t) == size_of::<U>()
andT: FromBits<U>
, thenstd::mem::transmute<&mut T, &mut U>(t)
does not cause UB.
In order for U: FromBits<T>
, the following must hold:
- If
T
andU
are bothSized
, thensize_of::<T>() >= size_of::<U>()
. - For any valid
t: T
such thatsize_of_val(t) >= size_of::<U>()
, the firstsize_of::<U>()
bytes oft
are a valid instance ofU
.
These requirements have the following corrolaries:
- If it is possible to construct a value of
T
which is as large asU
and which has uninitialized bytes (for example, from struct field padding) at byte offsets which correspond to actual values inU
, thenU: !FromBits<T>
. This is because accessing uninitialized memory is undefined behavior. For example, the followingFromBits<T>
implementation is unsound:
#[repr(C)]
struct T {
a: u8,
// padding byte
c: u16,
}
struct U {
a: u8,
b: u8, // at the same byte offset as the padding in T
c: u16,
}
unsafe impl FromBits<T> for U {} // invokes UB
FromBits<T>
is an auto trait, and is automatically implemented for any pair of types for which it's sound. Note that this precludes types with private fields - if a type has private fields, then implementing FromBits
for it is unsound because the implementation might use unsafe code and rely on internal invariants that a FromBits
implementation could violate. There is also a custom derive that allows a the author of a type with private fields to opt into FromBits
. Here is an example of its usage:
#[derive(FromBits)]
#[derive_from_bits(T = Bar)] // derive 'unsafe impl FromBits<Bar> for Foo {}'
#[repr(C)]
struct Foo(...);
Note that because, without a repr
, the memory layout of structs and enums are undefined, structs and enums without repr
are never automatically included in FromBits
impls, you can never derive FromBits
for them, and they can never appear as the type parameter to FromBits
.
Two new unsafe marker traits, SizeLeq<T> where T: Sized
and AlignLeq<T>
are introduced. If U: SizeLeq<T>
, then T
and U
are both Sized
, and size_of::<U>() <= size_of::<T>()
. If U: AlignLeq<T>
, then U
's minimum alignment requirement is less than or equal to T
's, and so any address which satisfies T
's alignment also satisfies U
's.
These are both auto traits, and are automatically implemented for any pair of types for which the property holds. For struct and enum types, this requires a repr
as with FromBits
. Struct field privacy does not affect the auto trait implementation, so you cannot derive
SizeLeq
or AlignLeq
.
The following safe library functions are added. They are all guaranteed to never panic.
coerce<T, U>(x: T) -> U where U: FromBits<T>
- interpret the firstsize_of::<U>()
bytes ofx
asU
and forgetx
coerce_ref<T, U>(x: &T) -> &U where U: FromBits<T> + AlignLeq<T>
coerce_ref_size_checked<T, U>(x: &T) -> Option<&U> where T: ?Sized, U: FromBits<T> + AlignLeq<T>
- likecoerce_ref
, butx
's size is checked at runtime, andNone
is returned ifsize_of_val(x) < size_of::<U>()
coerce_ref_align_checked<T, U>(x: &T) -> Option<&U> where U: FromBits<T>
- likecoerce_ref
, butx
's alignment is checked at runtime, andNone
is returned if it does not satisfyU
's alignment requirementscoerce_ref_size_align_checked<T, U>(x: &T) -> Option<&U> where T: ?Sized, U: FromBits<T>
- likecoerce_ref
, butx
's size and alignment are both checked at runtime, andNone
is returned ifx
is insufficient in either respectcoerce_mut<T, U>(x: &mut T) -> &mut U where T: FromBits<U>, U: FromBits<T> + AlignLeq<T>
- note that this differs fromcoerce_ref
in also requiring thatT: FromBits<U>
, which is necessary because the caller might write new values ofU
to the returned referencecoerce_mut_size_checked<T, U>(x: &mut T) -> Option<&mut U> where T: FromBits<U> + ?Sized, U: FromBits<T> + AlignLeq<T>
- unlikecoerce_ref_size_checked
, returnsNone
ifsize_of_val(x) != size_of::<U>()
coerce_mut_align_checked<T, U>(x: &mut T) -> Option<&mut U> where T: FromBits<U>, U: FromBits<T>
coerce_mut_size_align_checked<T, U>(x: &mut T) -> Option<&mut U> where T: FromBits<U> + ?Sized, U: FromBits<T>
- unlikecoerce_ref_size_align_checked
, returnsNone
ifsize_of_val(x) != size_of::<U>()
The following unsafe library functions are added. They are all equivalent to their checked counterparts, except that it is the caller's responsibility to ensure the unchecked property, and if the property does not hold, it may cause UB.
coerce_ref_size_unchecked<T, U>(x: &T) -> &U where T: ?Sized, U: FromBits<T> + AlignLeq<T>
coerce_ref_align_unchecked<T, U>(x: &T) -> &U where U: FromBits<T>
coerce_ref_size_align_unchecked<T, U>(x: &T) -> &U where T: ?Sized, U: FromBits<T>
coerce_mut_size_unchecked<T, U>(x: &mut T) -> &mut U where T: FromBits<U> + ?Sized, U: FromBits<T> + AlignLeq<T>
coerce_mut_align_unchecked<T, U>(x: &mut T) -> &mut U where T: FromBits<U>, U: FromBits<T>
coerce_mut_size_align_unchecked<T, U>(x: &mut T) -> &mut U where T: FromBits<U> + ?Sized, U: FromBits<T>
Note that, for all functions where T: Sized
, the U: FromBits<T>
bound implies that T
is at least as large as U
and, if a T: FromBits<U>
bound is also present, that T
and U
have equal size. The T: FromBits<U>
bound is used on _mut
functions since, without it, there's no guarantee that all valid instances of U
are valid instances of T
, and so code like *coerce_mut(&mut my_t) = U::constructor()
would not be sound.
This is the technical portion of the RFC. Explain the design in sufficient detail that:
- Its interaction with other features is clear.
- It is reasonably clear how the feature would be implemented.
- Corner cases are dissected by example.
The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.
Why should we not do this?
One of the significant benefits of this proposal is that it allows for zero-copy parsing. Consider, for example, the following simplified API for zero-copy parsing, a fuller version of which is implemented in a production system here:
pub struct LayoutVerified<'a, T, U>(&'a T, PhantomData<U>);
impl<'a, T, U> LayoutVerified<'a, T, U> {
pub fn new(x: &'a T) -> Option<LayoutVerified<'a, T, U>> { ... }
}
impl<'a, T, U> LayoutVerified<'a, T, U> where T: Sized, U: SizeLeq<T> {
pub fn new_sized(x: &'a T) -> Option<LayoutVerified<'a, T, U>> { ... }
}
impl<'a, T, U> LayoutVerified<'a, T, U> where U: AlignLeq<T> {
pub fn new_aligned(x: &'a T) -> Option<LayoutVerified<'a, T, U>> { ... }
}
impl<'a, T, U> LayoutVerified<'a, T, U> where T: Sized, U: SizeLeq<T> + AlignLeq {
pub fn new_sized_aligned(x: &'a T) -> LayoutVerified<'a, T, U> { ... }
}
impl<'a, T, U> Deref for LayoutVerified<'a, T, U> where U: FromBits<T> {
type Target = U;
fn deref(&self) -> &U { ... }
}
The idea is that LayoutVerified
carries a &T
which is guaranteed to be safe to convert to a &U
using the implementation of Deref
. Even though the Deref
impl itself only requires that U: FromBits<T>
, every LayoutVerified
constructor ensures that both size and alignment are valid before producing a LayoutVerified
. Thus, ownership of a LayoutVerified
is proof that the appropriate checks have been performed, whether at runtime or via the type system at compile time, and so Deref
can be implemented soundly (though unsafe code is required in Deref::deref
). Note that, since U: FromBits<T>
for T: Sized, U: Sized
implies U: SizeLeq<T>
, we don't use SizeLeq
in any of the library functions proposed above. However, we see its utility in this example: It allows us to write the new_sized
and new_sized_aligned
constructors. Without this, a LayoutVerified
-style type that used static compile-time size verification would need to be a distinct type.
Using LayoutVerified
, we can write zero-copy parsing code like the following, which is taken from a production networking stack, and requires no unsafe code (note that, in the real version, the derives are not implemented, and so unsafe code is required for implementing FromBits
and AlignLeq
):
#[derive(FromBits, AlignLeq)
#[derive_from_bits(T = [u8])]
#[derive_align_leq(T = [u8])]
#[repr(C)]
struct Header { ... }
pub struct UdpPacket<'a> {
header: LayoutVerified<'a, [u8], Header>,
body: &'a [u8],
}
impl<'a> UdpPacket<'a> {
pub fn parse(bytes: &'a [u8]) -> Result<UdpPacket<'a>, ()> {
let header = LayoutVerified::new_aligned(bytes).ok_or(())?;
...
}
}
-
Use a trait like
Pod
which simply guarantees that any byte sequence of lengthsize_of::<T>()
is a validT
. This is less general and, crucially, does not support the SIMD use case. -
Omit
SizeLeq
trait. We could still have all of the library functions proposed here. However, we would loose the ability to decouple the verification and utilization steps as described in theLayoutVerified
example, which would be a significant loss. -
Omit
AlignLeq
trait. We could still coerce by value, and we could still coerce by reference so long as alignment were checked at runtime. However, we could not express infallible reference coercions, which would significantly reduce the utility of this system for zero-copy parsing, as it would re-introduce a class of bugs that could not be caught at compile-time. -
Handle endianness. We don't currently include endianness in the model of what is considered "safe" because by "safe" we simply mean "not causing undefined behavior." However, under that definition "safe" doesn't mean "unsurprising." I think that the behavior that you want with respect to endianness is likely to be highly use case-specific, so I don't think it's appropriate in a general-purpose mechanism.
-
Why is this design the best in the space of possible designs?
-
What other designs have been considered and what is the rationale for not choosing them?
-
What is the impact of not doing this?
- pre-RFC FromBits/IntoBits
- Pre-RFC: Trait for deserializing untrusted input
Pod
trait- Safe conversions for DSTs
- Bit twiddling pre-RFC
Discuss prior art, both the good and the bad, in relation to this proposal. A few examples of what this can include are:
- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had?
- For community proposals: Is this done by some other community and what were their experiences with it?
- For other teams: What lessons can we learn from what other communities have done here?
- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background.
This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages.
Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. Please also take into consideration that rust sometimes intentionally diverges from common language features.
-
Is it possible to have
U: FromBits<T>
if eitherT
orU
are structs or enums that don't have arepr
? -
What does
U: FromBits<T>
mean ifU: !Sized
? -
If
U: FromBits<T>
,t: &mut T
, andsize_of_val(t) > size_of::<U>()
(note the strict inequality), is it safe to convertt
to a&mut U
? I don't think so. -
If we have a composite type with internal padding,
T
, isT: FromBits<[u8]>
sound? In other words, is it OK to read arbitrary but meaingful bytes from a[u8]
into the padding bytes of aT
? -
We say that
FromBits
,SizeLeq
, andAlignLeq
only work for struct and enum types withrepr
s. Do anyrepr
s work, or do we need to narrow it down to specificrepr
s? -
Bikeshed "coerce" to mean "safe transmute".
-
What parts of the design do you expect to resolve through the RFC process before this gets merged?
-
What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
-
What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?