From 510e75c2a11386f8f69d044b4c93d2f6471ace80 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Thu, 10 Aug 2023 21:54:35 -0700 Subject: [PATCH 01/14] Specify bit validity and padding of some types Specify the bit validity and padding of the primitive numeric types, bool, char, and pointer and reference types. Closes #1291 --- src/type-layout.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/src/type-layout.md b/src/type-layout.md index 4c87954f3..b304f64f3 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -56,12 +56,23 @@ Most primitives are generally aligned to their size, although this is platform-specific behavior. In particular, on x86 u64 and f64 are only aligned to 32 bits. +For the primitive numeric types (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, +`i64`, `u128`, `i128`, `usize`, `isize`, `f32`, and `f64`), every bit pattern +represents a valid instance of the type (in other words, +`transmute::<[u8; size_of::()], T>(...)` is always sound). For the primitive +numeric types and also for `bool` and `char`, every byte is guaranteed to be +initialized (in other words, `transmute::()]>(...) is always +sound). + ## Pointers and References Layout Pointers and references have the same layout. Mutability of the pointer or reference does not change the layout. -Pointers to sized types have the same size and alignment as `usize`. +Pointers to sized types have the same size and alignment as `usize`. Every +byte of a pointer to a sized type and of a reference to a sized type is +initialized (in other words, for such a pointer or reference type, `P`, +`transmute::()]>(...)` is always sound). Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer. From e2385f92ec393d6887630e1cafd4443388a269a8 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Thu, 10 Aug 2023 21:57:09 -0700 Subject: [PATCH 02/14] Update type-layout.md --- src/type-layout.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/type-layout.md b/src/type-layout.md index b304f64f3..b2ed3f946 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -61,7 +61,7 @@ For the primitive numeric types (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, represents a valid instance of the type (in other words, `transmute::<[u8; size_of::()], T>(...)` is always sound). For the primitive numeric types and also for `bool` and `char`, every byte is guaranteed to be -initialized (in other words, `transmute::()]>(...) is always +initialized (in other words, `transmute::()]>(...)` is always sound). ## Pointers and References Layout From d1850ea36b9aa3f3eadc33746cfba0e1b801b32b Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 14 Aug 2023 12:57:52 -0700 Subject: [PATCH 03/14] Clarify what `T` refers to --- src/type-layout.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/type-layout.md b/src/type-layout.md index b2ed3f946..0c0d36d14 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -58,11 +58,11 @@ aligned to 32 bits. For the primitive numeric types (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize`, `f32`, and `f64`), every bit pattern -represents a valid instance of the type (in other words, -`transmute::<[u8; size_of::()], T>(...)` is always sound). For the primitive -numeric types and also for `bool` and `char`, every byte is guaranteed to be -initialized (in other words, `transmute::()]>(...)` is always -sound). +represents a valid instance of the type (in other words, for every primitive numeric +type, `T`, `transmute::<[u8; size_of::()], T>(...)` is always sound). For the +primitive numeric types and also for `bool` and `char`, every byte is guaranteed to be +initialized (in other words, for every such type, `T`, +`transmute::()]>(...)` is always sound). ## Pointers and References Layout From 2e430462db76230e5753bcd41c4945769a921a2d Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Sat, 19 Aug 2023 13:41:35 -0700 Subject: [PATCH 04/14] Update type-layout.md --- src/type-layout.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/src/type-layout.md b/src/type-layout.md index 0c0d36d14..b437b63e1 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -56,12 +56,16 @@ Most primitives are generally aligned to their size, although this is platform-specific behavior. In particular, on x86 u64 and f64 are only aligned to 32 bits. -For the primitive numeric types (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, -`i64`, `u128`, `i128`, `usize`, `isize`, `f32`, and `f64`), every bit pattern -represents a valid instance of the type (in other words, for every primitive numeric -type, `T`, `transmute::<[u8; size_of::()], T>(...)` is always sound). For the -primitive numeric types and also for `bool` and `char`, every byte is guaranteed to be -initialized (in other words, for every such type, `T`, +For every primitive numeric type (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, +`i64`, `u128`, `i128`, `usize`, `isize`, `f32`, and `f64`), `T`, the bit validity +of `T` is equivalent to the bit validity of `[u8; size_of::()]`. `u8` has 256 +valid representations (namely, every 8-bit sequence). An uninitialized byte is not +a valid u8. A byte at any offset in a reference or pointer type may not be a valid +u8 (the semantics of transmuting a reference or pointer to a non-pointer type is +currently undecided). + +For the primitive numeric types and also for `bool` and `char`, every byte is +guaranteed to be initialized (in other words, for every such type, `T`, `transmute::()]>(...)` is always sound). ## Pointers and References Layout @@ -69,10 +73,7 @@ initialized (in other words, for every such type, `T`, Pointers and references have the same layout. Mutability of the pointer or reference does not change the layout. -Pointers to sized types have the same size and alignment as `usize`. Every -byte of a pointer to a sized type and of a reference to a sized type is -initialized (in other words, for such a pointer or reference type, `P`, -`transmute::()]>(...)` is always sound). +Pointers to sized types have the same size and alignment as `usize`. Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer. From 2f82b7ff3b3df86c87477865fe16bf6d3c5da40f Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Sat, 19 Aug 2023 13:43:48 -0700 Subject: [PATCH 05/14] Update type-layout.md --- src/type-layout.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/type-layout.md b/src/type-layout.md index b437b63e1..e0df3ba76 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -60,9 +60,9 @@ For every primitive numeric type (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize`, `f32`, and `f64`), `T`, the bit validity of `T` is equivalent to the bit validity of `[u8; size_of::()]`. `u8` has 256 valid representations (namely, every 8-bit sequence). An uninitialized byte is not -a valid u8. A byte at any offset in a reference or pointer type may not be a valid -u8 (the semantics of transmuting a reference or pointer to a non-pointer type is -currently undecided). +a valid `u8`. A byte at any offset in a reference or pointer type may not be a +valid `u8` (the semantics of transmuting a reference or pointer to a non-pointer +type is currently undecided). For the primitive numeric types and also for `bool` and `char`, every byte is guaranteed to be initialized (in other words, for every such type, `T`, From 61c6349326691bdfb6fc29315e5e5b27c7def851 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 21 Aug 2023 13:43:26 -0700 Subject: [PATCH 06/14] Update src/type-layout.md Co-authored-by: Ralf Jung --- src/type-layout.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/type-layout.md b/src/type-layout.md index e0df3ba76..09be94bd4 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -64,9 +64,9 @@ a valid `u8`. A byte at any offset in a reference or pointer type may not be a valid `u8` (the semantics of transmuting a reference or pointer to a non-pointer type is currently undecided). -For the primitive numeric types and also for `bool` and `char`, every byte is +For `bool` and `char`, every byte is guaranteed to be initialized (in other words, for every such type, `T`, -`transmute::()]>(...)` is always sound). +`transmute::()]>(...)` is always sound -- but the inverse is not). ## Pointers and References Layout From bbc8063811d67a22fc625fe523299f3cb5de3043 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 21 Aug 2023 13:57:34 -0700 Subject: [PATCH 07/14] Update type-layout.md --- src/type-layout.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/type-layout.md b/src/type-layout.md index 09be94bd4..ac4830fa9 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -58,11 +58,10 @@ aligned to 32 bits. For every primitive numeric type (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize`, `f32`, and `f64`), `T`, the bit validity -of `T` is equivalent to the bit validity of `[u8; size_of::()]`. `u8` has 256 -valid representations (namely, every 8-bit sequence). An uninitialized byte is not -a valid `u8`. A byte at any offset in a reference or pointer type may not be a -valid `u8` (the semantics of transmuting a reference or pointer to a non-pointer -type is currently undecided). +of `T` is equivalent to the bit validity of `[u8; size_of::()]`. An +uninitialized byte is not a valid `u8`. A byte at any offset in a reference or +pointer type may not be a valid `u8` (the semantics of transmuting a reference or +pointer to a non-pointer type is currently undecided). For `bool` and `char`, every byte is guaranteed to be initialized (in other words, for every such type, `T`, From d3a66eb69892b25794b2d82a1249ec01d8ead9f1 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 28 Aug 2023 11:12:57 -0700 Subject: [PATCH 08/14] Update boolean.md --- src/types/boolean.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/types/boolean.md b/src/types/boolean.md index d8984025f..7ea99d185 100644 --- a/src/types/boolean.md +++ b/src/types/boolean.md @@ -92,6 +92,12 @@ boolean type for its operands, they evaluate using the rules of [boolean logic]. * `a < b` is the same as `!(a >= b)` * `a <= b` is the same as `a == b | a < b` +## Bit validity + +The single byte of a `bool` is guaranteed to be initialized (in other words, +`transmute::(...)` is always sound -- but since some bit patterns +are invalid `bool`s, the inverse is not always sound). + [boolean logic]: https://en.wikipedia.org/wiki/Boolean_algebra [enumerated type]: enum.md [expressions]: ../expressions.md From 5795f3ad5d06f4ee005a91c78b5238b6bf0b1406 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 28 Aug 2023 11:13:07 -0700 Subject: [PATCH 09/14] Update textual.md --- src/types/textual.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/types/textual.md b/src/types/textual.md index 65d563312..41ed35ea1 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -17,6 +17,12 @@ is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause Since `str` is a [dynamically sized type], it can only be instantiated through a pointer type, such as `&str`. +## Bit validity + +Every byte of a `char` is guaranteed to be initialized (in other words, +`transmute::()]>(...)` is always sound -- but since +some bit patterns are invalid `char`s, the inverse is not always sound). + [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value [Undefined Behavior]: ../behavior-considered-undefined.md [dynamically sized type]: ../dynamically-sized-types.md From 80ec1463a515858cbb49ac4c9f601a75992893b2 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 28 Aug 2023 11:13:11 -0700 Subject: [PATCH 10/14] Update numeric.md --- src/types/numeric.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/types/numeric.md b/src/types/numeric.md index 8ab53a792..bd59daa6b 100644 --- a/src/types/numeric.md +++ b/src/types/numeric.md @@ -45,3 +45,8 @@ within an object along with one byte past the end. > `isize` are either 32-bit or 64-bit. As a consequence, 16-bit > pointer support is limited and may require explicit care and acknowledgment > from a library to support. + +## Bit validity + +For every numeric type, `T`, the bit validity of `T` is equivalent to the bit +validity of `[u8; size_of::()]`. An uninitialized byte is not a valid `u8`. From 8f92a7d4959d88f287d53bb4c4a0f2ecec31057b Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 28 Aug 2023 15:48:33 -0700 Subject: [PATCH 11/14] Update pointer.md --- src/types/pointer.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/types/pointer.md b/src/types/pointer.md index 4a74370a5..723111cae 100644 --- a/src/types/pointer.md +++ b/src/types/pointer.md @@ -50,6 +50,12 @@ Raw pointers can be created directly using [`core::ptr::addr_of!`] for `*const` The standard library contains additional 'smart pointer' types beyond references and raw pointers. +## Bit validity + +Despite pointers and references being similar to `usize`s in the machine code emitted on most platforms, +the semantics of transmuting a reference or pointer type to a non-pointer type is currently undecided. +Thus, it may not be valid to transmute a pointer or reference type, `P`, to a `[u8; size_of::

()]`. + [`core::ptr::addr_of!`]: ../../core/ptr/macro.addr_of.html [`core::ptr::addr_of_mut!`]: ../../core/ptr/macro.addr_of_mut.html [Interior mutability]: ../interior-mutability.md From 746a359cca82ad461624e9050e7091ad39d7c06a Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Mon, 28 Aug 2023 15:48:44 -0700 Subject: [PATCH 12/14] Update type-layout.md --- src/type-layout.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/src/type-layout.md b/src/type-layout.md index ac4830fa9..4c87954f3 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -56,17 +56,6 @@ Most primitives are generally aligned to their size, although this is platform-specific behavior. In particular, on x86 u64 and f64 are only aligned to 32 bits. -For every primitive numeric type (`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, -`i64`, `u128`, `i128`, `usize`, `isize`, `f32`, and `f64`), `T`, the bit validity -of `T` is equivalent to the bit validity of `[u8; size_of::()]`. An -uninitialized byte is not a valid `u8`. A byte at any offset in a reference or -pointer type may not be a valid `u8` (the semantics of transmuting a reference or -pointer to a non-pointer type is currently undecided). - -For `bool` and `char`, every byte is -guaranteed to be initialized (in other words, for every such type, `T`, -`transmute::()]>(...)` is always sound -- but the inverse is not). - ## Pointers and References Layout Pointers and references have the same layout. Mutability of the pointer or From ea1110b62298a839fb58b65534fc8b0f816dc9fd Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Wed, 6 Sep 2023 14:50:23 -0700 Subject: [PATCH 13/14] Apply suggestions from code review Co-authored-by: Ralf Jung --- src/types/pointer.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/types/pointer.md b/src/types/pointer.md index 723111cae..afb32ac60 100644 --- a/src/types/pointer.md +++ b/src/types/pointer.md @@ -56,6 +56,10 @@ Despite pointers and references being similar to `usize`s in the machine code em the semantics of transmuting a reference or pointer type to a non-pointer type is currently undecided. Thus, it may not be valid to transmute a pointer or reference type, `P`, to a `[u8; size_of::

()]`. +For thin raw pointers (i.e., for `::Metadata == ()` and `P = *const T` or `P = *mut T`), +the inverse direction (transmuting from an integer or array of integers to `P`) is always valid. +However, the pointer produced via such a transmutation may not be dereferenced (not even if `T` has size zero). + [`core::ptr::addr_of!`]: ../../core/ptr/macro.addr_of.html [`core::ptr::addr_of_mut!`]: ../../core/ptr/macro.addr_of_mut.html [Interior mutability]: ../interior-mutability.md From 13b5af85c57df0898f808651f5126c7fe7568349 Mon Sep 17 00:00:00 2001 From: Joshua Liebow-Feeser Date: Wed, 6 Sep 2023 14:51:37 -0700 Subject: [PATCH 14/14] Update pointer.md --- src/types/pointer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/types/pointer.md b/src/types/pointer.md index afb32ac60..cbbf356e8 100644 --- a/src/types/pointer.md +++ b/src/types/pointer.md @@ -56,7 +56,7 @@ Despite pointers and references being similar to `usize`s in the machine code em the semantics of transmuting a reference or pointer type to a non-pointer type is currently undecided. Thus, it may not be valid to transmute a pointer or reference type, `P`, to a `[u8; size_of::

()]`. -For thin raw pointers (i.e., for `::Metadata == ()` and `P = *const T` or `P = *mut T`), +For thin raw pointers (i.e., for `P = *const T` or `P = *mut T` for `T: Sized`), the inverse direction (transmuting from an integer or array of integers to `P`) is always valid. However, the pointer produced via such a transmutation may not be dereferenced (not even if `T` has size zero).