From 1a8f6cf3c645067f4104b67086eb9d7ecefc60c4 Mon Sep 17 00:00:00 2001 From: Thom Chiovoloni Date: Mon, 29 Aug 2022 21:26:38 -0700 Subject: [PATCH 1/7] Add offset_of macro --- text/0000-offset_of.md | 480 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 480 insertions(+) create mode 100644 text/0000-offset_of.md diff --git a/text/0000-offset_of.md b/text/0000-offset_of.md new file mode 100644 index 00000000000..8e1c1f15c57 --- /dev/null +++ b/text/0000-offset_of.md @@ -0,0 +1,480 @@ +- Feature Name: `offset_of` +- Start Date: 2022-08-29 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Introduce a new macro `core::mem::offset_of!`, which evaluates to a constant +containing the offset in bytes of a field inside some type. + +Specifically, this RFC allows usage like the following: + +```rs +use core::mem::offset_of; + +const EXAMPLES: &[usize] = &[ + offset_of!(Struct, b), + offset_of!(TupleStruct, 0), + offset_of!(Union, y), + offset_of!((i32, u32), 1), + offset_of!(inner::SubmodGeneric, pub_field), +]; + +struct Struct { a: u64, b: &'static str } +struct TupleStruct(u8, i32); +union Union { x: u8, y: u64 } + +mod inner { + pub struct SubmodAndGeneric { + private_field: T, + pub pub_field: u8, + } +} +``` + +# Motivation +[motivation]: #motivation + +Type layout information is very frequently needed in low level code, especially +if it's performing serialization, FFI, or implementing a data structure. + +While often the needed information is limited to the size and required alignment +of a given type, sometimes there is a need to access information about the +fields of a type, most commonly (and most fundamentally) the offset (in bytes), +at which the field may be found in the type which contains it. + +Currently, Rust's standard library provides good explicit APIs for providing +information about the size and alignment of a given type (specifically, +`core::mem` has `size_of`, `align_of`, `size_of_val`, and `align_of_val`). +Unfortunately, it provides none for determining field-offset, leaving it to be +computed based on implicitly-provided layout information. + +This is an unfortunate gap, one we've seen countless workarounds for, which have +caused no end of trouble in the ecosystem. The problem is that while recovering +layout information in this manner is completely possible in rust (recovering the +size and alignment would even be possible using the same technique), doing it +correctly is very subtle. Most of the implementations which seem obvious are +actually wrong, often because they invoke undefined behavior. + +Unfortunately, this also means they often tend to work at first, but have a risk +to be something of a "ticking time-bomb", which may break in a future release of +Rust or LLVM. + +This is not a theoretical concern, and widespread breakage of incorrect +`offset_of` implementations has happened in the past (e.g. when `mem::zeroed` +started performing validity checks), and may happen again (e.g. the +`deref_nullptr` lint revealed large bodies of code with incorrect +implementations). + +Unfortunately, previously there's not been great alternative. Generally, the +recommendation users are given is to either: + +1. Use a crate, for example `memoffset` and `bytemuck` both have `offset_of!` + implementations. +2. Hardcode the constant. + +Both of which have several downsides, but even if the operation can be +flawlessly performed by library code, it's the opinion of the author of this RFC +that this operation is fundamental enough that at a minimum, that the standard +library should provide the implementation. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +In low level code, you may find you need to know the byte offset of a field +within a type. This can be accomplished with the `core::mem::offset_of!` macro. + +`core::mem::offset_of!` takes two arguments, the type that holds the field, and +the name of the field. For example, if you have: + +```rs +#[repr(C)] +struct Vertex { + tex: [u16; 2], + pos: [f32; 3], +} +``` + +Then you can use `core::mem::offset_of!(Vertex, tex)` to get the offset in bytes +where `tex` begins, and `core::mem::offset_of!(Vertex, pos)` to get the offset +in bytes where `pos` begins. + +In this example, we also specified the layout algorithm to use, so we know that +`offset_of!(Vertex, tex)` will be 0, and `offset_of!(Vertex, pos)` will be 4. + +However, if a `#[repr(...)]` is not used, the compiler is free to place the +fields of `Vertex` in whatever order it prefers (even if they aren't the same as +the order the fields are written in the struct declaration), so there's no way +to know in advance what the positions of the fields will be. + +Thankfully, `offset_of!` is still usable here: + +```rs +// No `#[repr()]` needed! +struct Vertex { + tex: [u16; 2], + pos: [f32; 3], +} +// This time let's define some constants containing the offset value, +// which can be more readable if you need to use them several times. +const OFFSET_VERTEX_TEX: usize = core::mem::offset_of!(Vertex, tex); +const OFFSET_VERTEX_POS: usize = core::mem::offset_of!(Vertex, pos); +``` + +As you can see, the usage is the same as before, but because we didn't specify +`#[repr(C)]`, compiler may have changed the order or position, so the values may +be different -- it's completely possible that `pos` is located at offset 0, for +example! Thankfully, by using `core::mem::offset_of!`, this code is correct +either way, and will continute to be correct, even if the layout algorithm +changes in the future. + +## `offset_of!` On Other Types + +If your type doesn't have named fields, `offset_of!` can still be used. For +tuples and tuple structs, the "name" of the field is the numeral value you use +to access it. For example: + +```rust +// Works with a tuple struct +struct KeyVal(&'static str, Vec); +const OFFSET_KV_KEY: usize = core::mem::offset_of!(KeyVal, 0); +const OFFSET_KV_VAL: usize = core::mem::offset_of!(KeyVal, 1); +// Or with an anonymous tuple. +const OFFSET_ANON_KEY: usize = core::mem::offset_of!((&'static str, Vec), 0); +const OFFSET_ANON_VAL: usize = core::mem::offset_of!((&'static str, Vec), 1); +``` + +Finally, `offset_of!` can be used to compute the offset of fields in unions too. +While this may be surprising, the compiler is allowed to put padding in front of +fields in unions which are not `#[repr(C)]`, which would lead to a non-zero +field offset. + +```rs +use core::mem::offset_of; +union Buffer { + metadata: [u64; 3], + datadata: [u8; 1024 * 1024 * 32], +} +const METADATA_OFFSET: usize = offset_of!(Buffer, metadata); +``` + +## Limitations + +There are a few limitations worth mentioning. Some of these may be relaxed in +the future, however. + +1. Perhaps unsurprisingly, it obeys privacy, so both the type and field you call + `offset_of!` on must be visible to the code calling `offset_of!`. + +2. The type holding the field must be `Sized`, so trying to compute where the + slice begins in something like `offset_of!((i32, [u32]), 1)` isn't supported. + +3. Compared to `offsetof` in C and C++, you can't access nested fields/arrays. + That is, instead of `offset_of!(Foo, quank.zoop.2.quank[4])`, you'll have to + compute the offsets of each step manually, and sum them. + +4. Finally, types other than tuples, structs, and unions are not currently + unsupported. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +`offset_of` is a new macro exported from `core::mem` which has an signature +similar to the following: + +```rs +pub macro offset_of($Container:ty, $field:tt $(,)?) { + // ...implementation defined... +} +``` + +Invoking this macro expands to a constant expression of type `usize`, which +evaluates to the offset in bytes from the beginning of `$Container` where +`$field` is found. + +`$Container` must be visible and must be or resolve to one of the following +types: + +1. A `struct` or `union` type with either named or anonymous/tuple-style fields. + + In this case, `$field` must share a name or tuple index with a field which: + - Exists on `$Container`. + - Is visible at the location where `offset_of!` is invoked (but there is no + requirement that fields other than than `$field` be visible there) + +2. An anonymous tuple type. + + In this case, `$field` must be a tuple index (that is, an integer literal) + that exists on the tuple type in question. + +Use on other types is an error, although this may be relaxed in some cases in +the future (see the [Future possibilities][future-possibilities] section). + +As a note: the implementation is strongly encouraged to not have runtime +resource usage dependent on the values of `$Container` or `$field`. In +particular, the implementation should not allocate space for an instance of +`$Container` on the runtime stack. + +# Drawbacks +[drawbacks]: #drawbacks + +1. This exposes layout information at compile time which is otherwise not + exposed until runtime. This can cause compatibility hazards similar to + `mem::size_of` or `mem::align_of`, but plausibly greater as it provides even + more information. + + That said, this API allows querying information which (if needed at compile + time) would otherwise be hard-coded, so in some cases it may reduce the risk + of a compatibility hazard. + +2. This is a low level feature that most code won't need to use, so perhaps it + is better off left out. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +The general rationale is that it should remove the need to hardcode, hand-roll, +or pull in a third-party crate in order to compute field offsets. This hopefully +should remove as many barriers + +That said, there are several alternatives to this, some of which were even +considered: + +1. Do nothing, and tell users to use the [`memoffset`][memoffset] crate, or to + hard-code constant offsets. + + This was not chosen as this operation seems fundamental enough to provided + by the standard library, especially given how often it is incorrectly + implemented in the wild. + +2. Add `offset_of!`, but disallow use on `#[repr(Rust)]` types. + + This would make `core::mem::offset_of!` have less functionality than the + implementation from `memoffset`, or the implementation they could implement + if they computed it manually. + +3. Require that all fields of `$Container` be visible at the invocation site, + rather than just requiring that `$field` is. + + As above, this would make `core::mem::offset_of!` worse than the version + they'd have written themselves and/or an off-the-shelf implementation. + +4. Add `offset_of!`, but disallow use during constant evaluation. + + This would mean that users which need const access to `offset_of!` must + continue to hardcode the field offsets as constants, which is undesirable, + error-prone, and can cause compatibility hazards. + +5. Try to make `addr_of!((*null::<$Container>()).$field) as usize` work for this: + + Currently this is UB (due to dereferencing a null pointer) and does not + support use in const (due to accessing the address of a raw pointer). + Changing both of these issues would be challenging, but may be possible. + + This was not chosen because seems difficult, and would be harder to teach + (or read) than `core::mem::offset_of`. + +6. Hold off until this can be integrated into some larger language feature, such + as C++-style pointer-to-field, Swift-style field paths, ... + + Aside from avoiding scope creep, this wasn't pursued as `offset_of!` does + not prevent these in the future, and may not even be solved by them. + +7. Use `offset_of!($Container::$field)` as the syntax instead. + + This wasn't chosen because it doesn't really work with tuples, and seems + like it may harm the quality of error messages (for example, if a user + forgets `::$field`, and does `offset_of!(crate::path::to::SomeType)`). + + Additionally, this does not generalize as well to some of the extensions in + future work. + +# Prior art +[prior-art]: #prior-art + +There is quite a bit of prior art here, which I've grouped into: + +1. Crates: Rust libraries that expose similar or equivalent functionality to + this proposal. +2. Languages: Other languages that provide access to this information either as + a language builtin, or via a library. + +## Prior Art: Crates + +Several crates in the ecosystem have `offset_of!` implementations. +[`memoffset`][memoffset] and [`bytemuck`][bmuckcrate] are probably the two most +popular, and provide this functionality in different ways. + +- The [`memoffset`][memoffset] crate provides an `offset_of!` macro very similar + to this proposal. It is a fairly straightforward implementation that avoids + most pitfalls, although it does allocate an instance of the type on the stack, + which can cause stack overflow during debug builds (the compiler removes this + in release builds). + + On nightly, if the `unstable_const` cargo feature is enabled, + `memoffset::offset_of!` may be used during constant evaluation. + +- The [`bytemuck`][bmuckcrate] crate has an [`offset_of!`][bmuckoffset] + implementation which differs from the one in `memoffset` in that it takes + three arguments, where the first is an existing instance of the type (or, due + to a quirk in how it is implemented, a reference to one). + + This is intended to allow an implementation that does not require `unsafe` + (as it was added in a time when it was unclear how to provide a sound + `offset_of!`). + + Somewhat interestingly, this first parameter may be used to avoid a large + stack allocation by providing a reference to a const/static in this first + parameter (for example as `bytemuck::offset_of!(&SOME_STATIC, SomeTy, + field)`). + + It does not support use during constant evaluation. + +[memoffset]: https://crates.io/crates/memoffset/0.6.5 +[bmuckcrate]: https://crates.io/crates/bytemuck/1.12.1 +[bmuckoffset]: https://docs.rs/bytemuck/1.12.1/bytemuck/macro.offset_of.html + +## Prior Art: Languages + +Many languages which support low level programming have some equivalent to this +functionality. + +- The C programming language supports this as an [`offsetof`][coffsetof] macro, + for example: `offsetof(struct some_struct, some_field)` is morally equivalent + to this proposal's `offset_of!(SomeStruct, some_field)`. It produces a integer + constant, so it can be used during C's equivalent of constant evaluation. + + Notably, C's `offsetof` is more powerful than the `offset_of!` proposed in + this RFC, as it supports access to fields of nested types, and even can + project through arrays, for example `offsetof(some_type, foo.bar[1].baz)` is + completely allowed. + + Extending `core::mem::offset_of` to support some of these use-cases could be + done in the future, as is discussed in the future possibilities section + below. + +- C++ can an [`offsetof`][cppoffsetof] macro which is essentially compatible + with C's, although it is only "conditionally supported" to use it on types + which are not "standard layout" (see the linked documentation for information + on what the quoted text means). + + C++ also has support for getting a pointer to a field via it's + pointer-to-member feature. This feature is powerful and while it replaces + some uses of `offsetof`, it does not replace all of them + +- Zig supports this via the [`@offsetOf`][zigoffsetof] function, which takes a + `type` and `u8[]` that contains the field name as a string, for example + `@offsetOf(SomeType, "some_field")` would be essentially equivalent to this + proposal's `core::mem::offset_of!(SomeType, some_field)`. + + Zig also supports the [`@bitOffsetOf`][zigbitoffset] function, as Zig allows + structs to contain fields which are not byte-aligned (e.g. bitfields). The + syntax and semantics are otherwise equivalent. + + These are all `comptime` functions, which means they may be used in + situations which are morally equivalent to Rust's constant evaluation. + +- The D language allows accessing the offset via a property of each field. For + example, `SomeType.some_field.offsetof` is essentially equivalent to this + proposal's `core::mem::offset_of!(SomeType, some_field)`. + +- Swift supports this via the [`MemoryLayout.offset(of:)`][swiftoffset] function + (note: the link contains a good overview of the design). For example, + `MemoryLayout.offset(of: \.some_field))` would be the equivalent to + `core::mem::offset_of!(SomeType, some_field)`. + + The `\.some_field` syntax is a partial key path (a Swift language feature). + This can grant access to fields of nested structs in a manner similar to C's + `offsetof`, for example: `MemoryLayout.offset(of: \.foo.bar.baz)`. + +[coffsetof]: https://en.cppreference.com/w/c/types/offsetof +[cppoffsetof]: https://en.cppreference.com/w/cpp/types/offsetof +[zigoffsetof]: hhttps://ziglang.org/documentation/0.9.1/#offsetOf +[zigbitoffset]: https://ziglang.org/documentation/0.9.1/#bitOffsetOf +[doffsetof]: https://dlang.org/spec/struct.html#struct_field_properties +[swiftoffset]: https://github.com/apple/swift-evolution/blob/ec2028964daeda2600e49aa89fd9e59d2363433b/proposals/0210-key-path-offset.md + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +1. Should any of the features listed as "Future possibilities" be supported initially? + +# Future possibilities +[future-possibilities]: #future-possibilities + +This proposal is intentionally minimal, so there are a number of future +possibilities. + +## Enum support (`offset_of!(SomeEnum::StructVariant, field_on_variant)`) + +Eventually, it may be desirable to allow `offset_of!` to access the fields +inside the struct and tuple variants of certain enums (possibly limited to enums +with a primitive integer representation, such as `#[repr(C)]`, `#[repr(int)]`, +or `#[repr(C, int)]` -- where `int` is one of Rust's primitive integer types — +u8, isize, u128, etc). + +For example, in the future somthing like the following could be allowed: + +```rs +use core::mem::offset_of; + +#[repr(i8)] +enum Event { + Key { pressed: bool, code: u32 }, + Resize(u32, u32), +} + +const EVENT_KEY_CODE: usize = offset_of!(Event::Key, code); +const EVENT_KEY_PRESSED: usize = offset_of!(Event::Key, pressed); + +const EVENT_RESIZE_W: usize = offset_of!(Event::Resize, 0); +const EVENT_RESIZE_H: usize = offset_of!(Event::Resize, 1); +``` + +In this example, the name/path of the variant is used as the first argument. +While there are use-cases for this in low level FFI code (similar to the use +cases for `#[repr(int)]` and `#[repr(C, int)]` enums), this may need further +design work, and is left to the future. + +## Nested Field Access + +In C, expressions like `offsetof(struct some_struct, foo.bar.baz[3].quux)` are +allowed, where `foo.bar.baz[3].quux` denotes a path to a derived field. This can +be of somewhat arbitrary complexity, accessing fields of nested structs, +performing array indexing (often this is used to access past the end of the +array even), and so on. Similar functionality is offered by +`MemoryLayout.offset` in Swift, where more complex language features are used to +achieve it. + +This was omitted from this proposal because it is not commonly used, and can +generally be replaced (at the cost of convenience) by multiple invocations of +the macro. + +Additionally, in the future similar functionality could be added in a fully +backwards-compatible way, either by directly allowing usage like +`offset_of!(SomeStruct, foo.bar.baz[3].quux)`, or by requiring each field be +comma-separated, as in `offset_of!(SomeStruct, foo, bar, baz, [3], quux)`. + +Note that while this example shows a combination that supports array indexing, +it's unclear if this is actually desirable for Rust. + +## `memoffset::span_of!` Functionality + +The `memoffset` crate has support for a [`span_of!`][spanof] macro (used like +`memoffset::span_of!(SomeType, some_field)`), which expands to a `Range` +indicating which bytes of `SomeType` are from the field `some_field`. + +The use case for this is more limited than that of `offset_of!`, so it was +omitted from this proposal. That said, should this prove sufficiently useful, it +would be simple to add a similar macro to `core::mem` in the future. + +[spanof]: https://docs.rs/memoffset/0.6.5/memoffset/macro.span_of.html + +## Support for types with `?Sized` fields. + +Currently, we don't support `offset_of!((u8, [i32]), 1)`, as `(u8, [i32])` does +not implement `Sized`. + +This is a mostly artificial restriction, and could be relaxed in the future. From ade3091191c243cde1a3b2194a7e3fec94bb4949 Mon Sep 17 00:00:00 2001 From: Thom Chiovoloni Date: Mon, 29 Aug 2022 22:16:37 -0700 Subject: [PATCH 2/7] Update RFC number --- text/0000-offset_of.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-offset_of.md b/text/0000-offset_of.md index 8e1c1f15c57..872b346223b 100644 --- a/text/0000-offset_of.md +++ b/text/0000-offset_of.md @@ -1,6 +1,6 @@ - Feature Name: `offset_of` - Start Date: 2022-08-29 -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3308](https://github.com/rust-lang/rfcs/pull/3308) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From e23240e8f53a86105dab420013ea765023d23e8b Mon Sep 17 00:00:00 2001 From: Thom Chiovoloni Date: Tue, 30 Aug 2022 09:09:21 -0700 Subject: [PATCH 3/7] Review feedback --- text/0000-offset_of.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/text/0000-offset_of.md b/text/0000-offset_of.md index 872b346223b..148d5465959 100644 --- a/text/0000-offset_of.md +++ b/text/0000-offset_of.md @@ -175,13 +175,13 @@ the future, however. That is, instead of `offset_of!(Foo, quank.zoop.2.quank[4])`, you'll have to compute the offsets of each step manually, and sum them. -4. Finally, types other than tuples, structs, and unions are not currently +4. Finally, types other than tuples, structs, and unions are currently unsupported. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -`offset_of` is a new macro exported from `core::mem` which has an signature +`offset_of` is a new macro exported from `core::mem` which has a signature similar to the following: ```rs @@ -229,8 +229,8 @@ particular, the implementation should not allocate space for an instance of time) would otherwise be hard-coded, so in some cases it may reduce the risk of a compatibility hazard. -2. This is a low level feature that most code won't need to use, so perhaps it - is better off left out. +2. This is a feature most code won't need to use, and it may be confusing to + users unfamiliar with low level programming. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -273,8 +273,10 @@ considered: support use in const (due to accessing the address of a raw pointer). Changing both of these issues would be challenging, but may be possible. - This was not chosen because seems difficult, and would be harder to teach - (or read) than `core::mem::offset_of`. + This was not chosen because seems difficult, would be harder to teach (or + read) than `core::mem::offset_of`, and is largely orthogonal to whether or + not a dedicated field offset API is provided (in other words, fixing those + issues seems unlikely to make `offset_of!` appear redundant). 6. Hold off until this can be integrated into some larger language feature, such as C++-style pointer-to-field, Swift-style field paths, ... From 0afa52aa3b20bde059e4a39dd187b52e2d7676e8 Mon Sep 17 00:00:00 2001 From: Thom Chiovoloni Date: Fri, 2 Sep 2022 18:28:18 -0700 Subject: [PATCH 4/7] Respond to review feedback --- text/0000-offset_of.md | 110 +++++++++++++++++++++++++++++++---------- 1 file changed, 83 insertions(+), 27 deletions(-) diff --git a/text/0000-offset_of.md b/text/0000-offset_of.md index 148d5465959..c757df943dc 100644 --- a/text/0000-offset_of.md +++ b/text/0000-offset_of.md @@ -229,7 +229,16 @@ particular, the implementation should not allocate space for an instance of time) would otherwise be hard-coded, so in some cases it may reduce the risk of a compatibility hazard. -2. This is a feature most code won't need to use, and it may be confusing to +2. Similarly, this reduces the amount of dynamism that a Rust implementation + could use for `repr(Rust)` types. + + For example, it forbids a Rust implementation from varying field offsets of + `repr(Rust)` types between executions of the same compiled program (for + example, by way of interpretation or code modification), unless it also + performs modifications to adjust the result of `offset_of!` (and recompute + the values of derived constants, and regenerate relevant code, ...). + +3. This is a feature most code won't need to use, and it may be confusing to users unfamiliar with low level programming. # Rationale and alternatives @@ -255,6 +264,12 @@ considered: implementation from `memoffset`, or the implementation they could implement if they computed it manually. + Needing the offset of fields on `#[repr(Rust)]` is not as common, but still + useful in code which needs to describe precise details of type layout to + some other system, including GPU APIs which accept configurable vertex + formats or binary serialization formats that contain descriptions of the + field offsets for the record types they contain, etc. + 3. Require that all fields of `$Container` be visible at the invocation site, rather than just requiring that `$field` is. @@ -409,6 +424,28 @@ functionality. This proposal is intentionally minimal, so there are a number of future possibilities. +## Nested Field Access + +In C, expressions like `offsetof(struct some_struct, foo.bar.baz[3].quux)` are +allowed, where `foo.bar.baz[3].quux` denotes a path to a derived field. This can +be of somewhat arbitrary complexity, accessing fields of nested structs, +performing array indexing (often this is used to access past the end of the +array even), and so on. Similar functionality is offered by +`MemoryLayout.offset` in Swift, where more complex language features are used to +achieve it. + +This was omitted from this proposal because it is not commonly used, and can +generally be replaced (at the cost of convenience) by multiple invocations of +the macro. + +Additionally, in the future similar functionality could be added in a +backwards-compatible way, either by directly allowing usage like +`offset_of!(SomeStruct, foo.bar.baz[3].quux)`, or by requiring each field be +comma-separated, as in `offset_of!(SomeStruct, foo, bar, baz, [3], quux)`. + +Note that while this example shows a combination that supports array indexing, +it's unclear if this is actually desirable for Rust. + ## Enum support (`offset_of!(SomeEnum::StructVariant, field_on_variant)`) Eventually, it may be desirable to allow `offset_of!` to access the fields @@ -417,7 +454,7 @@ with a primitive integer representation, such as `#[repr(C)]`, `#[repr(int)]`, or `#[repr(C, int)]` -- where `int` is one of Rust's primitive integer types — u8, isize, u128, etc). -For example, in the future somthing like the following could be allowed: +For example, in the future something like the following could be allowed: ```rs use core::mem::offset_of; @@ -440,27 +477,9 @@ While there are use-cases for this in low level FFI code (similar to the use cases for `#[repr(int)]` and `#[repr(C, int)]` enums), this may need further design work, and is left to the future. -## Nested Field Access - -In C, expressions like `offsetof(struct some_struct, foo.bar.baz[3].quux)` are -allowed, where `foo.bar.baz[3].quux` denotes a path to a derived field. This can -be of somewhat arbitrary complexity, accessing fields of nested structs, -performing array indexing (often this is used to access past the end of the -array even), and so on. Similar functionality is offered by -`MemoryLayout.offset` in Swift, where more complex language features are used to -achieve it. - -This was omitted from this proposal because it is not commonly used, and can -generally be replaced (at the cost of convenience) by multiple invocations of -the macro. - -Additionally, in the future similar functionality could be added in a fully -backwards-compatible way, either by directly allowing usage like -`offset_of!(SomeStruct, foo.bar.baz[3].quux)`, or by requiring each field be -comma-separated, as in `offset_of!(SomeStruct, foo, bar, baz, [3], quux)`. - -Note that while this example shows a combination that supports array indexing, -it's unclear if this is actually desirable for Rust. +A drawback is that it is unclear how to support these types in the "Nested Field +Access" proposed above, so in the future should we decide to support one of +these, a decision may need to be made about the other. ## `memoffset::span_of!` Functionality @@ -474,9 +493,46 @@ would be simple to add a similar macro to `core::mem` in the future. [spanof]: https://docs.rs/memoffset/0.6.5/memoffset/macro.span_of.html -## Support for types with `?Sized` fields. +## Support for types with unsized fields + +### ... via `offset_of_val!` + +Currently, we don't support use with unsized types. That is, `(A, B, ... [T])` +and/or `(A, B, ..., dyn Foo)`, or their equivalent in structs. + +The reason for this is that the offset of the unsized field is not always known, +such as in the case of the last field in `(Foo, dyn SomeTrait)`, where the +offset depends on what the concrete type is. Notably, the compiler must read the +alignment out of the vtable when you access such a field. + +This is equivalent to not being able to determine the the size and/or alignment +of `?Sized` types, where we solve it by making the user provide the instance +they're interested in, as in `core::mem::{size_of_val, align_of_val}`, so we +could provide an analogous `core::mem::offset_of_val!($val, $Type, $field)` to +support this case. + +It would be reasonable to add this in the future, but is left out for now. + +### ... by only forbidding the edge case + +The only case where we currently do *not* know the offset of a field statically +is when the user has requested the offset of the unsized field, and the unsized +field is a trait object. + +There are valid reasons to want to get the offset of: +1. The fields before the unsized field, as in `offset_of!((i32, dyn Send), 0)`. +2. The unsized field if it's a `[T]`, `str`, or other case where the offset does + not depend on reading the metadata, as in `offset_of!((i32, [u16]), 1)`. + +Allowing these is somewhat inconsistent with `align_of`, which could provide the +alignment in some cases, but forbids it for all `?Sized` types (admittedly, +allowing `align_of::<[T]>()` is not particularly compelling, as it's always the +same as `align_of::()`). -Currently, we don't support `offset_of!((u8, [i32]), 1)`, as `(u8, [i32])` does -not implement `Sized`. +Either way, it's trivially backwards compatible for us to eventually start +allowing these, and for the trailing slice/str case, it seems difficult to pin +down the cases where it's allowed without risk of complicating potential future +features (like custom DSTs, extern types, or whatever other new unsized types we +might want to add). -This is a mostly artificial restriction, and could be relaxed in the future. +As such, it's left for future work. From cdafbed903c0b9a99c33f8acfb40f115c0dd5875 Mon Sep 17 00:00:00 2001 From: Thom Chiovoloni Date: Fri, 2 Sep 2022 19:36:32 -0700 Subject: [PATCH 5/7] Mention `field-offset`, reword parts of the future work around unsized types --- text/0000-offset_of.md | 52 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 45 insertions(+), 7 deletions(-) diff --git a/text/0000-offset_of.md b/text/0000-offset_of.md index c757df943dc..3f14a268521 100644 --- a/text/0000-offset_of.md +++ b/text/0000-offset_of.md @@ -270,6 +270,9 @@ considered: formats or binary serialization formats that contain descriptions of the field offsets for the record types they contain, etc. + It is also useful for implementing field projection as a library feature, as + in cases like [`field-offset`][fieldoffset]. + 3. Require that all fields of `$Container` be visible at the invocation site, rather than just requiring that `$field` is. @@ -308,6 +311,26 @@ considered: Additionally, this does not generalize as well to some of the extensions in future work. +8. Expose a high level type-safe API instead, where `offset_of` returns a type + with phantom parameters for container and field (for example, see the + [`field-offset`][fieldoffset] crate, and the notes on it in the Prior Art + section below): + + This is not pursued for a few reasons: + + 1. Field projection is just one of several use cases for getting the offset + to a field, rather than the only one, or even the most common one. While + the other uses could be supported by a function which returns the + `usize`, it seems better to push this kind of thing into the ecosystem. + + 2. Add this to the stdlib risks conflicting with or restricting our ability + to add a lang feature for field projection and/or pointer-to-member + functionality. + + None of those are deal-breakers, but it seems better to keep this simple and + limited. Such a type-safe API can be implemented on top of a `offset_of!` + which returns integers. + # Prior art [prior-art]: #prior-art @@ -349,9 +372,20 @@ popular, and provide this functionality in different ways. It does not support use during constant evaluation. +- The [`field-offset`][fieldoffset] crate provides a higher level type-safe API + for field offsets similar to the pointer-to-member functionality in C++. It + uses `memoffset` to implement `offset_of!`. + + Calling `field_offset::offset_of!` returns a `FieldOffset` + structure, which transparently wraps `usize` and while providing phantom + annotations to ensure it is used with the correct container and field type. + It uses this to provide some generic field projection functionality, mostly + around `Pin`. + [memoffset]: https://crates.io/crates/memoffset/0.6.5 [bmuckcrate]: https://crates.io/crates/bytemuck/1.12.1 [bmuckoffset]: https://docs.rs/bytemuck/1.12.1/bytemuck/macro.offset_of.html +[fieldoffset]: https://crates.io/crates/field-offset/0.3.4 ## Prior Art: Languages @@ -519,15 +553,19 @@ The only case where we currently do *not* know the offset of a field statically is when the user has requested the offset of the unsized field, and the unsized field is a trait object. -There are valid reasons to want to get the offset of: +It's possible for us to provide the offset of for: + 1. The fields before the unsized field, as in `offset_of!((i32, dyn Send), 0)`. -2. The unsized field if it's a `[T]`, `str`, or other case where the offset does - not depend on reading the metadata, as in `offset_of!((i32, [u16]), 1)`. -Allowing these is somewhat inconsistent with `align_of`, which could provide the -alignment in some cases, but forbids it for all `?Sized` types (admittedly, -allowing `align_of::<[T]>()` is not particularly compelling, as it's always the -same as `align_of::()`). +2. The unsized field itself if it is a type which whose offset is known without + reading the metadata, such as `[T]`, `str`, and types that end with them, as + in `offset_of!((i32, [u16]), 1)`, or `offset_of!((u16, (i64, str)), 2)`. + +Allowing these is somewhat inconsistent with `core::mem::align_of`, which could +provide the alignment in some cases such as slices, but instead you must use +`core::mem::align_of_val` for all `?Sized` types (admittedly, allowing +`align_of::<[T]>()` is perhaps not very compelling, as it's always the same as +`align_of::()`). Either way, it's trivially backwards compatible for us to eventually start allowing these, and for the trailing slice/str case, it seems difficult to pin From db48e955742b8cce324d0a04592a40aa7a1cb1a5 Mon Sep 17 00:00:00 2001 From: Thom Chiovoloni Date: Tue, 13 Dec 2022 08:30:11 -0800 Subject: [PATCH 6/7] Add discussion about fields in traits --- text/0000-offset_of.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/text/0000-offset_of.md b/text/0000-offset_of.md index 3f14a268521..9c84d450886 100644 --- a/text/0000-offset_of.md +++ b/text/0000-offset_of.md @@ -574,3 +574,13 @@ features (like custom DSTs, extern types, or whatever other new unsized types we might want to add). As such, it's left for future work. + +## Fields in Traits + +If support for fields in traits is ever added, then it would be an open question +how `offset_of!` behaves when applied to a generic value of a trait type which +has fields. Similarly, if an `offset_of_val!` is added, it would interact with +trait objects of traits that have fields. + +In either case, this could be forbidden or allowed, but decisions along these +lines are deferred for now, as fields in traits do not yet exist. From 56510ce212e0b31cab437250b38bf0cffad74efc Mon Sep 17 00:00:00 2001 From: Waffle Maybe Date: Tue, 10 Jan 2023 02:59:17 +0400 Subject: [PATCH 7/7] Fillin tracking issue and RFC numbers --- text/{0000-offset_of.md => 3308-offset_of.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-offset_of.md => 3308-offset_of.md} (99%) diff --git a/text/0000-offset_of.md b/text/3308-offset_of.md similarity index 99% rename from text/0000-offset_of.md rename to text/3308-offset_of.md index 9c84d450886..0edde3a5231 100644 --- a/text/0000-offset_of.md +++ b/text/3308-offset_of.md @@ -1,7 +1,7 @@ - Feature Name: `offset_of` - Start Date: 2022-08-29 - RFC PR: [rust-lang/rfcs#3308](https://github.com/rust-lang/rfcs/pull/3308) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) +- Rust Issue: [rust-lang/rust#106655](https://github.com/rust-lang/rust/issues/106655) # Summary [summary]: #summary