Can layout change at run-time? #97

RalfJung · 2019-03-12T19:20:23Z

In this forum discussion, the question came up whether layout can change at run-time. I am pretty sure it cannot, but it seems that's something worth saying explicitly. ;)

comex · 2019-03-12T23:41:45Z

Swift has types where the layout is not known until runtime, as part of its ABI stability system, although it can't actually change during runtime. But size_of being const kind of rules that out already in Rust, though it would take a stable const offsetof to be the final nail in the coffin (not a bad thing).

gnzlbg · 2019-03-13T17:17:51Z

@comex

Swift has types where the layout is not known until runtime, as part of its ABI stability system, although it can't actually change during runtime. But size_of being const kind of rules that out already in Rust,

Rust also has types whose layout is not known until runtime like, for example, slices, trait objects, etc. Since mem::size_of has a T: Sized bound it does not apply to them.

@RalfJung

whether layout can change at run-time

What would that even mean? I don't think it means that the layout of a type can "spontaneously" change (e.g. across two volatile reads of the same type).

So the question must imply that there is a certain action that one can take to change the layout of a type. The layout of T: Sized types is fixed at compile-time, so this must be talking about T: ?Sized types.

One can only materialize references to these types, and these references can be modified (e.g. the metadata of a &[T] such that size_of_val returns a different value), but one can't materialize these types themselves, so I don't know how one would be able to change their layout. For example I can have two &[T] to the same contiguous number of elements, and I can modify the length of one of these references (e.g. to contain one less element), but this modification does not change the actual number of elements in the [T].

AFAICT none of the custom DST proposals would allow to, e.g., drop and deallocate the last element behind a &mut [T] or anything similar, such that the length of the [T] would be able to change.

RalfJung · 2019-03-13T18:47:56Z

What would that even mean? I don't think it means that the layout of a type can "spontaneously" change (e.g. across two volatile reads of the same type).

The context here was about relative pointers, which (to my understanding) are basically member pointers. But I suggest you check the forum thread yourself; I linked it above.

RustyYato · 2019-03-14T23:10:58Z

The general question, is Rust going to guarantee that the layouts of Sized types will be determined at compile time and only depends on the type definition itself, and not on the context of where it is stored?

(I think when the discussion was on-going we didn't really think about how to word the question properly, as it was largely understood what we were talking about, so I reworded the question to be more clear about the specific issue)

For some context on how this came up:

Explanation of relative pointers for the unfamiliar

say I have some memory like so

[..., 0x01, 0x02, 0x03, 0x04, 0x03, 0xa4, 0x02, 0x04, 0x02, ...]

where the byte 0x01 is stored at address 0x000F0000 in memory.
Lets say that at address 0x000F0002 we have a 1 byte wide relative pointer to a u32 (this would be represented in my library as RelPtr<u32, i8>). Then that relative pointer will point to 0x000F0002 + 0x03 = 0x000F0005, the integer 0xa4020402. This is because it uses it's own address plus an offset to figure out where it is pointing to.

In my library the relative pointer is defined as such: RelPtr<Pointee, OffsetType> where OffsetType could be any signed integer, and defaults to isize

Context for the issue

I implemented relative pointers my library on crates.io. I advertised that you could use relative pointers to build safe movable self-referential types. This depends on type layouts being only depending on the type itself, and not on the context of where it is stored. One example that came up was if you moved a value into an aggregate, that aggregate may change the layout of the struct for efficiency. A concrete example of this would be a Struct of Arrays, where we could convert an array of structs into a struct of arrays. This would be a space optimization, and it would break my model of self-referential types.

Example of the problem

(note this example is not attempt to be a real world example)

struct Foo {
    value: u32,
    ptr: RelPtr<u32>
}

impl Foo {
    pub fn new(value: u32) -> Self {
        let mut this = Self { value, ptr: RelPtr::null() };
        this.ptr.set(&mut this.value);
        
        this
    }
    
    fn value(&self) -> u32 {
        // this should be safe because Foo's layout doesn't depend on it's context
        // only its definition
        unsafe { *self.ptr.as_ref_unchecked() }
    }
}

fn main() {
    let foo_array = [Foo::new(10), Foo::new(20)];
    
    println!("{}", foo_array[0].value());
    println!("{}", foo_array[1].value());
}

the two prints at the end would be UB if foo_array was allows to change the layouts of Foo to be of the form struct of arrays, as this would change the offsets of the fields inside of Foo, thus invalidating the relative pointer.

gnzlbg · 2019-03-14T23:43:15Z

The general question, is Rust going to guarantee that the layouts of Sized types will be determined at compile time and only depends on the type definition itself, and not on the context of where it is stored?

&T needs to work independently of where T is stored. How would field access, (&T).field, work if the layout of T was allowed to change depending on where the T is stored?

RustyYato · 2019-03-14T23:51:12Z

Yes, that is also what I thought, vorner brought up the issue, and that led to a minor debate which led to this issue.

We weren't sure because this wasn't explicitly mentioned anywhere.

comex · 2019-03-15T04:15:46Z

&T needs to work independently of where T is stored. How would field access, (&T).field, work if the layout of T was allowed to change depending on where the T is stored?

The field access itself could check the pointer value and act differently based on where it's stored, as opposed to the normal behavior of adding a constant offset. However, this is limited by the need to make types movable via memcpy.

gnzlbg · 2019-03-15T07:15:17Z

The field access itself could check the pointer value and act differently based on where it's stored,

I'm still having trouble understanding how we would implement this check. More concretely, suppose one has a Box<T> and a Box<(T, U)>, and the T within the (T, U) differs in layout from the T in Box<T>. Given a &T we don't know statically to which layout it would point.

How would the run-time look up the layout on field access ?

The compiler could automatically put types with different layouts behind an enum, where the discriminant specifies the active layout which is then checked on access, but that changes the layout of these types.

I don't know if it would be enough for the run-time to have a map from addresses to layouts. Different types can live at the same address (e.g. Option<T> and T when niche optimizations trigger), so maybe the run-time could maintain a multi-map, indexed by address first, and by the type of access second.

None of this sounds very realistic.

vorner · 2019-03-15T07:59:55Z

My point was, similar to how the compiler is allowed to create another instance of function with hard-coded specific value of one of the parameters or in C, if it does escape analysis and discovers nobody has a legal way to observe the actual order of fields, it can reorder the fields in the structure. In a similar way I was wondering if the compiler could eg. decide that there are two somehow distinct uses of a struct and generate two mostly independent types.

For that it would have to first prove that the reference of &T₁ (where ₁ is internal variant known by the compiler) never gets to a place where the refenence to &T₂ is expected and vice versa. But for any such proof, the compiler would likely be allowed to ignore any kind of shady or invalid operations and I wasn't 100% sure what the relative pointers did is guaranteed not to be invalid. However, all that checking and deciding about layout would be in compile time.

It could be however allowed to assign T₁ into T₂ with compiler-generated conversion routine (memcpy with reordering). An example could be storing (T₁, U) in a Vec (and interspersing them to save space), but when working with them on the stack, laying it out differently as T₂ for faster access.

I'm not sure if there's anything preventing the compiler from doing such optimisation given it can prove there's no mixup and what it needs to check in the first place.

Anyway, that's not to say I want to propose such optimisation. My point in the discussion was mostly „are you careful enough and do you have a good reason to believe the assumption not only look obvious, but are actually true?“ ‒ which seems to be the case.

gnzlbg · 2019-03-15T08:34:40Z

But for any such proof, the compiler would likely be allowed to ignore any kind of shady or invalid operations and I wasn't 100% sure what the relative pointers did is guaranteed not to be invalid.

These optimizations are valid as long as they don't break legal Rust programs.

I don't think this issue has anything to do with "layout changing at run-time" (these layout optimizations happen at compile-time), and if your program is legal, there is no way for you to tell that they happened (at least within Rust's abstract machine).

The question appears to be whether the example you show in #97 (comment) is legal.

Since the example is not self contained, it is impossible to tell with certainty.

A naive implementation of RelPtr has undefined behavior: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=aca43475408c4b7d1d88193685e2dc02 (just run it under miri).

The example materializes Foo's on the stack, and then moves them into an array and deallocates them from the stack. Since the addresses of the relative pointers point to freed memory on the stack when .value() is called (Rust doesn't have move constructors, so AFAICT this can't ever work correctly for moves), a dangling pointer is dereferenced and the behavior is undefined.

This has nothing to do with "layout changing at run-time". The unsafe code just assumes that after Foo::new the Foo will never be moved, and this assumption is incorrect. You'd need to use a Pin or similar.

gnzlbg · 2019-03-15T08:56:00Z

Maybe a better example is (playground):

struct Foo {
    value: u32,
    offset: isize,
}

impl Foo {
    pub fn new(value: u32) -> Self {
        let mut this = Self { value, offset: 0 };
        let this_ptr = &mut this as *mut _ as *mut u8;
        let value_ptr = &mut this.value as *mut _ as *mut u8;
        this.offset = unsafe { value_ptr.offset_from(this_ptr) };
        this
    }

    pub fn value(&self) -> u32 {
        let self_ptr = self as *const _ as *const u8;
        let value_ptr = unsafe { self_ptr.offset(self.offset) };
        unsafe { *(value_ptr as *const u32) }
    }
}

Given the offset o to a repr(Rust) struct T field f, is it always safe (for all values of type T) to access f by offesting a pointer to the struct by o ?

Right now, this is something we guarantee (https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/structs-and-tuples.md#default-layout-repr-rust):

the compiler can select any layout it likes for each struct on each compilation

I read this as, on each compilation, there is only one layout for each struct type. This does not prevent the compiler from optimizing struct layouts in particular cases when it can prove the optimizations to be correct, but for that it would need to ensure that the semantics of examples like the above, which use relative struct offsets to access struct fields, do not change.

Maybe we should add a note to that section about this?

JakobDegen · 2023-08-13T00:06:54Z

Layouts cannot change at runtime.

For each type, the layout of that type is a value that is provided as a parameter to the AM and stays the same throughout its execution. For example, this means that the offset of a field within a struct is always going to be the same.

However, general caveats:

You must read memory using a pointer with sufficient provenance for that memory. Using a pointer with insufficient provenance to read memory, even if you know which field is "supposed to be at that place in memory," remains UB with all the consequences that implies.
The as-if rule still applies. Specifically, if you use an "external" method such as a debugger to read machine memory, you are not guaranteed to get any particular value, regardless of what values are stored in AM memory.
At least for #[repr(Rust)] structs, there are no restrictions today on what the inputs to the compiler's layout computation may be. That means that the compiler may use things like PGO data or whole-program analyses to inform its layout decision.

@rfcbot close

digama0 · 2023-08-13T00:29:23Z

Is (3) a commitment to never have such restrictions, or just saying that we make no promises about it?

JakobDegen · 2023-08-13T00:35:30Z

I've clarified the text to indicate that it's not a commitment to never have them - that being said, I'm not sure that such a commitment would be meaningful, since we generally have no forwards compatibility promises around additional requirements placed on implementations

rfcbot · 2023-08-13T00:58:21Z

Team member @JakobDegen has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot · 2023-08-13T03:10:57Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2023-08-23T03:17:15Z

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

RalfJung added C-open-question Category: An open question that we should revisit A-layout Topic: Related to data structure layout (`#[repr]`) labels Aug 14, 2019

DeltaF1 mentioned this issue Jul 24, 2023

Do default repr structs have a contiguous memory representation? #431

Open

JakobDegen added S-pending-team-decision Status: There is broad agreement on the question T-opsem labels Jul 25, 2023

rfcbot added proposed-final-comment-period disposition-close final-comment-period labels Aug 13, 2023

rfcbot removed the proposed-final-comment-period label Aug 13, 2023

rfcbot added finished-final-comment-period and removed final-comment-period labels Aug 23, 2023

rfcbot added the to-announce label Aug 23, 2023

RalfJung closed this as completed in rust-lang/opsem-team@ff91b26 Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can layout change at run-time? #97

Can layout change at run-time? #97

RalfJung commented Mar 12, 2019

comex commented Mar 12, 2019

gnzlbg commented Mar 13, 2019

RalfJung commented Mar 13, 2019

RustyYato commented Mar 14, 2019

gnzlbg commented Mar 14, 2019

RustyYato commented Mar 14, 2019

comex commented Mar 15, 2019

gnzlbg commented Mar 15, 2019 •

edited

Loading

vorner commented Mar 15, 2019

gnzlbg commented Mar 15, 2019 •

edited

Loading

gnzlbg commented Mar 15, 2019 •

edited

Loading

JakobDegen commented Aug 13, 2023 •

edited

Loading

digama0 commented Aug 13, 2023

JakobDegen commented Aug 13, 2023

rfcbot commented Aug 13, 2023 •

edited by RalfJung

Loading

rfcbot commented Aug 13, 2023

rfcbot commented Aug 23, 2023

Can layout change at run-time? #97

Can layout change at run-time? #97

Comments

RalfJung commented Mar 12, 2019

comex commented Mar 12, 2019

gnzlbg commented Mar 13, 2019

RalfJung commented Mar 13, 2019

RustyYato commented Mar 14, 2019

Explanation of relative pointers for the unfamiliar

Context for the issue

Example of the problem

gnzlbg commented Mar 14, 2019

RustyYato commented Mar 14, 2019

comex commented Mar 15, 2019

gnzlbg commented Mar 15, 2019 • edited Loading

vorner commented Mar 15, 2019

gnzlbg commented Mar 15, 2019 • edited Loading

gnzlbg commented Mar 15, 2019 • edited Loading

JakobDegen commented Aug 13, 2023 • edited Loading

digama0 commented Aug 13, 2023

JakobDegen commented Aug 13, 2023

rfcbot commented Aug 13, 2023 • edited by RalfJung Loading

rfcbot commented Aug 13, 2023

rfcbot commented Aug 23, 2023

gnzlbg commented Mar 15, 2019 •

edited

Loading

gnzlbg commented Mar 15, 2019 •

edited

Loading

gnzlbg commented Mar 15, 2019 •

edited

Loading

JakobDegen commented Aug 13, 2023 •

edited

Loading

rfcbot commented Aug 13, 2023 •

edited by RalfJung

Loading