Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can layout change at run-time? #97

Closed
RalfJung opened this issue Mar 12, 2019 · 17 comments
Closed

Can layout change at run-time? #97

RalfJung opened this issue Mar 12, 2019 · 17 comments
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) C-open-question Category: An open question that we should revisit disposition-close finished-final-comment-period S-pending-team-decision Status: There is broad agreement on the question T-opsem to-announce

Comments

@RalfJung
Copy link
Member

In this forum discussion, the question came up whether layout can change at run-time. I am pretty sure it cannot, but it seems that's something worth saying explicitly. ;)

@comex
Copy link

comex commented Mar 12, 2019

Swift has types where the layout is not known until runtime, as part of its ABI stability system, although it can't actually change during runtime. But size_of being const kind of rules that out already in Rust, though it would take a stable const offsetof to be the final nail in the coffin (not a bad thing).

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 13, 2019

@comex

Swift has types where the layout is not known until runtime, as part of its ABI stability system, although it can't actually change during runtime. But size_of being const kind of rules that out already in Rust,

Rust also has types whose layout is not known until runtime like, for example, slices, trait objects, etc. Since mem::size_of has a T: Sized bound it does not apply to them.

@RalfJung

whether layout can change at run-time

What would that even mean? I don't think it means that the layout of a type can "spontaneously" change (e.g. across two volatile reads of the same type).

So the question must imply that there is a certain action that one can take to change the layout of a type. The layout of T: Sized types is fixed at compile-time, so this must be talking about T: ?Sized types.

One can only materialize references to these types, and these references can be modified (e.g. the metadata of a &[T] such that size_of_val returns a different value), but one can't materialize these types themselves, so I don't know how one would be able to change their layout. For example I can have two &[T] to the same contiguous number of elements, and I can modify the length of one of these references (e.g. to contain one less element), but this modification does not change the actual number of elements in the [T].

AFAICT none of the custom DST proposals would allow to, e.g., drop and deallocate the last element behind a &mut [T] or anything similar, such that the length of the [T] would be able to change.

@RalfJung
Copy link
Member Author

What would that even mean? I don't think it means that the layout of a type can "spontaneously" change (e.g. across two volatile reads of the same type).

The context here was about relative pointers, which (to my understanding) are basically member pointers. But I suggest you check the forum thread yourself; I linked it above.

@RustyYato
Copy link

The general question, is Rust going to guarantee that the layouts of Sized types will be determined at compile time and only depends on the type definition itself, and not on the context of where it is stored?

(I think when the discussion was on-going we didn't really think about how to word the question properly, as it was largely understood what we were talking about, so I reworded the question to be more clear about the specific issue)


For some context on how this came up:

Explanation of relative pointers for the unfamiliar

say I have some memory like so

[..., 0x01, 0x02, 0x03, 0x04, 0x03, 0xa4, 0x02, 0x04, 0x02, ...]

where the byte 0x01 is stored at address 0x000F0000 in memory.
Lets say that at address 0x000F0002 we have a 1 byte wide relative pointer to a u32 (this would be represented in my library as RelPtr<u32, i8>). Then that relative pointer will point to 0x000F0002 + 0x03 = 0x000F0005, the integer 0xa4020402. This is because it uses it's own address plus an offset to figure out where it is pointing to.

In my library the relative pointer is defined as such: RelPtr<Pointee, OffsetType> where OffsetType could be any signed integer, and defaults to isize

Context for the issue

I implemented relative pointers my library on crates.io. I advertised that you could use relative pointers to build safe movable self-referential types. This depends on type layouts being only depending on the type itself, and not on the context of where it is stored. One example that came up was if you moved a value into an aggregate, that aggregate may change the layout of the struct for efficiency. A concrete example of this would be a Struct of Arrays, where we could convert an array of structs into a struct of arrays. This would be a space optimization, and it would break my model of self-referential types.

Example of the problem

(note this example is not attempt to be a real world example)

struct Foo {
    value: u32,
    ptr: RelPtr<u32>
}

impl Foo {
    pub fn new(value: u32) -> Self {
        let mut this = Self { value, ptr: RelPtr::null() };
        this.ptr.set(&mut this.value);
        
        this
    }
    
    fn value(&self) -> u32 {
        // this should be safe because Foo's layout doesn't depend on it's context
        // only its definition
        unsafe { *self.ptr.as_ref_unchecked() }
    }
}

fn main() {
    let foo_array = [Foo::new(10), Foo::new(20)];
    
    println!("{}", foo_array[0].value());
    println!("{}", foo_array[1].value());
}

the two prints at the end would be UB if foo_array was allows to change the layouts of Foo to be of the form struct of arrays, as this would change the offsets of the fields inside of Foo, thus invalidating the relative pointer.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 14, 2019

The general question, is Rust going to guarantee that the layouts of Sized types will be determined at compile time and only depends on the type definition itself, and not on the context of where it is stored?

&T needs to work independently of where T is stored. How would field access, (&T).field, work if the layout of T was allowed to change depending on where the T is stored?

@RustyYato
Copy link

Yes, that is also what I thought, vorner brought up the issue, and that led to a minor debate which led to this issue.


We weren't sure because this wasn't explicitly mentioned anywhere.

@comex
Copy link

comex commented Mar 15, 2019

&T needs to work independently of where T is stored. How would field access, (&T).field, work if the layout of T was allowed to change depending on where the T is stored?

The field access itself could check the pointer value and act differently based on where it's stored, as opposed to the normal behavior of adding a constant offset. However, this is limited by the need to make types movable via memcpy.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 15, 2019

The field access itself could check the pointer value and act differently based on where it's stored,

I'm still having trouble understanding how we would implement this check. More concretely, suppose one has a Box<T> and a Box<(T, U)>, and the T within the (T, U) differs in layout from the T in Box<T>. Given a &T we don't know statically to which layout it would point.

How would the run-time look up the layout on field access ?

The compiler could automatically put types with different layouts behind an enum, where the discriminant specifies the active layout which is then checked on access, but that changes the layout of these types.

I don't know if it would be enough for the run-time to have a map from addresses to layouts. Different types can live at the same address (e.g. Option<T> and T when niche optimizations trigger), so maybe the run-time could maintain a multi-map, indexed by address first, and by the type of access second.

None of this sounds very realistic.

@vorner
Copy link

vorner commented Mar 15, 2019

My point was, similar to how the compiler is allowed to create another instance of function with hard-coded specific value of one of the parameters or in C, if it does escape analysis and discovers nobody has a legal way to observe the actual order of fields, it can reorder the fields in the structure. In a similar way I was wondering if the compiler could eg. decide that there are two somehow distinct uses of a struct and generate two mostly independent types.

For that it would have to first prove that the reference of &T₁ (where ₁ is internal variant known by the compiler) never gets to a place where the refenence to &T₂ is expected and vice versa. But for any such proof, the compiler would likely be allowed to ignore any kind of shady or invalid operations and I wasn't 100% sure what the relative pointers did is guaranteed not to be invalid. However, all that checking and deciding about layout would be in compile time.

It could be however allowed to assign T₁ into T₂ with compiler-generated conversion routine (memcpy with reordering). An example could be storing (T₁, U) in a Vec (and interspersing them to save space), but when working with them on the stack, laying it out differently as T₂ for faster access.

I'm not sure if there's anything preventing the compiler from doing such optimisation given it can prove there's no mixup and what it needs to check in the first place.

Anyway, that's not to say I want to propose such optimisation. My point in the discussion was mostly „are you careful enough and do you have a good reason to believe the assumption not only look obvious, but are actually true?“ ‒ which seems to be the case.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 15, 2019

But for any such proof, the compiler would likely be allowed to ignore any kind of shady or invalid operations and I wasn't 100% sure what the relative pointers did is guaranteed not to be invalid.

These optimizations are valid as long as they don't break legal Rust programs.

I don't think this issue has anything to do with "layout changing at run-time" (these layout optimizations happen at compile-time), and if your program is legal, there is no way for you to tell that they happened (at least within Rust's abstract machine).


The question appears to be whether the example you show in #97 (comment) is legal.

Since the example is not self contained, it is impossible to tell with certainty.

A naive implementation of RelPtr has undefined behavior: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=aca43475408c4b7d1d88193685e2dc02 (just run it under miri).

The example materializes Foo's on the stack, and then moves them into an array and deallocates them from the stack. Since the addresses of the relative pointers point to freed memory on the stack when .value() is called (Rust doesn't have move constructors, so AFAICT this can't ever work correctly for moves), a dangling pointer is dereferenced and the behavior is undefined.

This has nothing to do with "layout changing at run-time". The unsafe code just assumes that after Foo::new the Foo will never be moved, and this assumption is incorrect. You'd need to use a Pin or similar.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 15, 2019

Maybe a better example is (playground):

struct Foo {
    value: u32,
    offset: isize,
}

impl Foo {
    pub fn new(value: u32) -> Self {
        let mut this = Self { value, offset: 0 };
        let this_ptr = &mut this as *mut _ as *mut u8;
        let value_ptr = &mut this.value as *mut _ as *mut u8;
        this.offset = unsafe { value_ptr.offset_from(this_ptr) };
        this
    }

    pub fn value(&self) -> u32 {
        let self_ptr = self as *const _ as *const u8;
        let value_ptr = unsafe { self_ptr.offset(self.offset) };
        unsafe { *(value_ptr as *const u32) }
    }
}

Given the offset o to a repr(Rust) struct T field f, is it always safe (for all values of type T) to access f by offesting a pointer to the struct by o ?

Right now, this is something we guarantee (https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/structs-and-tuples.md#default-layout-repr-rust):

the compiler can select any layout it likes for each struct on each compilation

I read this as, on each compilation, there is only one layout for each struct type. This does not prevent the compiler from optimizing struct layouts in particular cases when it can prove the optimizations to be correct, but for that it would need to ensure that the semantics of examples like the above, which use relative struct offsets to access struct fields, do not change.

Maybe we should add a note to that section about this?

@RalfJung RalfJung added C-open-question Category: An open question that we should revisit A-layout Topic: Related to data structure layout (`#[repr]`) labels Aug 14, 2019
@JakobDegen JakobDegen added S-pending-team-decision Status: There is broad agreement on the question T-opsem labels Jul 25, 2023
@JakobDegen
Copy link
Contributor

JakobDegen commented Aug 13, 2023

Layouts cannot change at runtime.

For each type, the layout of that type is a value that is provided as a parameter to the AM and stays the same throughout its execution. For example, this means that the offset of a field within a struct is always going to be the same.

However, general caveats:

  1. You must read memory using a pointer with sufficient provenance for that memory. Using a pointer with insufficient provenance to read memory, even if you know which field is "supposed to be at that place in memory," remains UB with all the consequences that implies.
  2. The as-if rule still applies. Specifically, if you use an "external" method such as a debugger to read machine memory, you are not guaranteed to get any particular value, regardless of what values are stored in AM memory.
  3. At least for #[repr(Rust)] structs, there are no restrictions today on what the inputs to the compiler's layout computation may be. That means that the compiler may use things like PGO data or whole-program analyses to inform its layout decision.

@rfcbot close

@digama0
Copy link

digama0 commented Aug 13, 2023

Is (3) a commitment to never have such restrictions, or just saying that we make no promises about it?

@JakobDegen
Copy link
Contributor

I've clarified the text to indicate that it's not a commitment to never have them - that being said, I'm not sure that such a commitment would be meaningful, since we generally have no forwards compatibility promises around additional requirements placed on implementations

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 13, 2023

Team member @JakobDegen has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 13, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 23, 2023

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) C-open-question Category: An open question that we should revisit disposition-close finished-final-comment-period S-pending-team-decision Status: There is broad agreement on the question T-opsem to-announce
Projects
None yet
Development

No branches or pull requests

8 participants