Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix structs incorrect memory layout #208

Merged
merged 1 commit into from
Nov 23, 2023

Conversation

germag
Copy link
Collaborator

@germag germag commented Nov 15, 2023

Summary of the PR

Currently, some vhost-user-messages structs are defined with just repr(packed), and also implement the ByteValue trait implying that they can be initialized safely from an array of bytes.

This is not safe, because the default memory representation does not guarantee that the fields will not be reordered.

Note for reviewers

Another issue that I didn't address in this PR is the struct:

/// Single memory region descriptor as payload for ADD_MEM_REG and REM_MEM_REG
/// requests.
#[repr(C)]
#[derive(Copy, Clone, Default)]
pub struct VhostUserInflight {
    /// Size of the area to track inflight I/O.
    pub mmap_size: u64,
    /// Offset of this area from the start of the supplied file descriptor.
    pub mmap_offset: u64,
    /// Number of virtqueues.
    pub num_queues: u16,
    /// Size of virtqueues.
    pub queue_size: u16,
}
...
unsafe impl ByteValued for VhostUserInflight {}

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

@Ablu
Copy link
Collaborator

Ablu commented Nov 16, 2023

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

Wouldn't adding packed here fix that?

Copy link
Collaborator

@Ablu Ablu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike how easy it is to use ByteValued insafely :/. Are we the only one doing this? Does the ecosystem maybe have safer alternatives?

@germag
Copy link
Collaborator Author

germag commented Nov 16, 2023

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

Wouldn't adding packed here fix that?

Yes, but that requires changing the qemu's definition of this structure, and any other VMMs that use it.

We can also fix it by adding a 4-byte explicit padding field to be always 0 (reading an uninitialized explicit padding field is also UB)

@Ablu
Copy link
Collaborator

Ablu commented Nov 16, 2023

Yes, but that requires changing the qemu's definition of this structure, and any other VMMs that use it.

We can also fix it by adding a 4-byte explicit padding field to be always 0 (reading an uninitialized explicit padding field is also UB)

Ah. So QEMU expects a size that includes padding? Adding explicit padding sounds good to me then (it is done in other places too). I do not understand the remark about it always being 0... Why does that matter?

@germag
Copy link
Collaborator Author

germag commented Nov 16, 2023

Yes, but that requires changing the qemu's definition of this structure, and any other VMMs that use it.
We can also fix it by adding a 4-byte explicit padding field to be always 0 (reading an uninitialized explicit padding field is also UB)

Ah. So QEMU expects a size that includes padding? Adding explicit padding sounds good to me then (it is done in other places too). I do not understand the remark about it always being 0... Why does that matter?

Because we will have the same UB problem if the VMM sends a struct with uninitialized padding fields, so we need a guarantee that those fields are initialized to 0 (or any other value)

@Ablu
Copy link
Collaborator

Ablu commented Nov 16, 2023

Because we will have the same UB problem if the VMM sends a struct with uninitialized padding fields, so we need a guarantee that those fields are initialized to 0 (or any other value)

Hm... But for us, reading the memory, the padding will have some value. It may be any value, but it should not be undefined behavior (assuming it is an int type), right? Or do I misunderstand something here? I do not understand why we need to define this to be initialized with a specific value. In my understand we do not care where the memory comes from. We only care that we may access size_of::<Struct>() bytes and that the Struct is only containing integers (no bools, references, ...).

@germag
Copy link
Collaborator Author

germag commented Nov 16, 2023

Because we will have the same UB problem if the VMM sends a struct with uninitialized padding fields, so we need a guarantee that those fields are initialized to 0 (or any other value)

Hm... But for us, reading the memory, the padding will have some value. It may be any value, but it should not be undefined behavior (assuming it is an int type), right? Or do I misunderstand something here? I do not understand why we need to define this to be initialized with a specific value. In my understand we do not care where the memory comes from. We only care that we may access size_of::<Struct>() bytes and that the Struct is only containing integers (no bools, references, ...).

for rust "abstract machine" an area of memory with random bit pattern it's not the same as uninitialized memory

@stefano-garzarella
Copy link
Member

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

Wouldn't adding packed here fix that?

Yes, but that requires changing the qemu's definition of this structure, and any other VMMs that use it.

We can also fix it by adding a 4-byte explicit padding field to be always 0 (reading an uninitialized explicit padding field is also UB)

Is the padding defined in the spec?
If not, QEMU should use packed.

@germag
Copy link
Collaborator Author

germag commented Nov 16, 2023

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

Wouldn't adding packed here fix that?

Yes, but that requires changing the qemu's definition of this structure, and any other VMMs that use it.
We can also fix it by adding a 4-byte explicit padding field to be always 0 (reading an uninitialized explicit padding field is also UB)

Is the padding defined in the spec? If not, QEMU should use packed.

Nop, is not defined in the spec, IMO the correct thing to do is to change the spec to packed as you said

@stefano-garzarella
Copy link
Member

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

Wouldn't adding packed here fix that?

Yes, but that requires changing the qemu's definition of this structure, and any other VMMs that use it.
We can also fix it by adding a 4-byte explicit padding field to be always 0 (reading an uninitialized explicit padding field is also UB)

Is the padding defined in the spec? If not, QEMU should use packed.

mmm, it looks QEMU is not using packed and the padding is not defined :-(
So, maybe the best option is to update the spec with the padding, but old QEMU version will never fill with 0.

@stefano-garzarella
Copy link
Member

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

Wouldn't adding packed here fix that?

Yes, but that requires changing the qemu's definition of this structure, and any other VMMs that use it.
We can also fix it by adding a 4-byte explicit padding field to be always 0 (reading an uninitialized explicit padding field is also UB)

Is the padding defined in the spec? If not, QEMU should use packed.

Nop, is not defined in the spec, IMO the correct thing to do is to change the spec to packed as you said

But this will break the compatibility with old version. Or should we support both here?

@stefano-garzarella
Copy link
Member

Because we will have the same UB problem if the VMM sends a struct with uninitialized padding fields, so we need a guarantee that those fields are initialized to 0 (or any other value)

Hm... But for us, reading the memory, the padding will have some value. It may be any value, but it should not be undefined behavior (assuming it is an int type), right? Or do I misunderstand something here? I do not understand why we need to define this to be initialized with a specific value. In my understand we do not care where the memory comes from. We only care that we may access size_of::<Struct>() bytes and that the Struct is only containing integers (no bools, references, ...).

for rust "abstract machine" an area of memory with random bit pattern it's not the same as uninitialized memory

Even if it is repr(C)?

@Ablu
Copy link
Collaborator

Ablu commented Nov 16, 2023

for rust "abstract machine" an area of memory with random bit pattern it's not the same as uninitialized memory

Well, the volatile-memory abstraction exists exactly because of that. So our read_obj() stuff accounts for Rust's memory rules. Anything (safe) that leaves the volatile-memory abstraction is a copy from the guest memory. So I see no problems from that - as long as it is repr(C) of course.

@germag
Copy link
Collaborator Author

germag commented Nov 16, 2023

Because we will have the same UB problem if the VMM sends a struct with uninitialized padding fields, so we need a guarantee that those fields are initialized to 0 (or any other value)

Hm... But for us, reading the memory, the padding will have some value. It may be any value, but it should not be undefined behavior (assuming it is an int type), right? Or do I misunderstand something here? I do not understand why we need to define this to be initialized with a specific value. In my understand we do not care where the memory comes from. We only care that we may access size_of::<Struct>() bytes and that the Struct is only containing integers (no bools, references, ...).

for rust "abstract machine" an area of memory with random bit pattern it's not the same as uninitialized memory

Even if it is repr(C)?

Yes

@germag
Copy link
Collaborator Author

germag commented Nov 16, 2023

for rust "abstract machine" an area of memory with random bit pattern it's not the same as uninitialized memory

Well, the volatile-memory abstraction exists exactly because of that. So our read_obj() stuff accounts for Rust's memory rules. Anything (safe) that leaves the volatile-memory abstraction is a copy from the guest memory. So I see no problems from that - as long as it is repr(C) of course.

I thought it only took into account the "volatility" of memory, basically not being able to get a reference to memory that can change behind your back, and not elide the reads, etc. but I'm not sure if that allows operating with uninitialized memory, I think are different things.

EDIT:
the vm-memory's VolatileSlice rely on std::ptr::read_volatile and one of the safety requirements is:

pub unsafe fn read_volatile<T>(src: *const T) -> T
...

- src must point to a properly initialized value of type T.

@stefano-garzarella
Copy link
Member

Summary of the PR

Currently, some vhost-user-messages structs are defined with just repr(packed), and also implement the ByteValue trait implying that they can be initialized safely from an array of bytes.

This is not safe, because the default memory representation does not guarantee that the fields will not be reordered.

This PR LGTM, thanks!

Note for reviewers

Another issue that I didn't address in this PR is the struct:

/// Single memory region descriptor as payload for ADD_MEM_REG and REM_MEM_REG
/// requests.
#[repr(C)]
#[derive(Copy, Clone, Default)]
pub struct VhostUserInflight {
    /// Size of the area to track inflight I/O.
    pub mmap_size: u64,
    /// Offset of this area from the start of the supplied file descriptor.
    pub mmap_offset: u64,
    /// Number of virtqueues.
    pub num_queues: u16,
    /// Size of virtqueues.
    pub queue_size: u16,
}
...
unsafe impl ByteValued for VhostUserInflight {}

this struct implements ByteValued, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB.

I think we should discuss this in a separated Issue/PR, but we can't use packed there since old QEMU version will not work.
(Or if we want to use packed we need to discard the padding when reading from the socket)

Currently, some vhost-user-messages structs are defined with just
'repr(packed)', and also implement the 'ByteValue' trait implying that
they can be initialized safely from an array of bytes.

This is not safe, because the default memory representation does not
guarantee that the fields will not be reordered.

Signed-off-by: German Maglione <gmaglione@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants