New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance virtio subsystem for better comformance to virtio spec #809
Enhance virtio subsystem for better comformance to virtio spec #809
Conversation
The VirtIO Spec 1.0, sec 2.4 defines the alignment constraints for VirtIO queue's descriptor table, available ring and used ring. So enforce the alignment constraints in Queue::is_valid(). Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
nit: change the commit summary of bbe0b4c from "virtio: fix a potential attack channel through MMIO transport layer" to something less sensational like: "virtio: harden the MMIO transport spec enforcements" or "virtio: enhance for better comformance to virtio spec". |
Rust style test failed, please run |
563a5f5
to
db07077
Compare
thanks for review, I have updated the changeset according to the review comments:) |
devices/src/virtio/mmio.rs
Outdated
LittleEndian::write_u32(&mut buf[..], 0x8f); | ||
d.write(0x70, &buf[..]); | ||
assert_eq!(d.driver_status, 0x8f); | ||
// TODO: assert!(d.device_activated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why assert!(d.device_activated)
again? Is there any path to de-activate an activated device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reminder that there still two cases unhandled yet:
- virtio device driver writes FALIED flags to driver_status field;
- virtio device driver writes 0 to driver_status fields to reset device;
Still thinking about how to handle above two state transitions which may affect the device_activated flag. The block issue is that VirtioDevice::reset() is defined as optional and the net/block virtio backend driver doesn't support reset yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create issues for outstanding TODOs and reference them in the codebase like:
// TODO (#xxx): issue-xxx-summary
Having said that, please be aware that we've yet found no customer need to support device reset.
devices/src/virtio/mmio.rs
Outdated
|
||
LittleEndian::write_u32(&mut buf[..], 0x0); | ||
d.write(0x70, &buf[..]); | ||
// TODO: assert_eq!(d.driver_status, 0x0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a valid check, uncomment it.
@@ -322,9 +322,6 @@ impl Queue { | |||
|
|||
/// A consuming iterator over all available descriptor chain heads offered by the driver. | |||
pub fn iter<'a, 'b>(&'b mut self, mem: &'a GuestMemory) -> AvailIter<'a, 'b> { | |||
if !self.is_valid(mem) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure about this. Your logic is right, but we need to be very confident there is no way to alter any of these queue params after device activation, otherwise removing this opens up a security hole.
Other reviewers, please think hard on this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed all accesses of fields validated by this function and they are indeed not modified anywhere after VirtDevice::activate()
👍
memory_model/src/guest_memory.rs
Outdated
where | ||
F: FnMut(&MemoryMapping, usize, usize) -> Result<()>, | ||
{ | ||
// Fast path: content doesn't cross MemoryRegion boundary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast path is the same code as do_in_region()
. Why not call that here directly and in case of Err
move to the slow-path.
Would help de-duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do_in_region_full() has been removed.
== (DEVICE_ACKNOWLEDGE | DEVICE_DRIVER | DEVICE_FEATURES_OK) => | ||
{ | ||
self.driver_status = v; | ||
if !self.device_activated && self.are_queues_valid() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment here:
// If the driver incorrectly sets up the queues, the following check will fail and take the device into an unusable state.
I think it's important to call out this scenario since it's something we can't currently recover from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
memory_model/src/guest_memory.rs
Outdated
@@ -424,6 +451,52 @@ impl GuestMemory { | |||
} | |||
Err(Error::InvalidGuestAddress(guest_addr)) | |||
} | |||
|
|||
/// Read the whole content from one or more MemoryRegions | |||
fn do_in_region_full<F>(&self, guest_addr: GuestAddress, size: usize, mut cb: F) -> Result<()> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the point to this (big) function. It only adds value when trying to read/write an object which spans across multiple adjacent regions.
We never configure adjacent memory regions. There is either:
- A single contiguous region.
- A region followed by a hole followed by a second (non-adjacent) region.
I would drop do_in_region_full()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. It only matters when enabling memory hot-adding. Will remove do_in_region_full().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi acatangiu,
I just sent out an RFC to complete the virtio device state machine, which may help to review related patches. Will address other review comments later.
Thank!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting an object across memory regions only makes sense if the regions are adjacent. In our case, they aren't. If they were, we could simply merge them into a single region and remove some implementation complexity.
There are at most two memory regions, separated by the 32-bit PCI gap. Check out vmm/src/lib.rs::init_guest_memory
and x86_64/src/lib.rs::arch_memory_regions
for the mem setup code.
devices/src/virtio/mmio.rs
Outdated
if self.check_driver_status(DEVICE_FEATURES_OK, DEVICE_DRIVER_OK | DEVICE_FAILED) { | ||
self.with_queue_mut(f); | ||
} else { | ||
#[cfg(not(release))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn t we want to log this warning on release versions? The MmioDevice::write()
and MmioDevice::read()
have similar log messages and we do log them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to avoid injecting to many log messages to the log system by a malicious guest kernel. And it's ok to remove the conditional compilation.
devices/src/virtio/mmio.rs
Outdated
fn update_driver_status(&mut self, v: u32) { | ||
// match changed bits | ||
match !self.driver_status & v { | ||
DEVICE_ACKNOWLEDGE if self.driver_status == 0 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you say we declare a constant for the 0 value (i.e the value that is used to initialize the driver_status field).
Ok(MmioDevice {
device: device,
device_activated: false,
features_select: 0,
acked_features_select: 0,
queue_select: 0,
interrupt_status: Arc::new(AtomicUsize::new(0)),
interrupt_evt: Some(EventFd::new()?),
driver_status: 0,
config_generation: 0,
queues: queues,
queue_evts: queue_evts,
mem: Some(mem),
})
Even if the probability is small, If we were to change this value to something else in the init code, the state machine algorithm would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if self | ||
.check_driver_status(DEVICE_DRIVER, DEVICE_FEATURES_OK | DEVICE_FAILED) | ||
{ | ||
self.device.ack_features(self.acked_features_select, v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if after setting the DEVICE_DRIVER bit during initialization, a reset is issued by the guest os?
Should we still acknowledge features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A RESET command will cause the device goes into INIT state, and the guest OS need to reinitialize the device from start. So we don't acknowledge the features after a RESET until DEVICE_DRIVER is set again.
_ => { | ||
warn!("unknown virtio mmio register write: 0x{:x}", offset); | ||
return; | ||
} | ||
} | ||
} | ||
0x100...0xfff => return self.device.write_config(offset - 0x100, data), | ||
0x100...0xfff => { | ||
if self.check_driver_status(DEVICE_DRIVER, DEVICE_FAILED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the spec I understand that the config space can only be written after FEATURES_OK
has been set:
Step 4 from 3.1.1 section:
4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
fields to check that it can support the device before accepting it.
Step 7 from 3.1.1
7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and population of virtqueues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, so we only the device state on MmioDevice::write() and let it go in MmioDevice::read().
devices/src/virtio/mmio.rs
Outdated
LittleEndian::write_u32(&mut buf[..], 0x8f); | ||
d.write(0x70, &buf[..]); | ||
assert_eq!(d.driver_status, 0x8f); | ||
// TODO: assert!(d.device_activated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a TODO? Same for the other following TODOs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These TODOs have been covered by the last RFC patch.
memory_model/src/guest_memory.rs
Outdated
} | ||
|
||
/// Returns the address plus the offset if it is in range. | ||
pub fn checked_offset(&self, addr: GuestAddress, offset: usize) -> Option<GuestAddress> { | ||
addr.checked_add(offset) | ||
.and_then(|a| if a < self.end_addr() { Some(a) } else { None }) | ||
if let Some(tgt) = addr.checked_add(offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be nice to rename tgt
to something more intuitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed addr as base and tgt as addr, so it becomes addr = base + offset.
By using the above command you can check the coverage of the whole project:
Unfortunately, we still do not have a dynamic process in place for making sure that the coverage does not decrease from a commit to another. |
Great, will handle the coverage issue in next round. |
devices/src/virtio/mmio.rs
Outdated
@@ -183,6 +183,22 @@ impl MmioDevice { | |||
} | |||
} | |||
|
|||
fn reset(&mut self) { | |||
assert!(!self.device_activated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do assert
in release code. I suggest logging an error and moving on (in this case return
-ing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the leftover assert!
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for my mistake, fixed:)
devices/src/virtio/mmio.rs
Outdated
self.interrupt_status.store(0, Ordering::SeqCst); | ||
self.driver_status = 0; | ||
// . Keep interrupt_evt and queue_evts as is. There may be pending | ||
// notificaitons in those eventfds, but nothing will happen other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo in notifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me (including the final RFC commit handling device reset). Please address outstanding comments and I will approve the PR.
@@ -322,9 +322,6 @@ impl Queue { | |||
|
|||
/// A consuming iterator over all available descriptor chain heads offered by the driver. | |||
pub fn iter<'a, 'b>(&'b mut self, mem: &'a GuestMemory) -> AvailIter<'a, 'b> { | |||
if !self.is_valid(mem) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed all accesses of fields validated by this function and they are indeed not modified anywhere after VirtDevice::activate()
👍
The MMIO based VirtIO devices support the VirtIO 1.0 capability, but the code is a little hard to be understood. Improve code readability by adding a comment. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
The VirtIO queue code in queue.rs assumes all data structures, including the descriptor table, available ring and used ring, are consistent and legal once Queue::is_valid() returns true for an queue. If this assumption is broken, it may cause panic. On the other hand, most fields in a working queue object, including size, ready, desc_table, avail_ring and used_ring, could be modified by the MmioDevice::write() method. That means an untrusted guest kernel could easily break the assumption made by queue.rs, and then cause the firecracker hypervisor panic. So enhance the VirtIO MMIO transport driver to strictly follow the state machine defined in the VirtIO Spec IO. It closes the way for an untrusted guest kernel to modify firecracker internal data structures. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
The lifecycle of VirtIO queue object is as below: 1) MMIO transport driver creates queues 2) MMIO transport validates all queues are valid by calling Queue::is_valid() 3) Passing queues to VirtDevice by calling VirtDevice::activate() 4) VirtDevice device driver creates iterator to access available descritpors 5) MMIO transport reset the VirtDevice by calling VirtDevice::reset() 6) MMIO transport changes/destroys queues The MMIO transport driver won't touch those queues between stage 3 and stage 5, so Queue::is_valid will always return true during stage 4. Get rid of the redundant invocation of Queue::is_valid() in Queue::iter(). Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Fetch the entire VirtIO descriptor all at once instead of fetching each field one by one. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Split the memory copy operations into to cases: 1) Full copy by do_in_region() which must copy all requested memory content or fails. Pass an additional parameter 'size' into function do_in_region(), and validate memory region before copying memory content. By this way the destination memory will be either fully copied or untouched. 2) Partial copy by do_in_region_patial() which copies memory as much as possible and returns size of copied memory content. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
GuestMemory::address_in_range() and GuestMemory::checked_offset() assumes memory address is continuous. It's a dangerous assumption because there is a black hole just below 4G when guest memory is about 4G or more. Then it may cause invalid memory access in VirtIO subsystem. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Complete the virtio device state machine by: 1) Handle writing FAILED flag into "driver status" control field. 2) Handle writing 0 into "driver status" control field to reset the device. If the the backend driver supports reset operation, the device will be reset. Otherwise the device will be put into FAILED state. With these changes, the virtio mmio transport layer implements a full state machine according to Virtio Spec. The backend driver is still optional to support reset operation. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
@jiangliu I see your comments on our review, but I think you forgot to push the changes as well 😄 |
7a9aeaf
to
0f720b6
Compare
|
Hi, |
@jiangliu |
@acatangiu the test is failing as part of our CI. We should take a look. |
Oooopsie, now I see that it actually failed in our C.I. |
@jiangliu I think it was just a system load hiccup. Please rebase your changes against latest upstream master |
e23954a
to
c3a0b37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the RFC
from the last commit summary. Code changes look good 👍 !
c3a0b37
to
680b910
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @jiangliu - thank you!
Issue #, if available:
Description of changes:
There are some bugs in virtio subsystem which may cause firecracker hypervisor panicking, so enhance the virtio subsystem for better Spec conformance and avoid panic.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.