New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime-rs: update device pci info for vfio and virtio-blk devices #8284
runtime-rs: update device pci info for vfio and virtio-blk devices #8284
Conversation
Hey @amshinde Nice work on this PR! I'm also working on this feature, and I've submitted a WIP PR as well. My PR mainly implements the following features: (1) Due to the different ways that different VMMs handle PCI devices, we hope to provide a general PCIe topology Would you be interested in discussing the difference and a merge? Let me know what you think. Thx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amshinde - a few comments.
Also, although this works with the default pmem driver, it doesn't wfm using DM:
$ sudo ctr run --runtime "io.containerd.kata.v2" --rm -t quay.io/prometheus/busybox:latest foo /bin/true;echo $?
0
$ sudo ctr run --runtime "io.containerd.kata.v2" --snapshotter devmapper --rm -t quay.io/prometheus/busybox:latest foo-dm /bin/true;echo $?
ctr: failed to create shim task: Others("failed to handler message create container\n\nCaused by:\n 0: create\n 1: handler rootfs\n 2: new block rootfs\n 3: do handle device failed.\n 4: failed to add deivce\n 5: Unexpected PCI address \"0000:00:05.0\" for clh device add"): unknown
1
@@ -7,16 +7,19 @@ | |||
use super::inner::CloudHypervisorInner; | |||
use crate::device::DeviceType; | |||
use crate::BlockDevice; | |||
use crate::HybridVsockConfig; | |||
use crate::HybridVsockDevice; | |||
//use crate::HybridVsockConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out line.
//DeviceType::ShareFs(sharefs) => self.handle_share_fs_device(sharefs.config).await, | ||
//DeviceType::HybridVsock(hvsock) => self.handle_hvsock_device(&hvsock.config).await, | ||
//DeviceType::Block(block) => self.handle_block_device(block.config).await, | ||
//_ => Err(anyhow!("unhandled device: {:?}", device)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove these commented out lines.
Ok(()) | ||
} | ||
Err(e) => { | ||
self.decrease_attach_count().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails, we'll return the count error, not the original one (line below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as previous behaviour. If we want to change this, lets discuss. It could go as part of a separate change.
/cc @quanweiZhou, @studychao for input on the |
@Apokleos Yes, lets discuss this and figure out a common approach. |
fb382a4
to
dce711c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.with_context(|| format!("fail to attach passthrough fs {:?}", source)); | ||
|
||
match result { | ||
Ok(__) => Ok(()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
Ok(__) => Ok(()), | |
Ok(_) => Ok(()), |
? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed :) It still compiled with that!
dce711c
to
46c6795
Compare
/test |
This is now fixed with this PR. This PR fixes the device addition bug seen with:
as well as
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @amshinde comments as below.
It's mainly for the removed code generate_guest_pci_path
.
} | ||
if self.bus_mode == VfioBusMode::PCI { | ||
for hostdev in self.devices.iter() { | ||
if hostdev.guest_pci_path.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @amshinde, in case of dragonball, the guest_pci_path is always None, as dragonball doesn't return device info as clh does.
IMO, the None
case of hostdev.guest_pci_path
needs be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -47,29 +44,6 @@ lazy_static! { | |||
Arc::new(RwLock::new(HashMap::new())); | |||
} | |||
|
|||
// map host/guest bdf and the mapping saved into `HOST_GUEST_MAP`, | |||
// and return PciPath. | |||
pub fn generate_guest_pci_path(bdf: String) -> Result<PciPath> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed generate_guest_pci_path
which will help deduce the correct pci Path, it should be reserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess this would be used by dragonball. I'll add it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apokleos I have added generate_guest_pci_path
back in case it is usefull for dragonball. Plese take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @amshinde
46c6795
to
ae321c1
Compare
/test |
@Apokleos I have addressed your comments. |
@@ -40,7 +40,7 @@ pub enum DeviceType { | |||
ShareFs(ShareFsDevice), | |||
ShareFsMount(ShareFsMountDevice), | |||
HybridVsock(HybridVsockDevice), | |||
Vsock(VsockDevice), | |||
//Vsock(VsockDevice), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now taken out the Vsock value entirely from the enum. I have added an explanation for it in the commit message. Please take a look at it and let me know if it makes sense.
To summarize, Vsock value was removed as the Vsock device is not really used anywhere nor can it be used in its current state as it does not implement the Device trait. This means that it cannot be attached or detached from the hypervisor. What I can deduce from the code is that one of the reasons is that Vsock does not really implement the Clone trait which looks like a prerequisite for the Device trait, as the async trait functions require Device copies to be passed between them.(Vsock struct currently has a File handle which does not implement Clone, so this would require some rework) This is also the reason that I decided to remove Vsock for now, as the change I am introducing here requires devices to implement Clone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bergwolf The implementation is still present, I just removed it from the DeviceType enum. With its current implementation it neither supports attach or detach Device traits, meaning it cant really be hotplugged to a hypervisor. @pmores can add it back in the Device list once he adds support for qemu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gkurz Was not aware of this change drastiaclly breaking @pmores PR. Apologies, I was eager to get this landed to unblock folks to get started with implementing a CI for clh which is borked without this change.I had a look at @pmores PR for qemu support and here are my suggestions to handle the conflicts:
Like I mentioned I have removed VsockDevice from the DeviceType enum list. I think these enums should only specify devices that implement the Device
trait (attach/detach) and are passed down by the DeviceManager. None of these two are true for VsockDevice. VsockDevice is a special case device that is useful internally for qemu for agent communication(currently) and will never be handled/created by the Device Manager. Nor will the device really need to implement the Device trait with attach and detach since it does not make sense to hotplug/unplug this device. What we really want is initializing code to get the guest cid.
From what I have seen from a quick walkthrough of @pmores PR the breakage is due to the fact that he currently maintains a device vector of type DeviceType and stores the VsockDevice inside it which would not work due to change in my PR (and my argument above for the change). My suggestion would be since we only need Vsock for agent communication(for now), would be to store this as an expicit reference to VsockDevice in the QemuInner struct and not handle this in the device list that is semantically meant to handle devices that can be attached/detached. This can then be used in the get_agent_socket
and get_agent_vsock_dev
methods. Something like:
#[derive(Debug)]
pub struct QemuInner {
/// sandbox id
id: String,
qemu_process: Option<Child>,
config: HypervisorConfig,
devices: Vec<DeviceType>,
vsock: VsockDevice,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @amshinde ! I'm aware that the problem this PR creates in the qemu driver can be solved by special-casing, like just about any problem for that matter. I'm not sure though that this is warranted.
I'm not aware of the close semantic ties in the runtime-rs design and current practice that you describe between DeviceType
, the Device
trait and DeviceManager
. Also from Rust standpoint, DeviceType
is an enum so no relations among the variant types are implied. In short, I don't see that a DeviceType
variant implies cloneability or attach/detach functionality.
Also, could you explain the specific problem that DeviceType::Vsock
causes to you? If add_device()
now takes and returns a DeviceType
by value, I think you can do that with copyable and non-copyable types just as well.
What do other runtime-rs devs think, @bergwolf , @studychao , @Tim-0731-Hzt perhaps?
444b5c1
to
f665869
Compare
/test |
Block(virtio-blk) and vfio devices are currently not handled correctly by the agent as the agent is not provided with correct PCI paths for these devices. The PCI paths for these devices can be inferred from the PCI information provided by the hypervisor when the device is added. Hence changing the add_device trait function to return a device copy with PCI info potentially provided by the hypervisor. This can then be provided to the agent to correctly detect devices within the VM. This commit includes implementation for PCI info update for cloud-hupervisor for virtio-blk devices with stubs provided for other hypervisors. Removing Vsock from the DeviceType enum as Vsock currently does not implement the Device Trait, it has no attach and detach trait functions among others. Part of the reason is because these functions require Vsock to implement Clone trait as these functions need cloned copies to be passed down the hypervisor. The change introduced for returning a device copy from the add_device hypervisor trait explicitly requires a device to implement Copy trait. Hence removing Vsock from the DeviceType enum for now, as its implementation is incomplete and not currently used. Note, one of the blockers for adding the Clone trait to Vsock is that it currently includes a file handle which cannot be cloned. For Clone and Device Traits to be implemented for Vsock, it requires an implementation change in the future for it to be cloneable. Fixes: kata-containers#8283 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
If PCI path for block device is not empty for a block device, use that as identifier for agent instead of virt path which is valid only for mmio devices. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Remove earlier functionality that tries to assign PCI path to vfio devices from the host assuming pci slots to start from 1. Get this from the hypervisor instead. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
f665869
to
036b778
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @amshinde LGTM!
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @amshinde
This basically just a partial revert of PR kata-containers#8284, the only new code is adding Vsock support to DeviceType::try_clone(). At this point, if any DeviceType but Vsock is handled by an add_device() implementation it will be cloned via try_clone() if necessary as before. If Vsock is ever passed to add_device() the call will properly return an error since Vsock devices apparently aren't meant to be handled by add_device() anyway. Fixes: kata-containers#8413 Signed-off-by: Pavel Mores <pmores@redhat.com>
runtime-rs: change hypervisor add_device trait to return device copy
Block(virtio-blk) and vfio devices are currently not handled correctly
by the agent as the agent is not provided with correct PCI paths for
these devices.
The PCI paths for these devices can be inferred from the PCI information
provided by the hypervisor when the device is added.
Hence changing the add_device trait function to return a device copy
with PCI info potentially provided by the hypervisor. This can then be
provided to the agent to correctly detect devices within the VM.
This commit includes implementation for PCI info update for
cloud-hupervisor for virtio-blk devices with stubs provided for other
hypervisors.
This PR fixes virtio-block device addition bug. The following works now:
as well as
The PR also fixes bugs around vfio device addition. End to end vfio device add now works:
Fixes: #8283