New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime-rs: refactor qemu driver #9353
Conversation
The idea of this function is to make sure O_CLOEXEC is not set on file descriptors that should be inherited by a child (=hypervisor) process. The approach so far is however rather heavy-handed - clearing *all* flags is unjustifiably aggresive for a low-level function with no knowledge of context whatsoever. This commit refactors the function so that it only does what's expected and renames it accordingly. It also clarifies some of its call sites. Signed-off-by: Pavel Mores <pmores@redhat.com>
generate_netdev_fds() takes NetworkConfig from which it however only needs a host-side network device name. This commit makes it take the device name directly, making the function useful to callers who don't have the whole NetworkConfig but do have the requisite device name. Signed-off-by: Pavel Mores <pmores@redhat.com>
In keeping with architecture of QemuCmdLine implementation we split the functionality into two objects: Netdev to represent and generate the -netdev part and DeviceVirtioNet for the -device virtio-net-<transport> part. This change is a pure refactor, existing functionality does not change. However, we do remove some stub generalizations and govmm-isms, notably: - we remove the NetDev enum since the only network interface types that kata seems to use with qemu are tuntap and macvtap, both of which are implemented by the same -netdev tap - enum DeviceDriver is also left out since it doesn't seem reasonable to try to represent VFIO NICs (which are completely different from virtio-net ones) with the same struct as virtio-net - we also remove VirtioTransport because there's no use for it so far, but with the expectation that it will be added soon. We also make struct Netdev the owner of any vhost-net and queue file descriptors so that their lifetime is tied ultimately to the lifetime of QemuCmdLine automatically, instead of returning the fds to the caller and forcing it to achieve the equivalent functionality but manually. Signed-off-by: Pavel Mores <pmores@redhat.com>
…tioNet This commit replaces the existing NetDevice-based implementation with one using Netdev and DeviceVirtioNet. Signed-off-by: Pavel Mores <pmores@redhat.com>
is_running_in_vm() is enough to figure out whether to disable_modern but it's clumsy and verbose to use. should_disable_modern() streamlines the usage by encapsulating the verbosity. Signed-off-by: Pavel Mores <pmores@redhat.com>
add_network_device() doesn't need to be passed NetworkInfo since it already has access to the full HypervisorConfig. Also, one of the goals of QemuCmdLine interface's design is to avoid coupling between QemuCmdLine and the hypervisor crate's device module, if at all possible. That's why add_network_device() shouldn't take device module's NetworkConfig but just parts that are useful in add_network_device()'s implementation. Signed-off-by: Pavel Mores <pmores@redhat.com>
Make file descriptors to be passed to qemu owned by QemuCmdLine. See commit 52958f17cd for more explanation. Signed-off-by: Pavel Mores <pmores@redhat.com>
The remaining code in network.rs was mostly moved to utils.rs which seems better home for these utility functions anyway (and a closely related function open_named_tuntap() has already lived there). ToString implementation for Address was removed after some consideration. Address should probably ideally implement Display (as per RFC 565) which would also supply a ToString implementation, however it implements Debug instead, probably to enable automatic implementation of Debug for anything that Address is a member of, if for no other reason. Rather than having two identical functions this commit simply switches to using the Debug implementation for printing Address on qemu command line. Fixes kata-containers#9352 Signed-off-by: Pavel Mores <pmores@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @pmores. I was also thinking of why we need a separate network.rs
for the command construction. I also verified that runtime-rs works with this change on s390x.
Thanks for your review @BbolroC ! I'm not saying though that we won't ever split it into a bunch of files at some point! When that happens though I'd like it to be done in a clean and symmetric way. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments but I don't see anything that would prevent merging, so let's go for it !
Thanks @pmores !
cur_flags | ||
))?; | ||
new_flags.remove(fcntl::FdFlag::FD_CLOEXEC); | ||
if let Err(err) = fcntl::fcntl(rawfd, fcntl::FcntlArg::F_SETFD(new_flags)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth to note that FD_CLOEXEC has been the only flag forever and I'm not sure it will ever change 😉 but this is indeed how things should done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this might look a bit pedantic. Still, not everybody is necessarily aware of how many possible flags there are (I know I wasn't until fairly recently ;-)), so I considered it better to state the intent directly, for the benefit of those readers if for no other reason.
impl DeviceVirtioNet { | ||
fn new(netdev_id: &str, mac_address: Address) -> DeviceVirtioNet { | ||
DeviceVirtioNet { | ||
device_driver: "virtio-net-pci".to_owned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this only works for PCI based platforms, i.e. x86_64 and POWER. s390x would need "virtio-net-ccw"
here. DeviceVirtioNet
should be added a bus_type: VirtioBusType
field just like DeviceVirtioBlk
for this to work.
I'm not asking this to be addressed now since this PR claims to be pure refactoring.
Cc @BbolroC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This is unfortunately how much of this kind of work has to be done though, now and in the future, since I can't really develop support for archs I have no access to and thus cannot test. Exactly as you write, this is partial support for PCI only, to be completed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I can see the following log from go-runtime:
-device driver=virtio-net-ccw,netdev=network-0,mac=d6:f0:4c:10:8d:65,mq=on,devno=fe.0.0006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there isn't an actual scenario that mandates to have a PCI device on the QEMU command line for s390x, I'd rather not support PCI at all there. @BbolroC can hopefully shed some light on what is needed for Z.
}; | ||
if nested { | ||
virtio_net_device.set_disable_modern(true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kinda invalidates the "No changes in actual behaviour are foreseen, this is a pure refactor."
claim in the PR description but since this is a valuable fix and this is still experimental code, I guess it's okay to hijack the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I didn't realise that.
@@ -65,15 +64,11 @@ impl QemuInner { | |||
info!(sl!(), "Starting QEMU VM"); | |||
let netns = self.netns.clone().unwrap_or_default(); | |||
|
|||
// CAUTION: since 'cmdline' contains file descriptors that have to stay | |||
// open until spawn() is called to launch qemu later in this function, | |||
// 'cmdline' has to live at least until spawn() is called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would obviously be the case if ownership goes to the command
variable using https://crates.io/crates/command-fds and this comment wouldn't be needed.
Possible improvement in a follow-up PR. In anycase, the change does the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I only added the comment because I thought you'd appreciate it. ;-)
This was discussed off-line so just to summarise my stance for the public record: for this to become a problem, someone would need to drop cmdline
and before doing so, store the cmdline
's output somewhere. I don't quite see how this could happen inadvertently, seeing as someone would need to work to make the problem happen.
Also, if that did actually happen somehow, the problem would be very conspicuous since it would break everything (among other problems, it would prevent runtime-rs from talking with kata-agent so launch would fail immediately and consistently).
All in all, if I'm wrong and this problem actually materialises I'll be very curious to know how it happened.
/test |
Since PR #8866 has been merged the qemu driver in runtime-rs features a reasonable support of networking. However the implementation leaves a bunch of things to be desired. It looks much more like govmm than runtime-rs's own qemu code, contains numerous stub abstractions that are unused and only make code harder to understand, conflicts with
QemuCmdLine
interface & implementation principles etc.This PR aims to fix those shortcomings and make network device support an integral part of qemu-rs. No changes in actual behaviour are foreseen, this is a pure refactor.
As a logical part of this effort, ownership and management of file descriptors to be passed to qemu is moved to QemuCmdLine (so far they have been managed by the
QemuInner::start_vm()
function's topmost scope which was quite inappropriate given the way, way higher level of abstraction this function operates at). Also this change is just a refactor, the expected runtime behaviour of the code does not change (the descriptors were and are closed onstart_vm()
exit).