-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VM disk hotplug issue (running out of hotplug slots) #1086
Comments
Can you show a full Also, any chance you can test on an up to date version of Incus (6.0.1 for LTS, 6.3 for non-LTS)? |
I'm a bit time/resource constrained at the moment but will have a go.
Looking at the code I did wonder if having other virtiofs shares already
defined at start up was part of the problem?
…On Thu, 8 Aug 2024, 07:10 Stéphane Graber, ***@***.***> wrote:
***@***.***:~$ incus launch images:ubuntu/24.04 v1 --vm
Launching v1
***@***.***:~$ incus config device add v1 etc disk source=/etc/ path=/mnt/etc
Device etc added to v1
***@***.***:~$ incus exec v1 -- df -h /mnt/etc
Filesystem Size Used Avail Use% Mounted on
incus_etc 90G 25G 61G 29% /mnt/etc
***@***.***:~$
Can you show a full incus config show --expanded ostree3?
Also, any chance you can test on an up to date version of Incus (6.0.1 for
LTS, 6.3 for non-LTS)?
—
Reply to this email directly, view it on GitHub
<#1086 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4TJUUFUGEZPJQS45DWMALZQMDULAVCNFSM6AAAAABMFEZTAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZVGAZDKOJSGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I can't easily upgrade to 6.0.1 - I have some VMs running I really don't want to stop. I will look at spinning up another incus install somewhere though. For the moment, I did create a fresh VM, started it, added two virtiofs mounts OK. Shut it down and restarted, added one more OK, then the fourth failed This might be a question of working backwards from the code to work out what the failing condition is.. |
Here's the PCI topology after the tests described above:
|
Oops, sorry, missed this. I assume you meant for the VM, not the (failed) virtiofs disk device:
|
We don't restart workloads on upgrade, only the control plane (API) goes down during the upgrade. |
Anyway, it's likely caused by the high-ish number of devices. |
Oh, worth knowing, thanks! I did wonder if that was the case but couldn't quickly turn up the right docs.
My brain isn't working brilliantly at the moment, but poking around in the code I did wonder if it should be trying to add devices to one of the existing buses rather than create a new bus? I also noticed the block devices don't seem to end up in qemu.conf - I assume they're now set up using the qemu monitor when the vm is started? I wondered if the virtiofs stuff could take the same approach - at the moment I think there are different code paths for hotplug vs pre-configured mounts? But I very much did get lost in the code ... |
BTW, can confirm this does happen on 6.0.1 too. |
I'll need to look at the logic again, but I thought we made all the disks be hotplug as we want the ability to add/remove them. The way things are supposed to work is that at startup we allocate PCIe root addresses for all the stuff that we're going to hotplug through QMP. Then we allocate an additional 8 PCIe root addresses to allow for things to be added later on. We can't alter the PCIe root once the VM is running so given that limited hotplug/hotremove works, the core of the logic seems fine. I suspect we just have an issue where we're somehow not properly pre-allocating some stuff, basically making your boot time disks already use the "spare" slots, at which point you'd have run out of slot and get the error. So basically a few different things:
|
Looking into this one now |
Starting with an empty VM and attempting to add 10 disks which require PCIe address (using io.bus=nvme to force that), I'm getting:
So we can see that we can add at most 3 additional devices before running out of slots. I'll had to tweak things a bit because the number is supposed to be 4, not 3 and we definitely want a much nicer error when hitting the limit of remaining hotplug slots. |
Doing a quick test here after doubling the number of hotplug slots to 8, we can see that there's something off in the logic as the first hotplug slot isn't used:
|
Okay, so that's where we get to what you were pointing out earlier, basically the logic is a bit limited in that it basically assumes that every device in the devices list uses a PCIe slot. It doesn't consider the fact that devices that were present at boot time don't count towards the hotplug quota nor that a number of devices simply don't need a PCIe address at all. The cleanest option would be to fetch a list of addresses from QEMU directly, I'm going to look at what |
Interesting, we actually do have a mapping for |
Got a reliable way to handle things which is also much simpler than the current logic, win win. |
Use QMP PCI slot information rather than guessing at usable PCIe slots. Closes lxc#1086 Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
Use QMP PCI slot information rather than guessing at usable PCIe slots. Closes #1086 Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
Required information
Distribution: incus in docker container running on Balena (!)
The output of "incus info" - incus-info.txt
Issue description
virtiofs hotplug seems to fail whenever I try to use it:
I'm wondering if this is a logic error in the code here:
incus/internal/server/instance/drivers/driver_qemu.go
Lines 2311 to 2332 in 3ce8af0
I don't fully understand it, but if I look in the raw qemu config for this VM, this existing entries for all the virtiofs/9p devices show the bus as "qemu_pcie2", which makes me think this code should be doing .. something else!
The text was updated successfully, but these errors were encountered: