You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The template below is mostly useful for bug reports and support questions.
Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Required information
Click to see full
Distribution: Ubuntu
Distribution version: 22.04
The output of
lxc-start --version
5.0.0~git2209-g5a7b9ce67
lxc-checkconfig
LXC version 5.0.0~git2209-g5a7b9ce67
Kernel configuration not found at /proc/config.gz; searching...
Kernel configuration found at /boot/config-6.2.0-26-generic
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled
--- Control groups ---
Cgroups: enabled
Cgroup namespace: enabled
Cgroup v1 mount points:
Cgroup v2 mount points:
/sys/fs/cgroup
Cgroup v1 systemd controller: missing
Cgroup v1 freezer controller: missing
Cgroup ns_cgroup: required
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled
--- Misc ---
Veth pair device: enabled, not loaded
Macvlan: enabled, not loaded
Vlan: enabled, not loaded
Bridges: enabled, loaded
Advanced netfilter: enabled, loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, not loaded
FUSE (for use with lxcfs): enabled, not loaded
--- Checkpoint/Restore ---
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities:
Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig
uname -a
Linux q1 6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
I have an Ubuntu:22.04 system with multiple GPUs, NVIDIA drivers 535 server installed, persistence mode is off. I pass individual GPUs to a VM as PCI passthrough. When I pass a single GPU to the VM, start it and then stop, the GPU is not returned to the host system (i.e. nvidia-smi does not show it anymore). When I pass multiple GPUs to the VM, start it and then stop, the GPU with the lowest PCI address on the host is not returned to the host system (i.e. nvidia-smi does not show it anymore), but the other GPUs get returned just fine.
Restarting the VMs again the GPUs are visible inside the VM, but if I start a container with nvidia-driver passthrough, only the GPUs that are currently visible on the host (i.e. all installed minus those that were not returned from the VMs earlier) are visible in the container. The only info I can find is that syslog says "Failed to stop device".
Steps to reproduce
run nvidia-smi -L on host
create VM with single GPU via passthrough
start VM
stop VM
run nvidia-smi -L on host (the GPU that was passthrough to the VM will not be listed)
create VM with multiple GPUs via passthrough
start VM
stop VM
run nvidia-smi -L on host (the GPU will the lowest PCI address on the host that was passthrough to the VM will also not be listed)
run container with nvidia-driver passthrough (same status as on the host)
Information to attach
Click to see full
VM log (lxc info --show-log vm2)
Name: vm2
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Created: 2023/08/07 22:14 UTC
Last Used: 2023/08/07 23:19 UTC
Log:
qemu-system-x86_64: Issue while setting TUNSETSTEERINGEBPF: Invalid argument with fd: 83, prog_fd: -1
any relevant kernel output (syslog), the single GPU case
Aug 7 23:18:53 q1 kernel: [ 846.086912] vfio-pci 0000:ca:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none
Aug 7 23:18:53 q1 snapd[2334]: udevmon.go:149: udev event error: Unable to parse uevent, err: cannot parse libudev event: invalid env data
Aug 7 23:18:54 q1 kernel: [ 846.548326] xhci_hcd 0000:ca:00.2: remove, state 4
Aug 7 23:18:54 q1 kernel: [ 846.548343] usb usb10: USB disconnect, device number 1
Aug 7 23:18:54 q1 kernel: [ 846.549060] xhci_hcd 0000:ca:00.2: USB bus 10 deregistered
Aug 7 23:18:54 q1 kernel: [ 846.549083] xhci_hcd 0000:ca:00.2: remove, state 4
Aug 7 23:18:54 q1 kernel: [ 846.549091] usb usb9: USB disconnect, device number 1
Aug 7 23:18:54 q1 kernel: [ 846.550896] xhci_hcd 0000:ca:00.2: USB bus 9 deregistered
Aug 7 23:18:54 q1 kernel: [ 846.653021] kauditd_printk_skb: 9 callbacks suppressed
Aug 7 23:18:54 q1 kernel: [ 846.653026] audit: type=1400 audit(1691450334.129:54): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-vm2_</var/snap/lxd/common/lxd>" pid=5316 comm="apparmor_parser"
Aug 7 23:18:53 q1 snapd[2334]: message repeated 3 times: [ udevmon.go:149: udev event error: Unable to parse uevent, err: cannot parse libudev event: invalid env data]
Aug 7 23:18:55 q1 systemd[3823]: Started snap.lxd.lxc.b9b13195-c7c3-46d4-842a-856565db2c99.scope.
Aug 7 23:19:13 q1 kernel: [ 865.800363] vfio-pci 0000:ca:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 7 23:19:13 q1 kernel: [ 865.800386] vfio-pci 0000:ca:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 7 23:19:46 q1 systemd[3823]: Started snap.lxd.lxc.0a424bc8-95d2-4cb9-bdd0-468d3dbce737.scope.
Aug 7 23:19:51 q1 systemd[3823]: Started snap.lxd.lxc.63564057-7dd7-462c-9548-3a5153ddd1e7.scope.
Aug 7 23:19:51 q1 systemd[1]: Starting Cleanup of Temporary Directories...
Aug 7 23:19:51 q1 systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Aug 7 23:19:51 q1 systemd[1]: Finished Cleanup of Temporary Directories.
Aug 7 23:19:54 q1 kernel: [ 907.246377] vfio-pci 0000:ca:00.0: Relaying device request to user (#0)
Aug 7 23:20:01 q1 kernel: [ 913.710624] vfio-pci 0000:ca:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Aug 7 23:20:01 q1 kernel: [ 913.711376] vfio-pci 0000:ca:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Aug 7 23:20:01 q1 lxd.daemon[3076]: time="2023-08-07T23:20:01Z" level=error msg="Failed to stop device" device=gpu3 err="Failed probing device \"0000:ca:00.0\" via \"/sys/bus/pci/drivers_probe\": write /sys/bus/pci/drivers_probe: invalid argument" instance=vm2 instanceType=virtual-machine project=default
Aug 7 23:20:01 q1 systemd-networkd[2222]: mac6293c2ac: Link DOWN
Aug 7 23:20:01 q1 systemd-networkd[2222]: mac6293c2ac: Lost carrier
Aug 7 23:20:01 q1 kernel: [ 913.898141] audit: type=1400 audit(1691450401.373:55): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-vm2_</var/snap/lxd/common/lxd>" pid=10366 comm="apparmor_parser"
Aug 7 23:32:42 q1 systemd[3823]: Started snap.lxd.lxc.44a0582a-97eb-4f56-9149-a7b6f2afec5b.scope.
any relevant kernel output (syslog), two GPU case
Aug 7 23:45:38 q1 kernel: [ 2450.861745] vfio-pci 0000:17:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none
Aug 7 23:45:38 q1 snapd[2334]: udevmon.go:149: udev event error: Unable to parse uevent, err: cannot parse libudev event: invalid env data
Aug 7 23:45:38 q1 kernel: [ 2451.339448] xhci_hcd 0000:17:00.2: remove, state 4
Aug 7 23:45:38 q1 kernel: [ 2451.339464] usb usb4: USB disconnect, device number 1
Aug 7 23:45:38 q1 kernel: [ 2451.340164] xhci_hcd 0000:17:00.2: USB bus 4 deregistered
Aug 7 23:45:38 q1 kernel: [ 2451.340188] xhci_hcd 0000:17:00.2: remove, state 4
Aug 7 23:45:38 q1 kernel: [ 2451.340197] usb usb3: USB disconnect, device number 1
Aug 7 23:45:38 q1 kernel: [ 2451.341944] xhci_hcd 0000:17:00.2: USB bus 3 deregistered
Aug 7 23:45:40 q1 kernel: [ 2453.384621] vfio-pci 0000:31:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none
Aug 7 23:45:41 q1 kernel: [ 2453.867449] xhci_hcd 0000:31:00.2: remove, state 4
Aug 7 23:45:41 q1 kernel: [ 2453.867464] usb usb6: USB disconnect, device number 1
Aug 7 23:45:41 q1 kernel: [ 2453.868123] xhci_hcd 0000:31:00.2: USB bus 6 deregistered
Aug 7 23:45:41 q1 kernel: [ 2453.868144] xhci_hcd 0000:31:00.2: remove, state 4
Aug 7 23:45:41 q1 kernel: [ 2453.868151] usb usb5: USB disconnect, device number 1
Aug 7 23:45:41 q1 kernel: [ 2453.869683] xhci_hcd 0000:31:00.2: USB bus 5 deregistered
Aug 7 23:45:41 q1 kernel: [ 2453.966981] audit: type=1400 audit(1691451941.446:56): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-vm2_</var/snap/lxd/common/lxd>" pid=11010 comm="apparmor_parser"
Aug 7 23:46:00 q1 kernel: [ 2472.883434] vfio-pci 0000:17:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 7 23:46:00 q1 kernel: [ 2472.883457] vfio-pci 0000:17:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 7 23:46:00 q1 kernel: [ 2473.055433] vfio-pci 0000:31:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 7 23:46:00 q1 kernel: [ 2473.055455] vfio-pci 0000:31:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 7 23:45:40 q1 snapd[2334]: message repeated 7 times: [ udevmon.go:149: udev event error: Unable to parse uevent, err: cannot parse libudev event: invalid env data]
Aug 7 23:47:16 q1 systemd[3823]: Started snap.lxd.lxc.c918eda7-03e8-4d84-9cb2-c9e1b4d6bfa2.scope.
Aug 7 23:49:01 q1 kernel: [ 2653.889634] vfio-pci 0000:31:00.0: Relaying device request to user (#0)
Aug 7 23:49:08 q1 kernel: [ 2660.602855] vfio-pci 0000:31:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Aug 7 23:49:08 q1 kernel: [ 2660.603292] nvidia 0000:31:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Aug 7 23:49:08 q1 kernel: [ 2660.690297] snd_hda_intel 0000:31:00.1: Disabling MSI
Aug 7 23:49:08 q1 kernel: [ 2660.690325] snd_hda_intel 0000:31:00.1: Handle vga_switcheroo audio client
Aug 7 23:49:08 q1 kernel: [ 2660.714786] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:30/0000:30:02.0/0000:31:00.1/sound/card0/input19
Aug 7 23:49:08 q1 kernel: [ 2660.714916] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:30/0000:30:02.0/0000:31:00.1/sound/card0/input20
Aug 7 23:49:08 q1 kernel: [ 2660.715088] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:30/0000:30:02.0/0000:31:00.1/sound/card0/input21
Aug 7 23:49:08 q1 kernel: [ 2660.715283] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:30/0000:30:02.0/0000:31:00.1/sound/card0/input22
Aug 7 23:49:08 q1 snapd[2334]: udevmon.go:149: udev event error: Unable to parse uevent, err: cannot parse libudev event: invalid env data
Aug 7 23:49:08 q1 kernel: [ 2660.726602] xhci_hcd 0000:31:00.2: xHCI Host Controller
Aug 7 23:49:08 q1 kernel: [ 2660.726615] xhci_hcd 0000:31:00.2: new USB bus registered, assigned bus number 3
Aug 7 23:49:08 q1 kernel: [ 2660.727221] xhci_hcd 0000:31:00.2: hcc params 0x0180ff05 hci version 0x110 quirks 0x0000000000000010
Aug 7 23:49:08 q1 kernel: [ 2660.727606] xhci_hcd 0000:31:00.2: xHCI Host Controller
Aug 7 23:49:08 q1 kernel: [ 2660.727610] xhci_hcd 0000:31:00.2: new USB bus registered, assigned bus number 4
Aug 7 23:49:08 q1 kernel: [ 2660.727613] xhci_hcd 0000:31:00.2: Host supports USB 3.1 Enhanced SuperSpeed
Aug 7 23:49:08 q1 kernel: [ 2660.727661] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.02
Aug 7 23:49:08 q1 kernel: [ 2660.727664] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
Aug 7 23:49:08 q1 kernel: [ 2660.727666] usb usb3: Product: xHCI Host Controller
Aug 7 23:49:08 q1 kernel: [ 2660.727668] usb usb3: Manufacturer: Linux 6.2.0-26-generic xhci-hcd
Aug 7 23:49:08 q1 kernel: [ 2660.727669] usb usb3: SerialNumber: 0000:31:00.2
Aug 7 23:49:08 q1 kernel: [ 2660.727830] hub 3-0:1.0: USB hub found
Aug 7 23:49:08 q1 kernel: [ 2660.727837] hub 3-0:1.0: 2 ports detected
Aug 7 23:49:08 q1 kernel: [ 2660.727975] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM.
Aug 7 23:49:08 q1 kernel: [ 2660.727993] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.02
Aug 7 23:49:08 q1 kernel: [ 2660.727995] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
Aug 7 23:49:08 q1 kernel: [ 2660.727997] usb usb4: Product: xHCI Host Controller
Aug 7 23:49:08 q1 kernel: [ 2660.727999] usb usb4: Manufacturer: Linux 6.2.0-26-generic xhci-hcd
Aug 7 23:49:08 q1 kernel: [ 2660.728000] usb usb4: SerialNumber: 0000:31:00.2
Aug 7 23:49:08 q1 kernel: [ 2660.728175] hub 4-0:1.0: USB hub found
Aug 7 23:49:08 q1 kernel: [ 2660.728184] hub 4-0:1.0: 4 ports detected
Aug 7 23:49:08 q1 snapd[2334]: message repeated 3 times: [ udevmon.go:149: udev event error: Unable to parse uevent, err: cannot parse libudev event: invalid env data]
Aug 7 23:49:08 q1 systemd[3823]: Reached target Sound Card.
Aug 7 23:49:08 q1 kernel: [ 2660.807453] vfio-pci 0000:17:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Aug 7 23:49:08 q1 kernel: [ 2660.807674] vfio-pci 0000:17:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Aug 7 23:49:08 q1 lxd.daemon[3076]: time="2023-08-07T23:49:08Z" level=error msg="Failed to stop device" device=gpu0 err="Failed probing device \"0000:17:00.0\" via \"/sys/bus/pci/drivers_probe\": write /sys/bus/pci/drivers_probe: invalid argument" instance=vm2 instanceType=virtual-machine project=default
Aug 7 23:49:08 q1 systemd-networkd[2222]: mac43379c64: Link DOWN
Aug 7 23:49:08 q1 systemd-networkd[2222]: mac43379c64: Lost carrier
Aug 7 23:49:08 q1 kernel: [ 2661.011901] audit: type=1400 audit(1691452148.495:57): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-vm2_</var/snap/lxd/common/lxd>" pid=13584 comm="apparmor_parser"
The template below is mostly useful for bug reports and support questions.
Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Required information
Click to see full
lxc-start --version
lxc-checkconfig
uname -a
cat /proc/self/cgroup
cat /proc/1/mounts
Issue description
I have an Ubuntu:22.04 system with multiple GPUs, NVIDIA drivers 535 server installed, persistence mode is off. I pass individual GPUs to a VM as PCI passthrough. When I pass a single GPU to the VM, start it and then stop, the GPU is not returned to the host system (i.e.
nvidia-smi
does not show it anymore). When I pass multiple GPUs to the VM, start it and then stop, the GPU with the lowest PCI address on the host is not returned to the host system (i.e.nvidia-smi
does not show it anymore), but the other GPUs get returned just fine.Restarting the VMs again the GPUs are visible inside the VM, but if I start a container with nvidia-driver passthrough, only the GPUs that are currently visible on the host (i.e. all installed minus those that were not returned from the VMs earlier) are visible in the container. The only info I can find is that syslog says "Failed to stop device".
Steps to reproduce
Information to attach
Click to see full
lxc info --show-log vm2
)syslog
), the single GPU casesyslog
), two GPU caseThe text was updated successfully, but these errors were encountered: