Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM won't start after shutdown: error allocating framebuffer #30

Open
midi1996 opened this issue Mar 16, 2023 · 3 comments
Open

VM won't start after shutdown: error allocating framebuffer #30

midi1996 opened this issue Mar 16, 2023 · 3 comments

Comments

@midi1996
Copy link

midi1996 commented Mar 16, 2023

Hello, I'm having an issue allocating framebuffer even though there is enough VRAM.

  • Host:
    • Ubuntu 22.04.02 LTS
    • Kernel: 5.19.0-35 Generic
    • Nvidia Merged drivers: 525.85.05 (15.1) from vGPU Community Drivers patch (patched by the script)
    • CPU: Xeon E5-2680V4
    • GPU: Quadro M4000 (8GB)
    • RAM: 64GB
    • vGPU Profile:
  nvidia-14
    Available instances: 6
    Device API: vfio-pci
    Name: GRID M60-1B
    Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=8
  • Guest1:
    • Windows 11 22h2
      • 16 threads
      • 32GB RAM
      • vGPU slice with these changes in profile_override: (fb values taken from PolloLoco's Proxmox guide)
[mdev.2b6976dd-8620-49de-8d8d-ae9ba47a50df]
# Windows 11 4GB
cuda_enabled = 1
frl_enabled = 0
framebuffer = 0xEC000000
framebuffer_reservation = 0x14000000
  • Guest2:
    • Unraid 6.12
      • 8 threads
      • 10GB RAM
      • vGPU slice with these changes in profile_override:
[mdev.bcac6a5c-d5da-4f4d-a31f-ba7bc7de378c]
# Unraid 1GB
cuda_enabled = 1
frl_enabled = 0
framebuffer = 0x38000000
framebuffer_reservation = 0x8000000

How to reproduce:

  • Boot host
  • Start Unraid VM automatically on boot
  • Start Windows VM
  • All works as expected and both systems recognize the vGPU slices
  • Shutdown both vms
  • Start unraid VM
  • Start Windows VM
  • Windows VM wont start and the error appears

Why would I shutdown unraid? I had to change stuff on the unraid usb, doing it from the gui/cli of unraid will take a while. This usually doesn't happen that often, but I would do that from time to time.

Journalctl log:

Mar 16 17:32:10 X995 nvidia-vgpu-mgr[977]: Nv0000CtrlVgpuGetStartDataParams {
                                               mdev_uuid: {2b6976dd-8620-49de-8d8d-ae9ba47a50df},
                                               config_params: "vgpu_type_id=14",
                                               qemu_pid: 115078,
                                               gpu_pci_id: 0x300,
                                               vgpu_id: 2,
                                               gpu_pci_bdf: 768,
                                           }
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 2b6976dd-8620-49de-8d8d-ae9ba47a50df GPU PCI id 00:03:00.0 confi>
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=14
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_env_log: Successfully updated env symbols!
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: cmd: 0x20801322 failed.
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: cmd: 0x2080014b failed.
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: NvA081CtrlVgpuConfigGetVgpuTypeInfoParams {
                                                  vgpu_type: 14,
                                                  vgpu_type_info: NvA081CtrlVgpuInfo {
                                                      vgpu_type: 14,
                                                      vgpu_name: "GRID M60-1B",
                                                      vgpu_class: "NVS",
                                                      vgpu_signature: [],
                                                      license: "GRID-Virtual-PC,2.0;Quadro-Virtual-DWS,5.0;GRID-Virtual-WS,2.0;GRID-Virtual-WS-Ext,2.0",
                                                      max_instance: 8,
                                                      num_heads: 4,
                                                      max_resolution_x: 5120,
                                                      max_resolution_y: 2880,
                                                      max_pixels: 16384000,
                                                      frl_config: 45,
                                                      cuda_enabled: 0,
                                                      ecc_supported: 0,
                                                      gpu_instance_size: 0,
                                                      multi_vgpu_supported: 0,
                                                      vdev_id: 0x13f21177,
                                                      pdev_id: 0x13f2,
                                                      profile_size: 0x40000000,
                                                      fb_length: 0x38000000,
                                                      gsp_heap_size: 0x0,
                                                      fb_reservation: 0x8000000,
                                                      mappable_video_size: 0x400000,
                                                      encoder_capacity: 0x64,
                                                      bar1_length: 0x100,
                                                      frl_enable: 1,
                                                      adapter_name: "GRID M60-1B",
                                                      adapter_name_unicode: "GRID M60-1B",
                                                      short_gpu_name_string: "GM204GL-A",
                                                      licensed_product_name: "NVIDIA Virtual PC",
                                                      vgpu_extra_params: "",
                                                      ftrace_enable: 0,
                                                      gpu_direct_supported: 0,
                                                      nvlink_p2p_supported: 0,
                                                      multi_vgpu_exclusive: 0,
                                                      exclusive_type: 0,
                                                      exclusive_size: 0,
                                                      gpu_instance_profile_id: 0,
                                                  },
                                              }
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Applying profile nvidia-14 overrides
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/num_heads: 4 -> 1
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/max_resolution_x: 5120 -> 1920
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/max_resolution_y: 2880 -> 1080
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/max_pixels: 16384000 -> 2073600
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/cuda_enabled: 0 -> 1
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/fb_length: 939524096 -> 939524096
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/fb_reservation: 134217728 -> 134217728
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/frl_enable: 1 -> 0
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Applying mdev UUID 2b6976dd-8620-49de-8d8d-ae9ba47a50df profile overrides
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/cuda_enabled: 1 -> 1
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/fb_length: 939524096 -> 3959422976
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/fb_reservation: 134217728 -> 335544320
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: Patching nvidia-14/frl_enable: 0 -> 0
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: cmd: 0xa0810115 failed.
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_log: (0x0): gpu-pci-id : 0x300
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_log: (0x0): vgpu_type : NVS
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_log: (0x0): Framebuffer: 0xec000000
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_log: (0x0): Virtual Device Id: 0x13f2:0x1177
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_log: ######## vGPU Manager Information: ########
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: notice: vmiop_log: Driver Version: 525.85.07
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: cmd: 0x2080012f failed.
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: NVOS status 0x51
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: Assertion Failed at 0xc36bc541:143
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: 13 frames returned by backtrace
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv008399vgpu+0x35) [0x7fcac370eec5]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv008439vgpu+0x14e) [0x7fcac36bb73e]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv008520vgpu+0xe1) [0x7fcac36bc541]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xd1c67) [0x7fcac36d1c67]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xd4639) [0x7fcac36d4639]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: vgpu(+0x1893e) [0x56479981893e]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: vgpu(+0x19a39) [0x564799819a39]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: vgpu(+0x1389b) [0x56479981389b]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: vgpu(+0x11116) [0x564799811116]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: vgpu(+0x3e1a) [0x564799803e1a]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fcac3e29d90]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fcac3e29e40]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: vgpu(+0x3e5d) [0x564799803e5d]
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: (0x0): Failed to alloc guest FB memory
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 2 (vmiop-display: error allocating framebuffer)
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: (0x0): Initialization: init_device_instance failed error 2
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_log: display_init failed for inst: 0
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_env_log: (0x0): vmiope_process_configuration: plugin registration error
Mar 16 17:32:10 X995 nvidia-vgpu-mgr[115107]: error: vmiop_env_log: (0x0): vmiope_process_configuration failed with 0x1a
Mar 16 17:48:29 X995 nvidia-vgpu-mgr[112701]: notice: vmiop_log: (0x0): vGPU license state: Unlicensed (Unrestricted)
@mbilker
Copy link
Owner

mbilker commented Mar 19, 2023

Does this work without the merged driver?

@midi1996
Copy link
Author

midi1996 commented Mar 19, 2023

I have no idea why I couldn't reproduce this successfully this time around. Right after I made this post (I have tested this 4 times before writing, and I got the logs that I needed). I did install the regular patched drivers (used PolloLoco's patches for proxmox), I tried both proxmox 7.3 and ubuntu 22.04 hosts with non-merged drivers, and it works as expected (or probably the issue didn't present itself again). I moved back to merged drivers on ubuntu 22.04 host, and it's working for now. I really have no idea how it fixed itself. And I'm not sure either if it actually fixed itself or if it will present itself later on. I'll keep rebooting the VMs on and off and do some graphical tasks before giving a conclusion.

Update:
I'm not getting that framebuffer allocation issue right now, but I'm getting these

 Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: notice: vmiop_log: ######## Guest NVIDIA Driver Information: ########
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: notice: vmiop_log: Driver Version: 528.24
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: notice: vmiop_log: vGPU version: 0x100001
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: notice: vmiop_log: (0x0): vGPU license state: Unlicensed (Unrestricted)
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:10 X995 nvidia-vgpu-mgr[10858]: error: guest attempted access to priv registers
Mar 19 17:32:11 X995 dnsmasq-dhcp[2154]: DHCPREQUEST(virbr0) 10.0.1.22 52:54:00:fb:81:9b
Mar 19 17:32:11 X995 dnsmasq-dhcp[2154]: DHCPACK(virbr0) 10.0.1.22 52:54:00:fb:81:9b X99V
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: NVOS status 0x56
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: Assertion Failed at 0xffd0ca6f:143
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: 9 frames returned by backtrace
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv008399vgpu+0x35) [0x7fd4ffd0eec5]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0x10a768) [0x7fd4ffd0a768]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv004520vgpu+0x11f) [0x7fd4ffd0ca6f]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv010761vgpu+0x19d2) [0x7fd4ffd0e4e2]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv006812vgpu+0x3fe) [0x7fd4ffd8b83e]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xb767d) [0x7fd4ffcb767d]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: vgpu(+0x15aa1) [0x564a58815aa1]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7fd500494b43]
Mar 19 17:32:14 X995 nvidia-vgpu-mgr[10858]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7fd500526a00]

whenever the windows host starts, and I'm not sure why or if it's relevant to the issue above.

@mbilker
Copy link
Owner

mbilker commented Mar 27, 2023

I assume this is due to VRAM fragmentation when running with the merged driver. vGPU requires contiguous VRAM allocations which is why it can report out of RAM.

As for the Assertion Failed at 0xffd0ca6f:143, I am not sure what causes that to happen and I assume it is unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants