[BUG] If you add two vGPUs the VM won't boot #5289

noahgildersleeve · 2024-03-05T01:47:09Z

Describe the bug

When you add two vGPUs to a VM it won't boot

To Reproduce
Steps to reproduce the behavior:

Enable two vGPUs on existing VM
Save and restart VM or start
Wait for VM to start

Expected behavior

It should either boot or if we only support one vGPU it should
Support bundle

supportbundle_bf973e5e-935b-45fd-b911-432fb8a2a038_2024-03-05T01-35-13Z.zip

Environment

Harvester ISO version: v1.3.0-rc3
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): 2 nodes DL360 servers bare metal. A102

Additional context
Add any other context about the problem here.
Found while testing #2764
The issue looks like it might be related to this error that I'm seeing in events. I also included a screenshot
Server error. command SyncVMI failed: "LibvirtError(Code=67, Domain=20, Message='unsupported configuration: Only one vgpu device can have 'ramfb' enabled')"

The text was updated successfully, but these errors were encountered:

ibrokethecloud · 2024-03-05T06:35:56Z

I can confirm that a VM with multiple vGPU's can boot once the following virtualGPUOptions are applied to device

virtualGPUOptions:
  display:
     ramFB:
       enabled: false

However for this to work, the vGPU profile should support multiple vGPU allocation. Based on documentation https://docs.nvidia.com/grid/16.0/grid-vgpu-release-notes-generic-linux-kvm/index.html, our GPU's only support multiple vGPU allocation if the Q-series vGPU's are used:

I was able to create a VM with 2 A2-4Q vgpu profiles

And attach them to a VM with the additional virtualGPUOptions on one of the vGPU:

Post this change VM is able to boot successfully and devices are visible to guest

noahgildersleeve · 2024-03-13T19:18:24Z

I validated the workaround with version master-a2c98e96-head.

bk201 · 2024-03-14T08:14:17Z

The doc PR cover this: harvester/docs#526

harvesterhci-io-github-bot · 2024-03-14T08:14:33Z

bk201 · 2024-03-15T03:05:25Z

Doc published: https://docs.harvesterhci.io/v1.3/advanced/vgpusupport#attaching-multiple-vgpus

noahgildersleeve added kind/bug Issues that are defects reported by users or that we know have reached a real release severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact) reproduce/always Reproducible 100% of the time labels Mar 5, 2024

bk201 added this to the v1.3.0 milestone Mar 5, 2024

noahgildersleeve mentioned this issue Mar 5, 2024

[FEATURE] vGPU Support #2764

Closed

bk201 added the require/doc Improvements or additions to documentation label Mar 5, 2024

bk201 added the not-require/test-plan Skip to create a e2e automation test issue label Mar 14, 2024

bk201 assigned ibrokethecloud Mar 14, 2024

bk201 closed this as completed Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] If you add two vGPUs the VM won't boot #5289

[BUG] If you add two vGPUs the VM won't boot #5289

noahgildersleeve commented Mar 5, 2024 •

edited

ibrokethecloud commented Mar 5, 2024

noahgildersleeve commented Mar 13, 2024

bk201 commented Mar 14, 2024

harvesterhci-io-github-bot commented Mar 14, 2024

bk201 commented Mar 15, 2024

[BUG] If you add two vGPUs the VM won't boot #5289

[BUG] If you add two vGPUs the VM won't boot #5289

Comments

noahgildersleeve commented Mar 5, 2024 • edited

ibrokethecloud commented Mar 5, 2024

noahgildersleeve commented Mar 13, 2024

bk201 commented Mar 14, 2024

harvesterhci-io-github-bot commented Mar 14, 2024

Pre Ready-For-Testing Checklist

bk201 commented Mar 15, 2024

noahgildersleeve commented Mar 5, 2024 •

edited