Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Add memtune hard_limit for q35 VMs with many CPUs #382

Merged
merged 1 commit into from May 19, 2022

Conversation

mz-pdm
Copy link
Member

@mz-pdm mz-pdm commented May 18, 2022

When a VM with q35 chipset has at least 256 maximum vCPUs and contains
VFIO devices, it can fail to start. It can happen for two reasons:

  1. There are multiple VFIO devices in the same IOMMU group and their
    drivers don’t handle it properly.

  2. The memory locking limit needed to handle the VFIO devices is
    exceeded.

oVirt cannot do anything about 1., such a situation must be fixed in
the low level layers. As for 2., it should be fixed in QEMU one day
but we can work around it now by specifying a sufficiently high limit
on the amount of QEMU and guest memory locked in the host
memory (i.e. not being able to swap out). The limit should be at
least VM maximum RAM times the number of VFIO devices. We set it
simply to an extremely high value regardless the presence and number
of VFIO devices, which should cause no trouble because we don’t want
to swap out VM memory anyway.

See https://bugzilla.redhat.com/show_bug.cgi?id=2048429#c9 for more
details.

Bug-Url: https://bugzilla.redhat.com/2081241
Bug-Url: https://bugzilla.redhat.com/2048429

When a VM with q35 chipset has at least 256 maximum vCPUs and contains
VFIO devices, it can fail to start.  It can happen for two reasons:

1. There are multiple VFIO devices in the same IOMMU group and their
   drivers don’t handle it properly.

2. The memory locking limit needed to handle the VFIO devices is
   exceeded.

oVirt cannot do anything about 1., such a situation must be fixed in
the low level layers.  As for 2., it should be fixed in QEMU one day
but we can work around it now by specifying a sufficiently high limit
on the amount of QEMU and guest memory locked in the host
memory (i.e. not being able to swap out).  The limit should be at
least VM maximum RAM times the number of VFIO devices.  We set it
simply to an extremely high value regardless the presence and number
of VFIO devices, which should cause no trouble because we don’t want
to swap out VM memory anyway.

See https://bugzilla.redhat.com/show_bug.cgi?id=2048429#c9 for more
details.

Bug-Url: https://bugzilla.redhat.com/2081241
Bug-Url: https://bugzilla.redhat.com/2048429
@ahadas
Copy link
Member

ahadas commented May 19, 2022

/ost

Copy link
Member

@michalskrivanek michalskrivanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapping is anyway not desired at all, so if it works as you say it sounds like a good idea

@ahadas
Copy link
Member

ahadas commented May 19, 2022

ok, everyone agrees with that, it passes ost - we can't ask for me

@ahadas ahadas merged commit 5d6017f into oVirt:master May 19, 2022
@mz-pdm mz-pdm deleted the vfio-hard-limit branch May 24, 2022 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants