New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guest-to-request memory headroom ratio #9322
Add guest-to-request memory headroom ratio #9322
Conversation
/cc @acardace |
bd64a18
to
4b28a47
Compare
api/openapi-spec/swagger.json
Outdated
@@ -16853,6 +16853,10 @@ | |||
"description": "EvictionStrategy defines at the cluster level if the VirtualMachineInstance should be migrated instead of shut-off in case of a node drain. If the VirtualMachineInstance specific field is set it overrides the cluster level one.", | |||
"type": "string" | |||
}, | |||
"guestToRequestMemoryHeadroomRatio": { | |||
"description": "GuestToRequestMemoryHeadroomRatio can be used to set a ratio between VM's memory guest and the memory allocated to the compute container. A higher ratio means that the VMs would be less compromised by node pressures, but would mean that fewer VMs could be scheduled to a node. If not set, the default is 1.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a note if the internal overhead calculations are still taken into account or not (ignored)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
FYI: they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
3bd2539
to
ed8b899
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @iholder101
I think we do need such an additional tunable for the admin to increase the VM calculated overhead in the cluster.
That said, I would expect the additional calculated "headroom" simply be part of the GetMemoryOverhead
function. That way the overhead calculation will stay consistent, without the need to separately update the requests and limits.
When the requests/limits are calculated, virt-controller adds the overhead to it, it could also add the additional admin-provided overhead at the same step.
This would also work for the case when the guest overhead should be overcommitted: using the OvercommitGuestOverhead
.
I'm also not sure about the guestToRequestMemoryHeadroomRatio
name.
When I think about the main purpose of this tunable, I think that it is to allow the admin to provide an additional overhead - knowing that our overhead calculation is "best effort".
Would it make sense to rename it to something like additinalGuestOverhead
or increaseFuestOverheadRatio
?
func GetMemoryOverhead(vmi *v1.VirtualMachineInstance, cpuArch string) *resource.Quantity { |
ed8b899
to
eda7381
Compare
Hey @vladikr, thanks for your review!
That's a very good point! Thank you! I just want to note that now this field controls the overhead itself only, not the ratio between the guest memory and requests. IOW, the previous implementation did
We can discuss the name. WDYT about something like |
Unit tests pass locally |
@@ -1304,7 +1304,7 @@ func checkForKeepLauncherAfterFailure(vmi *v1.VirtualMachineInstance) bool { | |||
} | |||
|
|||
func (t *templateService) VMIResourcePredicates(vmi *v1.VirtualMachineInstance, networkToResourceMap map[string]string) VMIResourcePredicates { | |||
memoryOverhead := GetMemoryOverhead(vmi, t.clusterConfig.GetClusterCPUArch()) | |||
memoryOverhead := GetMemoryOverhead(vmi, t.clusterConfig.GetClusterCPUArch(), t.clusterConfig.GetConfig().GuestToRequestMemoryHeadroomRatio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there was an easy way to simply get the config from within the GetMemoryOverhead function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was a bit painful to write..
I'm not sure how we can improve this in a straight forward way, as this function is being called from different containers. We can always fetch Kubevirt with an API call, but that would be costly in terms of performance.
👍
I understand your point. I didn't actually read it this way so far. To me "Guest Overhead" on the Virtual Machine Instance reads as an overhead caused by the guest... Does it make sense? [1] https://kubevirt.io/user-guide/operations/node_overcommit/#overcommit-the-guest-overhead |
df95da1
to
950213e
Compare
@vladikr changed the name to |
api/openapi-spec/swagger.json
Outdated
@@ -16825,6 +16825,10 @@ | |||
"description": "KubeVirtConfiguration holds all kubevirt configurations", | |||
"type": "object", | |||
"properties": { | |||
"additinalGuestOverheadRatio": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please be more specific with the name
I think this is only calculating the memory, thus
additinalGuestMemoryOverheadRatio
could be an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladikr and I discussed the name in comments above.
At least to me, it feels a bit misleading that we mention "guest memory", as the guest memory itself is not affected in any way. @vladikr claimed that this is the naming we chose for many other fields, so I guess it's a topic for further refactoring.
I suggested the name additinalVirtInfraOverhead
above, as it specifically refers to the virt infra overhead - not to the guest memory.
Please share your thoughts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definetly include memory - if it is memory specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to additinalGuestMemoryOverheadRatio
.
AdditionalGuestMemoryOverheadRatio can be used to set a ratio between VM's memory guest and the memory allocated to the compute container. A higher ratio means that the VMs would be less compromised by node pressures, but would mean that fewer VMs could be scheduled to a node. If not set, the default is 1. Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
This is better both in terms of performance and in terms of safety. Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
950213e
to
38743f1
Compare
Thanks! /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vladikr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@iholder101: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-kubevirt-manifests |
/lgtm |
/cherrypick release-0.59 |
@acardace: once the present PR merges, I will cherry-pick it on top of release-0.59 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
@acardace: #9322 failed to apply on top of branch "release-0.59":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
How to validate the parameter? I modified CR and set “additionalGuestMemoryOverheadRatio: 1.3", but it still encounters OOM. Can you provide an example YAML file? |
Modifying Perhaps 1.3 is not enough for your case? Some guiding questions:
You can always try setting a higher value and see if it solves it for your case, but I'd recommend to answer the questions above to ensure your get the right values. In addition, keep in mind that the multiplier (e.g. |
vm.yaml apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm01
namespace: default
spec:
running: true
template:
metadata:
labels:
kubevirt.io/size: small
kubevirt.io/domain: vm01
annotations:
ovn.kubernetes.io/allow_live_migration: "true"
ovn.kubernetes.io/logical_switch: provider
ovn.kubernetes.io/ip_address: 192.168.1.2
spec:
nodeSelector:
kubernetes.io/hostname: k8s-01
dnsConfig:
nameservers:
- 192.168.1.250
dnsPolicy: "None"
architecture: amd64
domain:
cpu:
cores: 4
model: host-passthrough
#memory:
# guest: 8000Mi
devices:
disks:
- name: root-disk
disk:
bus: virtio
cache: writeback
interfaces:
- name: default
bridge: {}
model: virtio
rng: {}
machine:
type: q35
resources:
limits:
cpu: 4
memory: 8Gi
requests:
cpu: 4
memory: 8Gi
networks:
- name: default
pod: {}
volumes:
- name: root-disk
persistentVolumeClaim:
claimName: pvc-win10-bootdisk
|
@tghfly I see you set memory limits == requests. Can you explain why do you set memory limits? Memory limits are dangerous, and I wouldn't define them unless there's a good reason to do so, as if you exceed the limit the Pod is going to be OOM killed. BTW, another tip: it's recommended to use After you remove the limits, I'd try to see if the Hope this helps. Good luck! |
@iholder101 Thanks,We use "limits == requests" to ensure Guaranteed QoS. And in the NUMA binding and CPU Pin |
Fair enough. You're right, a Guaranteed QoS is necessary sometimes. You can either set a different |
Yes, we are currently planning to do so. Thank you again for your patient explanation. |
Sure thing! Good luck! |
@tghfly I think it would be better if you use |
What this PR does / why we need it:
Kubevirt computes [1] the estimation of the total memory overhead needed for infra components inside virt-launcher which are responsible to run the guest (e.g. libvirtd, qemu, etc).
Not only that this overhead currently suffers from known issues and non-accurate calculation which needs to be fixed - this calculation is in essence an educated guess / estimation, and not an accurate calculation. The reason is that even if a careful profiling will take place (which is a very difficult task to do, since the environments on which we would profile makes the results biased), there are still many components we cannot control, e.g. kernel drivers, kernel configuration, inner QEMU buffer allocations, etc.
To solve this problem, we need to both keep improving the overhead estimations, but also provide a solution for the cluster admin to explicitly add some overhead. This is useful in cases where a special configuration is used, or if the cluster-admin chooses to reduce the risk of workloads OOM killed during node pressures in exchange of workloads consuming more memory (therefore less VMIs could be scheduled on a node).
[1] https://github.com/kubevirt/kubevirt/blob/v0.59.0-alpha.2/pkg/virt-controller/services/renderresources.go#L272
Which issue(s) this PR fixes:
Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2165618
https://bugzilla.redhat.com/show_bug.cgi?id=2164593
Special notes for your reviewer:
For a reference, you can look at the temporary solution implemented in HCO: kubevirt/hyperconverged-cluster-operator#2206
Release note: