Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max hotplugged memory should not exceed actual max host memory #4516

Closed
skaegi opened this issue Jun 22, 2022 · 8 comments · Fixed by #4540
Closed

Max hotplugged memory should not exceed actual max host memory #4516

skaegi opened this issue Jun 22, 2022 · 8 comments · Fixed by #4540
Assignees
Labels
enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.

Comments

@skaegi
Copy link
Contributor

skaegi commented Jun 22, 2022

We're running into a problem when memory "limits" are used in our Kubernetes pods that work with runc but fail when run with Kata. Kata will look at the memory "limit" and try to hotplug the sum of all container memory limits into the guest. This works fine except if the memory exceeds the actual host's max memory in which case we get an error like this...

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Created    11s (x2 over 17s)  kubelet            Created container memory-boom
  Normal   Pulled     11s                kubelet            Successfully pulled image "polinux/stress" in 380.789628ms
  Warning  Failed     17s (x2 over 23s)  kubelet            Error: failed to create containerd task: failed to create shim: Unable to hotplug 1024000 MiB memory, the SB has 4096 MiB and the maximum amount is 31753 MiB: unknow
  Warning  BackOff    4s (x2 over 5s)    kubelet            Back-off restarting failed container

--
Here's an example Pod to illustrate (this Pod works in standard runc) ...

apiVersion: v1
kind: Pod
metadata:
  name: memory-boom-untrusted
spec:
  runtimeClassName: kata
  containers:
  - name: memory-boom
    image: polinux/stress
    resources:
      requests:
        memory: "50Mi"
        cpu: "50m"
      limits:
        memory: "1000Gi"
        cpu: "500"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]

The above Pod is contrived however we are hitting this problem in our hosted solution.

For CPU limits the hotplugged vcpus do not exceed the hosts. Can we have the same support for memory so that we do not try to hotplug more memory than what the host has.

Less critical, but it would be great if this support could be generalized to supporting something like default_maxmemory analagous to the current default_maxvcpus support in our runtime-configuration.

@skaegi skaegi added enhancement Improvement to an existing feature needs-review Needs to be assessed by the team. labels Jun 22, 2022
@skaegi
Copy link
Contributor Author

skaegi commented Jun 22, 2022

I didn't label this issue as a "bug" but perhaps it should be since these high memory limit pods work with runc.

@liubin
Copy link
Member

liubin commented Jun 23, 2022

This is an interesting feature.
/cc @kata-containers/architecture-committee

@fidencio
Copy link
Member

@skaegi, thinking out loud here, what's the behaviour on that?

I'm afraid hot-plugging all the memory we have (even if less than requested) wouldn't actually fly as it could lead to resources starvation.

Let's consider we use the suggested default_maxmemory, would this prevent the pod to start or would it hotplug the default_maxmemory amount?

That's indeed an interesting bug to solve.

@skaegi
Copy link
Contributor Author

skaegi commented Jun 23, 2022

I definitely do not want a pod being prevented from starting because of a limit. As a user I don't want to try and play the game where a container has to guess how to set its limits based on how much memory is on the host.

Kata is hot-plugging the memory space and not reserving it -- e.g. we are already and very much on purpose running over-committed which is how Kubernetes is always run so this is not a resource starvation concern. If a user is very conservative and wants to minimize the chance of the oom-killer from coming into play as the host runs low on memory they can make sure the resource 'request' is equal to the 'limit'. We do that today in some testing circumstances and it works well.

Off the top of my head the behavior I think makes the most sense is...

  1. If there is a non-zero default_maxmemory present we should hotplug up to that amount but not higher than host memory.
  2. If default_maxmemory zero we should hotplug up to the host's memory space.

This is identical to the cpu hotplugging so not novel. In either case we should not prevent the pod from starting for a limit.

@skaegi
Copy link
Contributor Author

skaegi commented Jun 23, 2022

I'm onboard for providing tests for this feature and if you think it's simple I could try coding it too with some guidance.

@fidencio
Copy link
Member

I will cook up something for that no later than next week.

@fidencio fidencio self-assigned this Jun 24, 2022
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.

By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
@katacontainersbot katacontainersbot moved this from To do to In progress in Issue backlog Jun 27, 2022
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.

By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 27, 2022
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 28, 2022
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.

By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 28, 2022
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 28, 2022
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
@fidencio fidencio linked a pull request Jun 28, 2022 that will close this issue
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 28, 2022
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 28, 2022
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 28, 2022
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
fidencio added a commit to fidencio/kata-containers that referenced this issue Jun 28, 2022
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
@egernst
Copy link
Member

egernst commented Jun 29, 2022

How would you set 'maxmemory' in your case @skaegi? I'm wondering how you'd manage this in a cluster that has a max of nodes configurations.

Issue backlog automation moved this from In progress to Done Jun 30, 2022
@katacontainersbot katacontainersbot moved this from Done to In progress in Issue backlog Jul 6, 2022
@skaegi
Copy link
Contributor Author

skaegi commented Jul 14, 2022

@egernst I'm using this to create a set of runtime classes with pre-configured sizes like 1(cpu)x4(GB memory), 2x8, 4x16, etc. Each runtime class in turn is using different base micro-vm configs with resource sizes where the default and max cpu/memory are set to the same value.

We've found our users are just terrible at setting individual resource requirements but are decent at giving a box size. Net result is the bin packing is definitely not as good, but the user experience is simpler and I have something much easier to use as a charging basis.

egernst pushed a commit to egernst/kata-containers that referenced this issue Jul 21, 2023
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.

By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.

Fixes: kata-containers#4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.
Projects
Issue backlog
  
In progress
Development

Successfully merging a pull request may close this issue.

4 participants