update: Return memory to host on memory update. #793
Conversation
Create container with + docker run -tdi --name kata-1 -m 6G --rm --runtime kata-runtime stress-ng bash
55d15db690fc059f66edc9f286add9856e9eaf29f033b5a0c0ea3f4feb462d4d
+ docker exec -ti kata-1 free -h
total used free shared buff/cache available
Mem: 5.9G 24M 5.8G 8.2M 18M 5.8G
Swap: 0B 0B 0B
# qemu memory consumption (smem)
PID User Command Swap USS PSS RSS
28465 root /opt/kata/bin/qemu-system-x 0 286.4M 286.4M 286.4M
+ docker exec -ti kata-1 stress-ng --vm 1 --vm-bytes 75% --vm-method all --verify -t 20s -v
stress-ng: debug: [12] 1 processor online, 1 processor configured
stress-ng: debug: [12] main: can't set oom_score_adj
stress-ng: info: [12] dispatching hogs: 1 vm
stress-ng: debug: [12] cache allocate: default cache size: 16384K
stress-ng: debug: [12] starting stressors
stress-ng: debug: [12] 1 stressor spawned
stress-ng: debug: [16] stress-ng-vm: can't set oom_score_adj
stress-ng: debug: [16] stress-ng-vm: started [16] (instance 0)
stress-ng: debug: [16] stress-ng-vm using method 'all'
stress-ng: debug: [16] stress-ng-vm: exited [16] (instance 0)
stress-ng: debug: [12] process [16] terminated
stress-ng: info: [12] successful run completed in 20.16s
# qemu memory consumption (smem)
PID User Command Swap USS PSS RSS
28465 root /opt/kata/bin/qemu-system-x 0 4.6G 4.6G 4.6G update memory to reduce it.With balloon: + docker update --memory 1G kata-1
kata-1
+ sleep 5s
+ docker exec -ti kata-1 free -h
total used free shared buff/cache available
Mem: 898M 40M 839M 8.2M 18M 795M
Swap: 0B 0B 0B
# qemu memory consumption (smem)
PID User Command Swap USS PSS RSS
28465 root /opt/kata/bin/qemu-system-x 0 310.8M 310.8M 310.8M
without balloon: + docker update --memory 1G kata-1
kata-1
+ sleep 5s
+ docker exec -ti kata-1 free -h
total used free shared buff/cache available
Mem: 5.9G 24M 5.8G 8.2M 18M 5.8G
Swap: 0B 0B 0B
# qemu memory consumption (smem)
PID User Command Swap USS PSS RSS
30038 root /opt/kata/bin/qemu-system-x 0 4.6G 4.6G 4.6G
|
So I am getting an strange behavior when hotplug memory just by adding
|
update: I just tried in a VM with the kata kernel and same qemu binary and worked I'll debug in the kata guest OS. |
found the issue, will send the fix |
61a8e9c
to
d25cee5
Compare
depends on kata-containers/govmm#55 |
@jcvenegas @sboeuf I think this is ready to a new review. I think this is almost the final version, just waiting to merge kata-containers/govmm#55 and I will create a PR to test this on tests repo. |
@jcvenegas Ok I'll take a look later today! |
d25cee5
to
157e713
Compare
/test |
1 similar comment
/test |
Codecov Report
@@ Coverage Diff @@
## master #793 +/- ##
==========================================
+ Coverage 51.67% 53.38% +1.70%
==========================================
Files 107 110 +3
Lines 14615 18759 +4144
==========================================
+ Hits 7552 10014 +2462
- Misses 6152 7592 +1440
- Partials 911 1153 +242 |
418bfc4
to
31fcea0
Compare
virtcontainers/container.go
Outdated
addMemDevice := &memoryDevice{ | ||
sizeMB: 0, | ||
} | ||
data, err := c.sandbox.hypervisor.hotplugAddDevice(addMemDevice, memoryDev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we trying to hotplug more memory here? We're in the case where oldMemMB == newMemMB
, and as mentioned by the debug log the current memory will not be modified
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @jcvenegas. This needs some changes to prevent from introducing the balloon
knowledge at the container.go
level.
virtcontainers/qemu.go
Outdated
return 0, fmt.Errorf("Unable to hotplug %d MiB memory, the SB has %d MiB and the maximum amount is %d MiB", | ||
memDev.sizeMB, currentMemory, q.config.MemorySize) | ||
if memDev.sizeMB == 0 { | ||
// handle case when not memory added (because not needed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// handle the case where no memory is added (because not needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, actually I would remove the comment since there's nothing to do in this case.
Just print a log and return.
virtcontainers/qemu.go
Outdated
"hotplug": "memory", | ||
}, | ||
).Debug("Not needed to hotplug, updating balloon") | ||
return 0, q.updateMemoryBalloon(currentMemory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to update the balloon, this case should be a simple no-op, right?
/test |
759640e
to
80b21f9
Compare
/test |
1 similar comment
/test |
@sboeuf take a look, I think now is more explicit the the balloon usage. @linzichang @clarecch please take a look this PR is planning to move part of the logic on how we manage memory. The last to commits do that. |
Hi @jcvenegas , most of this PR looks nice, but two doubts:
|
virtcontainers/sandbox.go
Outdated
|
||
// Memory is not updated if memory limit not set | ||
if newResources.MemMB != 0 { | ||
c.config.Resources.MemMB = newResources.MemMB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If just a part of memory is hotplugged successfully, the memory limit cgroup of container should not be set to newResources.MemMB
, which is expected to be hotplugged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarecch I need to test this,
from my perspective we still need to update the cgroup
- the container should update its cgroup
- The sandbox manage the memory from the guest.
I expect that cgroup limit does not depends on the guest physical memory, so we can set the cgroup with more memory than the memory that is available in the VM.
An example that is valid is when:
- have a sandbox/pod/VM with all (almost ) the memory of the host added.
create sandbox, create container -memory all the host memory
- Then we add another container to the Sandbox/pod/VM without any memory cgroup.
- After we update the cgroup of the new container to increase memory.
a. The sandbox wont increase the memory but the container will share the memory with other containers and the cgroup will be limited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation.
@jcvenegas ping, any updates? |
@jcvenegas ping, any updates? Thx. |
Branch is now conflicted (again). @jcvenegas - can you give us an update on your plan with this at least - are you actively pursuing and still hoping to land? |
@jcvenegas any updates? Thx! |
ping @jcvenegas |
@jcvenegas ping. |
ping @jcvenegas |
ping @jcvenegas Shall we close it? |
f01caba
to
a006319
Compare
/test |
ab8ac47
to
fc0f210
Compare
Request to return memory back using balloon. Fixes: kata-containers#790 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Now that we are in qemu 4.x it is not need to handle it in an special way for balloon. Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
fc0f210
to
131d5bc
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jcvenegas
@@ -1128,6 +1134,9 @@ func (q *qemu) hotplugVFIODevice(device *config.VFIODev, op operation) (err erro | |||
devID := device.ID | |||
|
|||
if op == addDevice { | |||
|
|||
// When HasVFIODevice is set balloon size is set to maximal memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and where is HasVFIODevice
set ?
@jcvenegas any update on this PR? |
@raravena80 thanks for ping on this I was waiting for feedback from @egernst and @bergwolf if still we want to enable this, the PR is functional but this PR does not have a lot of priority today. |
@jcvenegas any updates? Your weekly Kata herder. |
Hmm, this is a pain - the only failing CI is the nemu one. But we no longer support that so we could land this. But GitHub won't let us. Thoughts @jcvenegas? @chavafg - btw, as nemu is dead, do we still need the nemu CI jobs? |
a re-submit as a new PR might be the easiest thing to do. |
@jodh-intel we already removed the nemu CI for master branch. The nemu register above was from an old execution, we can ignore it, or as @grahamwhaley comments, the PR can be re-pushed. |
@jodh-intel its being a long time of this PR I think make sense to ask to @kata-containers/architecture-committee if we want it or should I close it. |
@chavafg @jcvenegas any updates on this PR? Thx |
If memory update will reduce memory we want to get memory
back to the host and restrict the guest to use more memory.
Fixes: #790
Signed-off-by: Jose Carlos Venegas Munoz jose.carlos.venegas.munoz@intel.com