New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime-rs: support Memory hotplug #6876
Conversation
This PR will depend on the framework of #6289, and the memory hotplug functionality of |
@Tim-0731-Hzt, please, when this PR becomes ready to be merged, re-enable this test here: https://github.com/kata-containers/kata-containers/blob/main/tests/integration/kubernetes/k8s-cpu-ns.bats We'd like to have it tested as part of our CI. |
02c0558
to
f436059
Compare
f436059
to
0ef31dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @Tim-0731-Hzt for your good job.
some comments for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Tim-0731-Hzt ! Some initial comments:
1c3c24b
to
ec053cb
Compare
ec053cb
to
6320eb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Tim-0731-Hzt , i still have some questions:
6320eb6
to
001e6a5
Compare
53f825c
to
4ddb4f7
Compare
d10e8f0
to
275842a
Compare
2078804
to
2b63b58
Compare
/test |
2b63b58
to
561e726
Compare
#[derive(Default, Debug, Clone)] | ||
pub struct MemResource { | ||
/// Current memory | ||
pub(crate) current_mem: Arc<RwLock<u32>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this Arc<RwLock> field which is only used to judge whether to clear balloon size or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As runtime is a multi-thread application, there might be a situation that we have one thread that needs to update the current_mem, while other threads are reading it to make decisions based on the current state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not focus on the Arc Rwlock, I mean this field is never used again after you refactor balloon logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -338,13 +348,115 @@ impl DragonballInner { | |||
Ok((old_vcpus, new_vcpus)) | |||
} | |||
|
|||
pub(crate) fn resize_memory( | |||
&mut self, | |||
old_mem_mb: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cloud simply save a balloon size in self, or just force to clear balloon size. It seems that resize_memory() only need a target memory size is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
use std::{collections::HashSet, fs::create_dir_all}; | ||
|
||
const DRAGONBALL_KERNEL: &str = "vmlinux"; | ||
const DRAGONBALL_ROOT_FS: &str = "rootfs"; | ||
|
||
const BALLOON0: &str = "balloon0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about BALLOON_DEVICE_ID? and why mem device don't use a const name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -17,6 +17,8 @@ pub enum CapabilityBits { | |||
MultiQueueSupport, | |||
/// hypervisor supports filesystem share | |||
FsSharingSupport, | |||
/// hypervisor supports memory hotplug probe interface | |||
GuestMemoryHotplugProbe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GuestMemoryProbe may be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to know if the hypervisor supports Hotplug, maybe keep the keyword Hotplug is better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path you read in agent is /sys/devices/system/memory/probe
, leave only probe here is more precise. And you can also choose leaving only hotplug here, which is also make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// set memory hotplug probe | ||
if guest_details.support_mem_hotplug_probe { | ||
self.hypervisor | ||
.set_capabilities(CapabilityBits::GuestMemoryHotplugProbe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this checking? set cap and check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we need to know if the hypervisor support memory hotplug. we get support_mem_hotplug_probe
from agent response, and it is to determine the capability of memory hotplug, if it is, we set the capability to hypervisor.
.await; | ||
|
||
// set memory hotplug probe | ||
if guest_details.support_mem_hotplug_probe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the probe interface in guest kernel a necessary condition for hypervisor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be unnecessary for Dragonball, but it is used for qemu in Kata 2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when kernel does not support this feature, will dragonball still works? If so, you may need to set this capability as default, and consider if this need to be update to agent response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, set this capability when we initialize dragonball instance
) -> Result<(u32, MemoryConfig)> { | ||
// check the invalid request memory | ||
if new_mem_mb > self.hypervisor_config().memory_info.default_maxmemory | ||
&& self.hypervisor_config().memory_info.default_maxmemory > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about set it to intmax or host total memory when it is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this new_mem_mb
variable cannot be 0, because it is calculated by adding it from the hypervisor's default memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does default_maxmemory
can be 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
327ec15
to
abbf70e
Compare
LGTM! |
/test |
/retest |
get memory block size and guest mem hotplug probe Fixes:kata-containers#6356 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Fixes: kata-containers#6875 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Fixes:kata-containers#6875 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
add default_maxmemory in config file Fixes:kata-containers#6875 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
abbf70e
to
8ddec91
Compare
/test |
check the update memory size greater than default max memory size Fixes:kata-containers#6875 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
8ddec91
to
9a37e77
Compare
/test |
support memory hotplug for runtime-rs