Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Memory hotplugging issues on x86 kernel with CONFIG_ARCH_MEMORY_PROBE=y #712

Closed
rhafer opened this issue Jan 15, 2020 · 4 comments · Fixed by #713
Closed

Memory hotplugging issues on x86 kernel with CONFIG_ARCH_MEMORY_PROBE=y #712

rhafer opened this issue Jan 15, 2020 · 4 comments · Fixed by #713
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@rhafer
Copy link
Contributor

rhafer commented Jan 15, 2020

Description of problem

While testing kata-containers with a (openSUSE) distribution kernel I noticed that it fails to launch containers when using a custom memory limit (e.g. docker run -m 2G). Further debugging showed that this is related to the fact that the kernel is build with CONFIG_ARCH_MEMORY_PROBE=y. Which causes /sys/devices/system/memory/probe to be present on the system. On x86 however usually ACPI is reponsible for notifying the system of added hotplugged memory. So even when /sys/devices/system/memory/probe is present there should be a need to use it in general (AFAIK it's mostly there for debugging/testing on x86). The kata-agent however is using the interface unconditionally which is causing issues when ACPI is already taking care of the added memory (writes to /sys/devices/system/memory/probe for already added ranges error out with EEXIST, which causes the hotplugging code in agent to error out and guest VM to be deleted again).

Expected result

docker run -m 2G succeeds to launch a container, even when the guest kernel is build with CONFIG_ARCH_MEMORY_PROBE=y on x86

Actual result

The above command fails with:

rpc error: code = Unknown desc = write /sys/devices/system/memory/probe: file exists: unknown

@rhafer rhafer added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Jan 15, 2020
@rhafer
Copy link
Contributor Author

rhafer commented Jan 15, 2020

I am currently working on a patch to address the issue.

@grahamwhaley
Copy link
Contributor

thanks @rhafer /cc @jcvenegas who worked on the memory hotplug stuff irc, and @devimc who has done a bunch around the ACPI hotplug (mostly pci/devices I think).

@rhafer
Copy link
Contributor Author

rhafer commented Jan 15, 2020

The /sys/devices/system/memory/probe code in the agent was introduced for memory hotplugging support on ARM it seems (see #442). However on that architecture is even requires a patched kernel.

Seems a possible fix to make it work on all archs is to check /sys/firmware/acpi/hotplug/memory/enabled if that is present and contains 1 the writes to the probe file can be skipped.

@grahamwhaley
Copy link
Contributor

/cc @Pennyzct for the ARM side input :-)

rhafer added a commit to rhafer/agent that referenced this issue Jan 15, 2020
Don't use the /sys/devices/system/memory/probe interface on architectures
where the firmware (ACPI) is notifying the system of hotplugged memory.
This fixes an issue with the agent erroring out when the guest-kernel is
compiled with CONFIG_ARCH_MEMORY_PROBE=y.

Fixes: kata-containers#712
Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
rhafer added a commit to rhafer/agent that referenced this issue Jan 15, 2020
Don't use the /sys/devices/system/memory/probe interface on architectures
where the firmware (ACPI) is notifying the system of hotplugged memory.
This fixes an issue with the agent erroring out when the guest-kernel is
compiled with CONFIG_ARCH_MEMORY_PROBE=y.

Fixes: kata-containers#712
Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
rhafer added a commit to rhafer/agent that referenced this issue Jan 15, 2020
Don't use the /sys/devices/system/memory/probe interface on architectures
where the firmware (ACPI) is notifying the system of hotplugged memory.
This fixes an issue with the agent erroring out when the guest-kernel is
compiled with CONFIG_ARCH_MEMORY_PROBE=y.

Fixes: kata-containers#712
Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants