Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding node rebooting for machine configurations #34

Closed
hershpa opened this issue Apr 27, 2023 · 5 comments
Closed

Avoiding node rebooting for machine configurations #34

hershpa opened this issue Apr 27, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@hershpa
Copy link
Contributor

hershpa commented Apr 27, 2023

Summary:

Node rebooting presents several challenges. Certain machine configurations require reboot of node(s) on a OpenShift cluster. Typically, machine configuration (MachineConfig) updates or changes on a OpenShift cluster are facilitated by the Machine Config Operator (MCO). From a cluster administrator or end user perspective, reboots may not be preferred in a production environment for a variety of reasons.

Process:

A reboot of a node typically involves cordoning the node (prevents the scheduler from placing new pods onto that node). Then, the node is drained which means all running pods are removed from the node. When possible, the scheduler will attempt to reschedule pod(s) that are evicted on node A to another node B. This scenario can prove challenging. During reboot, the node state will be not ready and then the node will become ready if the reboot succeeds gracefully. Finally, the node is uncordoned, (marked as schedulable) meaning new pods can be scheduled on the node. If multiple nodes are targeted by a specific MachineConfig, typically the nodes are rebooted sequentially.

Examples:

  • Since the default firmware directory /lib/firmware is read-only on OCP cluster nodes, MachineConfig is used to set an alternative firmware path via firmware_class.path=/var/lib/firmware so that out-of-tree (OOT) firmware can be loaded on a RHCOS node. Kernel Module Management (KMM) Operator copies the firmware from the driver container to the alternative firmware path after the driver container is deployed. This approach is used to load OOT GPU firmware and provision the Intel GPU card on OpenShift.

  • Similarly, for QAT, the kernel parameter intel_iommu is turned on via MCO. All MCO operations trigger a one time reboot per node to reach the desired configuration.

Goal:

When possible, the goal is to perform the configuration operations at runtime to avoid disruption to the cluster and workloads.

Possible Solutions to Certain Scenarios:

In certain scenarios, it may be possible to facilitate a configuration change at runtime.

  • For the alternative firmware path, it may be possible to have KMM configure the lookup path at runtime before loading any module.
    The lookup path is configured on the node with the following command: echo /var/lib/firmware > /sys/module/firmware_class/parameters/path
    For more details on firmware search paths, review details here.

  • Another option is to deploy a privileged DaemonSet that configures the lookup path at runtime and then sleeps forever.
    Note if the node is rebooted, this lookup path has to be configured again. With the above 2 options, the lookup path should always be configured prior to load of any module. This should be facilitated by design.

  • Here is a successful example of facilitating a node configuration change at runtime: KMM 1.1 facilitates removal of an in-tree module prior to loading the OOT module at runtime.

@hershpa hershpa added the enhancement New feature or request label Apr 27, 2023
@hershpa
Copy link
Contributor Author

hershpa commented May 1, 2023

Hi @qbarrand and @ybettan, we would love to have your perspective and insight. Especially on the idea of KMM configuring the alternative firmware lookup path on the fly. Thanks in advance.

@qbarrand
Copy link

qbarrand commented May 2, 2023

The current idea for KMM 2.0 is to run only one DaemonSet; on each node, one pod would download module images, extract them and load kmods. This should facilitate other operations, such as unloading in-tree modules or specifying dependencies. We could also make that pod configure the search path by writing the lookup path to /sys/module/firmware_class/parameters/path as soon as it starts.
@yevgeny-shnaidman WDYT?

@yevgeny-shnaidman
Copy link

This will require mounting the /sys host FS into the daemonset with RW permissions. Currently we get it for free by using the "privileged" SCC, and it is mounted RO

@hershpa
Copy link
Contributor Author

hershpa commented May 2, 2023

Thanks for the input @qbarrand and @yevgeny-shnaidman. Would mounting /sys host FS be a viable option?

@hershpa
Copy link
Contributor Author

hershpa commented Sep 29, 2023

PR in KMM upstream to set alternative FW path on the fly: kubernetes-sigs/kernel-module-management#586. Targeted for KMM 2.0

@hershpa hershpa closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants