Users need a way to update Kubelet configuration and Kubelet tuneables. Users can do this via MCO currently, but they need to know the correct options to use. We can hopefully help customers by documenting a CRD with the Kubelet tuneables and have the MCO use these tuneables when rendering the kubelet.conf and kubelet.system files to Ignition/disk.
Runtime configuration is configured via Kubelet as well. The CRI-O runtime is used by default and is the recommended solution. Some customers may have Docker specific tools or monitoring products they have paid for.
The MCO contains the logic to write the kubelet.conf configuration file and kubelet.system systemd unit file to Ignition. When Ignition starts on a machine it writes these two files to configure the kubelet. When these resources change within the MachineConfig, the new configs are rewritten and the MachineConfigDaemon is instructed to reboot the machine to pick up the new configs.
The following goals should not require explicit knowledge of a customer to know the right command-line or configuration options to use within the Kubelet.
-
changing out of resource handling (https://docs.openshift.com/container-platform/3.11/admin_guide/out_of_resource_handling.html#out-of-resource-create-config)
-
setting a feature gate (https://docs.openshift.com/container-platform/3.11/install_config/configuring_ephemeral.html#ephemeral-storage-enabling-ephemeral-storage)
-
setting cpu manager to static (https://docs.openshift.com/container-platform/3.11/scaling_performance/using_cpu_manager.html)
-
setting max pods per node (https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#admin-guide-max-pods-per-node)
-
configuring garbage collection (https://docs.openshift.com/container-platform/3.11/admin_guide/garbage_collection.html)
-
configuring node resources (https://docs.openshift.com/container-platform/3.11/admin_guide/manage_nodes.html#configuring-node-resources)
Extend the Machine Config Operator to include a KubetletConfig CRD and KubeletConfigController. By using a KubeletConfig CRD there is an implicit allowlist of allowed user controlled options. Upon deleting the KubeletConfig instance the default config is restored.
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: kubeletconfigs.machineconfiguration.openshift.io
spec:
group: machineconfiguration.openshift.io
versions:
- name: v1
served: true
storage: true
scope: Cluster
names:
plural: kubeletconfigs
singular: kubeletconfig
kind: KubeletConfig
MachineConfigPoolSelector *metav1.LabelSelector
Runtime:
Name string
Endpoint string
KubeletConfig
[KubeletConfigurationSpec](https://github.com/kubernetes/kubernetes/blob/release-1.11/pkg/kubelet/apis/kubeletconfig/v1beta1/types.go#L45)
It's important to note that, since the fields of the kubelet configuration are directly fetched from upstream the validation of those values is handled directly by the kubelet. Please refer to the upstream version of the relevant kubernetes for the valid values of these fields. Invalid values of the kubelet configuration fields may render cluster nodes unusable.
This is what an example kubelet config
CR looks like. Note: you must make sure to add a label under matchLabels
in the KubeletConfig CR:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: ""
kubeletConfig:
maxPods: 100
Save your kubeletconfig
locally, for example as maxpods.yaml
The label in the above example corresponds to the worker MachineConfigPool. By default the master/worker MachineConfigPool has labels pools.operator.machineconfiguration.openshift.io/{worker|master}: "" in OCP 4.6 and later. If you have a custom pool, or have an earlier OCP version, you can instead create a label youself as follows:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: small-pods
kubeletConfig:
maxPods: 100
To roll out the pods limit changes to all the worker nodes (can switch this to master for the master nodes), add the label that you created, here: custom-kubelet: small-pods
under labels in the machineConfigPool config:
oc edit machineconfigpool worker
Snippet of the machineConfigPool config with the matching label added:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
creationTimestamp: 2019-04-10T16:39:39Z
generation: 1
labels:
custom-kubelet: small-pods
name: worker
...
Now apply the kubeletconfig
that you created:
$ oc apply -f maxpods.yaml
kubeletconfig.machineconfiguration.openshift.io/set-max-pods created
Double check that it was created:
$ oc get kubeletconfig
NAME AGE
set-max-pods 6s
Check to ensure that a new 99-worker-XXX-kubelet is created and that a new rendered worker is created:
$ oc get machineconfigs
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
...
99-worker-kubelet-managed fc45f8b73b2fc61e567f2111181d3e802f2565d7 3.1.0 7s
...
rendered-worker-45678XYZ fc45f8b73b2fc61e567f2111181d3e802f2565d7 3.1.0 2s
...
The changes should now be rolled out to each node in the worker pool via that new rendered-worker machine config. You can verify by checking that the latest rendered-worker machine-config has been rolled out to the pools successfully:
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
...
worker rendered-worker-45678XYZ True False False 3 3 3 0 5m
...
The KubeletConfigController would perform the following steps:
-
Validate the user defined KubeletConfig
-
Render the current MachineConfig (storage.files.contents[kubelet.conf]) into the KubeletConfiguration structure
-
Load the KubeletConfig from the passed in Spec.KubeletConfig
-
Use mergo to merge the two structures
-
Serialize the KubeletConfig to json
-
Create or Update an ignition /etc/kubernetes/kubelet.conf file within a 99-[role]-kubelet-managed MachineConfig
After deletion of the KubeletConfig instance the config will be reverted to the original kubelet config.
First time installs will always install with CRI-O. After the initial install, the user will be able to switch from CRI-O to Docker, or from Docker to CRI-O if commanded to.
Setting the options with the KubeletConfig CRD will cause the node to drain and reboot. Upon reboot, the machine will be reconfigured to use docker as the runtime via Machine Config Daemon and a rollout to each node.
Note: Docker is only available in Bring-Your-Own-RHEL (User Provisioned Infrastructure) configurations.
TODO: Network Daemon?
TODO: crictl configuration?
There are specific partitioning schemes that are supported.
-
Single Root Partition (/): Single partitioning scheme where everything is on a root partition
-
Control Partition (/var): There is a root (/) partition for the filesystem, in addition to a separate /var disk or partition for runtime specific data. This scheme is useful in cases where the customer wants a small root partition or run the runtime via another disk (ie: SSD)
-
Control and Log Partition (/var and /var/log): There is a root (/) partition for the filesystem, in addition to a separate /var and /log disk or partition. This scheme is useful in cases where the customer wants to protect from log files filling up the primary image or runtime disk.