New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO fails to apply rendered configuration to infra nodes #1270
Comments
|
As per https://github.com/openshift/machine-config-operator/blob/master/docs/custom-pools.md You are not changing the default pool but adding your custom pool. Custom pools will inherit from worker so that custom pools will have labels worker and infra(or whatever the name is). Your infra pool only has an infra role in step 5 which will cause MCs to not roll out to it. |
|
machineConfigSelector renders all machine config under matchExpressions condition into a single rendered configuration, and nodeSelector applies that rendered configuration to all nodes that match the labels criteria. As infra nodes are matched by nodeSelector, the MCP should affect them. Indeed, the custom MCP works well if it is created after labeled nodes exists, but not if a new node is created. It seems that the exception is not correctly catched, making the pod to be restarted forever and corrupting the normal operator behavior. There is a null pointer exception in the error trace that makes the controller to stop working. |
|
Please add the infra label in addition to the worker label as shown in the documentation and report back if you hit the same issue, thanks: The MCO dictate that custom pools inherit from worker pools. Your step 4 above shows that your node is labelled incorrectly. The error logs (above) show that you are hitting a problem when it is getting pools for the node as it expects a custom pool and a worker pool. |
|
Note: will an nil check to error out to avoid pushing nil and ensuring that nodes with custom pools have correct labelling (custom pool & worker) |
|
Could you clarify -- what labels should infrastructure-only nodes have, and how should clusters be configured so that workloads aren't scheduled on them? I'm on UPI. I first used the following to ensure that workloads are only scheduled on nodes with the worker label:
To create the infra nodes I started by booting machines with the worker ignition config. After signing the initial bootstrap CSR, I used "oc edit node" to replace the "worker" label with "infra", and then signed the machine certificate. I then moved the infra components as described in https://docs.openshift.com/container-platform/4.1/machine_management/creating-infrastructure-machinesets.html. This worked well on 4.1 -- normal workloads ended up on normal workers, and infra workloads ran on the infra nodes. (We don't want to mix the two, as infra-only nodes don't count toward licensing.) If infra nodes are expected to have both "infra" and "worker" labels under 4.2, then I suppose we'd have to apply a "yesthisreallyisaworker" label to "real" workers, and use that as the default node selector for scheduling. |
|
You can use the nodeSelector in: worker/ not in: infra or you could taint the nodes: |
|
I am also facing similar error by reproducing the steps. |
infra/worker tag is correct and I believe the official documentation will be updated w/r/t taints to land workloards correctly. for now the official docs do show that infra/worker is the correct setup:https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/machine_management/creating-infrastructure-machinesets#moving-resources-to-infrastructure-machinesets |
|
It has been fixed in last Openshift versions BZ-1772490 and BZ-1772680 |
Description
When using a custom machine config pool for infra nodes, the MCO fails to manage the rolling update when a new
workernode is labeled toinfra.Steps to reproduce the issue
workerlabel toinfrato apply custom MCP:Describe the results you received
The MCO cannot handle the label change because the
machine-config-controlleris continually restarted due to a memory violation error.Describe the results you expected:
The MCO should apply MachineConfigPool configuration to the new infra node as expected.
Additional environment details (platform, options, etc.):
The text was updated successfully, but these errors were encountered: