-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-19537: OCB should fail if node is not coreos based #4442
Conversation
@cheesesashimi @inesqyx PTAL. I am not completely sure if I am on the right track, this is where it seems we check whether OCL is enabled via the feature gate. |
Here are some ideas to consider: The Each $ oc get node ip-10-0-27-18.ec2.internal -o yaml | yq '.status.nodeInfo'
architecture: amd64
bootID: 63e74981-e918-46fc-b9c4-99374a5a620e
containerRuntimeVersion: cri-o://1.29.4-6.rhaos4.16.git0e93ae2.el9
kernelVersion: 5.14.0-427.20.1.el9_4.x86_64
kubeProxyVersion: v1.30.1+9798e19
kubeletVersion: v1.30.1+9798e19
machineID: ec28d02b7ad61b67b62f3681741924b9
operatingSystem: linux
osImage: Red Hat Enterprise Linux CoreOS 417.94.202406050030-0
systemUUID: ec28d02b-7ad6-1b67-b62f-3681741924b9 Then, within the MCO, whenever we get a new MachineOSConfig, we can get a list of all of the nodes for the targeted MachineConfigPool, pass them into our modified |
73a925b
to
1ac72d7
Compare
Thanks @cheesesashimi, I have updated the PR based on your review to grab the osImage info from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off to a great start so far! The only improvement I'm specifically requesting is the addition of the node name and the name(s) of the MachineConfigPool(s) it's a member of to the error message. My other ideas are purely at your discretion 😄
@sergiordlr can you please help verify we error out when adding a non-coreos based node. |
@@ -164,3 +179,16 @@ func ListPools(node *corev1.Node, mcpLister v1.MachineConfigPoolLister) (*mcfgv1 | |||
|
|||
return master, worker, custom, nil | |||
} | |||
|
|||
// IsCoreOSNode checks whether the pretty name of a node matches any of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (non-blocking): I was originally going to suggest for this to be moved to pkg/daemon/osrelease
before I noticed the IsWindows()
function above it, which made me realize that this is a fine home for this function. However, seeing the IsWindows()
function brings up a question about what we should do about OCL for Windows nodes since they can't be updated that way. At the very least, we may want to do a small spike to better understand that relationship.
(To be clear: I'm not asking you to make any changes; I'm just putting my thoughts here for posterity 😄)
@@ -39,11 +34,11 @@ func TestOSDetection(t *testing.T) { | |||
assert.False(t, nodeOSRelease.OS.IsLikeTraditionalRHEL7(), "expected IsLikeTraditionalRHEL7() to be false: %s", nodeOSRelease.EtcContent) | |||
|
|||
switch { | |||
case strings.Contains(nodeOSRelease.EtcContent, rhcos): | |||
case strings.Contains(nodeOSRelease.EtcContent, osrelease.RHCOS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Nice work here! Thanks for making this change.
@umohnani8: This pull request references Jira Issue OCPBUGS-19537, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@umohnani8: This pull request references Jira Issue OCPBUGS-19537, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
If a non-coreos based node, such as RHEL8, is added to a cluster and the user tries to enable on cluster builds, throw an error as that is not supported on non-coreos based nodes. Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
/retest |
This is ready, @sergiordlr PTAL |
Pre-merge verified -
Added RHEL nodes using ocp4-scale-runner flexy job
Enabled and applied OCB on custom MCP rhel node
We are able to see error logs message in operator instead of RHEL node logs and MCP is not in degraded state
adding qe-approved label /label qe-approved Thankyou! |
@umohnani8: This pull request references Jira Issue OCPBUGS-19537, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (ptalgulk@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@cheesesashimi PTAL at the test results above. This is ready to be merged! |
Nice work! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheesesashimi, umohnani8 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@yuqi-zhang @cheesesashimi any idea how to get the label needed to merge this - it wants the |
We can apply this label if we want this bug to merge this week, anyone should have the ability. The label was added as a requirement, since we wanted to be more stable for shift week. The requirement should drop off by next week and this can merge normally as well. |
@umohnani8: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@umohnani8: Jira Issue OCPBUGS-19537: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-19537 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
If a non-coreos based node, such as RHEL8, is added to a cluster and the user tries to enable on cluster builds, throw an error as that is not supported on non-coreos based nodes.
Fixes https://issues.redhat.com/browse/OCPBUGS-19537
- What I did
Added a check for when OCB is enabled to verify which OS the nodes have and to fail if they are non-coreos based.
- How to verify it
Start a cluster, add a RHEL node to it and then try enabling OCB
- Description for the changelog
Throw an error if cluster has non-coreos node when trying to enable OCB