New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-29290: AWS: Always persist the existing node name on 4.14 #4215
OCPBUGS-29290: AWS: Always persist the existing node name on 4.14 #4215
Conversation
@JoelSpeed: This pull request references Jira Issue OCPBUGS-29290, which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
9055072
to
02ac771
Compare
/payload-job periodic-ci-openshift-release-master-nightly-4.14-upgrade-from-stable-4.13-e2e-aws-sdn-upgrade |
@JoelSpeed: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/003408b0-d589-11ee-8c09-fd4f3aba27c1-0 |
02ac771
to
e4ccde8
Compare
@JoelSpeed: This pull request references Jira Issue OCPBUGS-29290, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@JoelSpeed: This pull request references Jira Issue OCPBUGS-29290, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/label qe-approved |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bash syntax and reasoning
/lgtm
Is there an automated test in CI that exercises this scenario or did you test this manually somehow?
Thanks
# We must ensure that it persists across this upgrade boundary by writing the current node name out, no matter what we expected it to be. | ||
if [ -e "${CURRENT_CLIENT_CERT}" ]; then | ||
HOSTNAME=$(openssl x509 -noout -subject -in "${CURRENT_CLIENT_CERT}" | sed 's/.*CN = //' | sed 's/\"//g' | sed 's/system:node://') | ||
if [[ ! -z "${HOSTNAME}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible silly Q / nit
why are we checking that the hostname is non empty here? is it possible that the extraction above could fail? do we want explicit error handling in that case? or will
if [[ ! -z "${HOSTNAME}" ]]; then
skipping the following code be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected it may be possible that the file had been created but was empty, in that case, I just wanted to fallback to the original logic, which I believe is what will happen given this currently
one question, but other than that /lgtm |
/label backport-risk-assessed |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damdo, JoelSpeed, sinnykumari, theobarberbany The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
@JoelSpeed: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
8f8dbba
into
openshift:release-4.14
@JoelSpeed: Jira Issue OCPBUGS-29290: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-29290 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-machine-config-operator-container-v4.14.0-202402291242.p0.g8f8dbba.assembly.stream.el8 for distgit ose-machine-config-operator. |
Fix included in accepted release 4.14.0-0.nightly-2024-02-29-134959 |
On AWS, we persist the hostname into a file on disk, with the idea being that, this value is passed as a
--hostname-override
to kubelet, such that it should register with the overriden hostname to the cluster.In 4.13 and lower, if the hostname override is set, it is ignored as the AWS cloud provider integration takes precedence.
To ensure the upgrade from 4.13 to 4.14 is smooth (where the in-tree logic is removed), we must persist the hostname that the node originally registered with. If we do not, then the host name may change between versions, which breaks the upgrade and requires manual intervention to recover the nodes by approving new certificates, and removing the old node objects.