New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
templates: use Afterburn for setting GCP hostnames #2217
templates: use Afterburn for setting GCP hostnames #2217
Conversation
|
/hold I'm NOT sure I like this much better than the other code....its a hack, albeit a lesser hack. I would need to be convinced by others that this isn't horrible. My objection to this code is that it does not fix the problem in a better place; it's still using a hack script to accomplish the same goal. The reason I submitted this code is that it gets us out of using the NetworkManager dispatcher script which has been buggy. Using Afterburn to write the hostname to an ephemeral location and run every boot is using Afterburn in unattended ways; this alone makes it a hack. In terms of the carrying cost to MCO is much better than the existing code. So if we're carrying hacks, this narrowly scopes the hack to GCP, we stop fighting NetworkManager, stop checking magic strings, and it runs once at boot. |
|
With RHCOS, Afterburn is enabled for GCP hostnames in 4.7. This patch would only be applicable for upgrading clusters and would be a bridge between the NetworkManager fix landing in RH/RHCOS and now. A nice side effect is that it fixes OKD. |
3ae201b
to
1fa53eb
Compare
|
/retest |
|
Thank you for putting this up @darkmuggle. Question: The node-valid-hostname.service that calls set-valid-hostname.sh still exists in this repo, even after this PR. Is the goal to get rid of that too eventually and handle it elsewhere (in NM/Afterburn)? |
With this change, the code shifts from checking magic strings to ensuring each GCP node gets a hostname. Looking at this, I have half a mind to push this unit into FCOS and thereby RHCOS. When we put this into the MCO, we assumed that there was no place where we could fix the hostnames in the OS. We would still need to leave the The argument for NOT including this in FCOS/RHCOS is that the OS does not require a hostname, while the cluster does. |
1fa53eb
to
578f98d
Compare
The only platform where over-log hostnames have been encountered is on GCP. The code has proven buggy, racy and caused a bunch of BZ. To unwind this mess, on GCP, the new behavior: - leaves disabling NetworkManager on GCP - on each boot run Afterburn to fetch the hostname and writes it an ephemeral location - uses the existing checks to truncate the length Finally, this DROPS the NetworkManager dispatcher. FCOS/RHCOS is pursuing a more permanent solution. Signed-off-by: Ben Howard <ben.howard@redhat.com>
578f98d
to
631db89
Compare
|
/test okd-e2e-gcp-op |
|
/cherry-pick release-4.6 |
|
@LorbusChris: once the present PR merges, I will cherry-pick it on top of release-4.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test okd-e2e-upgrade |
|
Logs on a GCP cluster look good: |
|
👍 from my side |
|
To unblock OKD, this "hack" is better, and @lucab indicated in Slack that this acceptable. Afterburn will write an ephemeral hosts name and no files are modified so removal is straightforward. We get out of checking magic strings, and ensure that GCP has a good hostname. /unhold |
|
/test e2e-aws-serial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the above context I am tentatively in favour of this approach. Should do some upgrade tests etc. on GCP as well
|
/retest |
|
@darkmuggle: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/test e2e-aws-serial |
Workaround for OKD until openshift/machine-config-operator#2217 merges
|
/test okd-e2e-aws |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ashcrow, darkmuggle, sinnykumari, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
|
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
cluster failed to be provisioned |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/test e2e-aws |
|
/test okd-e2e-gcp-op |
|
/retest The failures in CI are all unrelated to the change....sigh. |
|
/retest |
|
@darkmuggle: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@LorbusChris: new pull request created: #2286 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The only platform where over-log hostnames have been encountered is on
GCP. The code has proven buggy, racy, and caused a bunch of BZ.
To unwind this mess, on GCP, the new behavior:
Finally, this DROPS the NetworkManager dispatcher. FCOS/RHCOS is
pursuing a more permanent one.