New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-11997: Prevent NM from unsetting the hostname #3794
OCPBUGS-11997: Prevent NM from unsetting the hostname #3794
Conversation
@mkowalski: An error was encountered searching for bug OCPBUGS-11997 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details. Full error message.
You do not have the permission to see the specified issue.: request failed. Please analyze the request body for more details. Status code: 401:
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
/jira refresh |
@mkowalski: This pull request references Jira Issue OCPBUGS-11997, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@mkowalski: This pull request references Jira Issue OCPBUGS-11997, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test all |
Manual test for IPv4-only cluster succeeded
|
Manual test for IPv6-only good
|
It has been discovered that in environments that use revDNS for setting the hostname instead of DHCP Option 12 it is possible to intermittently lose it and try to register as `localhost.localdomain`. It happens in a scenario when we obtain the hostname from the DNS server (PTR record for the IP address of the node), later we lose DNS for any reason and during this time we reload the NM configuration. Because of its design, if the revDNS is not available anymore instead of persisting the previous name, NetworkManger will unset it completely leaving the node visible as `localhost.localdomain`. In order to prevent this behaviour we are modifying `node-valid-hostname.service` so that instead of only waiting for the non-localhost name to appear once (without detecting if it can intermittently disappear in the future) it also sets it using `hostnamectl set-hostname` so that NetworkManager will not try unsetting it even if no valid form is available anymore. The change is valid as a platform-agnostic change because we never allow to change the hostname during lifetime of the node. Thus, a scenario where we would like NetworkManager to update the hostname based on a changed revDNS record is not a valid one. This change is valid for DHCP Option 12 and revDNS-based hostnames because changing hostname via updating Option 12 field is not allowed as well. Fixes: OCPBUGS-11997
/retitle OCPBUGS-11997, OCPBUGS-14692, OCPBUGS-14918: Prevent NM from unsetting the hostname |
@mkowalski: This pull request references Jira Issue OCPBUGS-14918, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@mkowalski: This pull request references Jira Issue OCPBUGS-14918, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retitle OCPBUGS-11997: Prevent NM from unsetting the hostname |
@mkowalski: This pull request references Jira Issue OCPBUGS-11997, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-metal-ipi-ovn-dualstack |
this makes sense, hopefully CI will clear up |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cybertron, jkyros, mkowalski The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mkowalski: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
d0c0670
into
openshift:master
@mkowalski: Jira Issue OCPBUGS-11997: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-11997 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.13 |
@mkowalski: new pull request created: #3805 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I know I'm late but I still think we should strongly encourage customers that were implicitly relying on rDNS to stop doing so. Instead there are two possibilities that just make everything more reliable:
|
Hmm...in this wait loop, what is it that will end up causing the rDNS lookup to succeed? IOW when will NetworkManager query rDNS in this case? Is it only when the DHCP lease expires? |
In environments with ipv6 (both single and dual stack) we explicitly set the hostname to the FQDN. However, the recent change to make the current hostname static in the node-valid-hostname service[0] has created a conflict with the resolv-prepender script where we force the FQDN. If node-valid-hostname runs before the "up" event for the interface which provides the hostname, it's possible for it to statically set a short name, which prevents the prepender script from later changing it to an FQDN. This can happen because the up event is asynchronous. The network is marked up before the event is dispatched, and as a result we currently have a race between the event and the node-valid-hostname service. To ensure proper ordering, this change adds a new dependency to node-valid-hostname which is a service that simply wait for the br-ex up event. Once we've received that, we should have run the prepender script and set the FQDN if necessary. 0: openshift#3794
In environments with ipv6 (both single and dual stack) we explicitly set the hostname to the FQDN. However, the recent change to make the current hostname static in the node-valid-hostname service[0] has created a conflict with the resolv-prepender script where we force the FQDN. If node-valid-hostname runs before the "up" event for the interface which provides the hostname, it's possible for it to statically set a short name, which prevents the prepender script from later changing it to an FQDN. This can happen because the up event is asynchronous. The network is marked up before the event is dispatched, and as a result we currently have a race between the event and the node-valid-hostname service. To ensure proper ordering, this change adds a new dependency to node-valid-hostname which is a service that simply wait for the br-ex up event. Once we've received that, we should have run the prepender script and set the FQDN if necessary. 0: openshift#3794
In environments with ipv6 (both single and dual stack) we explicitly set the hostname to the FQDN. However, the recent change to make the current hostname static in the node-valid-hostname service[0] has created a conflict with the resolv-prepender script where we force the FQDN. If node-valid-hostname runs before the "up" event for the interface which provides the hostname, it's possible for it to statically set a short name, which prevents the prepender script from later changing it to an FQDN. This can happen because the up event is asynchronous. The network is marked up before the event is dispatched, and as a result we currently have a race between the event and the node-valid-hostname service. To ensure proper ordering, this change adds a new dependency to node-valid-hostname which is a service that simply wait for the br-ex up event. Once we've received that, we should have run the prepender script and set the FQDN if necessary. 0: openshift#3794
It has been discovered that in environments that use revDNS for setting the hostname instead of DHCP Option 12 it is possible to intermittently lose it and try to register as
localhost.localdomain
.It happens in a scenario when we obtain the hostname from the DNS server (PTR record for the IP address of the node), later we lose DNS for any reason and during this time we reload the NM configuration. Because of its design, if the revDNS is not available anymore instead of persisting the previous name, NetworkManger will unset it completely leaving the node visible as
localhost.localdomain
.In order to prevent this behaviour we are modifying
node-valid-hostname.service
so that instead of only waiting for the non-localhost name to appear once (without detecting if it can intermittently disappear in the future) it also sets it usinghostnamectl set-hostname
so that NetworkManager will not try unsetting it even if no valid form is available anymore.The change is valid as a platform-agnostic change because we never allow to change the hostname during lifetime of the node. Thus, a scenario where we would like NetworkManager to update the hostname based on a changed revDNS record is not a valid one.
This change is valid for DHCP Option 12 and revDNS-based hostnames because changing hostname via updating Option 12 field is not allowed as well.
Fixes: OCPBUGS-11997