New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1841260: Make resolv.conf prepend operation atomic #1763
Bug 1841260: Make resolv.conf prepend operation atomic #1763
Conversation
@cybertron: This pull request references Bugzilla bug 1841260, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @celebdor @stbenjam @mandre @rgolangh @patrickdillon |
@@ -25,7 +25,8 @@ contents: | |||
logger -s "NM resolv-prepender: Prepending 'nameserver $NAMESERVER_IP' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf)" | |||
sed -e "/^search/d" \ | |||
-e "/Generated by/c# Generated by KNI resolv prepender NM dispatcher script\nsearch $DOMAIN\nnameserver $NAMESERVER_IP" \ | |||
/var/run/NetworkManager/resolv.conf > /etc/resolv.conf | |||
/var/run/NetworkManager/resolv.conf > /etc/resolv.tmp | |||
mv -f /etc/resolv.tmp /etc/resolv.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a sed below at well that limits us to the top 3 nameservers. Should we do it all (including that sed) in a temp file and then copy the file result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we should. That fits well with cgwalters suggestion below as well.
@cgwalters reccomended using Maybe we could also run the script through shellcheck just for good measure? |
@@ -25,7 +25,8 @@ contents: | |||
logger -s "NM resolv-prepender: Prepending 'nameserver $NAMESERVER_IP' to /etc/resolv.conf (other nameservers from /var/run/NetworkManager/resolv.conf)" | |||
sed -e "/^search/d" \ | |||
-e "/Generated by/c# Generated by KNI resolv prepender NM dispatcher script\nsearch $DOMAIN\nnameserver $NAMESERVER_IP" \ | |||
/var/run/NetworkManager/resolv.conf > /etc/resolv.conf | |||
/var/run/NetworkManager/resolv.conf > /etc/resolv.tmp | |||
mv -f /etc/resolv.tmp /etc/resolv.conf | |||
else | |||
logger -s "NM resolv-prepender: Couldn't find a Virtual IP, just updating resolv.conf" | |||
cp /var/run/NetworkManager/resolv.conf /etc/resolv.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cp
is also not atomic - this is a really big trap that a whole lot of people have fallen into. You could probably unify these by having the second path do
cp /var/run/NetworkManager/resolv.conf /etc/resolv.tmp
and then running
mv -f /etc/resolv.tmp /etc/resolv.conf
after the if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Will do.
Hmm, just noticed there is a set +e in here. I'm unsure why we would have done that. I don't see anything that would be expected to fail after that. :-/ |
Redirecting the output of sed directly to resolv.conf is not an atomic operation. This is dangerous because resolv.conf will be truncated before sed completes, which means if something happens in the meantime resolv.conf will be empty. Additionally, the cp command is not atomic either, so we want to do that in a similar fashion. This change moves the sed and cp operations to work on a temporary file, then once everything is completed the temp file is moved to /etc/resolv.conf, which is an atomic operation.
Since we dropped the DNS VIP, there is no longer any meaningful difference between these two files.
If a command fails, we don't want to blindly continue with potentially garbage data. Consider if the podman call to get the correct IP fails. We might prepend nonsense to resolv.conf and it certainly won't do what we want. set -eo pipefail will cause the script to stop if anything fails. We won't then mess with resolv.conf if we don't have the appropriate data first. Since we're now operating on a temp file, I also removed the set +e from before the sed commands. I'm not entirely sure what it was there in the first place, but it shouldn't be necessary now and I don't think we want to continue if one of these commands fails.
6429135
to
5be02dc
Compare
/approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, cybertron The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-gcp-upgrade |
/retest Please review the full test history for this PR and help us cut down flakes. |
@cybertron: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@cybertron: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1763. Bugzilla bug 1841260 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This aligned OpenStack and vSphere platforms with with the BM one after openshift#1763 and openshift#1765.
This aligned OpenStack and vSphere platforms with with the BM one after openshift#1763 and openshift#1765.
This PR includes two changes:
It makes resolv.conf prepending atomic. This prevents a potential problem
in case the sed is interrupted, which could leave an empty resolv.conf.
It moves the baremetal dispatcher script to common. With the removal of the
DNS VIP, there is no longer any difference between the master and worker
versions of this. This can't be done for the other platforms yet because they
still rely on the DNS VIP.
- Description for the changelog
Make resolv.conf prepend operation atomic to prevent potential race conditions.