-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2020003: Node upgrade stuck due to not writing through dangling symlink '/etc/machine-config-daemon/orig/etc/issue.mcdorig' #2681
Conversation
@jkyros: This pull request references Bugzilla bug 1970959, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@jkyros: This pull request references Bugzilla bug 1970959, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question
@@ -1605,11 +1610,11 @@ func (dn *Daemon) writeFiles(files []ign3types.File) error { | |||
} | |||
|
|||
func origParentDir() string { | |||
return filepath.Join("/etc", "machine-config-daemon", "orig") | |||
return origParentDirPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you did this for testing (?), but is there a reason you can't just mock the return value of this func? instead of having the var and func both existing at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am testing to make sure createOrigFile() is working properly, and I need a way to get it to write its files to a temporary directory during the test, but the file locations are essentially hard-coded because of those function calls.
Maybe I shouldn't worry about the test (or maybe it should be an e2e test because it does file i/o?). I don't know where you want to draw the line.
Anyway, my options as I saw them were:
- make origParentDir/noOrigParentDir function pointers and then mock/override their functions for the test:
var origParentDir = func() string { return filepath.Join("/etc", "machine-config-daemon", "orig") }
- make package variables for the strings and have createOrigFile use those strings directly instead of the function calls ( I figured that would be "too much change" for no benefit )
- have origParentDir/noOrigParentDir/createOrigFile take an path/prefix argument so I could redirect where they put their files ( again, I felt like that was too much change for no benefit)
- make package variables for the path strings and just have the existing functions return them
I went with option 4, but I can do option 1 if that's preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review, I think this generally LGTM, thanks for the tests!
I think the current variable paths method is fine, but will defer to others for a review as well.
One question below:
err = createOrigFile(relativeSymlink, relativeSymlink) | ||
assert.Nil(t, err) | ||
|
||
// Finally, make sure we can restore the relative symlink if we rollback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure, even if it is a dangling symlink, we still restore it now correctly right?
I see we do a cp -a --reflink=auto
so it seems that it should restore the (still dangling) symlink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we still restore it correctly. I was worried about that too! :)
Like in the case of /etc/issue, the symlink wasn't "wrong", it was only dangling because of where we put it, if we restored it, it wouldn't be dangling anymore, it would just be relative (and correct).
@@ -58,6 +58,11 @@ const ( | |||
postConfigChangeActionReloadCrio = "reload crio" | |||
) | |||
|
|||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variable are not supposed to be updated, so this could be made const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I would have made them const if they weren't paths being assembled with filepath.Join (which I assume was done as good practice for cross-platform separators, but I don't know how likely it is that this will need to be cross platform)
- If I make them constants, I can't assign to them in the test to redirect the files for testing
I am not sure if this is the right way to fix the bug. I looked at the bug and it seems, user is trying to update content of /etc/issue by applying a MachineConfig. On a 4.8 cluster RHCOS node, I see:
/etc/issue is symlink to /usr/lib/issue file. On RHCOS system, /usr is read-only filesystem, so in no way MCO would be able to edit the content of file. Ideally I think we should not even try to create either a backup of original file or try to write updated content for read-only content. MCO should either return an error that @cgwalters thoughts? |
I see what you're saying. writeFileAtomically's behavior is to replace the symlink with a file, not to write the file contents to the symlink's target
because under the hood it's a "rename" operation not a direct write.
I was assuming this was "desired behavior" 😄 Also...the compliance operator seems to encourage modifying /etc/issue on RHCOS as part of its audits -- at least one of the users that hit this was using the compliance operator -- so while the current behavior is maybe non-intuitive/incorrect, if we do change this to act on the target file vs the symlink itself, there will be some consequence. |
while this is being sorted out |
I think there's 2 issues here:
Which are somewhat tangential. We don't really do a great job of guarding against 2, so it's on a per-error basis. I agree with Sinny in that in the particular case of the BZ the user should have probably just written to As for the compliance operator, if we overwrite Would that be an ok path? (merge this for 1, move BZ for 2) |
@jkyros I agree that MCO is not really doing good job in updating symlink files. @yuqi-zhang sure, we can go ahead with this PR for in general improvement and move this bug to compliance team to see if they can update instructions to write to /etc/issue.d/ instead of in /etc/issue file. Suggesting to directly write directly into /etc/issue by making symlink to a file shouldn't be ideally done since that file is coming a spart of OS update and may cause issue(RHCOS team would have better idea) In addition to above two points, I think we should also create a story for MCO to discuss about right way of handling symlink files for both cases editable and read-only. Based on above discussion, this PR would need update in messaging and perhaps creating a new bz if we want to backport to older releases. |
/bugzilla refresh The requirements for Bugzilla bugs have changed, recalculating validity. |
@openshift-merge-robot: This pull request references Bugzilla bug 1970959, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@jkyros: This pull request references Bugzilla bug 1970959, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Add symlink check before existence check for original file backup, so the mcd regards dangling symlinks as present and doesn't try to overwrite them on later updates. This will resolve a bug where the mcd would back up a relative symlink and then try (and fail) to overwrite it next time config got applied, causing the mcd to degrade. This does NOT address the problem of a user undesirably overwriting symlinks with files, it only prevents the MCO from degrading in the event that such a write were to happen. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1970959
Thanks John for making all the changes. LGTM |
/retest |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@jkyros: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
15 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@jkyros: All pull requests linked via external trackers have merged: Bugzilla bug 2020003 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR fixes a condition where:
This PR adds a symlink check before the file existence check, to make sure we don't try to back up over the dangling symlink.
This also adds a unit test for original file backups that tests the symlink case as well as the "normal" case. I broke the backup file paths out into package variables so I could override them for the test.
This PR does NOT address the problem of a user undesirably using machine config to overwrite symlinks with files, it only prevents the MCO from degrading in the event that such a configuration were to occur.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2020003