-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change SSH taint to annotation instead #359
Change SSH taint to annotation instead #359
Conversation
Just add the basic skeleton.
This way nothing lives in the critical path of ssh to host; which admins will turn to (ideally only) when something is badly wrong. It's also just less code and keeps the functionality in one place.
We still have to port the tests.
Remove ssh taint file writing, as the file was used by sshd before and is no longer needed in the logind workflow. Also write errors to exitCh instead of logging. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
I think moving to this PR makes the most sense. I'd recommend the other two to close in favor of this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Waiting for e2e and a second review before 👍
/retest |
annos: annos, | ||
responseChannel: respChan, | ||
} | ||
return <-respChan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit maybe: if this takes too long, we may want to pass down exitCh
(or stopCh
, still reading the code) for cancelation since this calls the API over the network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I don't think I've seen this hang before, but either case this is only called in a separate goroutine for SSH monitoring, so it should be fine either way.
That said its good to investigate, all the other annotation writing uses the same mechanism, so its probably better to do that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, context and cancelation are generally good practice whenever goroutines are involved (especially for rest client and stuff like this). And ideally I'd love all this to have a context. But yeah, let's defer for another PR.
docs/MachineConfigDaemon.md
Outdated
@@ -14,6 +14,8 @@ | |||
|
|||
MachineConfigDaemon is scheduled on the machines in a cluster as a DaemonSet. This daemon is responsible for performing machine updates in OpenShift 4. The update will include tasks related to the systemd units, files on disk, operating system upgrades etc. The MachineConfigDaemon updates a machine to configuration defined by MachineConfig as instructed by the MachineConfigController. | |||
|
|||
The MachineConfigDaemon is also responsible for annotating a node with `machineconfiguration.openshift.io/rhcosSSH=accessed` when it detects an SSH access to the machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a bikeshed but "rhcosSSH" looks awkward. How about just machineconfiguration.openshift.io/ssh=accessed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either works from my POV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will change
docs/MachineConfigDaemon.md
Outdated
|
||
## Annotating on SSH access | ||
|
||
RHCOS nodes in Openshift are not meant to be manually accessed via SSH. MCD uses logind to watch for login sessions, which, upon detection, warns the user and annotates the node with `machineconfiguration.openshift.io/rhcosSSH=accessed`. This in turn will be used to warn cluster admins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I think we're going to need to link to a more canonical doc on this as we're...somewhat inconsistent in our "is SSH allowed/encouraged" phrasing. But this is OK for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Note that this doesn't actually pick up e.g. |
Remove the package since we use logind to detect ssh access, and thus no longer need to watch files. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
|
(We could also just set up a persistent journald query instead of watching dbus, but eh) |
Change behaviour from tainting a node after detecting an SSH access to instead annotating it. This annotation should be used to warn admins through e.g. the console that an SSH access occurred. Remove taint functionality from nodewriter now that its unused. Also add docs for this behaviour. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
4176673
to
e330723
Compare
/retest |
Retry in a ~20 minutes |
True, we could use that check for ssh access since last reboot when MCD starts up, although if an admin was monitoring for the annotation and deletes it after handling alerts, it could wrongly flag an ssh access if e.g. the MCD image was updated, causing MCD to restart. In either case since we don't really have a finalized workflow for how this annotation is going to be used, I think it's probably better to modify as needed later. |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ashcrow, cgwalters, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
glog.Infof("Detected a new login session: %v", msg) | ||
glog.Infof("Login access is discouraged! Applying annotation: %v", MachineConfigDaemonSSHAccessAnnotationKey) | ||
if err := dn.nodeWriter.SetSSHAccessed(dn.kubeClient.CoreV1().Nodes(), dn.name); err != nil { | ||
exitCh <- fmt.Errorf("Error: cannot apply annotation for SSH access due to: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this will degrade the node if we can't set the annotation. Do we really want that? (Not rhetorical, actually asking. :)). I'm leaning more towards not making this a hard error and just logging it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we can switch to a warning, but as far as I can tell all other failed nodewriter invocations also degrade the node. If we cannot write an annotation there is likely an issue with the node, and future updates, etc. would fail.
I don't think there is a specific case that an inability to write an annotation would come only from this?
After some discussion we noted that SSH tainting nodes breaks upstream tests, as well as disrupts many existing workflows without adding much security. Thus we will move to an annotation instead to mark a node as tainted, and warn the admin through e.g. the console when this is applied.
This builds on top of #291 (rebased directly with fixes). Alternatively we could rebase/merge that I can rebase.
This also supercedes #290.
cc/ @cgwalters @ashcrow