Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change SSH taint to annotation instead #359

Merged

Conversation

yuqi-zhang
Copy link
Contributor

After some discussion we noted that SSH tainting nodes breaks upstream tests, as well as disrupts many existing workflows without adding much security. Thus we will move to an annotation instead to mark a node as tainted, and warn the admin through e.g. the console when this is applied.

This builds on top of #291 (rebased directly with fixes). Alternatively we could rebase/merge that I can rebase.

This also supercedes #290.

cc/ @cgwalters @ashcrow

cgwalters and others added 4 commits January 31, 2019 16:26
Just add the basic skeleton.
This way nothing lives in the critical path of ssh to host; which
admins will turn to (ideally only) when something is badly wrong.
It's also just less code and keeps the functionality in one place.
We still have to port the tests.
Remove ssh taint file writing, as the file was used by sshd
before and is no longer needed in the logind workflow. Also
write errors to exitCh instead of logging.

Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 31, 2019
@ashcrow
Copy link
Member

ashcrow commented Jan 31, 2019

I think moving to this PR makes the most sense. I'd recommend the other two to close in favor of this one.

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Waiting for e2e and a second review before 👍

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 31, 2019
@yuqi-zhang
Copy link
Contributor Author

/retest

annos: annos,
responseChannel: respChan,
}
return <-respChan
Copy link
Member

@runcom runcom Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit maybe: if this takes too long, we may want to pass down exitCh (or stopCh, still reading the code) for cancelation since this calls the API over the network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't think I've seen this hang before, but either case this is only called in a separate goroutine for SSH monitoring, so it should be fine either way.

That said its good to investigate, all the other annotation writing uses the same mechanism, so its probably better to do that in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, context and cancelation are generally good practice whenever goroutines are involved (especially for rest client and stuff like this). And ideally I'd love all this to have a context. But yeah, let's defer for another PR.

@@ -14,6 +14,8 @@

MachineConfigDaemon is scheduled on the machines in a cluster as a DaemonSet. This daemon is responsible for performing machine updates in OpenShift 4. The update will include tasks related to the systemd units, files on disk, operating system upgrades etc. The MachineConfigDaemon updates a machine to configuration defined by MachineConfig as instructed by the MachineConfigController.

The MachineConfigDaemon is also responsible for annotating a node with `machineconfiguration.openshift.io/rhcosSSH=accessed` when it detects an SSH access to the machine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a bikeshed but "rhcosSSH" looks awkward. How about just machineconfiguration.openshift.io/ssh=accessed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works from my POV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change


## Annotating on SSH access

RHCOS nodes in Openshift are not meant to be manually accessed via SSH. MCD uses logind to watch for login sessions, which, upon detection, warns the user and annotates the node with `machineconfiguration.openshift.io/rhcosSSH=accessed`. This in turn will be used to warn cluster admins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I think we're going to need to link to a more canonical doc on this as we're...somewhat inconsistent in our "is SSH allowed/encouraged" phrasing. But this is OK for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@yuqi-zhang
Copy link
Contributor Author

Note that this doesn't actually pick up e.g. oc rsh access, nor is it able to detect SSH access before MCD is running on the node.

Remove the package since we use logind to detect ssh access,
and thus no longer need to watch files.

Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
@cgwalters
Copy link
Member

nor is it able to detect SSH access before MCD is running on the node.

journalctl MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66 will give us any logins before.

@cgwalters
Copy link
Member

(We could also just set up a persistent journald query instead of watching dbus, but eh)

Change behaviour from tainting a node after detecting an SSH access
to instead annotating it. This annotation should be used to warn
admins through e.g. the console that an SSH access occurred. Remove
taint functionality from nodewriter now that its unused. Also add
docs for this behaviour.

Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
@yuqi-zhang
Copy link
Contributor Author

/retest

@ashcrow
Copy link
Member

ashcrow commented Feb 1, 2019

test "release-latest" failed: pod release-latest was already deleted

Retry in a ~20 minutes

@yuqi-zhang
Copy link
Contributor Author

yuqi-zhang commented Feb 1, 2019

journalctl MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66

True, we could use that check for ssh access since last reboot when MCD starts up, although if an admin was monitoring for the annotation and deletes it after handling alerts, it could wrongly flag an ssh access if e.g. the MCD image was updated, causing MCD to restart.

In either case since we don't really have a finalized workflow for how this annotation is going to be used, I think it's probably better to modify as needed later.

@ashcrow
Copy link
Member

ashcrow commented Feb 1, 2019

/retest

@cgwalters
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

glog.Infof("Detected a new login session: %v", msg)
glog.Infof("Login access is discouraged! Applying annotation: %v", MachineConfigDaemonSSHAccessAnnotationKey)
if err := dn.nodeWriter.SetSSHAccessed(dn.kubeClient.CoreV1().Nodes(), dn.name); err != nil {
exitCh <- fmt.Errorf("Error: cannot apply annotation for SSH access due to: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this will degrade the node if we can't set the annotation. Do we really want that? (Not rhetorical, actually asking. :)). I'm leaning more towards not making this a hard error and just logging it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we can switch to a warning, but as far as I can tell all other failed nodewriter invocations also degrade the node. If we cannot write an annotation there is likely an issue with the node, and future updates, etc. would fail.

I don't think there is a specific case that an inability to write an annotation would come only from this?

@openshift-merge-robot openshift-merge-robot merged commit 6c3e3e6 into openshift:master Feb 1, 2019
@yuqi-zhang yuqi-zhang deleted the ssh-annotates-node branch August 26, 2019 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants