Change SSH taint to annotation instead #359

yuqi-zhang · 2019-01-31T21:35:01Z

After some discussion we noted that SSH tainting nodes breaks upstream tests, as well as disrupts many existing workflows without adding much security. Thus we will move to an annotation instead to mark a node as tainted, and warn the admin through e.g. the console when this is applied.

This builds on top of #291 (rebased directly with fixes). Alternatively we could rebase/merge that I can rebase.

This also supercedes #290.

cc/ @cgwalters @ashcrow

Just add the basic skeleton.

This way nothing lives in the critical path of ssh to host; which admins will turn to (ideally only) when something is badly wrong. It's also just less code and keeps the functionality in one place.

We still have to port the tests.

Remove ssh taint file writing, as the file was used by sshd before and is no longer needed in the logind workflow. Also write errors to exitCh instead of logging. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>

ashcrow · 2019-01-31T21:46:04Z

I think moving to this PR makes the most sense. I'd recommend the other two to close in favor of this one.

ashcrow

Changes look good. Waiting for e2e and a second review before 👍

yuqi-zhang · 2019-01-31T22:33:53Z

/retest

runcom · 2019-02-01T12:30:28Z

pkg/daemon/writer.go

+                annos:           annos,
+                responseChannel: respChan,
+        }
+        return <-respChan


nit maybe: if this takes too long, we may want to pass down exitCh (or stopCh, still reading the code) for cancelation since this calls the API over the network

Sure, I don't think I've seen this hang before, but either case this is only called in a separate goroutine for SSH monitoring, so it should be fine either way.

That said its good to investigate, all the other annotation writing uses the same mechanism, so its probably better to do that in a separate PR.

Well, context and cancelation are generally good practice whenever goroutines are involved (especially for rest client and stuff like this). And ideally I'd love all this to have a context. But yeah, let's defer for another PR.

cgwalters · 2019-02-01T13:32:49Z

docs/MachineConfigDaemon.md

@@ -14,6 +14,8 @@

 MachineConfigDaemon is scheduled on the machines in a cluster as a DaemonSet. This daemon is responsible for performing machine updates in OpenShift 4. The update will include tasks related to the systemd units, files on disk, operating system upgrades etc. The MachineConfigDaemon updates a machine to configuration defined by MachineConfig as instructed by the MachineConfigController.

+The MachineConfigDaemon is also responsible for annotating a node with `machineconfiguration.openshift.io/rhcosSSH=accessed` when it detects an SSH access to the machine.


I know this is a bikeshed but "rhcosSSH" looks awkward. How about just machineconfiguration.openshift.io/ssh=accessed ?

Either works from my POV.

Sure, will change

cgwalters · 2019-02-01T13:34:01Z

docs/MachineConfigDaemon.md

+
+## Annotating on SSH access
+
+RHCOS nodes in Openshift are not meant to be manually accessed via SSH. MCD uses logind to watch for login sessions, which, upon detection, warns the user and annotates the node with `machineconfiguration.openshift.io/rhcosSSH=accessed`. This in turn will be used to warn cluster admins.


At some point I think we're going to need to link to a more canonical doc on this as we're...somewhat inconsistent in our "is SSH allowed/encouraged" phrasing. But this is OK for now.

yuqi-zhang · 2019-02-01T14:44:43Z

Note that this doesn't actually pick up e.g. oc rsh access, nor is it able to detect SSH access before MCD is running on the node.

Remove the package since we use logind to detect ssh access, and thus no longer need to watch files. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>

cgwalters · 2019-02-01T14:49:06Z

nor is it able to detect SSH access before MCD is running on the node.

journalctl MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66 will give us any logins before.

cgwalters · 2019-02-01T14:49:44Z

(We could also just set up a persistent journald query instead of watching dbus, but eh)

Change behaviour from tainting a node after detecting an SSH access to instead annotating it. This annotation should be used to warn admins through e.g. the console that an SSH access occurred. Remove taint functionality from nodewriter now that its unused. Also add docs for this behaviour. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>

yuqi-zhang · 2019-02-01T14:59:55Z

/retest

ashcrow · 2019-02-01T15:03:53Z

test "release-latest" failed: pod release-latest was already deleted

Retry in a ~20 minutes

yuqi-zhang · 2019-02-01T15:14:14Z

journalctl MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66

True, we could use that check for ssh access since last reboot when MCD starts up, although if an admin was monitoring for the annotation and deletes it after handling alerts, it could wrongly flag an ssh access if e.g. the MCD image was updated, causing MCD to restart.

In either case since we don't really have a finalized workflow for how this annotation is going to be used, I think it's probably better to modify as needed later.

ashcrow · 2019-02-01T15:32:38Z

/retest

cgwalters · 2019-02-01T16:37:07Z

/lgtm

openshift-ci-robot · 2019-02-01T16:37:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ashcrow,cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2019-02-01T17:16:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

jlebon · 2019-02-01T17:20:34Z

pkg/daemon/daemon.go

+			glog.Infof("Detected a new login session: %v", msg)
+			glog.Infof("Login access is discouraged! Applying annotation: %v", MachineConfigDaemonSSHAccessAnnotationKey)
+			if err := dn.nodeWriter.SetSSHAccessed(dn.kubeClient.CoreV1().Nodes(), dn.name); err != nil {
+				exitCh <- fmt.Errorf("Error: cannot apply annotation for SSH access due to: %v", err)


Note this will degrade the node if we can't set the annotation. Do we really want that? (Not rhetorical, actually asking. :)). I'm leaning more towards not making this a hard error and just logging it for now.

Good point, we can switch to a warning, but as far as I can tell all other failed nodewriter invocations also degrade the node. If we cannot write an annotation there is likely an issue with the node, and future updates, etc. would fail.

I don't think there is a specific case that an inability to write an annotation would come only from this?

cgwalters and others added 4 commits January 31, 2019 16:26

daemon: Add watcher for logind sessions

edb5b5b

Just add the basic skeleton.

daemon: Rework taints to use logind directly

188d4e9

This way nothing lives in the critical path of ssh to host; which admins will turn to (ideally only) when something is badly wrong. It's also just less code and keeps the functionality in one place.

daemon: Don't taint node, just print a warning

faeeb05

We still have to port the tests.

daemon: fix logind issues

dcf4004

Remove ssh taint file writing, as the file was used by sshd before and is no longer needed in the logind workflow. Also write errors to exitCh instead of logging. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>

openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 31, 2019

openshift-ci-robot requested review from crawford and jlebon January 31, 2019 21:35

ashcrow approved these changes Jan 31, 2019

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 31, 2019

runcom reviewed Feb 1, 2019

View reviewed changes

cgwalters mentioned this pull request Feb 1, 2019

daemon: Rework taints to use logind directly #291

Closed

cgwalters reviewed Feb 1, 2019

View reviewed changes

dep: remove unused fsnotify

6e13c72

Remove the package since we use logind to detect ssh access, and thus no longer need to watch files. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>

yuqi-zhang force-pushed the ssh-annotates-node branch from 4176673 to e330723 Compare February 1, 2019 14:50

openshift-ci-robot assigned cgwalters Feb 1, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2019

jlebon reviewed Feb 1, 2019

View reviewed changes

openshift-merge-robot merged commit 6c3e3e6 into openshift:master Feb 1, 2019

yuqi-zhang mentioned this pull request Feb 1, 2019

daemon: fix taint issues #290

Closed

runcom mentioned this pull request Feb 3, 2019

ssh: accessed annotation when creating a new MC #372

Closed

cgwalters mentioned this pull request Feb 5, 2019

Early ssh access detection #379

Closed

yuqi-zhang deleted the ssh-annotates-node branch August 26, 2019 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change SSH taint to annotation instead #359

Change SSH taint to annotation instead #359

yuqi-zhang commented Jan 31, 2019

ashcrow commented Jan 31, 2019

ashcrow left a comment

yuqi-zhang commented Jan 31, 2019

runcom Feb 1, 2019 •

edited

yuqi-zhang Feb 1, 2019

runcom Feb 1, 2019

cgwalters Feb 1, 2019

ashcrow Feb 1, 2019

yuqi-zhang Feb 1, 2019

cgwalters Feb 1, 2019

yuqi-zhang Feb 1, 2019

yuqi-zhang commented Feb 1, 2019

cgwalters commented Feb 1, 2019

cgwalters commented Feb 1, 2019

yuqi-zhang commented Feb 1, 2019

ashcrow commented Feb 1, 2019

yuqi-zhang commented Feb 1, 2019 •

edited

ashcrow commented Feb 1, 2019

cgwalters commented Feb 1, 2019

openshift-ci-robot commented Feb 1, 2019

openshift-bot commented Feb 1, 2019

jlebon Feb 1, 2019

yuqi-zhang Feb 1, 2019

		@@ -14,6 +14,8 @@

		MachineConfigDaemon is scheduled on the machines in a cluster as a DaemonSet. This daemon is responsible for performing machine updates in OpenShift 4. The update will include tasks related to the systemd units, files on disk, operating system upgrades etc. The MachineConfigDaemon updates a machine to configuration defined by MachineConfig as instructed by the MachineConfigController.

		The MachineConfigDaemon is also responsible for annotating a node with `machineconfiguration.openshift.io/rhcosSSH=accessed` when it detects an SSH access to the machine.


		## Annotating on SSH access

		RHCOS nodes in Openshift are not meant to be manually accessed via SSH. MCD uses logind to watch for login sessions, which, upon detection, warns the user and annotates the node with `machineconfiguration.openshift.io/rhcosSSH=accessed`. This in turn will be used to warn cluster admins.

Change SSH taint to annotation instead #359

Change SSH taint to annotation instead #359

Conversation

yuqi-zhang commented Jan 31, 2019

ashcrow commented Jan 31, 2019

ashcrow left a comment

Choose a reason for hiding this comment

yuqi-zhang commented Jan 31, 2019

runcom Feb 1, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang commented Feb 1, 2019

cgwalters commented Feb 1, 2019

cgwalters commented Feb 1, 2019

yuqi-zhang commented Feb 1, 2019

ashcrow commented Feb 1, 2019

yuqi-zhang commented Feb 1, 2019 • edited

ashcrow commented Feb 1, 2019

cgwalters commented Feb 1, 2019

openshift-ci-robot commented Feb 1, 2019

openshift-bot commented Feb 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

runcom Feb 1, 2019 •

edited

yuqi-zhang commented Feb 1, 2019 •

edited