New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Rework taints to use logind directly #291

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@cgwalters
Copy link
Contributor

cgwalters commented Jan 11, 2019

This way nothing lives in the critical path of ssh to host; which
admins will turn to (ideally only) when something is badly wrong.
It's also just less code and keeps the functionality in one place.

cgwalters added some commits Jan 11, 2019

daemon: Add watcher for logind sessions
Just add the basic skeleton.
daemon: Rework taints to use logind directly
This way nothing lives in the critical path of ssh to host; which
admins will turn to (ideally only) when something is badly wrong.
It's also just less code and keeps the functionality in one place.
@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Jan 11, 2019

I tested login sessions but not tainting yet.

/hold

@yuqi-zhang

This comment has been minimized.

Copy link
Contributor

yuqi-zhang commented Jan 11, 2019

I agree that this is likely a better path. However we still cannot apply taints at the moment as it would break e2e. I suggest for the time being, have this to write to motd to warn the user? (There is the problem of race condition between motd updates and the user seeing it upon login, though)

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Jan 11, 2019

(One minor point of trivia: this approach will also work if someone logs in on a "local" console; which you'd have to set a password via a MachineConfig to do. Think bare metal physical console, and most of the cloud providers offer a connection to a virtual console)

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Jan 11, 2019

However we still cannot apply taints at the moment as it would break e2e.

Ah. So no matter what this code has to change. EDIT: Done ⬇️

(There is the problem of race condition between motd updates and the user seeing it upon login, though)

Yeah. My vote is we change the default /etc/motd to say something like:

WARNING: Direct ssh access to machines is not recommended.
In a future version of Red Hat CoreOS, this will cause nodes to become
tainted.
daemon: Don't taint node, just print a warning
We still have to port the tests.

@openshift-ci-robot openshift-ci-robot added size/L and removed size/M labels Jan 11, 2019

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Jan 11, 2019

Tested this patch stack and as expected it just results in some logs:

I0111 22:06:28.752824   69224 daemon.go:284] Detected a new logged in session: &{:1.2 /org/freedesktop/login1 org.freedesktop.login1.Manager.SessionNew [1 /org/freedesktop/login1/session/_31]}
I0111 22:06:28.752871   69224 daemon.go:285] Login session taints node
I0111 22:06:28.752994   69224 daemon.go:453] NOTE: In a future version of RHCOS this will taint the node
@yuqi-zhang

This comment has been minimized.

Copy link
Contributor

yuqi-zhang commented Jan 11, 2019

/retest

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Jan 11, 2019

/hold cancel

Although we still aren't decided if this is the way to go.

@yuqi-zhang

This comment has been minimized.

Copy link
Contributor

yuqi-zhang commented Jan 11, 2019

/retest

@yuqi-zhang

This comment has been minimized.

Copy link
Contributor

yuqi-zhang commented Jan 11, 2019

/retest
Appears to be network issues

@jlebon
Copy link
Member

jlebon left a comment

Some minor suggestions, but looks sane to me overall. Will let others take a look.
/approve

}
if err := dn.applySSHTaint(); err != nil {
glog.Errorf("Error during taint: cannot apply taint due to: %v", err)
}

This comment has been minimized.

@jlebon

jlebon Jan 11, 2019

Member

We'll need to remember to write any errors here to the exitCh once we enable this. Might be worth adding a comment about that somewhere in there.

return
case msg := <-sessionNewCh:
glog.Infof("Detected a new logged in session: %v", msg)
glog.Infof("Login session taints node")

This comment has been minimized.

@jlebon

jlebon Jan 11, 2019

Member

Minor: do we need this line given the logging in applySSHTaint already?

@openshift-ci-robot

This comment has been minimized.

Copy link

openshift-ci-robot commented Jan 11, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuqi-zhang

This comment has been minimized.

Copy link
Contributor

yuqi-zhang commented Jan 11, 2019

The one behaviour that could be an issue is, if the user ssh's in before the logind (or mcd for that matter) is ready, he may see an error message in the motd but the taint won't actually be applied.

The original case of the file allows for us to check the pre-existence of a write once the mcd is ready. That's potentially an edge case for this route.

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Jan 12, 2019

/test e2e-aws

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Jan 14, 2019

/test e2e-aws

Failures look unrelated to this PR.

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Jan 16, 2019

/retest

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Jan 16, 2019

Terraform failure flakes

@openshift-ci-robot

This comment has been minimized.

Copy link

openshift-ci-robot commented Jan 16, 2019

@cgwalters: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws e7fc982 link /test e2e-aws
ci/prow/e2e-aws-op e7fc982 link /test e2e-aws-op

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment