cmd/openshift-install/create: Drop addRouterCAToClusterCA #1541

wking · 2019-04-05T17:51:35Z

This was added in 4033577 (#1242), where the motivation was:

With users created with an identity-provider, OAuth flow goes through router and router-ca is not trusted. This prohibits the user from using oc login from command line...

But the admin kubeconfig is already authenticated, so folks shouldn't be using oc login with that kubeconfig (more in dee6929, #1513). That leaves us with no use case for this modification, and making the finish code watch-only sets us up to rename away from the user-provided-infrastructure subcommand.

CC @abhinavdahiya, @sallyom

This was added in 4033577 (Append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242), where the motivation was [1]: With users created with an identity-provider, OAuth flow goes through router and router-ca is not trusted. This prohibits the user from using oc login from command line, without manually appending the router ca to the cluster ca. The openshift-ingress-operator creates the router-CA and this is not available until the very end of an install. This PR adds the router CA from configmap router-ca -n openshift-config-managed to the kubeconfig certificate-authority-data. This allows an identity-provided user to oc login from the terminal. But the admin kubeconfig is already authenticated, so folks shouldn't be using 'oc login' with that kubeconfig. For example, see dee6929 (Modify kubeadmin usage message, admins should not use kubeadmin via CLI, 2019-04-01, openshift#1513). That leaves us with no use case for this modification, and making the finish code watch-only sets us up to rename away from the user-provided-infrastructure subcommand. [1]: openshift#1242 (comment)

openshift-ci-robot · 2019-04-05T17:54:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abhinavdahiya · 2019-04-05T18:12:17Z

If this passes CI. i'll :lgtm: :)

wking · 2019-04-05T20:11:16Z

e2e-aws:

Apr  5 19:04:36.350: INFO: Error running &{/usr/bin/oc [oc login --config=/tmp/configfile589282405 --namespace=e2e-test-bootstrap-login-lbzlz -u kubeadmin] []   error: x509: certificate signed by unknown authority
...
Flaky tests:

[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV (ext3)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

The bootstrap user should successfully login with password decoded from kubeadmin secret [Suite:openshift/conformance/parallel]
[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

So we'll need to tweak the test suite to drop that check (and add another replacement check?). I'll take a look at the test suite (originally from openshift/origin#21733) and see if we can save it.

wking · 2019-04-05T20:40:51Z

Talking this over with @enj and @ericavonb, there is concern that users will continue to attempt:

$ export KUBECONFIG=auth/kubeconfig
$ oc login ...

despite there being no point (you're already an admin) and us removing the docs around that in #1513. Ideally, somebody (the admin, an ACME request from the cluster, etc.) would be provisioning a router CA that was trusted by the cluster's expected user base (because that helps with web-console access, other users running oc login ... $URL without an existing kubeconfig, any other routes you're hosting, etc.), but getting that running is still a ways out. As a short-term fix, we're looking into wiring the OAuth listener up to the Kubernetes API domain (possibly on a new port). That way the admin kubeconfig, with it's installer-generated Kubernetes API-server bundle, will be able to connect without unknown-authority errors.

enj · 2019-04-05T20:43:25Z

As a short-term fix, we're looking into wiring the OAuth listener up to the Kubernetes API domain (possibly on a new port).

The real reason for this is to get around some crazy http2 errors: https://bugzilla.redhat.com/show_bug.cgi?id=1686476#c16

sdodson · 2019-04-08T12:37:20Z

I find the motivation to add the router CA to the kubeconfig dubious. It's an admin only kubeconfig that's already got a certificate granting system:admin rights. There's no need to oc login for anyone who has access to this file. Additionally, this doesn't actually provide any benefit to those who would need this to go through oc login flows, you're not going to manually extract the additional CA from the admin's kubeconfig and provide that to those wishing to login using oc login are you?

My suggestion would be that we clean up the wait-for commands to be pure watches for a condition and when wait-for cluster-ready completes we advise them to run a subsequent command which renders a CA bundle and anything else we deem necessary, like a stub kubeconfig that can be used for oc login

sdodson · 2019-04-08T13:29:52Z

After discussing this with @enj he's clarified that the routerCA needs to be added to the kubeconfig only for purposes of making oc login work. However, this shouldn't be necessary and this is an artifact of the installer telling them to run oc login rather than making it clear that the kubeconfig generated has embedded certificates granting admin access without requiring login. There's also some concern that users will assume that's necessary even if we don't tell them to do so.

@derekwaynecarr can I get your thoughts on what we should be doing here as we try to clean up these commands so that they're purely waiting for a condition? Do we drop adding the routerCA to the admin kubeconfig or do we sacrifice the purity of the command and just silently add it?

If we continue adding it silently I think we should make sure that it's also added to assets/tls/ca.crt as well as this is a file that I would reasonably expect an admin to provide to end users needing to log into the cluster.

enj · 2019-04-08T16:07:31Z

This is primarily a matter of UX. Users understand usernames and password. Users understand the concept of a root user. kube:admin is a user that works in all contexts, is there by default, has cluster admin rights and requires a straightforward login flow to use. At the end of an install, I would expect a user with no experience with OpenShift to run oc login -u kubeadmin as that is the obvious thing to do.

There is no hard line between installer- and user-provided infrastructure. Rename these commands to focus on what they'll do instead of the work-flow into which we expect them to fit. We're still working out how we can drop the router-CA injection to avoid 'wait-for cluster-ready' surprising users my editing their on-disk kubeconfig [1]. But that's mitigated somewhat by the fact that addRouterCAToClusterCA is idempotent, because AppendCertsFromPEM wraps AddCert [2] and AddCert checks to avoid duplicate certificates [3]. [1]: openshift#1541 [2]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L144 [3]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L106-L109

wking · 2019-04-08T17:53:22Z

At the end of an install, I would expect a user with no experience with OpenShift to run oc login -u kubeadmin as that is the obvious thing to do.

And also to export KUBECONFIG=auth/kubeconfig? Can we add a warning to the line where we log the KUBECONFIG suggestion? Something like:

INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/path/to/installer/auth/kubeconfig'. There is no need to 'oc login' with this kubeconfig.

There is no hard line between installer- and user-provided infrastructure. Rename these commands to focus on what they'll do instead of the work-flow into which we expect them to fit. We're still working out how we can drop the router-CA injection to avoid 'wait-for cluster-ready' surprising users my editing their on-disk kubeconfig [1]. But that's mitigated somewhat by the fact that addRouterCAToClusterCA is idempotent, because AppendCertsFromPEM wraps AddCert [2] and AddCert checks to avoid duplicate certificates [3]. [1]: openshift#1541 [2]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L144 [3]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L106-L109

wking · 2019-04-09T17:24:42Z

/hold

So we're now ok waiting on more easily configurable router CAs and having wait-for cluster-ready (from #1552) continue to touch the admin kubeconfig to inject the router CA in the meantime. I'll leave this PR open with the hold, and will resuscitate it once the router no longer relies on a self-signed cert that users are unlikely to trust out of the box.

openshift-ci-robot · 2019-04-15T02:29:10Z

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2019-10-04T14:50:36Z

@wking: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-upgrade	`cf174c4`	link	`/test e2e-aws-upgrade`
ci/prow/verify-vendor	`cf174c4`	link	`/test verify-vendor`
ci/prow/e2e-aws	`cf174c4`	link	`/test e2e-aws`
ci/prow/e2e-aws-disruptive	`cf174c4`	link	`/test e2e-aws-disruptive`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

abhinavdahiya · 2019-10-11T21:00:34Z

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

openshift-ci-robot · 2019-10-11T21:00:36Z

@abhinavdahiya: Closed this PR.

In response to this:

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motication in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)

The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132) we've blocked on ClusterVersion going Available=True. So the current dependency chain is: 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motication in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)

The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132) we've blocked on ClusterVersion going Available=True. So the current dependency chain is: 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motivation in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 5, 2019

openshift-ci-robot requested review from abhinavdahiya and hardys April 5, 2019 17:53

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2019

wking mentioned this pull request Apr 8, 2019

cmd/openshift-install/waitfor: Rename from user-provided-infrastructure #1552

Merged

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2019

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2019

openshift-ci-robot closed this Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/openshift-install/create: Drop addRouterCAToClusterCA #1541

cmd/openshift-install/create: Drop addRouterCAToClusterCA #1541

wking commented Apr 5, 2019

openshift-ci-robot commented Apr 5, 2019

abhinavdahiya commented Apr 5, 2019

wking commented Apr 5, 2019

wking commented Apr 5, 2019

enj commented Apr 5, 2019

sdodson commented Apr 8, 2019 •

edited

Loading

sdodson commented Apr 8, 2019

enj commented Apr 8, 2019

wking commented Apr 8, 2019

wking commented Apr 9, 2019

openshift-ci-robot commented Apr 15, 2019

openshift-ci-robot commented Oct 4, 2019

abhinavdahiya commented Oct 11, 2019

openshift-ci-robot commented Oct 11, 2019

cmd/openshift-install/create: Drop addRouterCAToClusterCA #1541

cmd/openshift-install/create: Drop addRouterCAToClusterCA #1541

Conversation

wking commented Apr 5, 2019

openshift-ci-robot commented Apr 5, 2019

abhinavdahiya commented Apr 5, 2019

wking commented Apr 5, 2019

wking commented Apr 5, 2019

enj commented Apr 5, 2019

sdodson commented Apr 8, 2019 • edited Loading

sdodson commented Apr 8, 2019

enj commented Apr 8, 2019

wking commented Apr 8, 2019

wking commented Apr 9, 2019

openshift-ci-robot commented Apr 15, 2019

openshift-ci-robot commented Oct 4, 2019

abhinavdahiya commented Oct 11, 2019

openshift-ci-robot commented Oct 11, 2019

sdodson commented Apr 8, 2019 •

edited

Loading