Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/openshift-install/create: Drop addRouterCAToClusterCA #1541

Closed
wants to merge 1 commit into from

Conversation

wking
Copy link
Member

@wking wking commented Apr 5, 2019

This was added in 4033577 (#1242), where the motivation was:

With users created with an identity-provider, OAuth flow goes through router and router-ca is not trusted. This prohibits the user from using oc login from command line...

But the admin kubeconfig is already authenticated, so folks shouldn't be using oc login with that kubeconfig (more in dee6929, #1513). That leaves us with no use case for this modification, and making the finish code watch-only sets us up to rename away from the user-provided-infrastructure subcommand.

CC @abhinavdahiya, @sallyom

This was added in 4033577 (Append router CA to cluster CA in
kubeconfig, 2019-02-12, openshift#1242), where the motivation was [1]:

  With users created with an identity-provider, OAuth flow goes
  through router and router-ca is not trusted. This prohibits the user
  from using oc login from command line, without manually appending
  the router ca to the cluster ca. The openshift-ingress-operator
  creates the router-CA and this is not available until the very end
  of an install.

  This PR adds the router CA from configmap router-ca -n
  openshift-config-managed to the kubeconfig
  certificate-authority-data. This allows an identity-provided user to
  oc login from the terminal.

But the admin kubeconfig is already authenticated, so folks shouldn't
be using 'oc login' with that kubeconfig.  For example, see dee6929
(Modify kubeadmin usage message, admins should not use kubeadmin via
CLI, 2019-04-01, openshift#1513).  That leaves us with no use case for this
modification, and making the finish code watch-only sets us up to
rename away from the user-provided-infrastructure subcommand.

[1]: openshift#1242 (comment)
@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 5, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2019
@abhinavdahiya
Copy link
Contributor

If this passes CI. i'll :lgtm: :)

@wking
Copy link
Member Author

wking commented Apr 5, 2019

e2e-aws:

Apr  5 19:04:36.350: INFO: Error running &{/usr/bin/oc [oc login --config=/tmp/configfile589282405 --namespace=e2e-test-bootstrap-login-lbzlz -u kubeadmin] []   error: x509: certificate signed by unknown authority
...
Flaky tests:

[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV (ext3)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

The bootstrap user should successfully login with password decoded from kubeadmin secret [Suite:openshift/conformance/parallel]
[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

So we'll need to tweak the test suite to drop that check (and add another replacement check?). I'll take a look at the test suite (originally from openshift/origin#21733) and see if we can save it.

@wking
Copy link
Member Author

wking commented Apr 5, 2019

Talking this over with @enj and @ericavonb, there is concern that users will continue to attempt:

$ export KUBECONFIG=auth/kubeconfig
$ oc login ...

despite there being no point (you're already an admin) and us removing the docs around that in #1513. Ideally, somebody (the admin, an ACME request from the cluster, etc.) would be provisioning a router CA that was trusted by the cluster's expected user base (because that helps with web-console access, other users running oc login ... $URL without an existing kubeconfig, any other routes you're hosting, etc.), but getting that running is still a ways out. As a short-term fix, we're looking into wiring the OAuth listener up to the Kubernetes API domain (possibly on a new port). That way the admin kubeconfig, with it's installer-generated Kubernetes API-server bundle, will be able to connect without unknown-authority errors.

@enj
Copy link
Contributor

enj commented Apr 5, 2019

As a short-term fix, we're looking into wiring the OAuth listener up to the Kubernetes API domain (possibly on a new port).

The real reason for this is to get around some crazy http2 errors: https://bugzilla.redhat.com/show_bug.cgi?id=1686476#c16

@sdodson
Copy link
Member

sdodson commented Apr 8, 2019

I find the motivation to add the router CA to the kubeconfig dubious. It's an admin only kubeconfig that's already got a certificate granting system:admin rights. There's no need to oc login for anyone who has access to this file. Additionally, this doesn't actually provide any benefit to those who would need this to go through oc login flows, you're not going to manually extract the additional CA from the admin's kubeconfig and provide that to those wishing to login using oc login are you?

My suggestion would be that we clean up the wait-for commands to be pure watches for a condition and when wait-for cluster-ready completes we advise them to run a subsequent command which renders a CA bundle and anything else we deem necessary, like a stub kubeconfig that can be used for oc login

@sdodson
Copy link
Member

sdodson commented Apr 8, 2019

After discussing this with @enj he's clarified that the routerCA needs to be added to the kubeconfig only for purposes of making oc login work. However, this shouldn't be necessary and this is an artifact of the installer telling them to run oc login rather than making it clear that the kubeconfig generated has embedded certificates granting admin access without requiring login. There's also some concern that users will assume that's necessary even if we don't tell them to do so.

@derekwaynecarr can I get your thoughts on what we should be doing here as we try to clean up these commands so that they're purely waiting for a condition? Do we drop adding the routerCA to the admin kubeconfig or do we sacrifice the purity of the command and just silently add it?

If we continue adding it silently I think we should make sure that it's also added to assets/tls/ca.crt as well as this is a file that I would reasonably expect an admin to provide to end users needing to log into the cluster.

@enj
Copy link
Contributor

enj commented Apr 8, 2019

This is primarily a matter of UX. Users understand usernames and password. Users understand the concept of a root user. kube:admin is a user that works in all contexts, is there by default, has cluster admin rights and requires a straightforward login flow to use. At the end of an install, I would expect a user with no experience with OpenShift to run oc login -u kubeadmin as that is the obvious thing to do.

wking added a commit to wking/openshift-installer that referenced this pull request Apr 8, 2019
There is no hard line between installer- and user-provided
infrastructure.  Rename these commands to focus on what they'll do
instead of the work-flow into which we expect them to fit.

We're still working out how we can drop the router-CA injection to
avoid 'wait-for cluster-ready' surprising users my editing their
on-disk kubeconfig [1].  But that's mitigated somewhat by the fact
that addRouterCAToClusterCA is idempotent, because AppendCertsFromPEM
wraps AddCert [2] and AddCert checks to avoid duplicate certificates
[3].

[1]: openshift#1541
[2]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L144
[3]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L106-L109
@wking
Copy link
Member Author

wking commented Apr 8, 2019

At the end of an install, I would expect a user with no experience with OpenShift to run oc login -u kubeadmin as that is the obvious thing to do.

And also to export KUBECONFIG=auth/kubeconfig? Can we add a warning to the line where we log the KUBECONFIG suggestion? Something like:

INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/path/to/installer/auth/kubeconfig'. There is no need to 'oc login' with this kubeconfig.

wking added a commit to wking/openshift-installer that referenced this pull request Apr 8, 2019
There is no hard line between installer- and user-provided
infrastructure.  Rename these commands to focus on what they'll do
instead of the work-flow into which we expect them to fit.

We're still working out how we can drop the router-CA injection to
avoid 'wait-for cluster-ready' surprising users my editing their
on-disk kubeconfig [1].  But that's mitigated somewhat by the fact
that addRouterCAToClusterCA is idempotent, because AppendCertsFromPEM
wraps AddCert [2] and AddCert checks to avoid duplicate certificates
[3].

[1]: openshift#1541
[2]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L144
[3]: https://github.com/golang/go/blob/go1.12/src/crypto/x509/cert_pool.go#L106-L109
@wking
Copy link
Member Author

wking commented Apr 9, 2019

/hold

So we're now ok waiting on more easily configurable router CAs and having wait-for cluster-ready (from #1552) continue to touch the admin kubeconfig to inject the router CA in the meantime. I'll leave this PR open with the hold, and will resuscitate it once the router no longer relies on a self-signed cert that users are unlikely to trust out of the box.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-upgrade cf174c4 link /test e2e-aws-upgrade
ci/prow/verify-vendor cf174c4 link /test verify-vendor
ci/prow/e2e-aws cf174c4 link /test e2e-aws
ci/prow/e2e-aws-disruptive cf174c4 link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@abhinavdahiya
Copy link
Contributor

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

In response to this:

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/openshift-installer that referenced this pull request Oct 27, 2021
The console may become optional [1], so teach the installer to handle
its absence gracefully.

We've waited on the console since way back in ff53523 (add logs at
end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806).  Back
then, install-complete timing was much less organized, and since
e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25,

1. Console route admission blocks console operator from going
   Available=True in its ClusterOperator.
2. Console ClusterOperator blocks cluster-version operator from
   going Available=True in ClusterVersion.
3. ClusterVersion blocks installer's waitForInitializedCluster.

So we no longer need to wait for the route to show up, and can fail
fast if we get a clear IsNotFound.  I'm keeping a bit of polling so we
don't fail an install on a temporary network hiccup.

We don't want to drop the console check entirely, because when it is
found, we want:

* To continue to log that access pathway on install-complete.
* To continue to append the router CA to the kubeconfig.

That latter point has been done since 4033577 (append router CA to
cluster CA in kubeconfig, 2019-02-12, openshift#1242).  The motication in that
commit message is not explicit, but the idea is to support folks who
naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that
kubeconfig already having cluster-root access) when the console
route's cert's CA happens to be something that the user's local trust
store doesn't include by default.

[1]: openshift/enhancements#922
[2]: openshift#1541 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Oct 27, 2021
The console may become optional [1], so teach the installer to handle
its absence gracefully.

We've waited on the console since way back in ff53523 (add logs at
end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806).  Back
then, install-complete timing was much less organized, and since
e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132)
we've blocked on ClusterVersion going Available=True. So the current
dependency chain is:

1. Console route admission blocks console operator from going
   Available=True in its ClusterOperator.
2. Console ClusterOperator blocks cluster-version operator from
   going Available=True in ClusterVersion.
3. ClusterVersion blocks installer's waitForInitializedCluster.

So we no longer need to wait for the route to show up, and can fail
fast if we get a clear IsNotFound.  I'm keeping a bit of polling so we
don't fail an install on a temporary network hiccup.

We don't want to drop the console check entirely, because when it is
found, we want:

* To continue to log that access pathway on install-complete.
* To continue to append the router CA to the kubeconfig.

That latter point has been done since 4033577 (append router CA to
cluster CA in kubeconfig, 2019-02-12, openshift#1242).  The motication in that
commit message is not explicit, but the idea is to support folks who
naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that
kubeconfig already having cluster-root access) when the console
route's cert's CA happens to be something that the user's local trust
store doesn't include by default.

[1]: openshift/enhancements#922
[2]: openshift#1541 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Aug 1, 2022
The console may become optional [1], so teach the installer to handle
its absence gracefully.

We've waited on the console since way back in ff53523 (add logs at
end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806).  Back
then, install-complete timing was much less organized, and since
e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132)
we've blocked on ClusterVersion going Available=True. So the current
dependency chain is:

1. Console route admission blocks console operator from going
   Available=True in its ClusterOperator.
2. Console ClusterOperator blocks cluster-version operator from
   going Available=True in ClusterVersion.
3. ClusterVersion blocks installer's waitForInitializedCluster.

So we no longer need to wait for the route to show up, and can fail
fast if we get a clear IsNotFound.  I'm keeping a bit of polling so we
don't fail an install on a temporary network hiccup.

We don't want to drop the console check entirely, because when it is
found, we want:

* To continue to log that access pathway on install-complete.
* To continue to append the router CA to the kubeconfig.

That latter point has been done since 4033577 (append router CA to
cluster CA in kubeconfig, 2019-02-12, openshift#1242).  The motivation in that
commit message is not explicit, but the idea is to support folks who
naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that
kubeconfig already having cluster-root access) when the console
route's cert's CA happens to be something that the user's local trust
store doesn't include by default.

[1]: openshift/enhancements#922
[2]: openshift#1541 (comment)
TrilokGeer pushed a commit to TrilokGeer/installer that referenced this pull request Sep 14, 2022
The console may become optional [1], so teach the installer to handle
its absence gracefully.

We've waited on the console since way back in ff53523 (add logs at
end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806).  Back
then, install-complete timing was much less organized, and since
e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132)
we've blocked on ClusterVersion going Available=True. So the current
dependency chain is:

1. Console route admission blocks console operator from going
   Available=True in its ClusterOperator.
2. Console ClusterOperator blocks cluster-version operator from
   going Available=True in ClusterVersion.
3. ClusterVersion blocks installer's waitForInitializedCluster.

So we no longer need to wait for the route to show up, and can fail
fast if we get a clear IsNotFound.  I'm keeping a bit of polling so we
don't fail an install on a temporary network hiccup.

We don't want to drop the console check entirely, because when it is
found, we want:

* To continue to log that access pathway on install-complete.
* To continue to append the router CA to the kubeconfig.

That latter point has been done since 4033577 (append router CA to
cluster CA in kubeconfig, 2019-02-12, openshift#1242).  The motivation in that
commit message is not explicit, but the idea is to support folks who
naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that
kubeconfig already having cluster-root access) when the console
route's cert's CA happens to be something that the user's local trust
store doesn't include by default.

[1]: openshift/enhancements#922
[2]: openshift#1541 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants