Add status handling for RouteHealth#334
Add status handling for RouteHealth#334benjaminapetersen wants to merge 5 commits intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benjaminapetersen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d06d691 to
16d894e
Compare
|
Revised per @spadgett comment about ordering. Def a good point, we want to be able to check health even if other resources |
|
/retest Variety of flakes |
jhadvig
left a comment
There was a problem hiding this comment.
Couple of comments, otherwise the changes looks good 👍
|
Interestingly, test failure: |
|
/retest |
fdf5aa2 to
0466b2b
Compare
|
/retest I want that status :) |
|
/retest
|
0466b2b to
6e26a31
Compare
- GET configmap router-ca if possible and load CA for testing console route health - load serviceaccount ca as fallback for testing route health - report various potential route health errors Add CA to solve x509 errors when checking route health
6e26a31 to
dcf0bf4
Compare
| } | ||
| client := clientWithCA(caPool) | ||
|
|
||
| // TODO: deal with an environment with a MitM proxy? |
There was a problem hiding this comment.
@stlaz got the CA cert from router-ca, and a fallback to use the local serviceaccount/ca from the operator, if the router-ca is unavailable. You mentioned a possible proxy MitM issue I should also take into account. Care to elaborate? Thanks!
| routerCA, rcaErr := co.configMapClient.ConfigMaps(api.OpenShiftConsoleNamespace).Get(api.RouterCAConfigMapName, metav1.GetOptions{}) | ||
|
|
||
| if rcaErr != nil && apierrors.IsNotFound(rcaErr) { | ||
| // using CA ca-bundle.crt from configmap router-ca from openshift-config-managed |
There was a problem hiding this comment.
On error, this does not log any cert data, simply logs path (at high enough -v) of attempt to read cert (also, if we fail, the cert was not read, anyway). We want to know exactly what went wrong, if it goes wrong, I think, without leaking anything.
|
No longer getting this issue with the CA wired up: Flow now may report: then: until finally resolved when the endpoint responds: |
| } | ||
|
|
||
| func (co *consoleOperator) getCA() (*x509.CertPool, error) { | ||
| caCertPool := x509.NewCertPool() |
There was a problem hiding this comment.
Possibly i should update to:
// start with system, fallback to new?
rootCAs, _ := x509.SystemCertPool()
if rootCAs == nil {
rootCAs = x509.NewCertPool()
}| func clientWithCA(caPool *x509.CertPool) *http.Client { | ||
| return &http.Client{ | ||
| Timeout: 5 * time.Second, | ||
| Transport: &http.Transport{ |
There was a problem hiding this comment.
Possibly should use:
transport := http.DefaultTransport()There was a problem hiding this comment.
You're using your own roots whereas the above transport only uses the default system trust bundle, so you will want to use your own transport.
|
Also needs #334 to eliminate the |
|
@benjaminapetersen: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
| } | ||
| } | ||
|
|
||
| func appendProxyCA(caCertPool *x509.CertPool, systemCABundle []byte) *x509.CertPool { |
There was a problem hiding this comment.
If proxy CA is configured, it should already appear in the system CA bundle - the location you're copying it to is where golang looks when you call x509.SystemCertPool()
There was a problem hiding this comment.
ah, nvm, it was just the name of the function that confused me, I see what you did there
| } | ||
|
|
||
| func (co *consoleOperator) getCA() (*x509.CertPool, error) { | ||
| // TODO: should I update to this? start with the SystemCertPool? |
There was a problem hiding this comment.
yes, this looks better IMO, and SystemCertPool() should be fresh since you're doing --terminate-on-files when starting your operator
|
|
||
| // fallback to our local ca in from our serviceaccount | ||
| // if we hit this path, are we likely to get self signed certs errors? | ||
| serviceAccountCAbytes, err := ioutil.ReadFile(api.OAuthEndpointCAFilePath) |
There was a problem hiding this comment.
This CA bundle actually does contain the router-ca, so you might just stick with it instead of getting the CA from the config map above.
| // that are missing from the system trust roots | ||
| rootCAs.AppendCertsFromPEM(config.CAData) | ||
|
|
||
| // TODO: could get router-ca ConfigMap we sync'd from openshift-config-managed |
There was a problem hiding this comment.
Not really, you're using a passthrough route to access the /metrics endpoint from an external endpoint, but the cert that's used for serving directly at the pod which terminates the TLS connection is not valid for the hostname of the route (and is actually signed by service-ca AFAIK).
So even if you had a valid CA that signed the server cert here, the cert verification would fail because of the hostname. Yet you need passthrough route to be able to use client cert auth. Another solution would be to grab a token for a service account that's allowed to access this endpoint, in which case the passthrough aspect of the route would not be necessary.
|
CLosing in favor of #399 |
Most of the "Console isn't happy" questions we get stem from the console route being unhealthy at this point. This adds a
RouteHealthDegradedcondition, with several checks to see if the route is in a good place.It is similar to the check done by the authentication operator, which seems to have made debugging much easier:
https://github.com/openshift/cluster-authentication-operator/blob/e7d5e3d45f5188e9c0631e93552e5b7ac48a06e4/pkg/operator2/operator.go#L445
/assign @spadgett @jhadvig