Update console for monitoring changes #909

spadgett · 2018-12-06T14:08:53Z

Proxy to port 9092, which has the tenancy proxy in front of it
Remove the CAN_LIST_NS check since users can now see metrics in their own namespaces

https://jira.coreos.com/browse/CONSOLE-1035

/hold

spadgett · 2018-12-06T14:10:34Z

Currently blocked because /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt is not there, and the console uses that ca to proxy to the prometheus service.

sh-4.2$ cd /var/run/secrets/kubernetes.io/serviceaccount/
sh-4.2$ ls
ca.crt  namespace  token

brancz · 2018-12-06T14:32:01Z

The serving certs controller works a bit differently now. The service-ca.crt is not mounted into the serviceaccount secrets anymore, you need to create a configmap that has a special annotation, the configmap will then be populated with the ca.crt automatically by the serving certs controller.

spadgett · 2018-12-06T15:02:51Z

https://github.com/openshift/service-serving-cert-signer#openshift-service-serving-cert-signer-operator

@benjaminapetersen fyi, this affects the operator

spadgett · 2018-12-10T19:48:39Z

@brancz I'm having trouble connecting to 9092 from inside the pod. Any ideas? 9091 is fine. This is a 4.0 install.

sh-4.2$ curl -k https://prometheus-k8s.openshift-monitoring.svc:9092
curl: (7) Failed connect to prometheus-k8s.openshift-monitoring.svc:9092; No route to host

s-urbaniak · 2018-12-12T09:36:42Z

@spadgett the error No route to host implies that there is a problem with DNS in your pod. Can you please verify that you can reach other services in the pod?

spadgett · 2018-12-12T14:06:46Z

@s-urbaniak I thought so, too, but I can reach the same service on port 9091. Just not 9092. (The service does have a port 9092 defined.)

s-urbaniak · 2018-12-12T14:33:28Z

@spadgett you might be right 🤔 When I look at the service I see it has a tenancy target port: https://github.com/openshift/cluster-monitoring-operator/blob/1322e56e961511994a4a1a5ef55152d3b389575c/assets/prometheus-k8s/service.yaml#L17

But that port is not declared in the container: https://github.com/openshift/cluster-monitoring-operator/blob/1322e56e961511994a4a1a5ef55152d3b389575c/assets/prometheus-k8s/prometheus.yaml#L67-L77

cc @metalmatze for verification

s-urbaniak · 2018-12-12T14:40:24Z

@spadgett good catch, we verified it's the missing port name, submitting a fix to the cluster monitoring operator as we speak.

s-urbaniak · 2018-12-12T15:16:13Z

@spadgett once openshift/cluster-monitoring-operator#183 is merged and the images are rebuilt, you can retry :-)

spadgett · 2018-12-12T20:00:43Z

@s-urbaniak great, thank you!

spadgett · 2018-12-17T17:59:10Z

I'm still struggling to get port 9092 working. 9091 seems to work. This is from inside the console pod.

sh-4.2$ curl -k -H 'Authorization: Bearer <redacted>' 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=count(increase(kube_pod_container_status_restarts_total%7Bnamespace%3D%22openshift-console%22%7D%5B1h%5D)%20%3E%205%20)'
{"status":"success","data":{"resultType":"vector","result":[]}}
sh-4.2$ curl -k -H 'Authorization: Bearer <redacted>' 'https://prometheus-k8s.openshift-monitoring.svc:9092/api/v1/query?query=count(increase(kube_pod_container_status_restarts_total%7Bnamespace%3D%22openshift-console%22%7D%5B1h%5D)%20%3E%205%20)'
Bad Request. The request or configuration is malformed.

The decoded query here is

count(increase(kube_pod_container_status_restarts_total{namespace="openshift-console"}[1h]) > 5 )

spadgett · 2018-12-17T19:48:54Z

I pulled out the service-ca.crt changes into a separate PR, #960. It will require console operator changes as well to pass the service-ca file path on startup.

metalmatze · 2018-12-18T10:55:09Z

It seems like the request still can't get through the kube-rbac-proxy:

https://github.com/brancz/kube-rbac-proxy/blob/7a8722d50ffc5928ca0d21091040c6758244dd6c/pkg/proxy/proxy.go#L79

I would like to help you debug this. Is there any easy way to work on this? Is a normal OpenShift 4 cluster enough? Do I need a patched version?

spadgett · 2018-12-20T03:33:45Z

Sorry, I meant to get back to you on this.

This is a normal OpenShift 4.0 cluster installed from the 0.7.0 installer. I'm logged in as the kube:admin user. Let me know if there's anything I can do to help debug.

s-urbaniak · 2019-01-02T10:41:48Z

@spadgett @metalmatze I gave this a debug session and found that the only missing piece was the namespace URL parameter.

Given the kube-rbac proxy configuration in openshift (check with kubectl -n openshift-monitoring get secret kube-rbac-proxy -o jsonpath='{.data.config\.yaml}' | base64 -d; echo), it expects a namespace URL query parameter, according to https://github.com/brancz/kube-rbac-proxy/blob/69cfb74e7e3b373602d2295d6175bcccd48da85c/pkg/proxy/proxy.go#L141.

If I take your request from above and just modify it slightly, adding a namespace=openshift-console URL query parameter, it now works as expected:

curl -k -H "Authorization: Bearer <REDACTED>" "https://prometheus-k8s.openshift-monitoring.svc:9092/api/v1/query?namespace=openshift-console&query=count(increase(kube_pod_container_status_restarts_total%7Bnamespace%3D%22openshift-console%22%7D%5B1h%5D)%20%3E%205%20)"; echo
{"status":"success","data":{"resultType":"vector","result":[]}}

metalmatze · 2019-01-03T12:15:20Z

Thank you for looking into it!
I guess that the console backend needs to add the namespace query parameter to its requests. It should be enforced there and not in the frontend.

spadgett · 2019-01-03T17:58:29Z

@s-urbaniak Thanks, I see now. I didn't realize namespace was a separate parameter even if specified in query. I'll make the updates.

metalmatze · 2019-01-03T18:07:06Z

No worries. That additional separate namespace parameter is what is used to enforce the namespaces inside the query paramter's query that will be executed against Prometheus. the label proxy will override all {namespace="foobar"} with the given namespace from that query parameter. Therefore it has to be enforced in the backend. ☺️

spadgett · 2019-01-03T19:07:09Z

For queries that should NOT have a namespace, do we need to use service port 9091 instead? For example:

https://github.com/openshift/console/blob/master/frontend/public/components/cluster-overview.jsx#L49

brancz · 2019-01-07T10:53:58Z

yes, for "cluster wide" metrics the existing port should continue to be used

spadgett · 2019-01-07T18:28:23Z

I'm seeing certificate errors connecting to port 9092 using service-ca.crt. It works OK for 9091.

2019/01/7 18:26:58 http: proxy error: x509: certificate is not valid for any names, but wanted to match prometheus-k8s.openshift-monitoring.svc

sh-4.2$ curl --cacert /var/service-ca/service-ca.crt https://prometheus-k8s.openshift-monitoring:9092/
curl: (60) Certificate type not approved for application.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

spadgett · 2019-01-07T18:43:57Z

Sorry, wrong hostname above, but I still see the error using the correct host. (No error using port 9091.)

sh-4.2$ curl --cacert /var/service-ca/service-ca.crt https://prometheus-k8s.openshift-monitoring.svc:9092/
curl: (60) Certificate type not approved for application.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

spadgett · 2019-01-07T19:15:48Z

Yeah, I don't see the kube-rbac-proxy container doing anything with the service serving certificate. I opened issue MON-511.

brancz · 2019-01-08T11:54:36Z

Yes you're right, we'll try and fix that as soon as we can! Sorry for the inconvenience.

spadgett · 2019-01-16T17:50:08Z

The proxy is working for me now 👍

spadgett · 2019-01-16T17:58:18Z

/hold

spadgett · 2019-01-17T17:52:18Z

jenkins rebuild

spadgett · 2019-01-17T19:12:59Z

jenkins rebuild

spadgett · 2019-01-25T20:28:38Z

/assign @kyoto
/hold cancel

spadgett · 2019-01-25T23:09:19Z

Stale element flake

/retest

1) Interacting with the etcd OCS : displays metadata about the created `EtcdBackup` in its "Overview" section
   StaleElementReferenceError: stale element reference: element is not attached to the page document

spadgett · 2019-01-26T00:14:17Z

OLM test flake

/retest

1) Interacting with the etcd OCS : creates etcd Operator `Deployment`
   Error: Timeout - Async callback was not invoked within timeout specified by jasmine.DEFAULT_TIMEOUT_INTERVAL.

spadgett · 2019-01-26T14:06:51Z

Tests are green, and I've been able to validate metrics work for a normal user by patching kubeapiserveroperatorconfig (thanks @kyoto for the tip)

kyoto · 2019-01-27T06:17:17Z

frontend/public/components/namespace.jsx

-      {
-        name: 'Used',
+    <Line title="Memory Usage" namespace={ns.metadata.name} query={[
+      { name: 'Used',


Nit: Was this newline accidentally removed?

Thanks, fixed

kyoto · 2019-01-27T06:32:31Z

frontend/public/components/graphs/index.jsx

-  // In OpenShift, the user must be able to list namespaces to query Prometheus.
-  return canListNS;
-};
+const canAccessPrometheus = (prometheusFlag) => prometheusFlag && !!prometheusBasePath && !!prometheusTenancyBasePath;


Just to confirm, this will prevent all requirePrometheus() wrapped components from rendering unless both prometheusBasePath and prometheusTenancyBasePath are set. Is that what we want?

Yeah, I was trying to keep the client logic simple. Currently, the console serve sets both values together, so if one is set the other will always be, too. We assume the RBAC proxy will always be there in OpenShift 4.0. Let me know if that's not the case.

kyoto · 2019-01-27T06:35:05Z

@spadgett LGTM apart from one nit and one question.

* Proxy to port 9092, which has the tenancy proxy in front of it * Remove the `CAN_LIST_NS` check since users can now see metrics in their own namespaces https://jira.coreos.com/browse/CONSOLE-1035

brancz · 2019-01-28T08:36:09Z

The rbac proxy is always there in 4.0. It's part of the Prometheus pod that also answers the non-tenancy requests, so this sounds good to me.

kyoto · 2019-01-28T08:49:39Z

/lgtm

brancz mentioned this pull request Jan 8, 2019

prometheus: Use TLS certs for tenant proxy openshift/cluster-monitoring-operator#192

Merged

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2019

spadgett force-pushed the update-monitoring-proxy branch from 8decd18 to 84506a5 Compare January 16, 2019 17:45

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2019

spadgett force-pushed the update-monitoring-proxy branch from 84506a5 to e56bac7 Compare January 21, 2019 20:08

spadgett mentioned this pull request Jan 22, 2019

Monitoring: Fix hiding sidebar links for pages the user cannot access #1117

Merged

spadgett force-pushed the update-monitoring-proxy branch from e56bac7 to cc36ec4 Compare January 25, 2019 20:28

spadgett changed the title ~~[WIP] Update console for monitoring changes~~ Update console for monitoring changes Jan 25, 2019

openshift-ci-robot assigned kyoto Jan 25, 2019

openshift-ci-robot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 25, 2019

kyoto reviewed Jan 27, 2019

View reviewed changes

Update console for monitoring changes

7ccd37e

* Proxy to port 9092, which has the tenancy proxy in front of it * Remove the `CAN_LIST_NS` check since users can now see metrics in their own namespaces https://jira.coreos.com/browse/CONSOLE-1035

spadgett force-pushed the update-monitoring-proxy branch from cc36ec4 to 7ccd37e Compare January 27, 2019 13:45

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2019

openshift-merge-robot merged commit 21d0a9d into openshift:master Jan 28, 2019

spadgett deleted the update-monitoring-proxy branch January 28, 2019 12:14

Update console for monitoring changes #909

Update console for monitoring changes #909

Conversation

spadgett commented Dec 6, 2018 • edited Loading

spadgett commented Dec 6, 2018

brancz commented Dec 6, 2018

spadgett commented Dec 6, 2018 • edited Loading

spadgett commented Dec 10, 2018 • edited Loading

s-urbaniak commented Dec 12, 2018

spadgett commented Dec 12, 2018

s-urbaniak commented Dec 12, 2018

s-urbaniak commented Dec 12, 2018

s-urbaniak commented Dec 12, 2018

spadgett commented Dec 12, 2018

spadgett commented Dec 17, 2018

spadgett commented Dec 17, 2018

metalmatze commented Dec 18, 2018

spadgett commented Dec 20, 2018

s-urbaniak commented Jan 2, 2019

metalmatze commented Jan 3, 2019

spadgett commented Jan 3, 2019

metalmatze commented Jan 3, 2019

spadgett commented Jan 3, 2019

brancz commented Jan 7, 2019

spadgett commented Jan 7, 2019

spadgett commented Jan 7, 2019

spadgett commented Jan 7, 2019

brancz commented Jan 8, 2019 • edited Loading

spadgett commented Jan 16, 2019

spadgett commented Jan 16, 2019

spadgett commented Jan 17, 2019

spadgett commented Jan 17, 2019

spadgett commented Jan 25, 2019

spadgett commented Jan 25, 2019

spadgett commented Jan 26, 2019

spadgett commented Jan 26, 2019

kyoto Jan 27, 2019

Choose a reason for hiding this comment

spadgett Jan 27, 2019

Choose a reason for hiding this comment

kyoto Jan 27, 2019

Choose a reason for hiding this comment

spadgett Jan 27, 2019 • edited Loading

Choose a reason for hiding this comment

kyoto commented Jan 27, 2019

brancz commented Jan 28, 2019

kyoto commented Jan 28, 2019

spadgett commented Dec 6, 2018 •

edited

Loading

spadgett commented Dec 6, 2018 •

edited

Loading

spadgett commented Dec 10, 2018 •

edited

Loading

brancz commented Jan 8, 2019 •

edited

Loading

spadgett Jan 27, 2019 •

edited

Loading