New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-76: Integrate operator metrics with Prometheus #122
NE-76: Integrate operator metrics with Prometheus #122
Conversation
@Miciah this got lost a long time ago — I'm inclined to tag it, any reason I shouldn't? /retest |
@Miciah if you still feel good about this one please remove the hold when you're ready. /lgtm |
/hold cancel |
@Miciah bindata might need updated here |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
/hold |
70065a9
to
4bc4cd8
Compare
/hold cancel |
4bc4cd8
to
e552a26
Compare
image: quay.io/openshift/origin-kube-rbac-proxy:latest | ||
args: | ||
- --logtostderr | ||
- -v=8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this log stuff on every scrape interval? If so is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will. I'll delete it.
- --secure-listen-address=:9393 | ||
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 | ||
- --upstream=http://127.0.0.1:60000/ | ||
- --tls-cert-file=/etc/tls/private/tls.crt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this image handle automatically reloading a rotated certificate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, kube-rbac-proxy watches for updates (brancz/kube-rbac-proxy@ef4527d), and I verified manually using the following command:
% oc -n openshift-dns-operator delete secrets/metrics-tls
secret "metrics-tls" deleted
Kube-rbac-proxy logs the following:
I1118 17:33:29.709147 1 reloader.go:98] reloading key /etc/tls/private/tls.key certificate /etc/tls/private/tls.crt
The operator exposes metrics, but they are not collected. Configure Prometheus to collect these metrics. This commit resolves NE-76. https://jira.coreos.com/browse/NE-76 * manifests/0000_70_dns-operator_00-cluster-role.yaml: Allow kube-rbac-proxy to create subjectaccessreviews and tokenreviews. * manifests/0000_70_dns-operator_00-namespace.yaml: Add a label for openshift-monitoring. * manifests/0000_70_dns-operator_01-service.yaml: New file. Define a metrics service. * manifests/0000_70_dns-operator_02-deployment.yaml: Use kube-rbac-proxy to expose the metrics port. * manifests/0000_90_dns-operator_00_prometheusrole.yaml: New file. Define a role that grants access to services, endpoints, and pods. * manifests/0000_90_dns-operator_01_prometheusrolebinding.yaml: New file. Bind the prometheus-k8s service account to the new role. * manifests/0000_90_dns-operator_02_servicemonitor.yaml: New file. Define a service monitor for the new service. * manifests/image-references: Add kube-rbac-proxy.
e552a26
to
ae1b9e6
Compare
Latest push drops |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ironcladlou, Miciah The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This looks like it broke upgrades from 4.2 last week:
|
@@ -36,6 +36,30 @@ spec: | |||
resources: | |||
requests: | |||
cpu: 10m | |||
- name: kube-rbac-proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super heavyweight vs just importing the library-go filters. Are you not based on library-go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general no one should be using rbac proxy for metrics if you are go code we own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are based on controller-runtime. Not sure about what library-go offers here, but seems too late anyway. Is there a reason we shouldn't ship what we have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general no one should be using rbac proxy for metrics if you are go code we own.
cc @Miciah
Clearly there's some communications breakdown on this subject. Do we need to revert this or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I don't know of any functional bugs with what we have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on discussions with the operator-sdk team, I understood kube-rbac-proxy to be the correct approach. I'm unfamiliar with what library-go provides in this area.
The operator exposes metrics, but they are not collected. Configure Prometheus to collect these metrics.
manifests/0000_70_dns-operator_00-namespace.yaml
: Add a label foropenshift-monitoring
.manifests/0000_70_dns-operator_02-deployment.yaml
: Add the metrics port.manifests/0000_70_dns-operator_04-service.yaml
: New file. Define a metrics service.manifests/0000_90_dns-operator_00_prometheusrole.yaml
: New file. Define a role that grants access to services, endpoints, and pods.manifests/0000_90_dns-operator_01_prometheusrolebinding.yaml
: New file. Bind theprometheus-k8s
service account to the new role.manifests/0000_90_dns-operator_02_servicemonitor.yaml
: New file. Define a service monitor for the new service.