-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/bridge: configure Thanos service #3000
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm 👍
cmd/bridge/main.go
Outdated
// Well-known location of Prometheus service for OpenShift. This is only accessible in-cluster. | ||
openshiftPrometheusHost = "prometheus-k8s.openshift-monitoring.svc:9091" | ||
// Well-known location of Thanos service for OpenShift. This is only accessible in-cluster. | ||
openshiftPrometheusHost = "thanos-querier.openshift-monitoring.svc:9091" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, we just found an edge case with @kyoto with respect to alerts in the openshift console. It turns out that console is hitting the /rules
endpoint from Prometheus to render alerts.
For these requests we have to still hit the cluster Prometheus endpoint. For this code this means, instead of rewriting the target service URL, we need to create a new variable pointing to thanos and:
- execute all
/query
requests against the thanos service - execute all
/rules
requests agains the cluster prometheus-k8s service
This implies that only cluster alerts are visible in the openshift console, not user workload alerts, which i believe is fine for tech preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason: thanos querier does not implement the /rules
endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/rules
can't really be implemented by Thanos. It refers to which rules are loaded by the Prometheus. I think the right thing to do is request the /rules
endpoint of the cluster-monitoring Prometheus on the admin console. Do we already show alerting in the dev console? If not then this might need a feature in prom-label-proxy
as we'd want to filter the alerts by their namespace label and request them from the user workload monitoring one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case let's have it hit the cluster-monitoring one only until we introduce the alerting section in the dev console.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not currently hitting the /rules
endpoint for the developer perspective, only for the admin perspective.
In the admin perspective, we use /rules
for both the dashboards and for the "Monitoring" -> "Alerting" page.
This enables console to pass requests to Thanos rather than Prometheus and enables user workload monitoring, if enabled.
@spadgett PTAL, I don't have enough knowledge about the e2e failures, it "works locally" on machine ;-) Can you help if these are not simply flakes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
We can update oc-environment.sh in a separate PR
@@ -69,6 +76,7 @@ func main() { | |||
fK8sModeOffClusterEndpoint := fs.String("k8s-mode-off-cluster-endpoint", "", "URL of the Kubernetes API server.") | |||
fK8sModeOffClusterSkipVerifyTLS := fs.Bool("k8s-mode-off-cluster-skip-verify-tls", false, "DEV ONLY. When true, skip verification of certs presented by k8s API server.") | |||
fK8sModeOffClusterPrometheus := fs.String("k8s-mode-off-cluster-prometheus", "", "DEV ONLY. URL of the cluster's Prometheus server.") | |||
fK8sModeOffClusterThanos := fs.String("k8s-mode-off-cluster-thanos", "", "DEV ONLY. URL of the cluster's Prometheus server.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the oc-environment.sh
script to set this, see https://github.com/openshift/console/blob/master/contrib/oc-environment.sh#L20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... it looks like there's no route for Thanos. Is that correct? We use the Prometheus route today for our development environment where we run off cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a new route for thanos querier which will be always available and we have to point the Prometheus UI
link there now, see https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/thanos-querier/route.yaml
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: s-urbaniak, spadgett The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Looks like a test flake /retest |
This enables console to pass requests to Thanos rather than Prometheus
and enables user workload monitoring, if enabled.
/cc @kyoto @paulfantom @lilic