cmd/bridge: configure Thanos service #3000

s-urbaniak · 2019-10-17T06:30:10Z

This enables console to pass requests to Thanos rather than Prometheus
and enables user workload monitoring, if enabled.

/cc @kyoto @paulfantom @lilic

lilic

This lgtm 👍

s-urbaniak · 2019-10-17T07:36:12Z

cmd/bridge/main.go

-	// Well-known location of Prometheus service for OpenShift. This is only accessible in-cluster.
-	openshiftPrometheusHost = "prometheus-k8s.openshift-monitoring.svc:9091"
+	// Well-known location of Thanos service for OpenShift. This is only accessible in-cluster.
+	openshiftPrometheusHost = "thanos-querier.openshift-monitoring.svc:9091"


ok, we just found an edge case with @kyoto with respect to alerts in the openshift console. It turns out that console is hitting the /rules endpoint from Prometheus to render alerts.

For these requests we have to still hit the cluster Prometheus endpoint. For this code this means, instead of rewriting the target service URL, we need to create a new variable pointing to thanos and:

execute all /query requests against the thanos service

execute all /rules requests agains the cluster prometheus-k8s service

This implies that only cluster alerts are visible in the openshift console, not user workload alerts, which i believe is fine for tech preview.

/cc @sichvoge @brancz @kyoto

Reason: thanos querier does not implement the /rules endpoint.

/rules can't really be implemented by Thanos. It refers to which rules are loaded by the Prometheus. I think the right thing to do is request the /rules endpoint of the cluster-monitoring Prometheus on the admin console. Do we already show alerting in the dev console? If not then this might need a feature in prom-label-proxy as we'd want to filter the alerts by their namespace label and request them from the user workload monitoring one.

@brancz afaik, alerts are only displayed on the cluster overview, but @kyoto might have more insight.

In that case let's have it hit the cluster-monitoring one only until we introduce the alerting section in the dev console.

sounds good 👍

We are not currently hitting the /rules endpoint for the developer perspective, only for the admin perspective.

In the admin perspective, we use /rules for both the dashboards and for the "Monitoring" -> "Alerting" page.

This enables console to pass requests to Thanos rather than Prometheus and enables user workload monitoring, if enabled.

s-urbaniak · 2019-10-17T15:36:03Z

@spadgett PTAL, I don't have enough knowledge about the e2e failures, it "works locally" on machine ;-) Can you help if these are not simply flakes?

spadgett

/approve
/lgtm

We can update oc-environment.sh in a separate PR

spadgett · 2019-10-17T15:51:13Z

cmd/bridge/main.go

@@ -69,6 +76,7 @@ func main() {
 	fK8sModeOffClusterEndpoint := fs.String("k8s-mode-off-cluster-endpoint", "", "URL of the Kubernetes API server.")
 	fK8sModeOffClusterSkipVerifyTLS := fs.Bool("k8s-mode-off-cluster-skip-verify-tls", false, "DEV ONLY. When true, skip verification of certs presented by k8s API server.")
 	fK8sModeOffClusterPrometheus := fs.String("k8s-mode-off-cluster-prometheus", "", "DEV ONLY. URL of the cluster's Prometheus server.")
+	fK8sModeOffClusterThanos := fs.String("k8s-mode-off-cluster-thanos", "", "DEV ONLY. URL of the cluster's Prometheus server.")


We should update the oc-environment.sh script to set this, see https://github.com/openshift/console/blob/master/contrib/oc-environment.sh#L20

Actually... it looks like there's no route for Thanos. Is that correct? We use the Prometheus route today for our development environment where we run off cluster

there is a new route for thanos querier which will be always available and we have to point the Prometheus UI link there now, see https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/thanos-querier/route.yaml

openshift-ci-robot · 2019-10-17T15:53:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: s-urbaniak, spadgett

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [spadgett]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

spadgett · 2019-10-17T15:54:18Z

Looks like a test flake

/retest

openshift-ci-robot requested review from kyoto, lilic and paulfantom October 17, 2019 06:30

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 17, 2019

lilic reviewed Oct 17, 2019

View reviewed changes

s-urbaniak commented Oct 17, 2019

View reviewed changes

openshift-ci-robot requested review from brancz and sichvoge October 17, 2019 07:36

s-urbaniak force-pushed the thanos branch from 15e8615 to 0b8e335 Compare October 17, 2019 13:42

openshift-ci-robot added component/backend Related to backend size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 17, 2019

cmd/bridge: configure Thanos service

2cf2acd

This enables console to pass requests to Thanos rather than Prometheus and enables user workload monitoring, if enabled.

s-urbaniak force-pushed the thanos branch from 0b8e335 to 2cf2acd Compare October 17, 2019 13:55

spadgett approved these changes Oct 17, 2019

View reviewed changes

openshift-ci-robot assigned spadgett Oct 17, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2019

spadgett added this to the v4.3 milestone Oct 17, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2019

s-urbaniak changed the title ~~WIP: cmd/bridge: configure Thanos service~~ cmd/bridge: configure Thanos service Oct 18, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 18, 2019

openshift-merge-robot merged commit d0e0cf9 into openshift:master Oct 18, 2019

s-urbaniak deleted the thanos branch October 18, 2019 09:42

spadgett mentioned this pull request Oct 18, 2019

Add BRIDGE_K8S_MODE_OFF_CLUSTER_THANOS to oc-environment.sh #3013

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/bridge: configure Thanos service #3000

cmd/bridge: configure Thanos service #3000

s-urbaniak commented Oct 17, 2019

lilic left a comment

s-urbaniak Oct 17, 2019

s-urbaniak Oct 17, 2019

brancz Oct 17, 2019

s-urbaniak Oct 17, 2019

brancz Oct 17, 2019

s-urbaniak Oct 17, 2019

kyoto Oct 18, 2019

s-urbaniak commented Oct 17, 2019

spadgett left a comment

spadgett Oct 17, 2019

spadgett Oct 17, 2019

s-urbaniak Oct 18, 2019

openshift-ci-robot commented Oct 17, 2019

spadgett commented Oct 17, 2019

cmd/bridge: configure Thanos service #3000

cmd/bridge: configure Thanos service #3000

Conversation

s-urbaniak commented Oct 17, 2019

lilic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-urbaniak commented Oct 17, 2019

spadgett left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Oct 17, 2019

spadgett commented Oct 17, 2019