Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/bridge: configure Thanos service #3000

Merged
merged 1 commit into from
Oct 18, 2019

Conversation

s-urbaniak
Copy link
Contributor

This enables console to pass requests to Thanos rather than Prometheus
and enables user workload monitoring, if enabled.

/cc @kyoto @paulfantom @lilic

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 17, 2019
Copy link

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm 👍

// Well-known location of Prometheus service for OpenShift. This is only accessible in-cluster.
openshiftPrometheusHost = "prometheus-k8s.openshift-monitoring.svc:9091"
// Well-known location of Thanos service for OpenShift. This is only accessible in-cluster.
openshiftPrometheusHost = "thanos-querier.openshift-monitoring.svc:9091"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we just found an edge case with @kyoto with respect to alerts in the openshift console. It turns out that console is hitting the /rules endpoint from Prometheus to render alerts.

For these requests we have to still hit the cluster Prometheus endpoint. For this code this means, instead of rewriting the target service URL, we need to create a new variable pointing to thanos and:

  1. execute all /query requests against the thanos service
  2. execute all /rules requests agains the cluster prometheus-k8s service

This implies that only cluster alerts are visible in the openshift console, not user workload alerts, which i believe is fine for tech preview.

/cc @sichvoge @brancz @kyoto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason: thanos querier does not implement the /rules endpoint.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/rules can't really be implemented by Thanos. It refers to which rules are loaded by the Prometheus. I think the right thing to do is request the /rules endpoint of the cluster-monitoring Prometheus on the admin console. Do we already show alerting in the dev console? If not then this might need a feature in prom-label-proxy as we'd want to filter the alerts by their namespace label and request them from the user workload monitoring one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brancz afaik, alerts are only displayed on the cluster overview, but @kyoto might have more insight.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case let's have it hit the cluster-monitoring one only until we introduce the alerting section in the dev console.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not currently hitting the /rules endpoint for the developer perspective, only for the admin perspective.

In the admin perspective, we use /rules for both the dashboards and for the "Monitoring" -> "Alerting" page.

@openshift-ci-robot openshift-ci-robot added component/backend Related to backend size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 17, 2019
This enables console to pass requests to Thanos rather than Prometheus
and enables user workload monitoring, if enabled.
@s-urbaniak
Copy link
Contributor Author

@spadgett PTAL, I don't have enough knowledge about the e2e failures, it "works locally" on machine ;-) Can you help if these are not simply flakes?

Copy link
Member

@spadgett spadgett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

We can update oc-environment.sh in a separate PR

@@ -69,6 +76,7 @@ func main() {
fK8sModeOffClusterEndpoint := fs.String("k8s-mode-off-cluster-endpoint", "", "URL of the Kubernetes API server.")
fK8sModeOffClusterSkipVerifyTLS := fs.Bool("k8s-mode-off-cluster-skip-verify-tls", false, "DEV ONLY. When true, skip verification of certs presented by k8s API server.")
fK8sModeOffClusterPrometheus := fs.String("k8s-mode-off-cluster-prometheus", "", "DEV ONLY. URL of the cluster's Prometheus server.")
fK8sModeOffClusterThanos := fs.String("k8s-mode-off-cluster-thanos", "", "DEV ONLY. URL of the cluster's Prometheus server.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the oc-environment.sh script to set this, see https://github.com/openshift/console/blob/master/contrib/oc-environment.sh#L20

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... it looks like there's no route for Thanos. Is that correct? We use the Prometheus route today for our development environment where we run off cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a new route for thanos querier which will be always available and we have to point the Prometheus UI link there now, see https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/thanos-querier/route.yaml

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2019
@spadgett spadgett added this to the v4.3 milestone Oct 17, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: s-urbaniak, spadgett

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2019
@spadgett
Copy link
Member

Looks like a test flake

/retest

@s-urbaniak s-urbaniak changed the title WIP: cmd/bridge: configure Thanos service cmd/bridge: configure Thanos service Oct 18, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 18, 2019
@openshift-merge-robot openshift-merge-robot merged commit d0e0cf9 into openshift:master Oct 18, 2019
@s-urbaniak s-urbaniak deleted the thanos branch October 18, 2019 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/backend Related to backend lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants