Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring V1, RKE1, RKE2] k8s 1.21 clusters do not display any metrics on Grafana #33465

Closed
aiyengar2 opened this issue Jul 1, 2021 · 11 comments
Assignees
Labels
area/monitoring kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement priority/0 QA/XS release-note Note this issue in the milestone's release notes
Milestone

Comments

@aiyengar2
Copy link
Contributor

aiyengar2 commented Jul 1, 2021

What kind of request is this (question/bug/enhancement/feature request):
bug

Steps to reproduce (least amount of steps as possible):
Spin up an RKE2 cluster on the latest master-head in DO. I chose 1 node with all roles and 1 node as worker.
Deploy Monitoring V1 0.2.3 (in dev-v2.6).
Navigate to Grafana and check out any dashboard.

Result:

All dashboards show no data

Expected Result:

Grafana should show data on the dashboards

Other details that may be helpful:

The following logs are shown under prometheus-auth (which Grafana proxies through), so the fix for this issue is probably bumping the client-go package used by prometheus-auth.

INFO[2021-07-01T22:10:12Z] listening on 10.42.89.12:9090, proxying to http://127.0.0.1:9090 with ignoring 'remote reader' labels [prometheus,prometheus_replica], only allow maximum 512 connections with 5m0s read timeout .
--
Thu, Jul 1 2021 3:10:12 pm | INFO[2021-07-01T22:10:12Z] Start listening for connections on 10.42.89.12:9090
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:05 pm | WARN[2021-07-01T22:12:05Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:06 pm | WARN[2021-07-01T22:12:06Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:06 pm | WARN[2021-07-01T22:12:06Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:06 pm | WARN[2021-07-01T22:12:06Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:06 pm | WARN[2021-07-01T22:12:06Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:06 pm | WARN[2021-07-01T22:12:06Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:09 pm | WARN[2021-07-01T22:12:09Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:10 pm | WARN[2021-07-01T22:12:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:11 pm | WARN[2021-07-01T22:12:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:11 pm | WARN[2021-07-01T22:12:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:11 pm | WARN[2021-07-01T22:12:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:11 pm | WARN[2021-07-01T22:12:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:11 pm | WARN[2021-07-01T22:12:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:12:11 pm | WARN[2021-07-01T22:12:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:10 pm | WARN[2021-07-01T22:13:10Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:11 pm | WARN[2021-07-01T22:13:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:11 pm | WARN[2021-07-01T22:13:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:11 pm | WARN[2021-07-01T22:13:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:11 pm | WARN[2021-07-01T22:13:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:11 pm | WARN[2021-07-01T22:13:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:11 pm | WARN[2021-07-01T22:13:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
Thu, Jul 1 2021 3:13:11 pm | WARN[2021-07-01T22:13:11Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): master-head 671a4b1a026f5cf2706c9d6717ef7ee0a6674e73
  • Installation option (single install/HA): single install

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): DO
  • Machine type (cloud/VM/metal) and specifications (CPU/memory): 8GB, 4CPU, 160GB
  • Kubernetes version (use kubectl version):
(paste the output here)
  • Docker version (use docker version):
(paste the output here)
@aiyengar2 aiyengar2 added this to the v2.6 milestone Jul 1, 2021
@aiyengar2 aiyengar2 self-assigned this Jul 1, 2021
@aiyengar2
Copy link
Contributor Author

aiyengar2 commented Jul 1, 2021

Note: This issue does not impact Monitoring V1 deployed in RKE1 clusters.

@aiyengar2
Copy link
Contributor Author

aiyengar2 commented Jul 6, 2021

Note: This issue does not impact Monitoring V1 deployed in imported RKE2 clusters.

It only affects Monitoring V1 deployed in RKE2 clusters provisioned by Rancher.

@aiyengar2 aiyengar2 added kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement area/monitoring [zube]: Working and removed [zube]: Next Up labels Jul 7, 2021
@aiyengar2
Copy link
Contributor Author

Note: This issue does not impact Monitoring V1 deployed in imported RKE2 clusters.

It only affects Monitoring V1 deployed in RKE2 clusters provisioned by Rancher.

Today, when I tested this, I noticed that it did affect an imported RKE2 cluster.

However, this is likely due to the fact that the original imported cluster I created when testing this issue was at version v1.20.8+rke2r1, not at v1.21.2+rke2r1 (since this was just modified in the RKE2 install scripts two days ago: https://github.com/rancher/rke2/blame/b9a8f347b24be6107f9679b5c41a375ef18c64be/channels.yaml#L3).

Since Prometheus still seems to contain the data that Grafana is trying to access but the Grafana queries are returning empty lists, I suspect that the fix for this issue will involve updating https://github.com/rancher/prometheus-auth for supporting k8s 1.21 clusters.

@aiyengar2 aiyengar2 changed the title [Monitoring V1] RKE2 clusters do not display any metrics on Grafana [Monitoring V1] RKE2 k8s 1.21 clusters do not display any metrics on Grafana Jul 12, 2021
@aiyengar2 aiyengar2 changed the title [Monitoring V1] RKE2 k8s 1.21 clusters do not display any metrics on Grafana [Monitoring V1] k8s 1.21 RKE2 cluster does not display any metrics on Grafana Jul 12, 2021
@aiyengar2 aiyengar2 changed the title [Monitoring V1] k8s 1.21 RKE2 cluster does not display any metrics on Grafana [Monitoring V1] k8s 1.21 clusters do not display any metrics on Grafana Jul 13, 2021
@aiyengar2
Copy link
Contributor Author

Reproduced consistently in RKE2 and RKE1 k8s 1.21 clusters, not observable in k8s 1.20.

@aiyengar2
Copy link
Contributor Author

In the prometheus-cluster-monitoring-0 Pod created by Monitoring V1, in the logs of prometheus-agent, the following related issue is observed in the logs:

WARN[2021-07-14T00:46:18Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:57: unknown namespace of token
--
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:33 pm | WARN[2021-07-14T00:46:33Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:34 pm | WARN[2021-07-14T00:46:34Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:34 pm | WARN[2021-07-14T00:46:34Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:34 pm | WARN[2021-07-14T00:46:34Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token
Tue, Jul 13 2021 5:46:34 pm | WARN[2021-07-14T00:46:34Z] failed to query Namespaces github.com/rancher/prometheus-auth/pkg/kube/namespaces.go:117: denied token

This is related since the token for the SubjectAccessReview issued by the authorization plugin sitting on top of Prometheus (rancher/prometheus-auth) is being denied.

I tried uplifting the client-go package and all the versions in rancher/prometheus-auth#10 but it seems like the image produced does not seem to resolve this issue.

Therefore, we might have to re-evaluate the logic in https://github.com/rancher/prometheus-auth/blob/master/pkg/kube/namespaces.go to see why the namespace logic is only failing in k8s 1.21 clusters.

@jiaqiluo
Copy link
Member

jiaqiluo commented Sep 20, 2021

The resource tokenreviews.authentication.k8s.io/v1 is available in k8s 1.16 and plus. We can use it to validate if a bearer token is authenticated.

Also, k8s has a builtin cluster role system:auth-delegator which contains the required permissions for using the TokenReview API

@jiaqiluo
Copy link
Member

Root cause

k8s 1.21 enables BoundServiceAccountTokenVolume by default, which breaks monitoring v1's logic for manipulating queries based on the token.

What was fixed, or what changes have occurred

We did not fix the issue. Instead, we will document this as a known issue for monitoring v1 in k8s 1.21 and above.

A workaround will be disabling the BoundServiceAccountTokenVolume feature in k8s 1.21 and above.

Areas or cases that should be tested

We need to test that the issue disappears if we disable BoundServiceAccountTokenVolume in k8s 1.21 cluster.

What areas could experience regressions?

The basic functionally of monitoring

Are the repro steps accurate/minimal?

n/a

@Auston-Ivison-Suse
Copy link

Auston-Ivison-Suse commented Nov 16, 2021

Further Testing

Setup: rke2 cluster
- 3 nodes each with 1 role. (etcd, control plane, worker)

Referencing these design docs you can see the feature

BoundServiceAccountTokenVolume

can be disabled.

If you look up feature gate docs you can see that you should be able to disable the feature by using the following argument when provisioning a cluster:

--feature-gate="BoundServiceAccountTokenVolume=false"

You can add those arguments to the configuration by going to the advanced options in the create page of an RKE2 cluster
Screenshot for clarity:Screen Shot 2021-11-12 at 4.53.18 PM.png

This should allow you to enter --feature-gate="BoundServiceAccountTokenVolume=false" into the api and controller args from the screenshot.

Those appear to be the location where the feature gate is implemented, per the mentioned design docs

However when you do this the provisioning of the cluster gets stuck waiting for a response from the cluster agent in the control plane node.

@Auston-Ivison-Suse
Copy link

Setup For Reproduction and Validation
Rancher version: 2.6-head (78b25c3)
Cluster Info:

  • Provider: RKE2
  • K8s Version: v1.21.6
  • ec2
  • 3 nodes (1 worker, 1 etcd, 1 control plane)
  • Legacy feature flag active.

Steps For Reproduction

  1. go to cluster tools --> monitoring (legacy)
  2. Enable
  3. Now navigate to grafana:
  • hamburger meu Legacy --> Project --> from namespace selector: "Project: System: --> select the left "/index.html" under "cluster-monitoring"
  1. click the magnifying glass on the left and select "cluster"
  2. Witness there are no metrics provided within Grafana.

Screenshot of issue:
Screen Shot 2021-11-17 at 11.51.14 AM.png

Steps For Validation

  1. Go to edit config of the RKE2 cluster.
  2. Within "Cluster Configuration" select "Advanced" on the left.
  3. Add the following into "Additional Controller Manager Args" & "Additional API Server Args":
feature-gates=BoundServiceAccountTokenVolume=false
  1. Save the changes and make sure Monitoring v1 is enabled from the repro steps.
  2. Navigate to the Grafana page in the repro steps
  3. You should see metrics displayed now.

Screenshot of metrics showing:
Screen Shot 2021-11-17 at 12.16.58 PM.png

Note
This workaround will be deprecated come k8s version 1.22.

@gothka
Copy link

gothka commented Aug 19, 2022

We had recently run into this issue on EKS which doesn't allow setting feature flags on cp aws/containers-roadmap#512 quick workaround is to update the prom access url for the grafana in the helm chart to http://access-prometheus.cattle-prometheus.svc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement priority/0 QA/XS release-note Note this issue in the milestone's release notes
Projects
None yet
Development

No branches or pull requests

8 participants