Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] Prometheus Operator pod cannot come up when admission hook is disabled #1438

Closed
AndrewSav opened this issue Oct 18, 2021 · 27 comments
Labels
bug Something isn't working lifecycle/stale

Comments

@AndrewSav
Copy link

Describe the bug a clear and concise description of what the bug is.

Prometheus Operator pod cannot come up with a missing admission hook secret error if admission hook is disabled.

What's your helm version?

v3.7.0

What's your kubectl version?

v1.22.2

Which chart?

kube-prometheus-stack

What's the chart version?

19.0.2

What happened?

The operator pod cannot come up, with the following error message: MountVolume.SetUp failed for volume "tls-secret" : secret "prometheus-kube-prometheus-admission" not found. This message is displayed because admission hook is disabled and the secret is not present.

What you expected to happen?

I expect the operator to come up.

How to reproduce it?

Install the chart with the values below

Enter the changed values of values.yaml?

prometheusOperator:
  admissionWebhooks:
    enabled: false

Enter the command that you execute and failing/misfunctioning.

helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f prometheus-values.yaml

Anything else we need to know?

No response

@AndrewSav AndrewSav added the bug Something isn't working label Oct 18, 2021
@jaanhio
Copy link

jaanhio commented Oct 26, 2021

i am also encountering this when trying to deploy Prometheus-operator admissionWebhooks disabled.

This is due to Prometheus-operator's deployment referencing the secret (

secretName: {{ template "kube-prometheus-stack.fullname" . }}-admission
) but secret will only be created by admission-create job (https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml#L1).

A workaround (or maybe intended behaviour?) will be to set

prometheusOperator:
  tls:
    enabled: false

This will prevent helm from generating the volume and volumeMount blocks (

{{- if .Values.prometheusOperator.tls.enabled }}
).

However, this revealed another set of issues.

  1. Missing role & rolebindings for Prometheus-operator
level=error ts=2021-10-26T14:55:40.266312816Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Probe: failed to list *v1.Probe: probes.monitoring.coreos.com is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-operator\" cannot list resource \"probes\" in API group \"monitoring.coreos.com\" in the namespace \"test-tenant\""
level=error ts=2021-10-26T14:55:42.581554026Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.PrometheusRule: failed to list *v1.PrometheusRule: prometheusrules.monitoring.coreos.com is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-operator\" cannot list resource \"prometheusrules\" in API group \"monitoring.coreos.com\" in the namespace \"test-tenant\""

Workaround is to create rolebinding and role with permission matching what's stated here https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/clusterrole.yaml.

  1. Missing role & rolebindings for Prometheus
level=error ts=2021-10-26T14:56:42.254Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-prometheus\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"test-tenant\""
level=error ts=2021-10-26T14:56:54.706Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-prometheus\" cannot list resource \"services\" in API group \"\" in the namespace \"test-tenant\""
level=error ts=2021-10-26T14:56:58.193Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-prometheus\" cannot list resource \"pods\" in API group \"\" in the namespace \"test-tenant\""

Similar to above, workaround is to create role & rolebinding separately using this as reference https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus/clusterrole.yaml

Question to maintainers:

  1. Should we be setting tls.enabled: false if we are not intending to use admissionWebhooks?
  2. Any issue with creating role/rolebindings when clusterrole/clusterrolebindings are not needed or not applicable? (e.g multi-tenant environment)

I can help create a PR to fix this.

@stale
Copy link

stale bot commented Nov 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@AndrewSav
Copy link
Author

+1

@stale stale bot removed the lifecycle/stale label Nov 26, 2021
@stale
Copy link

stale bot commented Dec 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@AndrewSav
Copy link
Author

+1

@stale stale bot removed the lifecycle/stale label Dec 28, 2021
@monotek
Copy link
Member

monotek commented Dec 28, 2021

Works for me.

@AndrewSav
Copy link
Author

@monotek do you have the prometheus-kube-prometheus-admission secret?

@stale
Copy link

stale bot commented Jan 28, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@gw0
Copy link
Contributor

gw0 commented Feb 5, 2022

Same issue with kube-prometheus-stack 31.0.0 on a fresh cluster. I disabled the admission webhooks, because I do not configure Prometheus in this way and there is no need for it to be running.

prometheusOperator:
  enabled: true
  admissionWebhooks:
    enabled: false

While looking at the code it seems conceptually wrong that the prometheus-operator uses the same TLS certificates intended to be used by the admission webhooks. It should generate it's own certificates if needed or there should be instructions on how to set it up.

Workaround 1: Disable TLS (traffic to operator is now unencrypted?):

  tls:
    enabled: false

Workaround 2: Enable the generation of admission webhooks certificates with cert-manager despite it being disabled (generated by certmanager.yaml#L42):

  admissionWebhooks:
    enabled: false
    certManager:
      enabled: true

Workaround 3: Manually create the needed TLS secrets/certificates.

@obvionaoe
Copy link

+1

@danmanners
Copy link

@gw0 thanks for recommending option 2, that seems to work for 33.1.0.

@stale stale bot removed the lifecycle/stale label Mar 2, 2022
@stale
Copy link

stale bot commented Apr 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Apr 2, 2022
@AndrewSav
Copy link
Author

recent activity

@stale stale bot removed the lifecycle/stale label Apr 2, 2022
@stale
Copy link

stale bot commented May 3, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label May 3, 2022
@danmanners
Copy link

More recent activity

@stale stale bot removed the lifecycle/stale label May 3, 2022
@stale
Copy link

stale bot commented Jun 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Jun 4, 2022
@AndrewSav
Copy link
Author

Even more recent activity

@stale stale bot removed the lifecycle/stale label Jun 4, 2022
@obvionaoe
Copy link

@monotek can you or anyone take a look?

@stale
Copy link

stale bot commented Jul 13, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@AndrewSav
Copy link
Author

Hello there

@stale stale bot removed the lifecycle/stale label Jul 13, 2022
@jkleinkauff
Copy link

I'm facing the same issue.

@stale
Copy link

stale bot commented Oct 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@AndrewSav
Copy link
Author

remove stale

@stale
Copy link

stale bot commented Nov 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@AndrewSav
Copy link
Author

/remove-lifecycle stale

@stale
Copy link

stale bot commented Nov 27, 2022

This issue is being automatically closed due to inactivity.

@AndrewSav
Copy link
Author

Re-opened as #2742 since it was closed by the bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/stale
Projects
None yet
Development

No branches or pull requests

7 participants