Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScrapeConfig not applied when deployed via CRD #6363

Open
1 task done
EugenMayer opened this issue Mar 3, 2024 · 6 comments · Fixed by #6390
Open
1 task done

ScrapeConfig not applied when deployed via CRD #6363

EugenMayer opened this issue Mar 3, 2024 · 6 comments · Fixed by #6390
Labels

Comments

@EugenMayer
Copy link

EugenMayer commented Mar 3, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Description

deploying a scrape config via a crd in the stack deployed via the kube-prometheus-stack chart

apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: opnsense
  namespace: kube-prometheus-stack
spec:
  staticConfigs:
    - targets:
        - 10.10.10.2:9100

While having the typical configs for the operator .

prometheus:
  prometheusSpec:
    scrapeConfigNamespaceSelector: {}
    scrapeConfigSelector: {}
    scrapeConfigSelectorNilUsesHelmValues: false
    retain: '10d'
    retentionSize: ''

prometheusOperator:
  logLevel: debug

The scrape config does never show in in prometheus under /targets

Reading the similar #6189 i deployed the operator with debug logs and in fact i can see that the ScrapeConfig CRD is picked up

level=debug ts=2024-03-03T15:44:58.818656387Z caller=resource_selector.go:768 component=prometheus-controller msg="selected ScrapeConfigs" scrapeConfig=kube-prometheus-stack/opnsense namespace=kube-prometheus-stack prometheus=kube-prometheus-stack-prometheus

Expected Result

The target/scrape config should be added to prom and show up under targets

Actual Result

Scrape config never shows up

Prometheus Operator Version

kubectl -n kube-prometheus-stack describe deployment/kube-prometheus-stack-operator
Name:                   kube-prometheus-stack-operator
Namespace:              kube-prometheus-stack
CreationTimestamp:      Sun, 03 Mar 2024 12:30:35 +0100
Labels:                 app=kube-prometheus-stack-operator
                        app.kubernetes.io/component=prometheus-operator
                        app.kubernetes.io/instance=kube-prometheus-stack
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=kube-prometheus-stack-prometheus-operator
                        app.kubernetes.io/part-of=kube-prometheus-stack
                        app.kubernetes.io/version=56.19.0
                        chart=kube-prometheus-stack-56.19.0
                        heritage=Helm
                        release=kube-prometheus-stack
Annotations:            deployment.kubernetes.io/revision: 2
                        meta.helm.sh/release-name: kube-prometheus-stack
                        meta.helm.sh/release-namespace: kube-prometheus-stack
Selector:               app=kube-prometheus-stack-operator,release=kube-prometheus-stack
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=kube-prometheus-stack-operator
                    app.kubernetes.io/component=prometheus-operator
                    app.kubernetes.io/instance=kube-prometheus-stack
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kube-prometheus-stack-prometheus-operator
                    app.kubernetes.io/part-of=kube-prometheus-stack
                    app.kubernetes.io/version=56.19.0
                    chart=kube-prometheus-stack-56.19.0
                    heritage=Helm
                    release=kube-prometheus-stack
  Service Account:  kube-prometheus-stack-operator
  Containers:
   kube-prometheus-stack:
    Image:      quay.io/prometheus-operator/prometheus-operator:v0.71.2

Kubernetes Version

1.29.0

Kubernetes Cluster Type

k3s

How did you deploy Prometheus-Operator?

helm chart:prometheus-community/kube-prometheus-stack

Manifests

prometheus:
  prometheusSpec:
    scrapeConfigNamespaceSelector: {}
    scrapeConfigSelector: {}
    scrapeConfigSelectorNilUsesHelmValues: false
    retain: '10d'
    retentionSize: ''

prometheusOperator:
  logLevel: debug

prometheus-operator log output

level=debug ts=2024-03-03T15:44:58.818656387Z caller=resource_selector.go:768 component=prometheus-controller msg="selected ScrapeConfigs" scrapeConfig=kube-prometheus-stack/opnsense namespace=kube-prometheus-stack prometheus=kube-prometheus-stack-prometheus

Anything else?

No response

@EugenMayer EugenMayer added kind/bug needs-triage Issues that haven't been triaged yet labels Mar 3, 2024
@slashpai
Copy link
Contributor

slashpai commented Mar 4, 2024

Not familiar with how helm values are set, did you look at prometheus log as well?

@slashpai slashpai added kind/support and removed kind/bug needs-triage Issues that haven't been triaged yet labels Mar 4, 2024
@EugenMayer
Copy link
Author

@slashpai not sure how to understand you question

level=debug ts=2024-03-03T15:44:58.818656387Z caller=resource_selector.go:768 component=prometheus-controller msg="selected ScrapeConfigs" scrapeConfig=kube-prometheus-stack/opnsense namespace=kube-prometheus-stack prometheus=kube-prometheus-stack-prometheus

are the logs of the prometheus-operator - so we know that

  1. the CRD has been placed
  2. the CRD has been picked up

This is not the first time this is flaky, several people have been reporting it and it 'magically' has been 'working' out of a sudden. e.g #6189

@slashpai
Copy link
Contributor

slashpai commented Mar 5, 2024

I was asking if any issues reported on Prometheus pod?

@EugenMayer
Copy link
Author

I cannot see any errors nor warnings

ts=2024-03-05T22:36:39.030Z caller=kubernetes.go:331 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/kube-prometheus-stack/kube-prometheus-stack-kube-proxy/0 msg="Using pod service account via in-cluster config"
ts=2024-03-05T22:36:39.031Z caller=kubernetes.go:331 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/kube-prometheus-stack/kube-prometheus-stack-apiserver/0 msg="Using pod service account via in-cluster config"
ts=2024-03-05T22:36:39.031Z caller=kubernetes.go:331 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/kube-prometheus-stack/kube-prometheus-stack-kube-state-metrics/0 msg="Using pod service account via in-cluster config"
ts=2024-03-05T22:36:39.032Z caller=kubernetes.go:331 level=info component="discovery manager notify" discovery=kubernetes config=config-0 msg="Using pod service account via in-cluster config"

Just wiped the entire namespace/deployment and installed it again and the scrape config is now part of the targets. I deployed that CRD together with the entire helm chart, so it was there at the very beginning.

As expected, adding another CRD after the helm deployment, just the same thing, just a different name/ip and it does not show up.

Could it be that those CRDs are later transformed into config maps and the actual deployment struggles since configMap changes do not trigger re-deployments of the pods? Maye the typical shasum "workarround" like https://github.com/EugenMayer/helm-charts/blob/main/charts/rundeck/templates/nginx-deployment.yaml#L30 is needed to ensure those CRDs are deployed properly.

I see that those static scrape configs end up in the /etc/prometheus/config_out/prometheus.env.yaml, which itself is deployed via the config-secret.

UPDATE waiting for some time (minutes) the new scrape config showed up (without me doing anything) - so there might be a time delay or something else?

@EugenMayer
Copy link
Author

EugenMayer commented Mar 5, 2024

Tried to remove a CRD and it took like 2 minutes to be removed as a target. This is by no means an issue in itself - maybe it just needed to be added to the docs so people consider/know about that.

After the removal, i added it again and it took about 2 minutes to show up.

So it seems to work reliable, but delayed (which is not an issue).

@simonpasquier
Copy link
Contributor

@mviswanathsai has found the root cause and offers to submit a PR 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants