Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus-operator repeatedly deletes prometheus StatefulSet once pods reach ContainerCreating #2950

Closed
scruplelesswizard opened this issue Jan 11, 2020 · 18 comments · Fixed by #2987
Labels

Comments

@scruplelesswizard
Copy link
Contributor

What happened?
On upgrading to v0.34.0 the prometheus-operator started deleting the prometheus-k8s StatefulSet once the pods reached the ContainerCreating status.

When the operator is scaled to 0 (terminated) after the StatefulSet is created but before the pods enter the ContainerCreating status the pods are able to sucessfully start and run the prometheus-k8s pods

Did you expect to see some different?
Yes, I expected that the StatefulSet would be created and not be repeatedly deleted by the operator.

How to reproduce it (as minimally and precisely as possible):
Unknown. The issue recurs in this environment but is unseen in other environments

Environment

  • Prometheus Operator version:

    v0.34.0

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-10T03:03:57Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

Kubeadm

  • Manifests:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: prometheus
              operator: In
              values:
              - k8s
          namespaces:
          - monitoring
          topologyKey: kubernetes.io/hostname
        weight: 100
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  replicas: 1
  resources:
    requests:
      memory: 400Mi
  retention: 30d
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  storage:
    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: standard
  version: v2.11.0

  • Prometheus Operator Logs:
level=debug ts=2020-01-10T15:31:26.377784425Z caller=operator.go:1151 component=prometheusoperator msg="updating current Prometheus statefulset"
level=debug ts=2020-01-10T15:31:26.381955209Z caller=operator.go:1157 component=prometheusoperator msg="resolving illegal update of Prometheus StatefulSet"
level=debug ts=2020-01-10T15:31:26.676005869Z caller=operator.go:1016 component=prometheusoperator msg="StatefulSet delete"

This recurs repeatedly within the sync loop

Anything else we need to know?:

@brancz
Copy link
Contributor

brancz commented Jan 13, 2020

That’s indeed odd and shouldn’t happen. It seems like some illegal update is continuously attempted. It would be good if we added what the illegal action was, as I believe the stated upset api does return this information.

@brancz
Copy link
Contributor

brancz commented Jan 13, 2020

cc @pgier @s-urbaniak i think this could potentially have to do with the controller generation tooling changes?

@scruplelesswizard
Copy link
Contributor Author

I should also note that this occurred after I updated kube-prometheus to master as there were several K8s API Alpha/Beta Group removals in 1.17.

@metalmatze
Copy link
Member

It would be great if you could start a kind cluster with 1.17 and check if it happens there too when deploying kube-prometheus master.

@scruplelesswizard
Copy link
Contributor Author

I have tested using image kindest/node:v1.17.0 and deploying my manifests generated by kube-prometheus I see the same behaviour with the StatefulSet being repeatedly deleted once it reaches the Pending phase.

It appears that this behaviour may be related to the changes in Kubernetes 1.17.

@juzhao
Copy link

juzhao commented Jan 21, 2020

checked, no issue with OCP 4.4
# kubectl -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | more level=info ts=2020-01-21T08:25:39.355Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=rhaos-4.4-rhel-7, revision=26eb549e3fbb176c890dffd3d2ac6b4ebed2ae44)"

# oc -n openshift-monitoring logs prometheus-operator-54c74f6797-zjvlt ts=2020-01-21T02:33:18.399798794Z caller=main.go:199 msg="Starting Prometheus Operator version '0.34.0'." ts=2020-01-21T02:33:18.409227432Z caller=main.go:96 msg="Staring insecure server on :8080" level=info ts=2020-01-21T02:33:18.415414693Z caller=operator.go:441 component=prometheusoperator msg="connection established" cluster-version=v1.17.0 level=info ts=2020-01-21T02:33:18.415426254Z caller=operator.go:219 component=alertmanageroperator msg="connection established" cluster-version=v1.17.0

# kubectl version Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0-4-g38212b5", GitCommit:"e17af88a81bab82a47aa161fb0db99f6e9424661", GitTreeState:"clean", BuildDate:"2020-01-17T09:20:01Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.0", GitCommit:"12de527", GitTreeState:"clean", BuildDate:"2020-01-20T12:51:02Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

@pgier
Copy link
Contributor

pgier commented Jan 27, 2020

I don't think this is related to the CRD and build changes since those weren't released until v0.35.0 and this is reported against v0.34.0. I haven't been able to reproduce this yet unfortunately. @chaosaffe what do your storageclass/standard and the associated PVs look like? Also could be helpful to turn on debug logging in prometheus-operator (--log-level=debug)

@pgier
Copy link
Contributor

pgier commented Jan 27, 2020

I was able to reproduce this and found this error from the api server.

updates to 
statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

It appears that this validation doesn't occur in kube v1.16. The issue only occurs when apiVersion or Kind are set. It appears these two fields are removed by the kube api server. The issue can be seen by trying to apply the same statefulset twice with one or both of these fields set (kubernetes/kubernetes#87583).

As a workaround for now, you can remove these two fields from the Prometheus config. I'm working on a longer term fix for prometheus-operator.

@s-urbaniak
Copy link
Contributor

Great finding @pgier! I believe this then can also bite us in OpenShift 4.4, as we are on k8s 1.17 right?

@brancz
Copy link
Contributor

brancz commented Jan 28, 2020

We have some logic that attempts to "resolve" this by deleting the statefulset completely and re-creating it. But it seems to me that in this case we're ending up in some sort of a loop of this behavior.

@pgier
Copy link
Contributor

pgier commented Jan 28, 2020

@s-urbaniak yes, I believe this will affect us in openshift also if either of those fields are set.

@brancz Right, the prometheus operator keeps trying to resolve the difference between the stateful set generated from the prometheus crd config and the stateful setting running in kube. It fails to update because of the validation error, so Prometheus operator deletes the running sts and tries to create it again which starts over the process. It's partly due to #2801 which included the running stateful set when generating the hash, but I'm not sure yet whether reverting that would completely fix the issue. I'm kind of waiting to see what the kube api team have to say about the issue I filed because this seems like a significant bug that you wouldn't be able to apply the same config twice.

pgier added a commit to pgier/prometheus-operator that referenced this issue Jan 29, 2020
Add a new hash annotation that tracks the state of the
statefulset spec separately from the inputs (Prometheus, Config,
ConfigMaps).  This hash annotation is added immediately
after the statefulset is created, and is checked for changes
during updates to detect if there were manual updates to the
statefulset spec.

This prevents an issue (prometheus-operator#2950) where the statefulset is
continously deleted and then recreated in kubernetes v1.17
due to a mismatch between the hash annotation and the state
of the statefulsetspec.  The issue only occurs in kubernetes
v1.17 because the api is more strict about what parts of a
statefulset can be updated.
pgier added a commit to pgier/prometheus-operator that referenced this issue Jan 29, 2020
Add a new hash annotation that tracks the state of the
statefulset spec separately from the inputs (Prometheus, Config,
ConfigMaps).  This hash annotation is added immediately
after the statefulset is created, and is checked for changes
during updates to detect if there were manual updates to the
statefulset spec.

This prevents an issue (prometheus-operator#2950) where the statefulset is
continously deleted and then recreated in kubernetes v1.17
due to a mismatch between the hash annotation and the state
of the statefulsetspec.  The issue only occurs in kubernetes
v1.17 because the api is more strict about what parts of a
statefulset can be updated.
pgier added a commit to pgier/prometheus-operator that referenced this issue Jan 29, 2020
Add a new hash annotation that tracks the state of the
statefulset spec separately from the inputs (Prometheus, Config,
ConfigMaps).  This hash annotation is added immediately
after the statefulset is created, and is checked for changes
during updates to detect if there were manual updates to the
statefulset spec.

This prevents an issue (prometheus-operator#2950) where the statefulset is
continously deleted and then recreated in kubernetes v1.17
due to a mismatch between the hash annotation and the state
of the statefulsetspec.  The issue only occurs in kubernetes
v1.17 because the api is more strict about what parts of a
statefulset can be updated.
pgier added a commit to pgier/prometheus-operator that referenced this issue Jan 29, 2020
Changes in kubernetes v1.17 cause an update loop due to validation
errors cause by setting the 'apiVersion' and 'kind' fields in the
StatefulSet spec.  These two fields are set to empty strings by
the Kube API server, and v1.17 added validation that does not
allow these fields to be modified.

See prometheus-operator#2950
pgier added a commit to pgier/prometheus-operator that referenced this issue Jan 29, 2020
Changes in kubernetes v1.17 cause an update loop due to validation
errors cause by setting the 'apiVersion' and 'kind' fields in the
StatefulSet spec.  These two fields are set to empty strings by
the Kube API server, and v1.17 added validation that does not
allow these fields to be modified.

See prometheus-operator#2950
pgier added a commit to pgier/prometheus-operator that referenced this issue Jan 29, 2020
Changes in kubernetes v1.17 cause an endless update loop due
to validation errors cause by setting the 'apiVersion' and
'kind' fields in the StatefulSet spec.  These two fields are
set to empty strings by the Kube API server, and v1.17 added
validation that does not allow these fields to be modified.

See prometheus-operator#2950
pgier added a commit to pgier/prometheus-operator that referenced this issue Feb 3, 2020
Changes in kubernetes v1.17 cause an endless update loop due
to validation errors cause by setting the 'apiVersion' and
'kind' fields in the StatefulSet spec.  These two fields are
set to empty strings by the Kube API server, and v1.17 added
validation that does not allow these fields to be modified.

See prometheus-operator#2950
pgier added a commit to pgier/prometheus-operator that referenced this issue Feb 3, 2020
Changes in kubernetes v1.17 cause an endless update loop due
to validation errors cause by setting the 'apiVersion' and
'kind' fields in the StatefulSet spec.  These two fields are
set to empty strings by the Kube API server, and v1.17 added
validation that does not allow these fields to be modified.

See prometheus-operator#2950
@jrcjoro
Copy link

jrcjoro commented Feb 6, 2020

I'm still having this issue with kube 1.17.1, prometheus-k8s-0 and prometheus-k8s-1 keeps terminating. I just tested with kube-prometheus "version": "release-0.35". Any info how to fix this?

@pgier
Copy link
Contributor

pgier commented Feb 6, 2020

@jrcjoro Can you try using prometheus-operator v0.35.1? The other workaround is to remove apiVersion and kind from your prometheus resource

    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim

@sebastiansirch
Copy link

sebastiansirch commented Feb 13, 2020

Unfortunately, neither 0.35.1 nor 0.36.0 are resolving this bug in our setup (K8s v1.17.2). We are using the Helm chart to setup the Prometheus instance, i.e. we don't have the apiVersion/kind fields in our generated Prometheus manifest.

  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: storage-ssd

@superbiche
Copy link

@pgier had this issue, removing apiVersion and kind from the generated manifests made the pods start. Kubernetes 1.17.2 on Scaleway Kapsule (managed Kubernetes)

@unfor19
Copy link

unfor19 commented May 5, 2021

I'm on AWS EKS v1.19, Rancher v2.3.4 and Prometheus-Operator 0.47.1

I had to disable Rancher Monitoring for this issue to stop. Would be great if you could add it as a note to the docs, something like

"Rancher users, please make sure you disable Rancher Monitoring, prior to installing Prometheus-Operator. Otherwise, the Prometheus pods will constantly restart"

@paulfantom
Copy link
Member

paulfantom commented May 5, 2021

I had to disable Rancher Monitoring for this issue to stop. Would be great if you could add it as a note to the docs, something like

That seems like a good doc entry for Rancher, @unfor19 maybe consider pinging the rancher team about it? We as maintainers of prometheus-operator cannot list all quirks of every platform.

@unfor19
Copy link

unfor19 commented May 5, 2021

I had to disable Rancher Monitoring for this issue to stop. Would be great if you could add it as a note to the docs, something like

That seems like a good doc entry for Rancher, @unfor19 maybe consider pinging the rancher team about it? We as maintainers of prometheus-operator cannot list all quirks of every platform.

@paulfantom I guess you're right, I'll do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.