prometheus-operator repeatedly deletes prometheus StatefulSet once pods reach ContainerCreating #2950

scruplelesswizard · 2020-01-11T14:11:03Z

What happened?
On upgrading to v0.34.0 the prometheus-operator started deleting the prometheus-k8s StatefulSet once the pods reached the ContainerCreating status.

When the operator is scaled to 0 (terminated) after the StatefulSet is created but before the pods enter the ContainerCreating status the pods are able to sucessfully start and run the prometheus-k8s pods

Did you expect to see some different?
Yes, I expected that the StatefulSet would be created and not be repeatedly deleted by the operator.

How to reproduce it (as minimally and precisely as possible):
Unknown. The issue recurs in this environment but is unseen in other environments

Environment

Prometheus Operator version:

v0.34.0
Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-10T03:03:57Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:

Kubeadm

Manifests:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: prometheus
              operator: In
              values:
              - k8s
          namespaces:
          - monitoring
          topologyKey: kubernetes.io/hostname
        weight: 100
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  replicas: 1
  resources:
    requests:
      memory: 400Mi
  retention: 30d
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  storage:
    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: standard
  version: v2.11.0

Prometheus Operator Logs:

level=debug ts=2020-01-10T15:31:26.377784425Z caller=operator.go:1151 component=prometheusoperator msg="updating current Prometheus statefulset"
level=debug ts=2020-01-10T15:31:26.381955209Z caller=operator.go:1157 component=prometheusoperator msg="resolving illegal update of Prometheus StatefulSet"
level=debug ts=2020-01-10T15:31:26.676005869Z caller=operator.go:1016 component=prometheusoperator msg="StatefulSet delete"

This recurs repeatedly within the sync loop

Anything else we need to know?:

The text was updated successfully, but these errors were encountered:

brancz · 2020-01-13T08:34:16Z

That’s indeed odd and shouldn’t happen. It seems like some illegal update is continuously attempted. It would be good if we added what the illegal action was, as I believe the stated upset api does return this information.

brancz · 2020-01-13T08:45:27Z

cc @pgier @s-urbaniak i think this could potentially have to do with the controller generation tooling changes?

scruplelesswizard · 2020-01-13T21:00:09Z

I should also note that this occurred after I updated kube-prometheus to master as there were several K8s API Alpha/Beta Group removals in 1.17.

metalmatze · 2020-01-15T13:33:25Z

It would be great if you could start a kind cluster with 1.17 and check if it happens there too when deploying kube-prometheus master.

scruplelesswizard · 2020-01-15T23:05:27Z

I have tested using image kindest/node:v1.17.0 and deploying my manifests generated by kube-prometheus I see the same behaviour with the StatefulSet being repeatedly deleted once it reaches the Pending phase.

It appears that this behaviour may be related to the changes in Kubernetes 1.17.

juzhao · 2020-01-21T08:37:00Z

checked, no issue with OCP 4.4
# kubectl -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | more level=info ts=2020-01-21T08:25:39.355Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=rhaos-4.4-rhel-7, revision=26eb549e3fbb176c890dffd3d2ac6b4ebed2ae44)"

# oc -n openshift-monitoring logs prometheus-operator-54c74f6797-zjvlt ts=2020-01-21T02:33:18.399798794Z caller=main.go:199 msg="Starting Prometheus Operator version '0.34.0'." ts=2020-01-21T02:33:18.409227432Z caller=main.go:96 msg="Staring insecure server on :8080" level=info ts=2020-01-21T02:33:18.415414693Z caller=operator.go:441 component=prometheusoperator msg="connection established" cluster-version=v1.17.0 level=info ts=2020-01-21T02:33:18.415426254Z caller=operator.go:219 component=alertmanageroperator msg="connection established" cluster-version=v1.17.0

# kubectl version Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0-4-g38212b5", GitCommit:"e17af88a81bab82a47aa161fb0db99f6e9424661", GitTreeState:"clean", BuildDate:"2020-01-17T09:20:01Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.0", GitCommit:"12de527", GitTreeState:"clean", BuildDate:"2020-01-20T12:51:02Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

pgier · 2020-01-27T15:26:04Z

I don't think this is related to the CRD and build changes since those weren't released until v0.35.0 and this is reported against v0.34.0. I haven't been able to reproduce this yet unfortunately. @chaosaffe what do your storageclass/standard and the associated PVs look like? Also could be helpful to turn on debug logging in prometheus-operator (--log-level=debug)

pgier · 2020-01-27T21:53:38Z

I was able to reproduce this and found this error from the api server.

updates to 
statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

It appears that this validation doesn't occur in kube v1.16. The issue only occurs when apiVersion or Kind are set. It appears these two fields are removed by the kube api server. The issue can be seen by trying to apply the same statefulset twice with one or both of these fields set (kubernetes/kubernetes#87583).

As a workaround for now, you can remove these two fields from the Prometheus config. I'm working on a longer term fix for prometheus-operator.

s-urbaniak · 2020-01-28T08:13:12Z

Great finding @pgier! I believe this then can also bite us in OpenShift 4.4, as we are on k8s 1.17 right?

brancz · 2020-01-28T09:05:09Z

We have some logic that attempts to "resolve" this by deleting the statefulset completely and re-creating it. But it seems to me that in this case we're ending up in some sort of a loop of this behavior.

pgier · 2020-01-28T16:19:57Z

@s-urbaniak yes, I believe this will affect us in openshift also if either of those fields are set.

@brancz Right, the prometheus operator keeps trying to resolve the difference between the stateful set generated from the prometheus crd config and the stateful setting running in kube. It fails to update because of the validation error, so Prometheus operator deletes the running sts and tries to create it again which starts over the process. It's partly due to #2801 which included the running stateful set when generating the hash, but I'm not sure yet whether reverting that would completely fix the issue. I'm kind of waiting to see what the kube api team have to say about the issue I filed because this seems like a significant bug that you wouldn't be able to apply the same config twice.

Add a new hash annotation that tracks the state of the statefulset spec separately from the inputs (Prometheus, Config, ConfigMaps). This hash annotation is added immediately after the statefulset is created, and is checked for changes during updates to detect if there were manual updates to the statefulset spec. This prevents an issue (prometheus-operator#2950) where the statefulset is continously deleted and then recreated in kubernetes v1.17 due to a mismatch between the hash annotation and the state of the statefulsetspec. The issue only occurs in kubernetes v1.17 because the api is more strict about what parts of a statefulset can be updated.

Changes in kubernetes v1.17 cause an update loop due to validation errors cause by setting the 'apiVersion' and 'kind' fields in the StatefulSet spec. These two fields are set to empty strings by the Kube API server, and v1.17 added validation that does not allow these fields to be modified. See prometheus-operator#2950

Changes in kubernetes v1.17 cause an endless update loop due to validation errors cause by setting the 'apiVersion' and 'kind' fields in the StatefulSet spec. These two fields are set to empty strings by the Kube API server, and v1.17 added validation that does not allow these fields to be modified. See prometheus-operator#2950

jrcjoro · 2020-02-06T13:38:07Z

I'm still having this issue with kube 1.17.1, prometheus-k8s-0 and prometheus-k8s-1 keeps terminating. I just tested with kube-prometheus "version": "release-0.35". Any info how to fix this?

pgier · 2020-02-06T14:59:03Z

@jrcjoro Can you try using prometheus-operator v0.35.1? The other workaround is to remove apiVersion and kind from your prometheus resource

    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim

sebastiansirch · 2020-02-13T14:23:40Z

Unfortunately, neither 0.35.1 nor 0.36.0 are resolving this bug in our setup (K8s v1.17.2). We are using the Helm chart to setup the Prometheus instance, i.e. we don't have the apiVersion/kind fields in our generated Prometheus manifest.

  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: storage-ssd

superbiche · 2020-02-15T17:56:13Z

@pgier had this issue, removing apiVersion and kind from the generated manifests made the pods start. Kubernetes 1.17.2 on Scaleway Kapsule (managed Kubernetes)

unfor19 · 2021-05-05T09:56:18Z

I'm on AWS EKS v1.19, Rancher v2.3.4 and Prometheus-Operator 0.47.1

I had to disable Rancher Monitoring for this issue to stop. Would be great if you could add it as a note to the docs, something like

"Rancher users, please make sure you disable Rancher Monitoring, prior to installing Prometheus-Operator. Otherwise, the Prometheus pods will constantly restart"

paulfantom · 2021-05-05T11:30:01Z

I had to disable Rancher Monitoring for this issue to stop. Would be great if you could add it as a note to the docs, something like

That seems like a good doc entry for Rancher, @unfor19 maybe consider pinging the rancher team about it? We as maintainers of prometheus-operator cannot list all quirks of every platform.

unfor19 · 2021-05-05T11:51:02Z

I had to disable Rancher Monitoring for this issue to stop. Would be great if you could add it as a note to the docs, something like

That seems like a good doc entry for Rancher, @unfor19 maybe consider pinging the rancher team about it? We as maintainers of prometheus-operator cannot list all quirks of every platform.

@paulfantom I guess you're right, I'll do that

scruplelesswizard added the kind/bug label Jan 11, 2020

scruplelesswizard mentioned this issue Jan 21, 2020

Prometheus pods constantly deleted / recreated #2956

Closed

This was referenced Jan 24, 2020

Setting storageSpec wont create a Prometheus instance #2970

Closed

prometheus-k8s-0 keeps terminating prometheus-operator/kube-prometheus#370

Closed

pgier mentioned this issue Jan 29, 2020

refactor update handling of prometheus statefulsets #2980

Closed

pgier mentioned this issue Jan 29, 2020

prevent prometheus sts update loop #2987

Merged

brancz closed this as completed in #2987 Jan 30, 2020

pgier mentioned this issue Feb 27, 2020

[alertmanager] status terminating when specifying volumeClaimTemplate #3048

Closed

iwilltry42 mentioned this issue May 19, 2020

rancher-monitoring: support prometheus/alertmanager-instance-namespaces rancher/system-charts#227

Closed

savealive mentioned this issue Sep 21, 2020

Monitoring operator is continuously recreating Prometheus statefulset rancher/rancher#29062

Closed

miksercz mentioned this issue Aug 13, 2021

Prometheus instance stuck in Terminating loop #4212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus-operator repeatedly deletes prometheus StatefulSet once pods reach ContainerCreating #2950

prometheus-operator repeatedly deletes prometheus StatefulSet once pods reach ContainerCreating #2950

scruplelesswizard commented Jan 11, 2020

brancz commented Jan 13, 2020

brancz commented Jan 13, 2020

scruplelesswizard commented Jan 13, 2020

metalmatze commented Jan 15, 2020

scruplelesswizard commented Jan 15, 2020

juzhao commented Jan 21, 2020 •

edited

pgier commented Jan 27, 2020 •

edited

pgier commented Jan 27, 2020 •

edited

s-urbaniak commented Jan 28, 2020

brancz commented Jan 28, 2020

pgier commented Jan 28, 2020

jrcjoro commented Feb 6, 2020

pgier commented Feb 6, 2020

sebastiansirch commented Feb 13, 2020 •

edited

superbiche commented Feb 15, 2020

unfor19 commented May 5, 2021

paulfantom commented May 5, 2021 •

edited

unfor19 commented May 5, 2021

prometheus-operator repeatedly deletes prometheus StatefulSet once pods reach ContainerCreating #2950

prometheus-operator repeatedly deletes prometheus StatefulSet once pods reach ContainerCreating #2950

Comments

scruplelesswizard commented Jan 11, 2020

brancz commented Jan 13, 2020

brancz commented Jan 13, 2020

scruplelesswizard commented Jan 13, 2020

metalmatze commented Jan 15, 2020

scruplelesswizard commented Jan 15, 2020

juzhao commented Jan 21, 2020 • edited

pgier commented Jan 27, 2020 • edited

pgier commented Jan 27, 2020 • edited

s-urbaniak commented Jan 28, 2020

brancz commented Jan 28, 2020

pgier commented Jan 28, 2020

jrcjoro commented Feb 6, 2020

pgier commented Feb 6, 2020

sebastiansirch commented Feb 13, 2020 • edited

superbiche commented Feb 15, 2020

unfor19 commented May 5, 2021

paulfantom commented May 5, 2021 • edited

unfor19 commented May 5, 2021

juzhao commented Jan 21, 2020 •

edited

pgier commented Jan 27, 2020 •

edited

pgier commented Jan 27, 2020 •

edited

sebastiansirch commented Feb 13, 2020 •

edited

paulfantom commented May 5, 2021 •

edited