Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facing error err="parsing YAML file /etc/prometheus/config_out/prometheus.env.yaml: empty duration string" #5197

Closed
pankdhnd opened this issue Dec 1, 2022 · 12 comments

Comments

@pankdhnd
Copy link

pankdhnd commented Dec 1, 2022

What happened?
We upgraded to prometheus operator version 0.61.1 and after upgrade we found that prometheus pods are failing with below error:

level=error msg="Error loading config (--config.file=/etc/prometheus/config_out/prometheus.env.yaml)" file=/etc/prometheus/config_out/prometheus.env.yaml err="parsing YAML file /etc/prometheus/config_out/prometheus.env.yaml: empty duration string"

We load the configurations from a secret, and config-reloader parses them and puts them into /etc/prometheus/config_out/prometheus.env.yaml file.

We found in the output file that global.scrape_interval is not being parsed and it is put as empty value in the prometheus.env.yaml file, due to which prometheus keeps crashing. Below is the snippet of the prometheus.env.yaml file

global:
  evaluation_interval: 40s
  scrape_interval: ""
  external_labels:
    prometheus: monitoring-prometheus
    prometheus_replica: monitoring-prometheus-0

If we downgrade operator to previous version i.e. 0.60.1, then everything works fine.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy prometheus stack with 0.60.1 version of prometheus operator
  2. Edit operator deployment and change the operator image tag to 0.61.1
  3. Let the operator and other pods restart.

Environment
Any. We checked on OCP, AWS and Azure

  • Prometheus Operator version:v 0.61.1

  • Kubernetes version information: 1.23.5

  • Prometheus Operator Logs:

level=warn ts=2022-12-01T11:20:58.497128297Z caller=operator.go:2018 component=prometheusoperator msg="skipping servicemonitor" error="invalid scrapeInterval \"\": empty duration string" servicemonitor=monitoring-grafana namespace=namespace prometheus=monitoring-prometheus
@simonpasquier
Copy link
Contributor

Have you upgraded the prometheus-operator CRDs to the same v0.61.1 version?

@slashpai
Copy link
Contributor

slashpai commented Dec 2, 2022

We had removed some of the default values and validations in operator code in v0.60.1 since they are already covered in OpenAPI. So like Simon mentioned you would need to update CRD's to 0.61.0 to make it work.

@bdbrink
Copy link

bdbrink commented Dec 13, 2022

seeing the same issue after updating CRDs + prometheus-operator to v0.61.1

EDIT: Fixed after updating CRDs and Prometheus-operator, had to restart the operator after applying the CRDs, then restart prometheus to get it working.

@billiford
Copy link

I was still having this problem with the kube-prometheus-stack, so I wanted to share how I debugged and fixed it.

I had gotten the stack running in one cluster, but not another, so I compared the config.file contents which according to kubectl -n monitoring describe po prometheus-kube-prometheus-stack-prometheus-0 was held in the volume config with secret name prometheus-kube-prometheus-stack-prometheus.

I output each secrets data context of prometheus.yaml.gz to a file.

$ echo '<SECRET_CONTENT>' | base64 -d | gunzip > /tmp/config-<CLUSTER_NAME>.yaml

Then performed a diff.

$ diff /tmp/config-<WORKING_CLUSTER>.yaml /tmp/config-<BROKEN_CLUSTER>.yaml
2,3c2,3
<   evaluation_interval: 30s
<   scrape_interval: 30s
---
>   evaluation_interval: ""
>   scrape_interval: ""

Sure enough, the evaluation interval and scrape intervals were not being set on lines 2 and 3!

To fix, I set them explicitly, redeployed, and bounced the prometheus pod.

prometheus:
  prometheusSpec:
    scrapeInterval: 30s
    evaluationInterval: 30s

@simonpasquier
Copy link
Contributor

@billiford it's very likely that you have a difference between the version of the CRDs and the operator version (e.g. operator version > CRD version).

@billiford
Copy link

I deployed all the CRDs to both clusters that I found here.

It would be nice to know which CRD specifically is the root of this problem and why it is causing these intervals to not be set.

@simonpasquier
Copy link
Contributor

It is the Prometheus CRD. You need to check that spec.scrapeInterval and spec.evaluationInterval return with default values if unset.

@darkpwny
Copy link

darkpwny commented Feb 9, 2023

The bug I think is that the scrapeInterval and evaluationInterval are defined in the values.yaml as empty strings.

line 2624 : scrapeInterval: ""
line 2632 : evaluationInterval: ""

These either need to be commented out so the defaults get inserted or set to the default value "30s".

I edited my kube-prometheus-stack\values.yaml so values where

line 2624 : scrapeInterval: "30s"
line 2632 : evaluationInterval: "30s"

and then installed via helm:

helm install promstack --namespace monitoring -f --create-namespace kube-prometheus-stack/values.yaml ./kube-prometheus-stack

@bvanelli
Copy link

As others stated, the CRDs were incompatible and I was also getting this error message. Following the CRD upgrade solved the problem for me. I was installing chart version 45.X.X, so the following CRDs were applicable:

kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.63.0/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml

After installing them, the problem went away. See more on the documentation: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#upgrading-an-existing-release-to-a-new-major-version

@pankdhnd
Copy link
Author

The issue was resolved after CRD update. Thanks everyone for the help :-)

@sfxworks
Copy link

https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L2815 it's still in the default values yaml

@dauberson
Copy link

dauberson commented Oct 19, 2023

I had the same issue and the problem was the spec.scrapeInterval and spec.evaluationInterval values, it was empty.
I am using Terraform to provide the resources, eks, charts, etc... And I change manually the specs values, and somehow it rolls back to the old values, to fix it, I copy this file and set values in terraform helm_release:

resource "helm_release" "chart_prometheus" {
  name       = "kube-prometheus-stack"
  chart      = "kube-prometheus-stack"
  version    = "51.9.4"
  repository = "https://prometheus-community.github.io/helm-charts"
  namespace  = "monitoring"


  values = compact(distinct(concat([
    file("${path.module}/configs/prometheus-values.yaml"),
  ])))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants