Prometheus pod cannot find prometheus.env.yaml on start
#3061
Comments
|
I am just beginning to learn. Please give me your advice. |
|
I am seeing a similar issue with an installation of |
This issue has not been resolved, but it does not affect the normal use of the service.This may be a configuration BUG and I hope this problem can be fixed later. |
|
This is a race that between the file being provisioned on disk by the sidecar and Prometheus starting, however, Prometheus just tries again and once the file is there it starts just fine. This is primarily a beauty mark, things still work as expected. If someone wants to work on removing this through we'd be more than happy to review! :) |
|
Thanks @brancz ! Definitely wasn't reading the pod state correctly and the service did start properly after the sidecar generated the configuration file. First thought off the top of my head is to exponentially back off for up to 30 seconds if the file doesn't exist, possibly with an additional command line option. Would this be an acceptable solution or is there a better alternative? Willing to try to code it up. |
That's a good idea! |
|
If we can find something reasonably small then something minimal that just checks that the file is there and then starts the prometheus process would be best I feel. |
|
What would happen if the file isn't present? If we change the entry point to a script then something like this might work on busybox: while [ ! -f /etc/prometheus/config_out/prometheus.env.yaml ]; do
sleep 1
done
exec operator "$@"Is modifying the entry point a stability concern? |
|
another approach would be to have an init-container that generates that file the first time, so that when the operator starts the file is there for sure. |
|
I think that is a great solution! I believe the |
|
Modifying the config reloader to be a one-off instead of long running for the init container sounds like a fantastic idea! |
yeah before leaving my previous comment I went looking for that source code but my go is weak and I wouldn't know how to make that happen without directions. |
|
The reloader is here: https://github.com/coreos/prometheus-operator/blob/master/cmd/prometheus-config-reloader/main.go And I would suggest we add a new flag, something along the lines of |
|
This issue has been automatically marked as stale because it has not had any activity in last 60d. Thank you for your contributions. |
prometheus.env.yaml on start
Please help me to check this error, thank you!
Below is the description of my pod
[root@k8s-master-1 alertmanager]# kubectl describe pod prometheus-prometheus-operator-158278-prometheus-0
Name: prometheus-prometheus-operator-158278-prometheus-0
Namespace: default
Priority: 0
Node: k8s-node-2/172.29.1.102
Start Time: Mon, 02 Mar 2020 16:30:42 +0800
Labels: app=prometheus
controller-revision-hash=prometheus-prometheus-operator-158278-prometheus-674c57cc8c
prometheus=prometheus-operator-158278-prometheus
statefulset.kubernetes.io/pod-name=prometheus-prometheus-operator-158278-prometheus-0
Annotations: cni.projectcalico.org/podIP: 10.100.140.127/32
Status: Running
IP: 10.100.140.127
IPs:
IP: 10.100.140.127
Controlled By: StatefulSet/prometheus-prometheus-operator-158278-prometheus
Containers:
prometheus:
Container ID: docker://bbec5226b9c3610a265a5cee046f76044e5ff26149ad5079f774f2a49f46f1e1
Image: quay.io/prometheus/prometheus:v2.15.2
Image ID: docker-pullable://quay.io/prometheus/prometheus@sha256:914525123cf76a15a6aaeac069fcb445ce8fb125113d1bc5b15854bc1e8b6353
Port: 9090/TCP
Host Port: 0/TCP
Args:
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--storage.tsdb.path=/prometheus
--storage.tsdb.retention.time=10d
--web.enable-lifecycle
--storage.tsdb.no-lockfile
--web.external-url=http://prometheus-operator-158278-prometheus.default:9090
--web.route-prefix=/
State: Running
Started: Mon, 02 Mar 2020 16:30:44 +0800
Last State: Terminated
Reason: Error
Message: caller=main.go:648 msg="Starting TSDB ..."
level=info ts=2020-03-02T08:30:43.925Z caller=web.go:506 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-03-02T08:30:43.928Z caller=head.go:584 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-03-02T08:30:43.929Z caller=head.go:632 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:663 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:664 msg="TSDB started"
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:734 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:517 msg="Stopping scrape discovery manager..."
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:531 msg="Stopping notify discovery manager..."
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:553 msg="Stopping scrape manager..."
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:527 msg="Notify discovery manager stopped"
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:513 msg="Scrape discovery manager stopped"
level=info ts=2020-03-02T08:30:43.930Z caller=manager.go:814 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-03-02T08:30:43.930Z caller=manager.go:820 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-03-02T08:30:43.930Z caller=main.go:547 msg="Scrape manager stopped"
level=info ts=2020-03-02T08:30:43.933Z caller=notifier.go:598 component=notifier msg="Stopping notification manager..."
level=info ts=2020-03-02T08:30:43.933Z caller=main.go:718 msg="Notifier manager stopped"
level=error ts=2020-03-02T08:30:43.933Z caller=main.go:727 err="error loading config from "/etc/prometheus/config_out/prometheus.env.yaml": couldn't load configuration (--config.file="/etc/prometheus/config_out/prometheus.env.yaml"): open /etc/prometheus/config_out/prometheus.env.yaml: no such file or directory"
prometheus-config-reloader:
Container ID: docker://b1b8e07d22b7bb07864eaad0f45707bf0426e879c31c14f821032a287871cbcf
Image: quay.io/coreos/prometheus-config-reloader:v0.35.0
Image ID: docker-pullable://quay.io/coreos/prometheus-config-reloader@sha256:b75b9b60e6bc7a256b37c66ffef8db074983e800e0f710336d48484f55d51659
Port:
Host Port:
Command:
/bin/prometheus-config-reloader
Args:
--log-format=logfmt
--reload-url=http://127.0.0.1:9090/-/reload
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
State: Running
Started: Mon, 02 Mar 2020 16:30:44 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 25Mi
Requests:
cpu: 100m
memory: 25Mi
Environment:
POD_NAME: prometheus-prometheus-operator-158278-prometheus-0 (v1:metadata.name)
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-operator-158278-prometheus-token-57q88 (ro)
rules-configmap-reloader:
Container ID: docker://351374ae790979c0dfc6758f8b6ca321df47f6143dce250fbb3590606867ee85
Image: quay.io/coreos/configmap-reload:v0.0.1
Image ID: docker-pullable://quay.io/coreos/configmap-reload@sha256:e2fd60ff0ae4500a75b80ebaa30e0e7deba9ad107833e8ca53f0047c42c5a057
Port:
Host Port:
Args:
--webhook-url=http://127.0.0.1:9090/-/reload
--volume-dir=/etc/prometheus/rules/prometheus-prometheus-operator-158278-prometheus-rulefiles-0
State: Running
Started: Mon, 02 Mar 2020 16:30:44 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 25Mi
Requests:
cpu: 100m
memory: 25Mi
Environment:
Mounts:
/etc/prometheus/rules/prometheus-prometheus-operator-158278-prometheus-rulefiles-0 from prometheus-prometheus-operator-158278-prometheus-rulefiles-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-operator-158278-prometheus-token-57q88 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-prometheus-operator-158278-prometheus
Optional: false
tls-assets:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-prometheus-operator-158278-prometheus-tls-assets
Optional: false
config-out:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
prometheus-prometheus-operator-158278-prometheus-rulefiles-0:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-prometheus-operator-158278-prometheus-rulefiles-0
Optional: false
secret-etcd-certs:
Type: Secret (a volume populated by a Secret)
SecretName: etcd-certs
Optional: false
prometheus-prometheus-operator-158278-prometheus-db:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
prometheus-operator-158278-prometheus-token-57q88:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-operator-158278-prometheus-token-57q88
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
The text was updated successfully, but these errors were encountered: