Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upK8s SD: Targets are not updated #4124
Comments
This comment has been minimized.
This comment has been minimized.
|
This issue is seen in production at SoundCloud. I'm still working on reproducing it with a minimal setup. In particular, this issue can not be reproduced with the following setup (cf. #4117 (comment) ): @krasi-georgiev wrote: "using the configs below seems to work as expected with minikube.
|
This was referenced Apr 30, 2018
This comment has been minimized.
This comment has been minimized.
|
Here is the job config that creates the problem. Note that the Prometheus server is deployed on bare-metal outside of the K8s cluster. The K8s API server is v1.10.1. The very same config works on Prometheus 1.8.2 (and IIRC on 2.0.0, although I might need to double-check that). - job_name: myjob
kubernetes_sd_configs:
- api_server: https://api.mycluster.k8s.example.org
role: pod
tls_config:
ca_file: "/srv/prometheus/k8s-certificates/mycluster/ca.crt"
cert_file: "/srv/prometheus/k8s-certificates/mycluster/client.crt"
key_file: "/srv/prometheus/k8s-certificates/mycluster/client.key"
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_annotation_prometheus_io_port
regex: ".+;(?:[0-9]+,?)+|;"
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_label_system
regex: ".+;myjob|;"
- source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
regex: "(https?)"
- source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
regex: "(.+)"
- source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
regex: "(.+?)(?::\\d+)?;(\\d+)"
replacement: "$1:$2"
- target_label: cluster
replacement: mycluster
- source_labels:
- __meta_kubernetes_pod_label_system
target_label: system
- source_labels:
- __meta_kubernetes_pod_label_env
target_label: env
- source_labels:
- __meta_kubernetes_pod_label_component
target_label: component
- source_labels:
- __meta_kubernetes_pod_label_version
target_label: version
- source_labels:
- __meta_kubernetes_pod_label_track
target_label: track
- source_labels:
- __meta_kubernetes_pod_name
target_label: instance
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace |
beorn7
added
kind/bug
component/service discovery
labels
Apr 30, 2018
This comment has been minimized.
This comment has been minimized.
|
My general thoughts:
|
This comment has been minimized.
This comment has been minimized.
|
my Prometheus is running on the local machine(bare metal) and the k8s minikube runs in a VM so it is the same as your setup. |
This comment has been minimized.
This comment has been minimized.
|
here is the minimal deployment that works with this config, but still works as expected on my setup so will keep digging.
|
This comment has been minimized.
This comment has been minimized.
|
Just to update current state: This only seems to happen with large K8s clusters. Initial full load works (i.e. after starting Prometheus or after SIGHUP, all is good), but later changes are (partially?) lost. We couldn't recreate this in a smaller setup. It would be interesting to know if other users of large clusters (~10k pods or more) see similar problems. I will be off work for the summer, so I cannot work on this from inside SoundCloud. |
beorn7
added
the
priority/P1
label
May 24, 2018
This comment has been minimized.
This comment has been minimized.
|
This is a total blocker for anything involving K8s SD for us. I mark it "only" as P1 assuming that only large-scale K8s users are affected (although those might be the most important users ;-). On the other hand, as long as nobody else sees the same, this might be some weird cornercase we have accidentally created at SoundCloud. FTR: Prometheus 2.0.0 works just fine in this regard. |
This comment has been minimized.
This comment has been minimized.
|
@beorn7 is there anything interesting if you look at the kubernetes_sd metrics? Do the |
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier hmm, I sometimes see the missing targets appearing, and then I see "add" events. but only after the pods have been up for quite a while. Interestingly, we removed some jobs from the server yesterday, and now even the job that was most problematic in terms of SD seems fine. But I'll wait a bit longer before making a call. It just adds to the idea that the SD starts to become flaky from a certain update load on. |
This comment has been minimized.
This comment has been minimized.
|
Can it be related to broken watch calls in kube client? kubernetes/kubernetes#50102 |
This comment has been minimized.
This comment has been minimized.
|
Update (sorry for still being very high level here, no time to do a proper analysis): Dimensions that changed a week ago (before/after): |
This comment has been minimized.
This comment has been minimized.
|
In different news: I'll be off work for the next three months. While I plan some Prometheus project work, I will focus on those parts that are not so much related to my day-job. So I definitely won't debug production issues at SoundCloud. @grobie is probably the best SoundClouder to take it from here, but he's probably even busier than I have been. |
This comment has been minimized.
This comment has been minimized.
sta-szek
commented
Jun 7, 2018
•
|
Hi, I have similar issue, when I create / delete pods, prometheus still keeps old ones. I have:
and it takes about 3h 25m for prometheus to update targets. I have prometheus deployed to k8s instance using helm chart (from official repository). This is misleading as we can see that we have 0 healthy pods in grafana. Any ideas what can I check? |
This comment has been minimized.
This comment has been minimized.
|
@sta-szek is it the same as @beorn7 reported that reducing the total number of pods fixes the problem? At the moment without a way to reproduce this the only real clue is this bug |
This comment has been minimized.
This comment has been minimized.
sta-szek
commented
Jun 7, 2018
|
Unfortunately I cannot reduce number of pods, but this:
solves the issue. A little strange workaround. |
This comment has been minimized.
This comment has been minimized.
|
reloading touches many different components so I expect it should have some side effects. You can try as a temporary workaround, but we want to find the real culprit. Can you think of any way to help us replicate this? |
This comment has been minimized.
This comment has been minimized.
|
@sta-szek "203 pods across 16 nodes" is a scale that the k8s SD should handle. Can you share your Prometheus configuration file? Which version do you run? It would be helpful also if you could share the same metrics as in #4124 (comment):
|
This comment has been minimized.
This comment has been minimized.
sta-szek
commented
Jun 11, 2018
|
Prometheus info
What I can say more is that we perform deployments using helm tool (2.8.2). BTW reloading works for us, for now. No side effects. |
This comment has been minimized.
This comment has been minimized.
|
@sta-szek thanks for the info! I made a typo and meant You may want to try with the latest version of Prometheus (v2.3.0) because it includes an update of the k8s client that could help with your issue (although it didn't work for Björn). |
This comment has been minimized.
This comment has been minimized.
sta-szek
commented
Jun 11, 2018
|
Thanks
I create issue for update and let you know if that helped us or not. |
krasi-georgiev
referenced this issue
Jun 21, 2018
Closed
Failures to refresh targets list using ec2_sd_configs #3664
This comment has been minimized.
This comment has been minimized.
cwolfinger
commented
Jul 18, 2018
|
Does anyone think there is an issue in just issuing a reload every 30 seconds or so ? I have noticed numerous times where the K8s SD discovery misses pods. |
This comment has been minimized.
This comment has been minimized.
This would be very expensive on big k8s clusters (tens of thousands of pods). |
This comment has been minimized.
This comment has been minimized.
|
After updating another two dozen of our servers to v2.3.2 (from v2.0.0) we're experiencing this issue every day, on some servers with every deploy. There is a high correlation between number of kubernetes_sd_config jobs per server and the likelihood of the discovery being stuck. We haven't seen the issue on any server with a single kubernetes_sd_config job so far, it's working without issues. On the other hand, on one of our servers with 14 kubernetes_sd_config jobs we can reliably reproduce the issue every time we change a single pod. |
This comment has been minimized.
This comment has been minimized.
|
To narrow things down a bit, if you were to start 14 Prometheus servers with one sd each then how does the problem manifest (if at all)? |
This comment has been minimized.
This comment has been minimized.
|
@grobie are the 14 jobs very similar? or do they select different namespaces? |
This comment has been minimized.
This comment has been minimized.
|
All 14 jobs use identical |
This comment has been minimized.
This comment has been minimized.
sta-szek
commented
Jul 19, 2018
|
just an update, |
This comment has been minimized.
This comment has been minimized.
|
Interesting, I've got #3912 to avoid creating multiple SD mechanisms when they are identical. Maybe it can help there? |
This comment has been minimized.
This comment has been minimized.
|
It sounds like there's a deeper bug here, though that may mitigate.
…On Thu 19 Jul 2018, 16:34 Simon Pasquier, ***@***.***> wrote:
Interesting, I've got #3912
<#3912> to avoid creating
multiple SD mechanisms when they are identical. Maybe it can help there?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4124 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyTdi_wSD2WmhK1NGwfWXYajg-ksKhXks5uIJj1gaJpZM4TsivI>
.
|
This comment has been minimized.
This comment has been minimized.
|
I'll be gone for the next few days, but will try to find some time to
investigate this further mid/end next week.
On Fri, Jul 20, 2018 at 12:01 AM Brian Brazil <notifications@github.com>
wrote:
… It sounds like there's a deeper bug here, though that may mitigate.
On Thu 19 Jul 2018, 16:34 Simon Pasquier, ***@***.***>
wrote:
> Interesting, I've got #3912
> <#3912> to avoid creating
> multiple SD mechanisms when they are identical. Maybe it can help there?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <
#4124 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AGyTdi_wSD2WmhK1NGwfWXYajg-ksKhXks5uIJj1gaJpZM4TsivI
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4124 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAANaDE4PUPC3b2tRUpJLJuuHSi5HItqks5uIQGdgaJpZM4TsivI>
.
|
This comment has been minimized.
This comment has been minimized.
|
Another thought: if you run two such Prometheus servers, started at
different times, do they both break at once? That'd differentiate between a
server and client issue.
…On Fri 20 Jul 2018, 00:08 Tobias Schmidt, ***@***.***> wrote:
I'll be gone for the next few days, but will try to find some time to
investigate this further mid/end next week.
On Fri, Jul 20, 2018 at 12:01 AM Brian Brazil ***@***.***>
wrote:
> It sounds like there's a deeper bug here, though that may mitigate.
>
> On Thu 19 Jul 2018, 16:34 Simon Pasquier, ***@***.***>
> wrote:
>
> > Interesting, I've got #3912
> > <#3912> to avoid creating
> > multiple SD mechanisms when they are identical. Maybe it can help
there?
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > <
>
#4124 (comment)
> >,
> > or mute the thread
> > <
>
https://github.com/notifications/unsubscribe-auth/AGyTdi_wSD2WmhK1NGwfWXYajg-ksKhXks5uIJj1gaJpZM4TsivI
> >
> > .
> >
>
> —
> You are receiving this because you were mentioned.
>
>
> Reply to this email directly, view it on GitHub
> <
#4124 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AAANaDE4PUPC3b2tRUpJLJuuHSi5HItqks5uIQGdgaJpZM4TsivI
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4124 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyTdkxlryfOS2EA0Ljb-P9C3q2-YnMmks5uIQNIgaJpZM4TsivI>
.
|
simonpasquier
referenced this issue
Aug 7, 2018
Closed
discovery.Manager.runUpdater() only needs to run once #4470
grobie
referenced this issue
Aug 9, 2018
Merged
discovery: coalesce identical SD configurations #3912
krasi-georgiev
referenced this issue
Aug 12, 2018
Closed
Update the Prom config to have 50 jobs #132
This comment has been minimized.
This comment has been minimized.
|
I think I may have finally been able to reproduce this to some degree. I created 200 Kubernetes SD entries in my config which discovers one service in a Kubernetes cluster and I scale the monitored service up and down a couple of times from 25 to 12 to 35 pods, and I can see that there are large amounts of timeouts, no route to host, etc. errors. I have yet to debug this more but here are the manifests I used to do this: At first sight this could be something about lock contention, but that's just intuition, we should investigate. |
This comment has been minimized.
This comment has been minimized.
|
Great I will deploy and troubleshoot. |
This comment has been minimized.
This comment has been minimized.
|
I deployed @brancz's configuration to my local machine (equivalent of |
This comment has been minimized.
This comment has been minimized.
|
@brancz in my case it gets stuck at some point when I change the replicas, but in 2-3mins latter it picks up again. Can you please confirm that in your case it stays stuck even when you leave it for 5-10min? The SD manager in prom is ok and the delay comes either from the k8s provider or k8s api server so will keep digging. |
This comment has been minimized.
This comment has been minimized.
|
I think the back-pressure could come just as easily from the SD infrastructure itself, only profiling will tell, but yes I did see it catch up eventually, but only when I actually stopped scaling up and down, the report from @grobie was in production system, that simply doesn't stop, so I can see how the lag would just become larger and larger and it would seem like it just never changes. |
This comment has been minimized.
This comment has been minimized.
|
after a long weekend of troubleshooting I finally figured it. When scaling down or removing targets the receiving channel is blocked untill all old scrape loops are stopped. Lines 197 to 208 in 8fbe1b5 this also blocks the receiving channel prometheus/discovery/manager.go Lines 165 to 171 in 8fbe1b5 and since we keep the prometheus/discovery/manager.go Lines 143 to 152 in 8fbe1b5 The second problem is that when we process the changes in all targetSets/Jobs we read all targets for all jobs and than try to process each one.The problem is that if in the middle of this processing we get new updates than we are already processing old entries so Prometheus need to run a new loop to stop/start new loops. I have managed to solve the first problem , but still trying few things for the second one. I will also look at the old implementation before the refactoring and why it didn't suffer from this problem. |
This was referenced Aug 20, 2018
krasi-georgiev
closed this
in
#4526
Sep 26, 2018
simonpasquier
referenced this issue
Oct 1, 2018
Open
Using k8s SD for alertmanager: doesn't cache address when API server goes away #4019
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 25, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
beorn7 commentedApr 30, 2018
•
edited
Bug Report
What did you do?
Scaled new K8s pods up, scaled old ones down.
What did you expect to see?
Old pods disappear from targets, new ones show up.
What did you see instead? Under which circumstances?
Prometheus doesn't see the new pods, tries to scrape the old ones.
Note that a proper target update can be triggered by sending a SIGHUP.
Environment
System information:
Linux 4.4.10+soundcloud x86_64
Prometheus version:
most recently investigated version is revision 2be543e (in the release-2.2 branch). However, the same happened in all 2.2 releases, most notably also before the grand K8s SD refactoring of #4117 .
Prometheus configuration file:
See the various comments below.