Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upKubernetes discovery misses some changes #1316
Comments
brian-brazil
added
the
bug
label
Jan 14, 2016
This comment has been minimized.
This comment has been minimized.
|
PS: I'm now serializing pushing components into the cluster, I'll report back if that makes things better. |
This comment has been minimized.
This comment has been minimized.
|
The It is possible there are races (however careful I was to try to avoid them) & I'll look again at that (other people's eyes also welcome to look if anyone has time!). |
This comment has been minimized.
This comment has been minimized.
|
BTW order should of course not matter in an environment like this. If you find serializing deploys works then will have to look for races even more closely. |
This comment has been minimized.
This comment has been minimized.
|
Ah I see you're using the 0.16.1 tag - there's a number of fixes made to master since then (including the error logging I mentioned above!). Can you try with master please? If you're using Docker (which I assume you are, deploying on to Kubernetes cluster) then you can use |
This comment has been minimized.
This comment has been minimized.
|
Because of the order of the tests, I know that it did successfully scrape the apiserver and node despite the failure to find this service. |
This comment has been minimized.
This comment has been minimized.
|
Hmm okay, let me try master. |
This comment has been minimized.
This comment has been minimized.
|
Update: unparallelizing the component deploy did not fix this. (However, as all components are deployed as Deployments or Daemonsets in our setup, the unparallelization just meant that the Kubernetes API objects were created serially, not the pod startup). |
This comment has been minimized.
This comment has been minimized.
|
Here's an error message from :master:
At the time of this message the cluster was in a state where some components were up, but specifically the DNS pods were still Pending, so I presume what times out is the DNS lookup. |
This comment has been minimized.
This comment has been minimized.
|
… and I give up for today because now it can't resolve the cluster relative names. I'll update here when I have more information. |
This comment has been minimized.
This comment has been minimized.
|
DNS is only required to resolve the kubernetes api server if you've specified it in the config using the cluster relative name ( Not much we can do about DNS resolution not working as this is reliant on glibc for resolution. Failures shouldn't be cached AFAIK. We have somehow separate out what is cluster set up problems & what is Prometheus problems I think. |
This comment has been minimized.
This comment has been minimized.
|
Yes, that's what I mean to do. I'll update once I have a clearer picture.
|
This comment has been minimized.
This comment has been minimized.
|
Thanks! |
This comment has been minimized.
This comment has been minimized.
|
tl;dr: I've found a time when this could happen & will look into a fix for it. Long story: In |
jimmidyson
referenced this issue
Jan 15, 2016
Merged
Kubernetes SD: Refactor to handle missing Kubernetes events #1318
fabxc
closed this
in
#1318
Jan 20, 2016
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
matthiasr commentedJan 14, 2016
When bringing up a local cluster for integration tests, sometimes Prometheus (using the example Kubernetes configuration) does not pick up a service that requests to be scraped using the
prometheus.io/scrapeannotation.In this situation, sending a SIGHUP to the Prometheus process seems to "fix" this:
Of note may be that during the bring-up of this cluster, several components are dumped in pretty much at once, in parallel, and in no particular order. Prometheus and the to-be-scraped HAProxy exporter may come up in either order or at the same time. Once in about 5 times it gets into the situation above, which may or may not be a race condition …
There's also a bunch of errors about being unable to query any masters, but they also happen when everything works out just fine.
@jimmidyson any idea what's up with that?