Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upKubeApi servers are in "state unknown" mostly #2543
Comments
This comment has been minimized.
This comment has been minimized.
|
Logs would help here. Do you see anything out of place there? |
This comment has been minimized.
This comment has been minimized.
|
The only thing I can get with my limited knowledge is:
Is there somewhere else I need to check for logging? |
This comment has been minimized.
This comment has been minimized.
|
Hmm, this could happen if the storage is throttled and stops ingesting metrics. But the logs will clearly mention this. These logs are the normal ones. Can you pastebin all the logs since the beginning? I am not sure about their usefulness though as the checkpoint time is too low for any significant amount of metrics. What is the current RAM usage of the server and did you put an upper limit on the RAM? |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
|
I have created a namespace in kubernetes and limited to 2GB Ram. Is this metrics on kubernetes-api server are enabled by default? Do you have any idea? |
This comment has been minimized.
This comment has been minimized.
|
Your logs actually look ok, I'm wondering since you said it's a HA 3 masters cluster, shouldn't there be 3 targets? Can you show us the content of kubectl get endpoints/kubernetes -oyamlAnd kubectl -n kube-system get pods
Yes the metrics are exposed by default. |
This comment has been minimized.
This comment has been minimized.
the ip section is changing in multiple requests. This is pretty interesting for me too. I was expecting 3 ip at the same time. here is the relevant pods part
I had just added some parameters to kube-api server about cronjobs. That is why they have just started. |
This comment has been minimized.
This comment has been minimized.
|
I just learned that this is a known issue, where the apiservers are racing against each other and keep replacing the IP. |
This comment has been minimized.
This comment has been minimized.
|
Would you think that an update on k8s cluster for latest 1.5.x can help? |
This comment has been minimized.
This comment has been minimized.
|
This issue seems to arise, when the |
This comment has been minimized.
This comment has been minimized.
|
So my issue is actually not related to constantly being updated api endpoints. It seems I have another issue. |
This comment has been minimized.
This comment has been minimized.
|
Can you elaborate what you are seeing? |
This comment has been minimized.
This comment has been minimized.
|
Sorry for not being clear. I thought if kubernetes/kubernetes#22609 is still open, too many cluster must suffer from same problem. But given the fact that there is not so much noise about this issue, not scraping kube api servers must be related to something else. Do you think that my issue is related to kubernetes/kubernetes#22609 ? |
This comment has been minimized.
This comment has been minimized.
|
Gentle ping @brancz |
This comment has been minimized.
This comment has been minimized.
|
Can you provide the flags you use to start the apiserver? Then we should be able to find out quickly. |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
|
@alexsomesan @s-urbaniak can either of you comment that what we are seeing could be related to the |
This comment has been minimized.
This comment has been minimized.
|
from my local box I can successfully see metrics https://kubernetes.io/docs/concepts/cluster-administration/access-cluster/
|
This comment has been minimized.
This comment has been minimized.
|
tried latest |
This comment has been minimized.
This comment has been minimized.
|
@brancz I think that this is bug on prometheus side. :( Can I debug more verbose? |
This comment has been minimized.
This comment has been minimized.
|
I added
|
This comment has been minimized.
This comment has been minimized.
|
I have also added @brancz Don't you think that this is a bug of prometheus? |
cemo
changed the title
KubeApi servers are in state unknown mostly
KubeApi servers are in `state unknown` mostly
Apr 4, 2017
cemo
changed the title
KubeApi servers are in `state unknown` mostly
KubeApi servers are in "state unknown" mostly
Apr 4, 2017
This comment has been minimized.
This comment has been minimized.
|
As Prometheus takes it from the |
This comment has been minimized.
This comment has been minimized.
|
Great explanation. Do you think that a workaround can be provided? Is it possible to skipping discovery and statically scrapping api servers like this?
I changed to this but this is not working as well. I don't know where is the configuration mistake right now. |
This comment has been minimized.
This comment has been minimized.
Removed This configuration is working right now. I can successfully see |
This comment has been minimized.
This comment has been minimized.
s-urbaniak
commented
Apr 4, 2017
|
Hey, we just tried it locally and can reproduce the issue. Without setting When having the Having said that
[1] kubernetes/kubernetes#22609 |
This comment has been minimized.
This comment has been minimized.
|
thanks @s-urbaniak
|
This comment has been minimized.
This comment has been minimized.
|
Static scraping is totally an option, if you know they won't change I wouldn't recommend removing the relabeling rules if you are basing your configuration on the example config and are not too familiar with Prometheus. |
This comment has been minimized.
This comment has been minimized.
gianrubio
commented
May 24, 2017
•
|
Today I have the same issue. I have 2 replicas (v1.6.2) running in different nodes I was reading this thread and accidentally I restarted the apiserver. After that the stucked prometheus starting scrapping again. I guess this is not only related to ps. I just have 1 apiserver replica Logs when the instance was stucked
|
brian-brazil
added
the
component/service discovery
label
Jul 7, 2017
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil, @brancz reading the discussion this seems to be a problem on the k8s side. |
This comment has been minimized.
This comment has been minimized.
|
We haven't had any recent reports, so that sounds sane. |
brian-brazil
closed this
Feb 17, 2018
This comment has been minimized.
This comment has been minimized.
|
For anyone wondering, the correct way to solve this is to enable the “lease” Endpoints reconciler on your Kubernetes API server. Then everything will work in Prometheus as expected. |
This comment has been minimized.
This comment has been minimized.
|
The lease Endpoints reconciler is available in Kubernetes 1.9 as alpha, meaning it must be explicitly enabled. |
This comment has been minimized.
This comment has been minimized.
kinghrothgar
commented
Mar 6, 2018
•
|
I am on GKE and thus cannot use features until they are beta. As far as I can tell the lease endpoints fix won't be beta until 1.11 which is quite a ways away from being released or hitting GKE. Are there no solutions for this except for this alpha feature? |
This comment has been minimized.
This comment has been minimized.
|
Unless you have a single master, this problem will persist until it's available as beta in GKE unfortunately for those users. I know that the requirements for beta are being worked on, but are not landing in 1.10, they're targeted for 1.11. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
cemo commentedMar 28, 2017
What did you do?
I am trying to get used prometheus with k8s and I have successfully run some exporters. However I have some problems for API server. In
last scrapeI see almost everytimeneverfor API servers:I also saw very rarely some successful scraps too:
What did you expect to see?
always
UPWhat did you see instead? Under which circumstances?
UNKNOWNEnvironment
k8s 1.5.2
HA 3 master
CoreOS
Prometheus version:
1.5.2
Prometheus configuration file:
Any reason why?