New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not obviously serious issue about CPU, nginx-ingress http://IP:10254/healthz fails constantly #4735
Comments
Another logs from crash:
It can be useful |
This is not a crash. The kubelet is instructing the pod to terminate. |
From the screenshot, it seems the kubelet cannot reach the ingress controller pod |
When it is happening ALL ingress don't work. So users can't connect to ALL http(s) services in k8s. We can call it bug then :)
Any tips how to debug it further? What can I do with it? |
To confirm this is not an issue with the ingress controller itself, please remove the probes in the deployment. The ingress controller should not restart anymore
SSH to the node where the pod is running and try to execute curl http://:10254/healthz |
Do you mean
I don't see such option https://github.com/helm/charts/blob/master/stable/nginx-ingress/values.yaml#L179 How can I do it? |
SSH to the Kubernetes node where the ingress controller pod is running.
Editing the deployment (kubectl edit deployment) without using helm? |
For people who have similar issues:
Edit deployment to remove livenessProbe: In my case I have to also turn off sync in GitOps, because otherwise it will detect difference and overwrite probs. |
Ha I found when it is happening!
after a while there is no connection to nginx-ingress controller.
I have 1 URL which run process working about 20-30 minutes. Before it finish it always timeout in web browser. But it is not a problem for user. This link is to manually run synchronisation. So I think nginx-igress controller stop working just after somebody open this link. It is what I discovered by experimenting manually when it is happening.
So to summary up:
How to debug it further? How to fix it? Hmm alternately can it be because |
I can confirm after change CPU I think it should be bolded warning in doc helm nginx-ingress installation and all places about that. If processes will take 100% CPU, then nginx-ingress-controller will go down, which means nobody will be able to connect with your http(s) ingresses and all your systems Default instalation doesn't While it is not a bug, it is serious issue. I took about half a year to figure it out and nobody really knew why it was happening. It is not my first issue reported about it. |
@kwladyka what are you expecting exactly? From your previous comment, this seems an issue without the limits in your cluster, there is nothing related to the ingress controller itself.
This is on purpose. You should set limits only when you get enough information about the traffic you need to handle, not before.
The same thing happens to any other pod running without limits or using the BestEffort QoS class. Also, maybe your nodes are small for the things you are running? |
So while nginx-ingress-controller fail, because of this, then other pods fail because can't connect to foo.bar.pl so all services start to fail and restart in loop and it turns really fast into chaos.
It is very not obviously, because all processes work like they should. Only nginx-ingress-controller fail. At least in my case. So it is not happening with other processes in k8s.
The issue is not about nginx-ingress-controller When CPU is consumed in 100% on node, then nginx-ingress-controller immediatelly fail, because it doesn't have So for me it sounds like everybody should have CPU |
Note: maybe using LimitRange could be possible to define the minimum requirements to avoid the 100% CPU problem https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-constraint-namespace/#create-a-limitrange-and-a-pod |
TL;DR;
jump to #4735 (comment)
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG
Kubernetes version (use
kubectl version
):Environment:
GCP Kubernetes Engines
What happened:
Nginx restart in loop. It always fix itself after 20-30 minutes itself.
What you expected to happen:
Not restart / crash.
How to reproduce it (as minimally and precisely as possible):
I don't know. I only guess it can be about https://argoproj.github.io/argo-cd/ maybe? While both in the same cluster nginx-ingress crash constantly? Blind guess. But probably it is configuration
values.yaml
file. Hard to say. I need it to usenginx.ingress.kubernetes.io/whitelist-source-range
.It always happen when downgrade / upgrade nodes. So cluster have to re-run everything on new machines. And from time to time randomly.
My helm values.yaml
I use Google Cloud Engine.
The text was updated successfully, but these errors were encountered: