Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway liveness and readiness probes failing [AKS] #1110

Closed
goncalo-oliveira opened this issue Jun 6, 2023 · 5 comments
Closed

Gateway liveness and readiness probes failing [AKS] #1110

goncalo-oliveira opened this issue Jun 6, 2023 · 5 comments

Comments

@goncalo-oliveira
Copy link

This issue is occurring on an "old" cluster that was upgraded recently (on Azure) to v1.26, but the same happens on a clean install.

The gateway keeps crashing and never starts. Describing the pod or looking at the logs, it seems like the liveness and readiness probes are failing. All other components seem to be up and running.

$ kubectl -n openfaas describe pod gateway-65f6b959c5-8st67

kubelet            Started container gateway
kubelet            Back-off restarting failed container operator in pod gateway-65f6b959c5-8st67_openfaas(62b6f3e8-7d0b-423d-a38b-f4d90e2efb0a)
kubelet            Readiness probe failed: HTTP probe failed with statuscode: 502
kubelet            Back-off restarting failed container gateway in pod gateway-65f6b959c5-8st67_openfaas(62b6f3e8-7d0b-423d-a38b-f4d90e2efb0a)

$ kubectl -n openfaas logs gateway-65f6b959c5-8st67
12:45:30 Forwarded [GET] to /healthz - [502] - 0.0002s
12:45:30 error with upstream request to: /healthz, Get "http://127.0.0.1:8081/healthz": dial tcp 127.0.0.1:8081: connect: connection refused

Expected Behaviour

The gateway pods should be up and running.

Who is this for?

Fonix Telematics, yes, listed in the adopters list.

Steps to Reproduce (for bugs)

  1. Provision a new AKS cluster
  2. Install Nginx Ingress (don't think it's relevant)
  3. Install Cert-Manager (also don't think it's relevant)
  4. Install OpenFaaS

OpenFaaS was installed with arkade, using the following arkade

$ arkade install openfaas --clusterrole --operator --ingress-operator

$ helm -n openfaas list
NAME    	NAMESPACE	REVISION	UPDATED                              	STATUS  	CHART          	APP VERSION
openfaas	openfaas 	1       	2023-06-06 13:25:56.603242 +0100 WEST	deployed	openfaas-13.0.0	  

Your Environment

  • AKS Kubernetes 1.26.3

  • OpenFaaS Helm Chart 13.0.0

Other stuff

There's a published issue on AKS 1.22 and 1.23 here. Other related and maybe relevant issue [here]. This might have nothing to do with it.

Both clusters are still live, if there's any other information that I can provide, just ask.

@alexellis
Copy link
Member

Hey @goncalo-oliveira

You need to run through the troubleshooting guide to find out what's wrong with the faasnetes container.

https://docs.openfaas.com/deployment/troubleshooting/

I also have a good post on my blog about this:

https://blog.alexellis.io/troubleshooting-on-kubernetes/

Alex

@goncalo-oliveira
Copy link
Author

Hi @alexellis,

I've gone through your guides, which cover mostly what I already checked, thanks for that, but I still can't say why it's failing, other than the probes are failing. Here's the gateway logs

$ kubectl -n openfaas logs deploy/gateway 
Found 2 pods, using pod/gateway-7684989679-qkmfl
Defaulted container "gateway" out of: gateway, operator
OpenFaaS Gateway - Community Edition (CE)

Version: 0.26.3 Commit: a128df471f406690b1021a32317340b29689c315
Timeouts: read=1m5s	write=1m5s	upstream=1m0s
Function provider: http://127.0.0.1:8081/

2023/06/06 13:19:29 Async enabled: Using NATS Streaming
2023/06/06 13:19:29 Deprecation Notice: NATS Streaming is no longer maintained and won't receive updates from June 2023
2023/06/06 13:19:29 Opening connection to nats://nats.openfaas.svc.cluster.local:4222
2023/06/06 13:19:29 Connect: nats://nats.openfaas.svc.cluster.local:4222
2023/06/06 13:19:34 Get "http://127.0.0.1:8081/system/namespaces": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:34 Get "http://127.0.0.1:8081/system/functions?namespace=openfaas-fn": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:39 Get "http://127.0.0.1:8081/system/namespaces": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:39 Get "http://127.0.0.1:8081/system/functions?namespace=openfaas-fn": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:44 Get "http://127.0.0.1:8081/system/namespaces": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:44 Get "http://127.0.0.1:8081/system/functions?namespace=openfaas-fn": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:49 Get "http://127.0.0.1:8081/system/namespaces": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:49 Get "http://127.0.0.1:8081/system/functions?namespace=openfaas-fn": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:54 Get "http://127.0.0.1:8081/system/namespaces": dial tcp 127.0.0.1:8081: connect: connection refused
2023/06/06 13:19:54 Get "http://127.0.0.1:8081/system/functions?namespace=openfaas-fn": dial tcp 127.0.0.1:8081: connect: connection refused

The events tell me the same... liveness and readiness probes failing.

openfaas      13m         Warning   Unhealthy            pod/gateway-65f6b959c5-5jpwg                                Readiness probe failed: HTTP probe failed with statuscode: 502
openfaas      13m         Warning   Unhealthy            pod/gateway-65f6b959c5-5jpwg                                Liveness probe failed: HTTP probe failed with statuscode: 502

@alexellis
Copy link
Member

I'm so sorry for the confusion.

I asked you for the logs from faas-netes - not the gateway.

The troubleshooting guide gives you the exact commands, when you run them, you'll get more info.

image

Perhaps it was also unclear, but my troubleshooting guide tells you to get the events for the namespace.

Please can you do that too.

As a rule, we do not troubleshoot or debug the clusters of community edition (free) users. I'm doing this as an exception.

Alex

@goncalo-oliveira
Copy link
Author

Hi again @alexellis,

The confusion was mine, I misread your advise.

$ kubectl -n openfaas logs deploy/gateway -c operator  
Found 2 pods, using pod/gateway-7684989679-qkmfl
2023/06/06 14:02:52 A license is required via --license or --license-file

Now this was unexpected, maybe it was a change that I missed in the release notes. Is a license file required for CE?

@alexellis
Copy link
Member

alexellis commented Jun 6, 2023

I'll get this issue closed as there is no problem that I can see above.

No additional license is required for CE, however you should consult the docs for what's available for free users vs paid:

  1. https://docs.openfaas.com/openfaas-pro/introduction/#comparison
  2. https://www.openfaas.com/pricing/

You should convert that installation to use the controller mode, no custom resource is available for the community edition.

Or you can revert to the prior version that you were using previously - but beware that the base image and some libraries used contain high severity CVEs.

Of course, you know that we are always here, if you want to talk about options for OpenFaaS Standard for your use within your business.

Alex

@openfaas openfaas locked as resolved and limited conversation to collaborators Jun 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants