Skip to content
This repository has been archived by the owner on Aug 26, 2021. It is now read-only.

Can't seem to get the GCE LoadBalancer to work (502) #18

Closed
niclashedam opened this issue Aug 21, 2016 · 11 comments
Closed

Can't seem to get the GCE LoadBalancer to work (502) #18

niclashedam opened this issue Aug 21, 2016 · 11 comments

Comments

@niclashedam
Copy link

I only see error 502, no matter what I try.

My ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
spec:
  tls:
  - hosts:
    - redacted.io
    - www.redacted.io
    secretName: tls
  - hosts:
    - staging.redacted.io
    secretName: tls-staging
  rules:
  - host: staging.redacted.io
    http:
      paths:
      - path: /*
        backend:
          serviceName: web-http-staging
          servicePort: 80
  - host: redacted.io
    http:
      paths:
      - path: /*
        backend:
          serviceName: web-http
          servicePort: 80
  - host: www.redacted.io
    http:
      paths:
      - path: /*
        backend:
          serviceName: web-http
          servicePort: 80

And one of my services (The other looks exactly the same):

apiVersion: v1
kind: Service
metadata:
  name: web-http-staging
  labels:
    app: web
    tier: frontend
spec:
  type: NodePort
  ports:
    # the port that this service should serve on
  - name: http
    port: 80
    protocol: TCP
  - name: https
    port: 443
    protocol: TCP
  selector:
    app: web-http-staging
    tier: frontend

and lastly, my pod listens on 80 and 443. If I curl the internal service IP from a node, I get the correct response (200). Therefore the Load Balancer must fail.

The .well-known path is present in the GCE load balancer, and there are four backends -- and half of them has the health status 0/4, while the last two has 4/4. I have no idea why they are reporting has unhealthy, since none of my Pods have health checks and they are reporting 200, if you connect directly to the service.

Help is greatly appreciated. Thank you.

@simonswine
Copy link
Contributor

I think the problem is your app has to respond to GET / with a 200. That's the default behaviour for GCE's health checking. If you want another health check endpoint (e.g. /health), you can specify a ReadinessProbe for the pods behind the services web-http/web-http-staging.

Can you verify what exact services are the ones that fail the health check? (I assume web-http/web-http-staging are failing)

@niclashedam
Copy link
Author

Hi,

You are completely right. / returns a HTTP 302 to the HTTPS version, which for some reason doesn't work. The load balancer refuses connections to HTTPS. The certs are renewed by lego correctly, though. I'm going to investigate further and I'll be back.

@niclashedam
Copy link
Author

Ok, some of it works now
Google wanted this:

spec:
  tls:
  - secretName: tls-staging
    hosts:
    - staging.redacted.io  

instead of

spec:
  tls:
  - hosts:
    - staging.redacted.io
    secretName: tls-staging

Now I'm finding myself in a situation, where the HTTP backend is assigned one IP and the HTTPS backend is assigned another. That makes no sense to me.

@simonswine
Copy link
Contributor

I think the yaml line swap doesn't change anything. I think it's always a good idea to remove the ingress resource from your cluster and make sure that all related LB objects in GCE are removed. And then start a again...

@niclashedam
Copy link
Author

The swap fixed it though. Also, the Ingress works now by recreating it.

@jamesthompson
Copy link

Can confirm this works for me too! Massive thanks to you @niclashedam. I had so much trouble getting the https load balancing running thanks to my index session auth redirect to a login page. Once I'd changed that / path to a 200 returning handler - all was well.

@niclashedam
Copy link
Author

No problem! Glad I could help :-)

@mcwienczek
Copy link

mcwienczek commented May 22, 2017

@simonswine this is very good point that your service needs ALWAYS to respond 200 to kubernetes at / I think that's something that should be very clearly stated in documentation.

@mbaragiola
Copy link

mbaragiola commented Sep 8, 2017

Hello, I seem be to having the same problem.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: django
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
spec:
  tls:
  - secretName: django-tls
    hosts:
      - api.example.com
      - example.com
      - www.example.com
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: django
          servicePort: 80
  - host: example.com
    http:
      paths:
      - path: /.well-known/acme-challenge/*
        backend:
          serviceName: django
          servicePort: 80
  - host: www.example.com
    http:
      paths:
      - path: /.well-known/acme-challenge/*
        backend:
          serviceName: django
          servicePort: 80

GET / returns HTTP 200 OK but still have defined my own livenessProbe and readinessProbe to /healthz:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: django
spec:
  replicas: 1
  progressDeadlineSeconds: 600
  minReadySeconds: 15
  revisionHistoryLimit: 5
  template:
    metadata:
      labels:
        app: django
        tier: midend
    spec:
      securityContext:
        runAsUser: 999
        fsGroup: 999
      restartPolicy: Always
      containers:
      - name: django
        image: gcr.io/project-id/django:v1beta3
        imagePullPolicy: "IfNotPresent"
        command: ["gunicorn", "config.wsgi:application", "-b", "0.0.0.0:5000", "-w", "4", "--chdir=/app"]
        livenessProbe:
          httpGet:
            path: /healthz
            port: 5000
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /healthz
            port: 5000
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 60

On the Google Cloud Console I can see 3 loadbalancers, 2 of them showing healthy 2/2 and 1 of them showing 0/2.

I have deleted and recreated the setup as instructed above, but it still isn't ready for further testing.

EDIT: Changing both /healthz to / and deleting/recreating the Ingress fixed it.

@Winterflower
Copy link

hI @niclashedam and @jamesthompson ,
were you able to find our why changing the YAML around makes kube-lego work?
I am hitting similar issues with my deployment and for some reason something in the YAML parsing flips the tls block from

  tls: 
   - secretName: something
     hosts:
      - REDACTED

to

kubectl edit ingress myingress

  tls:
  - hosts:
    - REDACTED
    secretName: somename

when I try to edit this file back to the original configuration and save, I get an "edit cancelled" message from kubectl.

@dennypenta
Copy link

I have the same problem.
So I'm thinking what if I would remove that defaults healthchecks and and my own in kubernetes?
Is it a good idea?

ElvinEfendi added a commit to Shopify/kube-lego that referenced this issue Feb 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants