Ingress-GCE has a nil pointer exception #471

rramkumar1 · 2018-09-11T16:54:24Z

We are aware of a nil pointer issue in v1.3.2. This bug was actually fixed in #434 but did not make it into the 1.3 release branch. Since this nil pointer crashes the controller, the issue is not surfaced to users other than Ingresses not being synced.

The current workaround is to delete the Ingress which is not being synced and recreate it. A fix will be coming in the next 1.3 patch release (v1.3.3)

rramkumar1 · 2018-09-11T17:17:37Z

/kind bug

abstrctn · 2018-09-12T18:04:07Z

Is there a way to verify whether this is occurring in a GKE cluster? Since I don't think we have access to the logs, we can't check for the exception listed in #434, but we'd like to make sure this is the issue before recreating Ingresses.

rramkumar1 · 2018-09-12T18:11:41Z

Unfortunately, no. If you are able to update your Ingress and see the changes reflected in GCP, then you should be fine. Otherwise, you are most likely hitting this issue. Also note that this is only happening is GKE clusters above version 1.10.6

poor-bob · 2018-09-13T19:01:52Z

I'm pulling my hair out trying to figure out why our ingresses suddenly stopped being fulfilled by the ingress controllers. Normally I've found a very reasonable explanation (Quotas, etc.), but this time I'm relatively sure we're running into this bug.

kubernetes master version: 1.10.6-gke.2

We've tried deleting every ingress and recreating them, to no avail. Is there a time period I should wait before recreating the ingresses? I waited roughly 5 minutes this first time.

rramkumar1 · 2018-09-13T19:21:18Z

@poor-bob Email me your project name, cluster name and location of the cluster and I'll take a look. If you deleted and recreated every ingress I would think that you would not be running into this specific issue.

addisonbair · 2018-09-13T19:51:25Z

@rramkumar1
I'm experiencing the same issue with 1.10.6-gke.2

I have disabled the default GKE loadbalancer-controller and installed this version JUST to see logs. Indeed I am experiencing this issue:

E0913 19:46:08.868155       1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)

If I delete and recreate each and every ingress from my project, can I expect to get past this nil pointer dereference issue?

rramkumar1 · 2018-09-13T19:56:49Z

@addisonbair Theoretically yes. Since you installed another instance to get logs, you should be able to find out if that indeed works for you.

addisonbair · 2018-09-13T20:17:02Z

Not related to the nil pointer issue, but my default backend disappears (both the service and deployment) without a trace:

ingress-gce/deploy/glbc on  master [!] at ☸️ gke_remesh-stage_us-east1-b_stage
➜ kubectl describe svc default-http-backend -n kube-system
Name:                     default-http-backend
Namespace:                kube-system
Labels:                   addonmanager.kubernetes.io/mode=Reconcile
                          k8s-app=glbc
                          kubernetes.io/cluster-service=true
                          kubernetes.io/name=GLBCDefaultBackend
Annotations:              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"glbc","kubernetes.i...
Selector:                 k8s-app=glbc
Type:                     NodePort
IP:                       10.47.247.24
Port:                     http  80/TCP
TargetPort:               8080/TCP
NodePort:                 http  30668/TCP
Endpoints:                10.44.21.65:8080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

ingress-gce/deploy/glbc on  master [!] at ☸️ gke_remesh-stage_us-east1-b_stage
➜ kubectl describe svc default-http-backend -n kube-system
Error from server (NotFound): services "default-http-backend" not found

Is there any way I can debug this?

rramkumar1 · 2018-09-13T20:23:25Z

@addisonbair Can you file a separate issue for that and explain how exactly you are using the script in deploy/glbc?

addisonbair · 2018-09-13T20:35:13Z

Will do. 👍

I believe I have a fix and in the process uncovered a possible bug with the yaml manifests.

Since I don't have access to the masters (GKE) I can't be completely sure, but it appears there is a conflict between the Addon-manager running on the master and the annotations on the objects within deploy/glbc/yaml/default-http-backend.yaml. By changing the annotations to addonmanager.kubernetes.io/mode: EnsureExists from addonmanager.kubernetes.io/mode: Reconcile, the Addon-manager does not delete these objects.

addisonbair · 2018-09-13T21:02:50Z

@rramkumar1

After deleting all my Ingresses, I am unfortunately still seeing the NPE.

Is there a known working pre-release image that I can use?

Thank you!!

rramkumar1 · 2018-09-13T21:09:11Z

@addisonbair We are in the process of building a patch with the fix and pushing it out. This will enable you to start testing the fix. Keep in mind that this does not mean it is released in GKE. You will still have to wait for an official GKE rollout and upgrade your cluster to get the fix.

Will let you know when the patch is ready to pull down.

addisonbair · 2018-09-13T21:11:16Z

Awesome. Thank you!

addisonbair · 2018-09-13T21:42:32Z

Just a quick update:

I managed to build an image from master and have successfully deployed to GKE (1.10.6-gke.2) without seeing the dreaded NPE. All ingresses are back up and operational.

I'm happy to test a more formal image when it is ready. Thanks so much for the help!

rramkumar1 · 2018-09-13T21:48:17Z

@addisonbair Thanks, that's great to hear! We just pushed v1.3.3 so please let us know if that works as well.

This would be the version that would officially be rolled out as part of a new GKE version.

addisonbair · 2018-09-13T22:03:31Z

@rramkumar1

Built, pushed and deployed v1.3.3 on my 1.10.6-gke.2 cluster and it works perfectly! No more NPE.

Thanks so much!

rramkumar1 · 2018-09-17T18:45:38Z

Quick update for all tracking this issue. Hopefully the GKE rollout for the fix ends this week. I will ping this thread with the GKE version everyone should upgrade to once rollout is complete.

cerealcable · 2018-09-19T05:06:48Z

Figure I would comment to clarify if anyone else is new to k8s since I was and it wasn't clear, you need to delete the LB as well as the associated services along with it. Once I did that and recreated the services & ingress I was able to work around this bug. Definitely not a great long-term solution but it worked until the patch is ready.

Thanks @rramkumar1 for your assistance and confirmation of my issue!

laupow · 2018-09-24T16:23:44Z

The GKE team released a new version, 1.10.7-gke.2, which fixes the issue of stuck Ingress resources.

rramkumar1 · 2018-09-24T16:24:54Z

@laupow Thanks for the update. The fix should be rolled out as part of 1.10.7-gke.2 and 1.11.2-gke.4.

/close

k8s-ci-robot · 2018-09-24T16:24:55Z

@rramkumar1: Closing this issue.

In response to this:

@laupow Thanks for the update. The fix should be rolled out as part of 1.10.7-gke.2 and 1.11.2-gke.4.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ericuldall · 2018-09-24T19:33:42Z

@rramkumar1 Are you sure this fix is live with 1.10.7-gke.2? I got this response from GCP Support last week:

This is to provide you an update that I checked with the product team on whether GKE version (1.10.7-gke.2) have the ingress fix and I came to know that it won't.

rramkumar1 · 2018-09-24T19:45:58Z

@ericuldall I'm not sure why GCP support told you that. They may have gotten confused about something else. Do you not see 1.10.7-gke.2 as a viable version?

ericuldall · 2018-09-24T20:01:34Z

I see it available, just unclear if the fix is actually deployed to that version or not.

rramkumar1 · 2018-09-24T20:02:13Z

Yes, the fix is available in that version.

ericuldall · 2018-09-24T20:02:37Z

Yes, I deployed it and my ingress was updated :D thanks for confirming!

bschwartz757 · 2018-09-26T16:56:58Z

@rramkumar1 I have a cluster running on 1.10.6-gke.2 and I replaced one of the ingresses, then it got stuck in 'creating ingress'. I just found this thread this morning, and accordingly, deleted and then re-created the ingress but it's still showing up as 'creating ingress' in the GCP dashboard. Any ideas?

rramkumar1 · 2018-09-26T16:58:44Z

@bschwartz757 You can upgrade to 1.10.7-gke.2. See above discussion

bschwartz757 · 2018-09-26T17:15:45Z

@rramkumar1 ok..... anything that doesn't involve upgrading?

rramkumar1 · 2018-09-26T17:22:12Z

@bschwartz757 Upgrading is the only supported way to get these kinds of fixes. If you don't want to upgrade, you can also run the script we have in deploy/. Note that this script is somewhat dangerous to run in production (and as a result, we don't officially support it) but it does allow you to modify the version of your ingress-gce controller without having to depend on GKE for upgrades

Arconapalus · 2019-01-11T23:27:26Z

@rramkumar1 just letting you know that I also am running into this issue on 1.11.5-gke.5. The creating ingress is stuck and I have deleted and recreated the ingress. Should I delete node and cluster and recreate at 1.10.7.gke.2?

rramkumar1 · 2019-01-11T23:34:13Z

@Arconapalus This issue is already fixed for that version so you might be running into a separate issue.

Can you please file a separate issue for this?

Arconapalus · 2019-01-11T23:35:52Z

@rramkumar1 Yes I can.
#605

kdeng · 2019-01-16T03:24:35Z

@rramkumar1 I am also experiencing this issue on 1.11.6-gke.2. When I look at events of ingress details, there is no message at all.

My ingress file is pretty simple as below.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: basic-ingress
  namespace: build
spec:
  rules:
  - http:
      paths:
      - path: /jenkins
        backend:
          serviceName: jenkins-ui
          servicePort: 8080
      - path: /nexus
        backend:
          serviceName: nexus-ui
          servicePort: 8081

rramkumar1 · 2019-01-16T05:28:27Z

@kdeng Did you take a look at #605?

surykatka · 2019-01-21T12:46:03Z

@rramkumar1 I'm having a similar issue to the one you have described in #605. I have sent you an e-mail with my setup but I'm also happy to continue the conversation online.

rramkumar1 mentioned this issue Sep 11, 2018

GCE ingress stucks on "Creating ingress" status, existing ingresses don't update #470

Closed

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 11, 2018

rramkumar1 mentioned this issue Sep 12, 2018

Cherrypick of #434 on release 1.3 #472

Merged

aviresonai mentioned this issue Sep 16, 2018

GKE ingress with https load balancer and IAP/security policy enabled #469

Closed

rramkumar1 mentioned this issue Sep 17, 2018

Changes to ingress resource doesn't update forwarding rules most of the time in 1.10.6-gke.2 #477

Closed

apognu mentioned this issue Sep 19, 2018

Certificates are not added to GCP load-balancer cert-manager/cert-manager#916

Closed

k8s-ci-robot closed this as completed Sep 24, 2018

Ingress-GCE has a nil pointer exception #471

Ingress-GCE has a nil pointer exception #471

Comments

rramkumar1 commented Sep 11, 2018

rramkumar1 commented Sep 11, 2018

abstrctn commented Sep 12, 2018

rramkumar1 commented Sep 12, 2018

poor-bob commented Sep 13, 2018

rramkumar1 commented Sep 13, 2018

addisonbair commented Sep 13, 2018

rramkumar1 commented Sep 13, 2018

addisonbair commented Sep 13, 2018

rramkumar1 commented Sep 13, 2018

addisonbair commented Sep 13, 2018

addisonbair commented Sep 13, 2018

rramkumar1 commented Sep 13, 2018 • edited Loading

addisonbair commented Sep 13, 2018

addisonbair commented Sep 13, 2018

rramkumar1 commented Sep 13, 2018 • edited Loading

addisonbair commented Sep 13, 2018

rramkumar1 commented Sep 17, 2018

cerealcable commented Sep 19, 2018

laupow commented Sep 24, 2018

rramkumar1 commented Sep 24, 2018

k8s-ci-robot commented Sep 24, 2018

ericuldall commented Sep 24, 2018 • edited Loading

rramkumar1 commented Sep 24, 2018

ericuldall commented Sep 24, 2018

rramkumar1 commented Sep 24, 2018

ericuldall commented Sep 24, 2018

bschwartz757 commented Sep 26, 2018

rramkumar1 commented Sep 26, 2018

bschwartz757 commented Sep 26, 2018

rramkumar1 commented Sep 26, 2018 • edited Loading

Arconapalus commented Jan 11, 2019

rramkumar1 commented Jan 11, 2019

Arconapalus commented Jan 11, 2019 • edited Loading

kdeng commented Jan 16, 2019

rramkumar1 commented Jan 16, 2019

surykatka commented Jan 21, 2019 • edited Loading

rramkumar1 commented Sep 13, 2018 •

edited

Loading

rramkumar1 commented Sep 13, 2018 •

edited

Loading

ericuldall commented Sep 24, 2018 •

edited

Loading

rramkumar1 commented Sep 26, 2018 •

edited

Loading

Arconapalus commented Jan 11, 2019 •

edited

Loading

surykatka commented Jan 21, 2019 •

edited

Loading