New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException on Kubernetes 1.6.1 #1219

Closed
lkysow opened this Issue Apr 17, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@lkysow

lkysow commented Apr 17, 2017

Linkerd Versions: 0.9.0 and 0.9.1 have same error
Kubernetes Versions: 1.6.1, 1.6.0

Stack trace on linkerd 0.9.0

E 0417 21:47:41.726 UTC THREAD21: k8s failed to list endpoints
java.lang.NullPointerException
	at io.buoyant.k8s.EndpointsNamer$.io$buoyant$k8s$EndpointsNamer$$getAddrs(EndpointsNamer.scala:116)
	at io.buoyant.k8s.EndpointsNamer$.io$buoyant$k8s$EndpointsNamer$$mkPorts(EndpointsNamer.scala:145)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc$1.apply(EndpointsNamer.scala:246)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc$1.apply(EndpointsNamer.scala:245)
	at scala.Option.map(Option.scala:146)
	at io.buoyant.k8s.EndpointsNamer$NsCache.io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc(EndpointsNamer.scala:245)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$10.apply(EndpointsNamer.scala:226)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$10.apply(EndpointsNamer.scala:225)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252)
	at scala.collection.immutable.List.flatMap(List.scala:344)
	at io.buoyant.k8s.EndpointsNamer$NsCache.initialize(EndpointsNamer.scala:225)
	at io.buoyant.k8s.EndpointsNamer$NsCache.initialize(EndpointsNamer.scala:213)
	at io.buoyant.k8s.Ns$$anonfun$io$buoyant$k8s$Ns$$watch$1$$anonfun$apply$1.apply(Ns.scala:77)
	at io.buoyant.k8s.Ns$$anonfun$io$buoyant$k8s$Ns$$watch$1$$anonfun$apply$1.apply(Ns.scala:76)
	at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$3.apply(Future.scala:1145)
	at com.twitter.util.Try$.apply(Try.scala:15)

Stack trace on 0.9.1

I 0417 21:57:49.256 UTC THREAD1: linkerd 0.9.1
...
E 0417 21:57:52.349 UTC THREAD18: k8s failed to list endpoints
java.lang.NullPointerException
        at io.buoyant.k8s.EndpointsNamer$.io$buoyant$k8s$EndpointsNamer$$getEndpoints(EndpointsNamer.scala:126)
        at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc$1.apply(EndpointsNamer.scala:221
        at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc$1.apply(EndpointsNamer.scala:219
        at scala.Option.map(Option.scala:146)
        at io.buoyant.k8s.EndpointsNamer$NsCache.io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc(EndpointsNamer.scala:219)
        at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$13.apply(EndpointsNamer.scala:200)
        at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$13.apply(EndpointsNamer.scala:199)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252)
        at scala.collection.immutable.List.flatMap(List.scala:344)
        at io.buoyant.k8s.EndpointsNamer$NsCache.initialize(EndpointsNamer.scala:199)
        at io.buoyant.k8s.EndpointsNamer$NsCache.initialize(EndpointsNamer.scala:187)
        at io.buoyant.k8s.Ns$$anonfun$io$buoyant$k8s$Ns$$watch$1$$anonfun$apply$1.apply(Ns.scala:77)
        at io.buoyant.k8s.Ns$$anonfun$io$buoyant$k8s$Ns$$watch$1$$anonfun$apply$1.apply(Ns.scala:76)
        at com.twitter.util.Future$$anonfun$map$1$$anonfun$apply$3.apply(Future.scala:1145)
        at com.twitter.util.Try$.apply(Try.scala:15)
        at com.twitter.util.Future$.apply(Future.scala:163)
        at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1145)
        at com.twitter.util.Future$$anonfun$map$1.apply(Future.scala:1144)
        at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:107)
        at com.twitter.util.Promise$Transformer.k(Promise.scala:107)
        at com.twitter.util.Promise$Transformer.apply(Promise.scala:117)
        at com.twitter.util.Promise$Transformer.apply(Promise.scala:98)
        at com.twitter.util.Promise$$anon$1.run(Promise.scala:421)
        at com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:200)
        at com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:158)

Full Stack Traces: https://gist.github.com/lkysow/d82d9d59bb6a7776418917560a0c95dd

With these errors Linkerd can't do any lookups and so no routing, i.e. it is essentially dead.

@olix0r

This comment has been minimized.

Member

olix0r commented Apr 17, 2017

@siggy

This comment has been minimized.

Member

siggy commented Apr 17, 2017

confirmed repro with gke 1.6.1:

gcloud alpha container clusters create alpha --cluster-version 1.6.1 --zone us-central1-b --enable-kubernetes-alpha

also confirmed this works correctly on 1.6.0.

@siggy siggy removed the needs repro label Apr 17, 2017

@siggy siggy self-assigned this Apr 17, 2017

@lkysow

This comment has been minimized.

lkysow commented Apr 18, 2017

Also failing for us after downgrading to 1.6.0:

kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:40:50Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
➜  load-test-endpoint
➜  load-test-endpoint k get ds
NAME         DESIRED   CURRENT   READY     NODE-SELECTOR                   AGE
flannel-ds   5         5         4         beta.kubernetes.io/arch=amd64   2d
linkerd-ds   5         5         4         <none>                          2d
➜  load-test-endpoint k get pods |grep linkerd
linkerd-ds-7vzzv                        2/2       Running   0          12m
linkerd-ds-7xvkk                        2/2       Running   1          12m
linkerd-ds-90svd                        2/2       Running   0          12m
linkerd-ds-m1h7b                        2/2       Running   5          4m
linkerd-ds-rnlkv                        2/2       Running   0          1m
linkerd-ds-v45pf                        2/2       Running   0          1m
➜  load-test-endpoint k logs linkerd-ds-90svd linkerd --since=1m
I 0417 23:58:44.656 UTC THREAD10: k8s initializing kube-system
E 0417 23:58:44.667 UTC THREAD26: k8s failed to list endpoints
java.lang.NullPointerException
	at io.buoyant.k8s.EndpointsNamer$.io$buoyant$k8s$EndpointsNamer$$getEndpoints(EndpointsNamer.scala:126)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc$1.apply(EndpointsNamer.scala:221)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc$1.apply(EndpointsNamer.scala:219)
	at scala.Option.map(Option.scala:146)
	at io.buoyant.k8s.EndpointsNamer$NsCache.io$buoyant$k8s$EndpointsNamer$NsCache$$mkSvc(EndpointsNamer.scala:219)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$13.apply(EndpointsNamer.scala:200)
	at io.buoyant.k8s.EndpointsNamer$NsCache$$anonfun$13.apply(EndpointsNamer.scala:199)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
@olix0r

This comment has been minimized.

Member

olix0r commented Apr 18, 2017

We'll probably need to compare the raw responses for the Endpoints API between 1.5.x and 1.6.x

@siggy

This comment has been minimized.

Member

siggy commented Apr 18, 2017

looks like this is not just an api difference between 1.6.0 and 1.6.1, but a difference between /api/v1/namespaces/{namespace}/endpoints and /api/v1/namespaces/{namespace}/endpoints/{name} in 1.6.1

1.6.0

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl -n=kube-system get ep kube-controller-manager -o jsonpath='{.subsets}'
[]

$ kubectl -n=kube-system get ep -o jsonpath='{.items[?(@.metadata.name=="kube-controller-manager")].subsets}'
[]

1.6.1

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl -n=kube-system get ep kube-controller-manager -o jsonpath='{.subsets}'
[]

$ kubectl -n=kube-system get ep -o jsonpath='{.items[?(@.metadata.name=="kube-controller-manager")].subsets}'
<nil> # <---- BOOM

@siggy siggy added the in progress label Apr 18, 2017

@anubhavmishra

This comment has been minimized.

anubhavmishra commented Apr 18, 2017

@siggy I see this behaviour for 1.6.0 as well:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T23:37:30Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl -n=kube-system get ep kube-controller-manager -o jsonpath='{.subsets}'
[]

$ kubectl -n=kube-system get ep -o jsonpath='{.items[?(@.metadata.name=="kube-controller-manager")].subsets}'
<nil>

siggy added a commit that referenced this issue Apr 18, 2017

Fix k8s namer to handle null ep subsets (#1219)
Problem: User reported NullPointerException on Kubernetes 1.6.1.
Confirmed EndpointsList API differs in 1.6.1, where subsets field can be
null rather. In 1.6.0 it was empty string.

Solution: Modify the k8s namer to handle null Endpoint subsets.

Validation: ./sbt test and confirmed on k8s 1.6.1 cluster

@siggy siggy added reviewable and removed in progress labels Apr 18, 2017

@siggy

This comment has been minimized.

Member

siggy commented Apr 18, 2017

Fix is up for review.

Also reported this at kubernetes/kubernetes#44593

siggy added a commit that referenced this issue Apr 18, 2017

Fix k8s namer to handle null ep subsets (#1219)
In Kubernetes 1.6.1, the EndpointsList API changed so that the `subsets`
field may be null.

To resolve this, subsets have been made optional.

A test case has been added to validate this.

Fixes #1219

@siggy siggy closed this in #1223 Apr 19, 2017

siggy added a commit that referenced this issue Apr 19, 2017

Fix k8s namer to handle null ep subsets (#1219) (#1223)
In Kubernetes 1.6.1, the EndpointsList API changed so that the `subsets`
field may be null.

To resolve this, subsets have been made optional.

A test case has been added to validate this.

Fixes #1219
@ahmetb

This comment has been minimized.

ahmetb commented Oct 31, 2017

FWIW adding some context, this incompatibility has apparently caused an outage at a bank’s infrastructure. https://community.monzo.com/t/current-account-payments-may-fail-major-outage/26296/95

@wmorgan

This comment has been minimized.

Member

wmorgan commented Oct 31, 2017

Hi @ahmetb! Thanks for the note. To be clear, the root cause of that incident was a Kubernetes bug. This Linkerd bug compounded the problem, which is bad, but "caused an outage" is an overstatement.

(I'll also add that this bug has been fixed since Linkerd 1.0, released over 6 months ago.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment