'helm install' gets tuck in error loop #981

maratoid · 2016-07-22T22:06:39Z

I have an ansible playbook that:

runs 'helm init'
runs 'helm add repo'
runs a few 'helm install' commands

From time to time, the helm install commands get stuck in an error loop of some sort:

E0722 21:46:21.216925       1 portforward.go:327] an error occurred forwarding 43394 -> 44134: error forwarding port 44134 to pod tiller-rc-sf2a4_default, uid : pod not found ("tiller-rc-sf2a4_default")
2016/07/22 21:46:21 transport: http2Client.notifyError got notified that the client transport was broken EOF.
E0722 21:46:21.223846       1 portforward.go:327] an error occurred forwarding 43394 -> 44134: error forwarding port 44134 to pod tiller-rc-sf2a4_default, uid : pod not found ("tiller-rc-sf2a4_default")
2016/07/22 21:46:21 transport: http2Client.notifyError got notified that the client transport was broken EOF.
E0722 21:46:21.233468       1 portforward.go:327] an error occurred forwarding 43394 -> 44134: error forwarding port 44134 to pod tiller-rc-sf2a4_default, uid : pod not found ("tiller-rc-sf2a4_default")
2016/07/22 21:46:21 transport: http2Client.notifyError got notified that the client transport was broken EOF.
E0722 21:46:51.234500       1 portforward.go:267] error creating error stream for port 43394 -> 44134: Timeout occured
2016/07/22 21:46:51 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:42940->127.0.0.1:43394: read: connection reset by peer.
E0722 21:47:21.238377       1 portforward.go:289] error creating forwarding stream for port 43394 -> 44134: Timeout occured
2016/07/22 21:47:21 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:42941->127.0.0.1:43394: read: connection reset by peer.
E0722 21:47:51.245922       1 portforward.go:289] error creating forwarding stream for port 43394 -> 44134: Timeout occured
2016/07/22 21:47:51 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:42942->127.0.0.1:43394: read: connection reset by peer.
E0722 21:48:21.247027       1 portforward.go:267] error creating error stream for port 43394 -> 44134: Timeout occured
2016/07/22 21:48:21 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:42943->127.0.0.1:43394: read: connection reset by peer.
E0722 21:48:51.255121       1 portforward.go:289] error creating forwarding stream for port 43394 -> 44134: Timeout occured
2016/07/22 21:48:51 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:42944->127.0.0.1:43394: read: connection reset by peer.

checking kubectl get pods shows that tiller-rc-sf2a4_default is actually present. Could this be a race of some sort between 'helm init' initialising the tiller rc fully and 'helm install' ?

The text was updated successfully, but these errors were encountered:

technosophos · 2016-07-22T23:38:31Z

Can you tell us about the environment that Tiller is running in? Local, remote, GKE, etc? Oh, and the Kubernetes version?

I've had that happen once or twice due to k8s API server issues (running k8s 1.3 inside of VirtualBox using scripts/local-cluster.sh). But my guess is that you may have hit a different bug.

adamreese · 2016-07-23T05:11:29Z

This could be due to Helm not checking if the pod is ready before connecting. I'll check on it

Fixes: helm#981

maratoid · 2016-09-20T22:58:46Z

FYI, this happens again. We have a cluster with some sort of a networking problem - pods seem to be able to talk to each other but cannot reach outside the flannel network. Trying to install a release on this cluster results in helm getting stuck in a loop:

KUBECONFIG=/Users/marat/.kraken/maratoidTNG/admin.kubeconfig HELM_HOME=/Users/marat/.kraken/maratoidTNG/.helm helm install atlas/kubedns-0.1.0 --name kubedns --values /Users/marat/.kraken/maratoidTNG/atlas-kubedns.helmvalues
Fetched atlas/kubedns-0.1.0 to /Users/marat/dev/kraken/kubedns-0.1.0.tgz
2016/09/20 15:49:32 transport: http2Client.notifyError got notified that the client transport was broken EOF.
E0920 15:49:33.217943   61537 portforward.go:327] an error occurred forwarding 61453 -> 44134: error forwarding port 44134 to pod tiller-deploy-1979772362-1kqd9_kube-system, uid : exit status 1: 2016/09/20 22:49:33 socat[21989] E connect(5, AF=2 127.0.0.1:44134, 16): Connection refused
2016/09/20 15:49:33 transport: http2Client.notifyError got notified that the client transport was broken EOF.
E0920 15:49:33.439489   61537 portforward.go:327] an error occurred forwarding 61453 -> 44134: error forwarding port 44134 to pod tiller-deploy-1979772362-1kqd9_kube-system, uid : exit status 1: 2016/09/20 22:49:33 socat[21990] E connect(5, AF=2 127.0.0.1:44134, 16): Connection refused
2016/09/20 15:49:33 transport: http2Client.notifyError got notified that the client transport was broken EOF.
E0920 15:49:33.695460   61537 portforward.go:327] an error occurred forwarding 61453 -> 44134: error forwarding port 44134 to pod tiller-deploy-1979772362-1kqd9_kube-system, uid : exit status 1: 2016/09/20 22:49:33 socat[21991] E connect(5, AF=2 127.0.0.1:44134, 16): Connection refused
2016/09/20 15:49:33 transport: http2Client.notifyError got notified that the client transport was broken EOF.
E0920 15:49:33.922762   61537 portforward.go:327] an error occurred forwarding 61453 -> 44134: error forwarding port 44134 to pod tiller-deploy-1979772362-1kqd9_kube-system, uid : exit status 1: 2016/09/20 22:49:33 socat[22003] E connect(5, AF=2 127.0.0.1:44134, 16): Connection refused
2016/09/20 15:49:33 transport: http2Client.notifyError got notified that the client transport was broken EOF.

and tiller pod crashing:

kubectl --kubeconfig=/Users/marat/.kraken/maratoidTNG/admin.kubeconfig logs tiller-deploy-1979772362-1kqd9 --namespace=kube-system --follow
Tiller is running on :44134
Tiller probes server is running on :44135
Storage driver is ConfigMap
Cannot initialize Kubernetes connection: Get https://10.32.0.1:443/api: dial tcp 10.32.0.1:443: connect: network is unreachable2016-09-20 22:52:33.535956 I | Getting release "kubedns" from storage
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6d2d69]

goroutine 41 [running]:
panic(0x1473dc0, 0xc42000a030)
        /usr/local/Cellar/go/1.7/libexec/src/runtime/panic.go:500 +0x1a1
k8s.io/helm/vendor/k8s.io/kubernetes/pkg/client/unversioned.(*ConfigMaps).Get(0xc420074b00, 0xc420461bd9, 0x7, 0x30, 0x1ec4fe0, 0x27)
        /Users/adamreese/p/go/src/k8s.io/helm/vendor/k8s.io/kubernetes/pkg/client/unversioned/configmap.go:58 +0x79
k8s.io/helm/pkg/storage/driver.(*ConfigMaps).Get(0xc4204611d0, 0xc420461bd9, 0x7, 0x1, 0x1, 0x0)
        /Users/adamreese/p/go/src/k8s.io/helm/pkg/storage/driver/cfgmaps.go:69 +0x62
k8s.io/helm/pkg/storage.(*Storage).Get(0xc4204611e0, 0xc420461bd9, 0x7, 0x493dd, 0x1fed4460200cb850, 0xcf73b4b1)
        /Users/adamreese/p/go/src/k8s.io/helm/pkg/storage/storage.go:36 +0xdd
main.(*releaseServer).uniqName(0xc420020030, 0xc420461bd9, 0x7, 0xc420075c00, 0x1, 0xc4203e7400, 0x0, 0x7f17f671b000)
        /Users/adamreese/p/go/src/k8s.io/helm/cmd/tiller/release_server.go:337 +0x376
main.(*releaseServer).prepareRelease(0xc420020030, 0xc420457400, 0xc42042d440, 0xc4200761c0, 0xc4200cba98)
        /Users/adamreese/p/go/src/k8s.io/helm/cmd/tiller/release_server.go:398 +0x6e
main.(*releaseServer).InstallRelease(0xc420020030, 0x7f17f66cfd08, 0xc4203d4810, 0xc420457400, 0xc42003bb18, 0x417cc8, 0x40)
        /Users/adamreese/p/go/src/k8s.io/helm/cmd/tiller/release_server.go:379 +0x3c
k8s.io/helm/pkg/proto/hapi/services._ReleaseService_InstallRelease_Handler(0x1592440, 0xc420020030, 0x7f17f66cfd08, 0xc4203d4810, 0xc420288040, 0x0, 0x0, 0xc4203e7400, 0x4)
        /Users/adamreese/p/go/src/k8s.io/helm/pkg/proto/hapi/services/tiller.pb.go:586 +0xdd
k8s.io/helm/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc42044a000, 0x1e971c0, 0xc42042d440, 0xc4200761c0, 0xc42033f140, 0x1eb8108, 0xc4203d46c0, 0x0, 0x0)
        /Users/adamreese/p/go/src/k8s.io/helm/vendor/google.golang.org/grpc/server.go:497 +0xa0b
k8s.io/helm/vendor/google.golang.org/grpc.(*Server).handleStream(0xc42044a000, 0x1e971c0, 0xc42042d440, 0xc4200761c0, 0xc4203d46c0)
        /Users/adamreese/p/go/src/k8s.io/helm/vendor/google.golang.org/grpc/server.go:646 +0x6ad
k8s.io/helm/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc4204618a0, 0xc42044a000, 0x1e971c0, 0xc42042d440, 0xc4200761c0)
        /Users/adamreese/p/go/src/k8s.io/helm/vendor/google.golang.org/grpc/server.go:323 +0xab
created by k8s.io/helm/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
        /Users/adamreese/p/go/src/k8s.io/helm/vendor/google.golang.org/grpc/server.go:324 +0xa3

Now of course this is not a valid cluster, but I figured tiller shouldn't just go panic either, and helm should still fail gracefully.

adamreese added a commit to adamreese/helm that referenced this issue Jul 25, 2016

fix(cmd): ensure tiller is running for connection

49491a8

Fixes: helm#981

adamreese mentioned this issue Jul 25, 2016

fix(cmd): ensure tiller is running for connection #986

Merged

adamreese closed this as completed in #986 Jul 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'helm install' gets tuck in error loop #981

'helm install' gets tuck in error loop #981

maratoid commented Jul 22, 2016

technosophos commented Jul 22, 2016

adamreese commented Jul 23, 2016

maratoid commented Sep 20, 2016

'helm install' gets tuck in error loop #981

'helm install' gets tuck in error loop #981

Comments

maratoid commented Jul 22, 2016

technosophos commented Jul 22, 2016

adamreese commented Jul 23, 2016

maratoid commented Sep 20, 2016