New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm install --wait does not wait for deployment pod readiness properly #3173

Closed
jgiles opened this Issue Nov 20, 2017 · 22 comments

Comments

Projects
None yet
@jgiles

jgiles commented Nov 20, 2017

I recently discovered and started using --wait:

--wait                   if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment are in a ready state before marking the release as successful. It will wait for as long as --timeout

It's great!

However, when I run helm install --wait, the command exits successfully before the pods in my deployments pass their configured readiness probe.

I did a little spelunking to confirm that my expectations were correct, and found this suspicious-looking code attempting to do the deployment readiness check:

func (c *Client) deploymentsReady(deployments []deployment) bool {
	for _, v := range deployments {
		if !(v.replicaSets.Status.ReadyReplicas >= *v.deployment.Spec.Replicas-deploymentutil.MaxUnavailable(*v.deployment)) {
			c.Log("Deployment is not ready: %s/%s", v.deployment.GetNamespace(), v.deployment.GetName())
			return false
		}
	}
	return true
}

permalink: https://github.com/kubernetes/helm/blob/d790f7d843182f1c126cfa64989ffdd4e9583747/pkg/kube/wait.go#L174

I'm happy to send a PR to fix the if-check, but might need some advice on testing the fix.

@cueBall123

This comment has been minimized.

cueBall123 commented Jan 2, 2018

I just ran into this issue. Is there a work around for this until the fix is in place.

@paolomainardi

This comment has been minimized.

Contributor

paolomainardi commented Jan 24, 2018

A reproducible test case can be found here: https://github.com/paolomainardi/helm-wait-test
You can find here the recorded terminal session: https://asciinema.org/a/RdnLJpALOik8MLlzlFSUqCxOc

Helm version:

 ❯ helm version --tiller-namespace test                                                                                                                                
Client: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}

The same behaviour is present also using helm < 2.8.0
If you need more info, please don't hesitate to ask me.

@paolomainardi

This comment has been minimized.

Contributor

paolomainardi commented Jan 25, 2018

ping @jgiles @cueBall123

@paolomainardi

This comment has been minimized.

Contributor

paolomainardi commented Jan 25, 2018

I tried several things:

  • adding maxUnavailable: 0
  • trying to add replicaCount : 4

The result is the same:

test git:master ❯ helm upgrade --timeout 600 --install test --wait --tiller-namespace test .                                                                                           
Release "test" does not exist. Installing it now.
NAME:   test
LAST DEPLOYED: Thu Jan 25 15:18:30 2018
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME  TYPE       CLUSTER-IP   EXTERNAL-IP  PORT(S)  AGE
test  ClusterIP  10.3.242.21  <none>       80/TCP   2s

==> v1beta2/Deployment
NAME  DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
test  4        4        4           0          2s


NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l "app=test,release=test" -o jsonpath="{.items[0].metadata.name}")
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl port-forward $POD_NAME 8080:80

test git:master ❯ kubectl get pod -ntest                                                                                                                                               ✹
NAME                            READY     STATUS    RESTARTS   AGE
test-64d4979f9f-7mrxg           0/1       Running   0          5s
test-64d4979f9f-fk4b9           0/1       Running   0          5s
test-64d4979f9f-vxkkk           0/1       Running   0          5s
test-64d4979f9f-zxpns           0/1       Running   0          5s
tiller-deploy-566ff67bd-c2nv4   1/1       Running   0          43s
@paolomainardi

This comment has been minimized.

Contributor

paolomainardi commented Jan 26, 2018

Just an update to this issue, it seems to be related to the apiVersion used on the deployment manifest.

Testing on:

kubectl version                                                                                                                                                     

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.5-gke.0", GitCommit:"2c2a807131fa8708abc92f3513fe167126c8cce5", GitTreeState:"clean", BuildDate:"2017-12-19T20:05:45Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}

Helm:

test git:master ❯ helm version --tiller-namespace test                                                                                                                                
Client: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"dirty"}
Server: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}

Test cases:

--wait is working just when extensions/v1beta1 is used, all the other scenarios are broken

@thomastaylor312

This comment has been minimized.

Collaborator

thomastaylor312 commented Jan 26, 2018

I think apps/v1 should work too, although I haven't tested it

@paolomainardi

This comment has been minimized.

Contributor

paolomainardi commented Jan 26, 2018

@thomastaylor312 i can't test it, because (i guess) that apiVersion is not available since 1.9.0

@ljnelson

This comment has been minimized.

ljnelson commented Jan 26, 2018

@thomastaylor312 My contention would be that at least in the case of Deployments, apps/v1 would work, yes, because the two case blocks are exact copies of each other (maybe Go doesn't allow fallthrough?).

@bacongobbler

This comment has been minimized.

Member

bacongobbler commented Jan 27, 2018

Yes, only extensions/v1beta1 and apps/v1 will work given the existing code. If someone wants to write a patch that adds apps/v1beta1 and apps/v1beta2, that will solve the issue. It should be a very easy fix to implement.

@unguiculus

This comment has been minimized.

Member

unguiculus commented Jan 27, 2018

#3372 is similar in that only certain API versions were considered. I fixed it by adding the missing ones. I wonder if we should try and find a more generic solution using reflection. Right now the code would break as soon as new API versions are added.

@paolomainardi

This comment has been minimized.

Contributor

paolomainardi commented Jan 27, 2018

The simplest fix: #3407

mattfarina added a commit that referenced this issue Jan 31, 2018

Merge pull request #3407 from paolomainardi/feature/3173_fix_wait_for…
…_deployments

refs #3173: add appsv1beta1 and appsv1beta2 apiVersion
@mattfarina

This comment has been minimized.

Collaborator

mattfarina commented Jan 31, 2018

With #3407 being merged I believe this is fixed. Please re-open if the issue persists. It was tagged for the next patch release so it should be in that release.

@davidchua

This comment has been minimized.

davidchua commented Mar 9, 2018

I'm actually having this issue on helm v2.7.2 but on extensions/v1beta1

Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
$ helm upgrade --install k8s-readiness-test ./repo --debug --namespace=k8s-readiness-test --values repo/values-dev.yaml --set imageTag=k8s-readiness-test-0d984aaa7732b603fcca5f3c3338e3f83e61c718 --wait --timeout 600

NAMESPACE: k8s-readiness-test
STATUS: DEPLOYED
<< ... truncated ... >>
==> v1beta1/Deployment
NAME                DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
somecool-service  1        1        1           0          1m
@bacongobbler

This comment has been minimized.

Member

bacongobbler commented Mar 9, 2018

@davidchua this was fixed in 2.8.1

@davidchua

This comment has been minimized.

davidchua commented Mar 9, 2018

@bacongobbler I'm going through the code but the major changes I see from 2.7.2 and 2.8.1 on pkg/kube/wait.go for extensions/v1beta1 that may be relevant to the issue would just be the move from kcs.Core().Pods to kcs.CoreV1().Pods and kcs.Extensions().Deployments to kcs.ExtensionsV1beta1().Deployments.

Why would this affect the way the --wait deal with this scenario?

Please pardon my ignorance. My concern is also trying to avoid having to upgrade helm at least for the moment.

@davidchua

This comment has been minimized.

davidchua commented Mar 9, 2018

I have upgraded helm to 2.8.1, with the same issue, the pod is crashing, but helm upgrade --install --wait --timeout 600 still passes.

@samchenraising

This comment has been minimized.

samchenraising commented Apr 25, 2018

#3296 seems like the same issue. I'm using extensions/v1beta1. Helm v2.8.2, tested with k8s v1.7.12 and 1.8.0

@thomastaylor312

This comment has been minimized.

Collaborator

thomastaylor312 commented May 4, 2018

@samchenraising Yes it is. Let's use that as the issue to follow

@sta-szek

This comment has been minimized.

sta-szek commented May 8, 2018

Hi I have the same issue with helm 2.8.2.
I always deploy 2 pods using RollingUpdate 25%, 25%, always different docker image tag.

Helm seems to be waiting as described:
--wait if set, will wait until all Pods, PVCs, Services, and minimum number of Pods of a Deployment are in a ready state before marking the release as successful. It will wait for as long as --timeout

but after first pod is ready, it exits -- it does not wait for second pod!

I think problem is somewhere here: will wait until ... minimum number of Pods of a Deployment are in a ready state ...

I have two pods ready, but one from old release and one from new release.

@mbelang

This comment has been minimized.

mbelang commented May 18, 2018

Is there somebody that had a combination of Helm version + + k8s version + k8s api version that actually work with the helm --wait?

@thomastaylor312

This comment has been minimized.

Collaborator

thomastaylor312 commented May 18, 2018

@mbelang It works for me with apps/v1 on k8s 1.10 and Helm 2.9

@mbelang

This comment has been minimized.

mbelang commented May 18, 2018

@thomastaylor312 This is good news. Now we need to figure if it will work with apps/v1beta2. According to this it should: #3407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment