Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if Image exists #69

Closed
sirupsen opened this issue Mar 18, 2017 · 3 comments
Closed

Check if Image exists #69

sirupsen opened this issue Mar 18, 2017 · 3 comments

Comments

@sirupsen
Copy link

sirupsen commented Mar 18, 2017

If you deploy a SHA that doesn't exist, kubernetes-deploy will continue and the container will end up in ImagePullBackOff until the image is up.

Should we consider blocking in kubernetes-deploy or failing the deploy early, instead of resorting to a timeout?

How can we check this with the appropriate credentials?

It's worth noting that in Shopify we do this check in our present deployment wrapper around kubernetes-deploy (Capistrano).

What do you think @KnVerey @kirs?

@kirs
Copy link
Contributor

kirs commented Mar 18, 2017

I agree that it would be great to have this kind of check, but that might be problematic because for security reasons the host that runs kubernetes-deploy might not have access to the container registry. Also the registry auth may be specific to the project (VPN or token).
Maybe we should leave it up to users?

@KnVerey
Copy link
Contributor

KnVerey commented Mar 19, 2017

I agree with Kir that we shouldn't really assume the deploy host has access to the registry. Conversely, it would be possible (if stranger) for the deploy host to have access but the production cluster not to. We should improve our handling of the situation though for sure.

Did you see warnings logged about the ImangePullBackoff? This code is supposed to log, giving you the chance to realize it is unrecoverable and abort. Minimally we could improve that message to suggest action.

#54 proposes aborting the deploy at some point based on observing this condition. We'd have to think carefully about when we make the "it's doomed" call though... Last week we saw a case where issues with the production env's link to the registry caused a huge deployment to roll out super slowly, with many (but not all) containers in ImagePullBackoff. Ideally that case wouldn't fail the deploy. Maybe we could flag this condition to the parent deployment object, which could make the call when all its children are persistently in that state... still a timeout, but hopefully smarter?

On the other hand, maybe we should keep the log-warning approach and expect deployments to specify progressDeadlineSeonds to actually fail the deploy, for real on the k8s side, not just from our gem's perspective. AFAICT that field seems to be the Kubernetes answer to permanent failure conditions, including this one.

@KnVerey
Copy link
Contributor

KnVerey commented Jul 12, 2017

IMO #116 resolved this. We now inspect all pods in the new ReplicaSet, and if they are all failing for a recognized reason (including because of failure to pull image), the deploy will fail immediately. Please reopen if you disagree.

@KnVerey KnVerey closed this as completed Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants