-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unversioned deployment with multiple pods does not update all pods #110
Comments
It seems that Kubernetes doesn't manage to destroy existing replicas in time. Could you try scaling down to 1 replica and then trying an update? If it would solve the issue then Keel could do it for you. I imagine workflow could be:
|
That would work for non-production workloads, which applies to working with latest tag anyway. The other would be to set spec.strategy.type to 'Recreate' in the deployment which results to some downtime as well, but wouldn't require changes in keel. I'm currently trying out a very rough patch, where no reset is performed, but a ENV variable is set to each container, resulting in a new rc each time. What is your opinion on this? I remember seeing some discussion earlier on the force-policy feature ticket. |
Could work. Another option is to terminate pods, it could be done "slowly" so it's almost like a rolling update. Regarding non-production workloads - I guess it's reasonable to expect that production workloads would be versioned. Regarding that patch - feel free to open a work in progress PR :) |
I tested keel 0.4.7 with gke server version 1.8, and "force update" does not work for me. Here is the sequence of events that happened:
The notification I got is that the image is updated successfully, and yet the pod is not updated at all. (There is only 1 pod.) Instead of pulling tag |
Seems like k8s scheduler behaviour changed. I think force update should be reimplemented with your suggestion. Seems like a clean approach. @taylorchu do you have to wait a little bit when you set replicas to 0 or it terminates pods immediately? |
no, I do not set replica count to 0 myself. |
@taylorchu started looking at this issue. One problem with setting replicas to 0 would mean that auto scaler would stop working (it has to be unset). What about terminating all pods? that would result in k8s recreating them. If it was done with some breaks in the process it could even mean no downtime. |
We're just starting to use Keel (on GCP K8s 1.8.7) and are hit with this problem on 0.6.1. As my 5c, I think emulating the rolling update would be the cleanest way to go. Also, we're quite happy running (a carefully selected set of) |
Hi, thanks. Will get this sorted ASAP. Do you think my suggested strategy by terminating pods would do the job? Terminated pod will always pull the new version as I understand. |
Well AFAIK you'd need to set |
Awesome, I am a bit swamped by work these days but will try to add and test this strategy either this evening or on the weekend :) |
That would be awesome! We'll be more than happy to help you test the changes if you like. |
Hi @The-Loeki, just pushed It would be nice if you did more testing as it should also solve that other #153 issue (even added a unit test for that specific docker registry :)). Migrated client-go (which is now split into multiple repos) to |
Hi @rusenask thanks for your hard work :) Today we've done the first round of testing on the alpha tag. The Good
The Bad
The Ugly
I'd venture from the logs that it tries to auth against Quay with the empty/nonexistent secret or something, but that's just a guess |
Hi @The-Loeki thanks for trying it out :) Great regarding the good part. As for the bad, maybe it's angry about empty credentials (try sending empty basic auth). Not sure what changed though. I will dig into it. |
Did you see my updated comments? I'm hacking up a PR with fixed RBAC perms, but I'm not sure if you need to be able to delete replicasets & controllers too? |
No, I didn't see it. Yeah, totally forgot that perms are required for deletion. Only pod deletion permissions are required, thanks! Regarding the quay: After a simple unit test that pretty much does the same thing as for Zalando registry, Quay returns an error (every registry wants to be unique). Will get it fixed. |
We'll be deploying Harbor as our own registry service soon, so you might want to get more coffee ;) |
at least it's open source :) |
Apparently that error was just a log of failed ping, the manifest was retrieved successfully. I have removed Ping function from the registry client as I can see that public index.docker.io doesn't have that endpoint anymore too. New Merging into |
Looks much better indeed
|
Fixed, available from |
…-hq#110) Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.170.0 to 0.171.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](googleapis/google-api-go-client@v0.170.0...v0.171.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
I have a deployment with 5 replicas following a :latest tag. From the logs, I can see that keel resets the image to 0.0.0 and after 5 seconds applies :latest.
The deployment seems to revert back to the previous version of replicaset, and the rollout does not continue for pods. The single pod that was recreated has the same rc version, but the latest image was pulled via imagePullPolicy. During the reset, I can see 2 pods in
ErrImagePull
state.I'm running keel 0.5.0-rc.1 with native webhooks with
keel.sh/policy: force
in kubernetes 1.7.xThe events from the deployment are:
and in the end, the pods are:
The text was updated successfully, but these errors were encountered: