New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace FAILED deployments with helm upgrade --install --force
#3597
Conversation
Forward note: I'm a little iffy on marking the previous release as SUPERSEDED. |
Can this be the default behaviour if there is only 1 release prior and it failed? Seems to be the most common case so far. Beyond that, I agree with it not being the default behaviour. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment for discussion, otherwise I tested this and it is good to go
@@ -37,6 +38,10 @@ func (s *ReleaseServer) UpdateRelease(c ctx.Context, req *services.UpdateRelease | |||
s.Log("preparing update for %s", req.Name) | |||
currentRelease, updatedRelease, err := s.prepareUpdate(req) | |||
if err != nil { | |||
if req.Force { | |||
// Use the --force, Luke. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
pkg/tiller/release_update.go
Outdated
} | ||
} | ||
|
||
oldRelease.Info.Status.Code = release.Status_SUPERSEDED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may deserve a new status. If I was troubleshooting and saw "superseded" I wouldn't know it was force updated. Maybe REPLACED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good point. I also think it might be good to just keep this in the FAILED state so others cannot roll back to this release, which SUPERSEDED or REPLACED would allow. The more I think about this, the more I would prefer to retain the existing FAILED state.
Related comment: #3597 (comment)
@adamreese any opinions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FAILED would also be a good idea
pkg/tiller/release_update_test.go
Outdated
Chart: &chart.Chart{ | ||
Metadata: &chart.Metadata{Name: "hello"}, | ||
Templates: []*chart.Template{ | ||
{Name: "templates/something", Data: []byte("hello: world")}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of sad this text isn't from Star Wars 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be arranged...
pkg/tiller/release_update_test.go
Outdated
|
||
compareStoredAndReturnedRelease(t, *rs, *res) | ||
|
||
edesc := "Upgrade complete" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think expectedDescription
would be clearer here. Not a necessary change though by any means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearer variable names are always great.
That is good feedback, thanks @helgi. I'd be a little concerned about the behaviour being inconsistent for users though. In this case, a user can expect Perhaps we can decide at a later date if we should do that based on others' feedback in 2.8.2. How does that sound? |
Yeah, I considered the inconsistency and cringed as I wrote that message. The scenario I run into a little too often (for comfort) is engineers throwing together a new helm chart and not running it in minikube but rather putting it directly into the dev CI system (copy pasta basically), which leads to failed first deployment a lot of the time. The user generally has no idea why it failed and why their future pushes do not fix the issue, and even when they have done the fix before (helm delete --purge) it is forgotten a lot of the time. Basically, first time deploy clumsiness UX issues. I'm happy with deferring the decision but I did want to bring up the use case |
pkg/tiller/release_update.go
Outdated
return res, err | ||
} | ||
|
||
// pre-upgrade hooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pre-install
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
pkg/tiller/release_update.go
Outdated
return res, err | ||
} | ||
|
||
// post-upgrade hooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be post-install
?
k, addressed all comments. New stuff since the last round of reviews:
Should be good for another round of reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, and your Star Wars reference is golden
When using `helm upgrade --install`, if the first release fails, Helm will respond with an error saying that it cannot upgrade from an unknown state. With this feature, `helm upgrade --install --force` automates the same process as `helm delete && helm install --replace`. It will mark the previous release as DELETED, delete any existing resources inside Kubernetes, then replace it as if it was a fresh install. It will then mark the FAILED release as SUPERSEDED.
Am I write in reading this as so if i have a chart deployed, break it, and then upgrade it, it will be fixed, not recreated? |
@bacongobbler I'm also confused regarding this change.
Is the |
|
and no, it kicks in any time a release fails to upgrade. |
is case 2 accurate? |
We're still on Helm v2.7.0 because the current upgrade-over-failure behavior appears to be safer for our use case than deleting a deployment. Our releases usually fail due to hooks, but our hooks are idempotent Jobs, so it's usually safe and desireable to upgrade right over them with the pre-existing Kubernetes resources. If I'm understanding the new behavior correctly, I believe it would be possible for a Deployment to be deleted and for no Pods to be available to serve traffic mid-release if a hook failed on the previous release. |
To answer case 2, |
👍 yep, I'm curious |
It seems from #3208 (comment) that this might be the case when using |
replace FAILED deployments with `helm upgrade --install --force`
Failed helm deployments deployments cannot be upgraded without the --force flag. See: helm/helm#3597
When using
helm upgrade --install
, if the first release fails, Helm will respond with an error saying that it cannot upgrade from an unknown state.With this feature,
helm upgrade --install --force
automates the same process ashelm delete && helm install --replace
. It will mark the previously FAILED release as DELETED, delete any existing resources inside Kubernetes, then replace it as if it was a fresh install. I did not want to make this the default behaviour ofhelm upgrade --install
because this is a destructive operation that deletes resources in Kubernetes, and the operator should opt into and accept this behaviour.closes #3353
refs discussion in #3437