Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do something better for 409 errors #2073

Open
joeduffy opened this issue Oct 17, 2018 · 13 comments
Open

Do something better for 409 errors #2073

joeduffy opened this issue Oct 17, 2018 · 13 comments
Assignees
Labels
area/cli UX of using the CLI (args, output, logs) kind/enhancement Improvements or new features

Comments

@joeduffy
Copy link
Member

joeduffy commented Oct 17, 2018

If you attempt to do a concurrent update, while another is in progress, you'll see:

error: [409] Conflict: Another update is currently in progress.

This error is unfortunate for both possible reasons you'd end up in this state

  1. If it's simply that another job is running, you probably want to wait for it to complete (retry), and you most definitely don't want to run pulumi cancel
  2. If it's an orphaned update, you will need to run pulumi cancel, but you'll want to know that it's been orphaned for sure -- which the error message does not make clear

It would seem we should be able to detect this somehow, especially since we already have the notion of a "keepalive" to renew tokens. We should put some more thought into this experience.

When we know it's not an orphaned situation, we probably want to retry by default -- especially in CI scenarios -- and give users control over the maximum retry period (e.g., perhaps 10 minutes is the maximum, since CI jobs are generally expected to be short). This will mean that we don't need to implement our "clever" CI job scheduling system across N different CI systems, some of which may not even support this capability.

@joeduffy joeduffy added area/cli UX of using the CLI (args, output, logs) kind/enhancement Improvements or new features labels Oct 17, 2018
@joeduffy joeduffy added this to the 0.19 milestone Oct 17, 2018
@geekflyer
Copy link

geekflyer commented Oct 17, 2018

just to share one of my own experiences: A couple of days ago pulumi crashed once while attempting to deploy istio. Subsequently I also ran into the error: [409] Conflict: Another update is currently in progress. issue. I didn't know how to tell pulumi that the previous update crashed (I was searching for "unlock" etc. in the docs, which matches the terraform terminology) and didn't know / find out about the pulumi cancel command. After some time (10-15 minutes or so) wait I could continue but it would've been handy to know about pulumi cancel :). I think the error message should explicitly contain a reference to some docs about pulumi cancel.

@joeduffy
Copy link
Member Author

This just came up when talking to @chrsmith about various CI integrations. @praneetloke @ellismg @pgavlin , let's make sure to tackle this during this sprint (ideally on the "early" side since if people try to operationalize a lot of our ongoing CI work, they are likely to hit this issue pretty quickly).

@praneetloke praneetloke modified the milestones: 0.19, 0.21 Nov 21, 2018
@praneetloke praneetloke modified the milestones: 0.21, 0.22 Mar 7, 2019
@lukehoban
Copy link
Member

@praneetloke let's try to summarize a plan for what we could do here.

@lukehoban lukehoban modified the milestones: 0.22, 0.23 Apr 20, 2019
@lukehoban lukehoban assigned chrsmith and unassigned praneetloke Apr 20, 2019
@chrsmith chrsmith modified the milestones: 0.23, 0.24 May 20, 2019
@joeduffy joeduffy assigned ellismg and unassigned chrsmith Jun 9, 2019
@joeduffy
Copy link
Member Author

joeduffy commented Jun 9, 2019

I just hit this one again, and it reminded me we could do better here. Assigning to @ellismg for triage, since this really is an SDK thing, not a service thing.

@joeduffy
Copy link
Member Author

joeduffy commented Jun 9, 2019

Also one other interesting thing to note: This occurs after confirming the update, not during the preview. That makes it extra scary. I wonder, why can't we tell the user this during the preview?

@pgavlin
Copy link
Member

pgavlin commented Jun 10, 2019

I wonder, why can't we tell the user this during the preview?

With a little work, we could tell the user during preview that another update was in progress. That would probably improve the UX in the common case. It is important to note, though, that we cannot fully eliminate the original issue: for proper concurrency control, we always need to check when attempting to start an update that no other update is in progress.

@ellismg ellismg modified the milestones: 0.24, 0.25 Jun 10, 2019
@lukehoban lukehoban modified the milestones: 0.25, 0.26 Jul 19, 2019
@ellismg ellismg modified the milestones: 0.26, 0.27 Aug 5, 2019
@lblackstone
Copy link
Member

It would be nice if we displayed a shortlink that resolved to https://www.pulumi.com/docs/reference/troubleshooting/#conflict here. Right now, all you get is the following message:

error: [409] Conflict: Another update is currently in progress.

@pgavlin pgavlin assigned mikhailshilkov and unassigned ellismg Sep 6, 2019
@mikhailshilkov
Copy link
Member

@pgavlin So, what is the plan for this sprint? Just improve the message to point to the docs, do so during the preview, or do something more to match the original description?

@pgavlin
Copy link
Member

pgavlin commented Sep 9, 2019

I think we should just point to the docs. We should open a separate issue to track retrying, which we might even want to put under a separate flag.

@mikhailshilkov
Copy link
Member

The point-to-docs PR is merged so I'll remove the current milestone until we have an exact plan for more advanced scenarios

@mikhailshilkov mikhailshilkov removed this from the 0.27 milestone Sep 10, 2019
@alexeyzimarev
Copy link

I also experience multiple issues when trying to use Pulumi in the GitLab CI pipeline. GitLab can cancel the deployment if another pipeline starts since the deployment job expires. Either I get the "another update is in progress" error or an inconsistent stack state, which has to be recovered manually.

Another issue that I experienced just now is this wicked error that I cannot find anywhere:

updating failed [diff: ~spec]; error: post-step event returned an error: failed to save snapshot: [409] Conflict: The Update is not in progress.

Now I just want to kill the stack but I get the following error when trying to destroy it:

error: the current deployment has 1 resource(s) with pending operations:

  • urn:pulumi:dev::gl-test::kubernetes:apps/v1:Deployment::pulumi-test, interrupted while updating

@barakcoh
Copy link

@joeduffy we're trying to implement a stack-per-PR strategy in our CI (CodeBuild) and came across this issue as well. has there been any progress on this issue or a suggested workaround?

@johnkors
Copy link

Yeah, this is starting to hit our team using GitHub actions pretty quickly. Looking into using github actions built-in support for concurrency limits: https://docs.github.com/en/actions/using-jobs/using-concurrency#examples-using-concurrency-and-the-default-behavior

Could probably use the stack name here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli UX of using the CLI (args, output, logs) kind/enhancement Improvements or new features
Projects
None yet
Development

No branches or pull requests