New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do something better for 409 errors #2073
Comments
just to share one of my own experiences: A couple of days ago pulumi crashed once while attempting to deploy istio. Subsequently I also ran into the |
This just came up when talking to @chrsmith about various CI integrations. @praneetloke @ellismg @pgavlin , let's make sure to tackle this during this sprint (ideally on the "early" side since if people try to operationalize a lot of our ongoing CI work, they are likely to hit this issue pretty quickly). |
@praneetloke let's try to summarize a plan for what we could do here. |
I just hit this one again, and it reminded me we could do better here. Assigning to @ellismg for triage, since this really is an SDK thing, not a service thing. |
Also one other interesting thing to note: This occurs after confirming the update, not during the preview. That makes it extra scary. I wonder, why can't we tell the user this during the preview? |
With a little work, we could tell the user during preview that another update was in progress. That would probably improve the UX in the common case. It is important to note, though, that we cannot fully eliminate the original issue: for proper concurrency control, we always need to check when attempting to start an update that no other update is in progress. |
It would be nice if we displayed a shortlink that resolved to https://www.pulumi.com/docs/reference/troubleshooting/#conflict here. Right now, all you get is the following message:
|
@pgavlin So, what is the plan for this sprint? Just improve the message to point to the docs, do so during the preview, or do something more to match the original description? |
I think we should just point to the docs. We should open a separate issue to track retrying, which we might even want to put under a separate flag. |
The point-to-docs PR is merged so I'll remove the current milestone until we have an exact plan for more advanced scenarios |
I also experience multiple issues when trying to use Pulumi in the GitLab CI pipeline. GitLab can cancel the deployment if another pipeline starts since the deployment job expires. Either I get the "another update is in progress" error or an inconsistent stack state, which has to be recovered manually. Another issue that I experienced just now is this wicked error that I cannot find anywhere:
Now I just want to kill the stack but I get the following error when trying to destroy it:
|
@joeduffy we're trying to implement a stack-per-PR strategy in our CI (CodeBuild) and came across this issue as well. has there been any progress on this issue or a suggested workaround? |
Yeah, this is starting to hit our team using GitHub actions pretty quickly. Looking into using github actions built-in support for concurrency limits: https://docs.github.com/en/actions/using-jobs/using-concurrency#examples-using-concurrency-and-the-default-behavior Could probably use the stack name here? |
If you attempt to do a concurrent update, while another is in progress, you'll see:
This error is unfortunate for both possible reasons you'd end up in this state
pulumi cancel
pulumi cancel
, but you'll want to know that it's been orphaned for sure -- which the error message does not make clearIt would seem we should be able to detect this somehow, especially since we already have the notion of a "keepalive" to renew tokens. We should put some more thought into this experience.
When we know it's not an orphaned situation, we probably want to retry by default -- especially in CI scenarios -- and give users control over the maximum retry period (e.g., perhaps 10 minutes is the maximum, since CI jobs are generally expected to be short). This will mean that we don't need to implement our "clever" CI job scheduling system across N different CI systems, some of which may not even support this capability.
The text was updated successfully, but these errors were encountered: