-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulumi gets stuck waiting on a deployment to be updated while it has updated #1502
Comments
From #kubernetes community slack: Pulumi CLI version 2.22.0 |
Saw issue in ci/cd and we were able to repro by running the same scripts locally. |
We are doing that. Does not help.
--matteo
…On Tue, Mar 23, 2021 at 5:01 PM Tushar Shah ***@***.***> wrote:
Saw issue in CircleCI and we were able to repro by running the same
scripts locally.
We use --parallel 1 to limit the memory usage
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1502 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUBK4KAPZPFUVFVYZ76XL3TFD6RJANCNFSM4ZN2K3IQ>
.
|
Work Around for the issue: If you do the following:
There is a line that shows up
Removing this line and then do a |
It looks like this annotation is caused by running |
I haven't been able to reproduce this locally. Here's what I've tried:
I’ve also tried a few combinations of pulumi refresh and updating the Deployment metadata rather than the spec, but everything seems to be proceeding as expected. @monchier Are you still hitting this issue, and if so, do you have any more ideas on what I could do to repro? |
I got some more context on this issue from the user. Initially, they were running a very large number of Deployments (5000+) in a single Namespace. Most of these Deployments were managed externally (not by Pulumi), but a small number were managed with Pulumi. These managed Deployments were the ones getting stuck. Moving the small number of managed Deployments to a separate Namespace resolved the issue. (It appears that the This leads me to believe that the issue is related to the scale rather than a bug in the await logic itself. I have two initial theories:
Based on the Deployment status given in the initial issue report, I think we can eliminate option 1. For option 2, processing occurs serially, and has not been optimized for performance. It seems likely that this would be a bottleneck since it would be processing Events from every Deployment in the Namespace; not only the ones managed by Pulumi. I'll need to do some more testing to confirm this theory, but in the meantime, it sounds like moving the Pulumi-managed Deployments into a separate Namespace solved the problem for the user. |
I reviewed the relevant code and also wasn't able to reproduce locally, so this appears to be a performance issue caused by a large number of Deployments in the same Namespace. The await logic serially processes every incoming Event for the target Namespace, which is likely not a viable solution for more extreme cases like this. It may be worth optimizing the await logic in the future, but I'm going to close this issue for now since we have a viable workaround and the problem seems to be a scaling issue rather than a correctness issue. |
There is a performance problem in our code here, due to the way we have implemented await logic. The scenario which led to this is perfectly valid and should work with our provider, just as I assume |
There are a few intersecting issues here:
I am planning to target #1596 to fix this specific issue and prioritize the other linked issues subsequently. |
Fixed by #1598 |
We are running Pulumi on to release to our Kubernetes cluster. It is quite a standard setup, but we have many deployments (in the same namespace) - approximatley 5000. We only manage some of them with Pulumi since the other deployment are application-level (app creates and remove them dynamically).
Expected behavior
When the deployment has updated in Kubernetes, Pulumi should progress. This is the status of the deployment when Pulumi gets stuck.
Current behavior
Pulumi is stuck indefinitely as
As a workaround, we interrupts Pulumi and restart Pulumi. on the second run Pulumi will progress just fine. Since we have multiple deployments, this happens for every deployment.
The text was updated successfully, but these errors were encountered: