-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Resources can get into a bad state if resource client init fails #364
Comments
I am confused about how this could happen. Do we have a repro or a timeline/set of logs to indicate how this could have happened? Until them I'm super reticent to believe that it's actually less likely because of the client logic cleanup. Marking as Q3 -- can always remove later if we decide it's not super high priority. |
@hausdorff IIRC, the problem was that the resource create request was successful, but the watch client used by the await logic failed. We were not returning a partial failure in this case, so Pulumi didn't have a record of that resource being created. This can cause two error cases:
|
@lblackstone Do you have a repro for this? |
I don't have a repro, and the original cause was buggy resource client logic. I think we should still audit the error path here, but don't think it's likely to occur in practice at this point. |
This commit will fix an almost-theoretical object leak in the code that awaits resource creation, update, or read. The issue in each case is that the final line of each of these `await` functions calls is a call to `Get`, which will fail if the cluster is unreachable, returning `nil` instead of an object. This causes Pulumi to believe the object was not successfully created. This commit will return an old version of the live object instead. Fixes #364.
I'm still skeptical that this is "really a bug" but I've put up #664 which fixes a (theoretical?) object leak that could have caused something like this. |
This commit will fix an almost-theoretical object leak in the code that awaits resource creation, update, or read. The issue in each case is that the final line of each of these `await` functions calls is a call to `Get`, which will fail if the cluster is unreachable, returning `nil` instead of an object. This causes Pulumi to believe the object was not successfully created. This commit will return an old version of the live object instead. Fixes #364.
Following #348, I ran into a bug where the provider failed to get clients with the following errors:
After this failure, I tried running
pulumi up
again, but got an error because the resource had actually created successfully, but not registered that status with the engine:Also, a
pulumi refresh
did not correct the state mismatch.The text was updated successfully, but these errors were encountered: