Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[engine] Clear pending operations with refresh. #8435

Merged
merged 10 commits into from Mar 25, 2022

Conversation

pgavlin
Copy link
Member

@pgavlin pgavlin commented Nov 16, 2021

Just what it says on the tin. This is implemented by moving the check
for pending operations in the last statefile into the deployment
executor and making it conditional on whether or not a refresh is being
performed (either via pulumi refresh or pulumi up -r). Because
pending operations are not carried over from the base statefile, this
has the effect of clearing pending operations if a refresh is performed.

Part of #4265.

@pgavlin
Copy link
Member Author

pgavlin commented Nov 16, 2021

pending operations are not carried over from the base statefile

FWIW, this aspect of the implementation makes me a little uncomfortable. Another approach that I think is worth considering is something like this:

  1. if a refresh is not being performed, retain the current behavior
  2. otherwise:
    a. scan the pending operations for creates. remove these operations and issue warnings re: potentially orphaned resources
    b. retain all other pending operations and begin refreshing the stack
    c. when a resource R is refreshed, clear any pending operation associated with R

That approach feels a little more well-founded, as it only drops information re: the pending operation once we've explicitly taken action to resolve the operation's results. That said, taking this PR does not prevent us from adopting the "stronger" approach in the future.

@t0yv0
Copy link
Member

t0yv0 commented Nov 17, 2021

Before diving in implementation options, can I get a bit more context. Sounds like the user is running into interrupted updates and using our IUR procedure I was just looking up:

** Interrupted Update Recovery

We have a documented procedure for interrupted update recovery:

https://www.pulumi.com/docs/troubleshooting/#interrupted-update-recovery

***  pulumi cancel

What does it to precisely?

Note that this operation is very dangerous, and may leave the stack in
an inconsistent state if a resource operation was pending when the
update was canceled.

*** `pulumi stack export | pulumi stack import`

This clears your state’s stack of all pending operations.

Example:

  warning: removing pending operation 'creating' on '...' from snapshot
  import successful.

*** Manual edit of cloud state

For every warning that this (export|import) command prints out, you
should verify with your cloud provider whether or not this operation
was successful. If the operation was successful, and a resource was
created, you should delete that resource using your cloud provider’s
console, CLI, or SDK

*** `pulumi refresh`

I understand this reconciles by copying cloud state to stack state
when they diverge.

Is the idea that after the change the IUR becomes simply pulumi refresh? Will this automate the manual edit of cloud state? Sounds like from the above description the case causing problems is when a resource was created but the pending create operation is recorded, so we it exists in the cloud but pending operation is the only record on Pulumi side about the resource being managed by Pulumi.

@t0yv0
Copy link
Member

t0yv0 commented Nov 17, 2021

I'm reading the #4265 and it's helpful. Luke's comments constrain the scope of the fix. I need to do some more work here and take it for a spin to understand the changes.

I like the "Another approach that I think is worth considering is something like this" as it's spelled out what's happening and I feel like I understand it.

I feel like I need to understand and step through the approach in the PR. Open questions (notes to self to verify):

  1. when pending operations are dropped because pulumi refresh or pulumi up -r, are there still warnings for orphan resources

  2. are normal (non-refresh) updates still blocked on pending operations or they continue rolling forward with a warning (as in Luke's comment in 4625)

  3. double check the IUR procedure after this change

  4. should we consider backlogging the more invasive work of fully automated recovery (including automatic removal of orphan resources), or has it been backlogged? It appears possible theoretically

@pgavlin
Copy link
Member Author

pgavlin commented Nov 17, 2021

when pending operations are dropped because pulumi refresh or pulumi up -r, are there still warnings for orphan resources

Not as written. This would probably be easy to do.

are normal (non-refresh) updates still blocked on pending operations or they continue rolling forward with a warning (as in Luke's comment in 4625)

Yes, they are still blocked on pending operations.

double check the IUR procedure after this change

This is the thrust of this PR. With these changes, the IUR procedure becomes "run pulumi refresh, or use pulumi up -r in all cases to avoid the possibility of errors due to pending operations".

Just what it says on the tin. This is implemented by moving the check
for pending operations in the last statefile into the deployment
executor and making it conditional on whether or not a refresh is being
performed (either via `pulumi refresh` or `pulumi up -r`). Because
pending operations are not carried over from the base statefile, this
has the effect of clearing pending operations if a refresh is performed.

Fixes #4265.
@pgavlin pgavlin force-pushed the pgavlin/refreshPendingOperations branch from a3060a1 to 88f9ca4 Compare November 18, 2021 22:07
Copy link
Member

@t0yv0 t0yv0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes and the test looking good!

I just really think we should send warnings when --refresh ignores maybe orphaned resources.

I tried adding this but printPendingOperationsError seems tied to Result which seems tied to early termination (in PR or quick follow up). Which API can we use to send warnings into the console for the user? Thanks.

@lukehoban
Copy link
Member

are normal (non-refresh) updates still blocked on pending operations or they continue rolling forward with a warning (as in Luke's comment in 4625)

Yes, they are still blocked on pending operations.

I do not think I would consider #4265 resolved until we make this change. That is the key point of extreme friction in the current model.

If we want to land this refresh change first, that sounds reasonable, but I think we should leave #4265 open to track addressing the core user experience pain here.

@Pavel910
Copy link

Pavel910 commented Dec 17, 2021

I agree with @lukehoban, Pulumi should handle the pending_operations internally on updates. It's a big pain for all of our users. The entire Webiny CMS is relying on Pulumi, and we keep getting questions about this exact issue. Solving this would be a huge step forward! 🍻 ❤️

@Zaid-Ajaj
Copy link
Contributor

As discussed with @Frassle I've made a change to this PR where pending CREATE operations are not cleared upon a refresh and these will require user intervention to resolve potentially orphaned resources.

@Zaid-Ajaj Zaid-Ajaj merged commit 36c6533 into master Mar 25, 2022
@pulumi-bot pulumi-bot deleted the pgavlin/refreshPendingOperations branch March 25, 2022 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants