Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of delete failures for targeted destroys #14735

Merged
merged 1 commit into from Dec 5, 2023

Conversation

justinvp
Copy link
Member

@justinvp justinvp commented Dec 4, 2023

When doing a targeted destroy, if a delete operation for a resource fails, the resource was still being removed from the state. Instead, if the delete operation for a resource fails, it should remain in the state.

Fixes #8164
Fixes #14416

When doing a targeted destroy, if a delete operation for a resource fails, the resource was still being removed from the state. Instead, if the delete operation for a resource fails, it should remain in the state.
@pulumi-bot
Copy link
Contributor

Changelog

[uncommitted] (2023-12-04)

Bug Fixes

  • [engine] Fix handling of delete failures for targeted destroys
    #14735

Comment on lines -391 to -400
// After executing targeted deletes, we may now have resources that depend on the resource that
// were deleted. Go through and clean things up accordingly for them.
if targetsOpt.IsConstrained() {
resourceToStep := make(map[*resource.State]Step)
for _, step := range deleteSteps {
resourceToStep[ex.deployment.olds[step.URN()]] = step
}

ex.rebuildBaseState(resourceToStep, false /*refresh*/)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the problematic code. It's assuming all the delete steps will be successful, removing the associated resources from the state, but that may not be the case if a delete fails.

Originally, I was thinking we'd have to rework this to only do it for successful delete operations, but thinking about it more, I think we don't have to call ex.rebuildBaseState at all for targeted deletes because if we're doing a targeted delete, we're already forced to specify dependents (or specify TargetDependents), so there shouldn't be a need to clean things up. Does this sound right, or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Frassle, @pgavlin, can you sanity check me on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're already forced to specify dependents

Are we? I think we check that only targetted resources or their dependents can be deleted, but it looks like the planner would allow deleting a parent and not a child? It probably shouldn't but do we have a test to prove that either way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds a test for this: https://github.com/pulumi/pulumi/pull/14735/files#diff-aa36a122dd8ff659d8ba8df709cf3fac67dce2b07d5ad8c51240eb76b08d7022R1664-R1666

If I try to target a parent and not its child, it errors.

@justinvp justinvp marked this pull request as ready for review December 4, 2023 16:29
@justinvp justinvp requested a review from a team December 4, 2023 16:36
}),
},
}, false, p.BackendClient, nil)
assert.Error(t, err) // Expect error because we didn't specify the dependency as a target or TargetDependents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use assert.Error but EqualError or at least ErrorContains to check this is the error we expect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it, but it's not actually very specific in this case: we get a generic error that says "step generator errored".

What's happening is we write out error messages to the user and then set sawError = true:

if !targetsOpt.Contains(urn) && !sg.opts.TargetDependents {
d := diag.GetResourceWillBeDestroyedButWasNotSpecifiedInTargetList(urn)
// Targets were specified, but didn't include this resource to create. Report all the
// problematic targets so the user doesn't have to keep adding them one at a time and
// re-running the operation.
//
// Mark that step generation entered an error state so that the entire app run fails.
sg.deployment.Diag().Errorf(d, urn)
sg.sawError = true
deletingUnspecifiedTarget = true
}

func GetResourceWillBeDestroyedButWasNotSpecifiedInTargetList(urn resource.URN) *Diag {
return newError(urn, 2014, `Resource '%v' will be destroyed but was not specified in --target list.
Either include resource in --target list or pass --target-dependents to proceed.`)
}

The actual step generator errored error we get here from Run is from this:

// Figure out if execution failed and why. Step generation and execution errors trump cancellation.
if err != nil || stepExecutorError != nil || ex.stepGen.Errored() {
// TODO(cyrusn): We seem to be losing any information about the original 'res's errors. Should
// we be doing a merge here?
ex.reportExecResult("failed", preview)
if err != nil {
return nil, result.BailError(err)
}
if stepExecutorError != nil {
return nil, result.BailErrorf("step executor errored: %w", stepExecutorError)
}
return nil, result.BailErrorf("step generator errored")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok, I might have a look at making that error better then. It would be good to be able to test in places like this which bit of the engine err'd even if we don't bubble that error value specifically back to the user. Was a big point of the bail rework we did. But no need to be done in this PR, can tidy that later.

},
}, false, p.BackendClient, nil)
assert.Error(t, err)
validateSnap(snap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also assert that the resource is tagged with delete=true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that only happen if the resource is pending deletion due to replacement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeh just checked, it is only for replacements we do that.

@justinvp justinvp added this pull request to the merge queue Dec 5, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 5, 2023
@justinvp justinvp added this pull request to the merge queue Dec 5, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 5, 2023
@justinvp justinvp added this pull request to the merge queue Dec 5, 2023
Merged via the queue into master with commit 71e1ee7 Dec 5, 2023
45 checks passed
@justinvp justinvp deleted the justin/targetfaileddelete branch December 5, 2023 19:12
github-merge-queue bot pushed a commit that referenced this pull request Dec 7, 2023
### Features

- [cli] Add `--import-file` to `pulumi preview` to generate a
placeholder import file for every resource that needs to Create.
  [#14548](#14548)

- [sdk/nodejs] Add TypeScript definitions for the grpc and protobuf
generated code.
  [#14415](#14415)


### Bug Fixes

- [auto] Don't swallow error if EditDir is not found in ProgramTest.
  [#14695](#14695)

- [cli/display] Fix a panic in diff display when parsing YAML strings
  [#14710](#14710)

- [auto/python] Ensures that the project_settings has a main directory
for inline programs in python
  [#14709](#14709)

- [engine] Error if a resource's parent is a skipped create.
  [#14672](#14672)

- [engine] Warn if SDKs are trying to use old RegisterResource style
StackReferences.
  [#14678](#14678)

- [engine] Send resource inputs as inputs and state for Reads.
  [#14683](#14683)

- [engine] Engine now prefers stable plugin versions to pre-releases
when no explict version is given.
  [#14700](#14700)

- [engine] Fix handling of delete failures for targeted destroys
  [#14735](#14735)

- [sdkgen] Return all bind diagnostics in sdk-gen rather than just the
first.
  [#14661](#14661)

- [sdkgen/go] Fix compiling plain element type with plain maps
  [#14704](#14704)

- [sdkgen/go] Fix generating input collection types for enums when used
from an array of map of enums
  [#14744](#14744)

- [backend/service] Service backend now validates snapshots are valid on
load, same as the self managed backend. This can be disabled with
--disable-integrity-checking.
  [#14046](#14046)
@justinvp justinvp mentioned this pull request Dec 7, 2023
github-merge-queue bot pushed a commit that referenced this pull request Dec 7, 2023
### Features

- [cli] Add `--import-file` to `pulumi preview` to generate a
placeholder import file for every resource that needs to Create.
  [#14548](#14548)

- [sdk/nodejs] Add TypeScript definitions for the grpc and protobuf
generated code.
  [#14415](#14415)


### Bug Fixes

- [auto] Don't swallow error if EditDir is not found in ProgramTest.
  [#14695](#14695)

- [cli/display] Fix a panic in diff display when parsing YAML strings
  [#14710](#14710)

- [auto/python] Ensures that the project_settings has a main directory
for inline programs in python
  [#14709](#14709)

- [engine] Error if a resource's parent is a skipped create.
  [#14672](#14672)

- [engine] Warn if SDKs are trying to use old RegisterResource style
StackReferences.
  [#14678](#14678)

- [engine] Send resource inputs as inputs and state for Reads.
  [#14683](#14683)

- [engine] Engine now prefers stable plugin versions to pre-releases
when no explict version is given.
  [#14700](#14700)

- [engine] Fix handling of delete failures for targeted destroys
  [#14735](#14735)

- [sdkgen] Return all bind diagnostics in sdk-gen rather than just the
first.
  [#14661](#14661)

- [sdkgen/go] Fix compiling plain element type with plain maps
  [#14704](#14704)

- [sdkgen/go] Fix generating input collection types for enums when used
from an array of map of enums
  [#14744](#14744)

- [backend/service] Service backend now validates snapshots are valid on
load, same as the self managed backend. This can be disabled with
--disable-integrity-checking.
  [#14046](#14046)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants