New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of delete failures for targeted destroys #14735
Conversation
When doing a targeted destroy, if a delete operation for a resource fails, the resource was still being removed from the state. Instead, if the delete operation for a resource fails, it should remain in the state.
Changelog[uncommitted] (2023-12-04)Bug Fixes
|
// After executing targeted deletes, we may now have resources that depend on the resource that | ||
// were deleted. Go through and clean things up accordingly for them. | ||
if targetsOpt.IsConstrained() { | ||
resourceToStep := make(map[*resource.State]Step) | ||
for _, step := range deleteSteps { | ||
resourceToStep[ex.deployment.olds[step.URN()]] = step | ||
} | ||
|
||
ex.rebuildBaseState(resourceToStep, false /*refresh*/) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the problematic code. It's assuming all the delete steps will be successful, removing the associated resources from the state, but that may not be the case if a delete fails.
Originally, I was thinking we'd have to rework this to only do it for successful delete operations, but thinking about it more, I think we don't have to call ex.rebuildBaseState
at all for targeted deletes because if we're doing a targeted delete, we're already forced to specify dependents (or specify TargetDependents), so there shouldn't be a need to clean things up. Does this sound right, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're already forced to specify dependents
Are we? I think we check that only targetted resources or their dependents can be deleted, but it looks like the planner would allow deleting a parent and not a child? It probably shouldn't but do we have a test to prove that either way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR adds a test for this: https://github.com/pulumi/pulumi/pull/14735/files#diff-aa36a122dd8ff659d8ba8df709cf3fac67dce2b07d5ad8c51240eb76b08d7022R1664-R1666
If I try to target a parent and not its child, it errors.
}), | ||
}, | ||
}, false, p.BackendClient, nil) | ||
assert.Error(t, err) // Expect error because we didn't specify the dependency as a target or TargetDependents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use assert.Error
but EqualError
or at least ErrorContains
to check this is the error we expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it, but it's not actually very specific in this case: we get a generic error that says "step generator errored".
What's happening is we write out error messages to the user and then set sawError = true
:
pulumi/pkg/resource/deploy/step_generator.go
Lines 1278 to 1290 in 3d37369
if !targetsOpt.Contains(urn) && !sg.opts.TargetDependents { | |
d := diag.GetResourceWillBeDestroyedButWasNotSpecifiedInTargetList(urn) | |
// Targets were specified, but didn't include this resource to create. Report all the | |
// problematic targets so the user doesn't have to keep adding them one at a time and | |
// re-running the operation. | |
// | |
// Mark that step generation entered an error state so that the entire app run fails. | |
sg.deployment.Diag().Errorf(d, urn) | |
sg.sawError = true | |
deletingUnspecifiedTarget = true | |
} |
pulumi/sdk/go/common/diag/errors.go
Lines 79 to 82 in 3d37369
func GetResourceWillBeDestroyedButWasNotSpecifiedInTargetList(urn resource.URN) *Diag { | |
return newError(urn, 2014, `Resource '%v' will be destroyed but was not specified in --target list. | |
Either include resource in --target list or pass --target-dependents to proceed.`) | |
} |
The actual step generator errored
error we get here from Run
is from this:
pulumi/pkg/resource/deploy/deployment_executor.go
Lines 326 to 337 in 3d37369
// Figure out if execution failed and why. Step generation and execution errors trump cancellation. | |
if err != nil || stepExecutorError != nil || ex.stepGen.Errored() { | |
// TODO(cyrusn): We seem to be losing any information about the original 'res's errors. Should | |
// we be doing a merge here? | |
ex.reportExecResult("failed", preview) | |
if err != nil { | |
return nil, result.BailError(err) | |
} | |
if stepExecutorError != nil { | |
return nil, result.BailErrorf("step executor errored: %w", stepExecutorError) | |
} | |
return nil, result.BailErrorf("step generator errored") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok, I might have a look at making that error better then. It would be good to be able to test in places like this which bit of the engine err'd even if we don't bubble that error value specifically back to the user. Was a big point of the bail rework we did. But no need to be done in this PR, can tidy that later.
}, | ||
}, false, p.BackendClient, nil) | ||
assert.Error(t, err) | ||
validateSnap(snap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also assert that the resource is tagged with delete=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that only happen if the resource is pending deletion due to replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeh just checked, it is only for replacements we do that.
### Features - [cli] Add `--import-file` to `pulumi preview` to generate a placeholder import file for every resource that needs to Create. [#14548](#14548) - [sdk/nodejs] Add TypeScript definitions for the grpc and protobuf generated code. [#14415](#14415) ### Bug Fixes - [auto] Don't swallow error if EditDir is not found in ProgramTest. [#14695](#14695) - [cli/display] Fix a panic in diff display when parsing YAML strings [#14710](#14710) - [auto/python] Ensures that the project_settings has a main directory for inline programs in python [#14709](#14709) - [engine] Error if a resource's parent is a skipped create. [#14672](#14672) - [engine] Warn if SDKs are trying to use old RegisterResource style StackReferences. [#14678](#14678) - [engine] Send resource inputs as inputs and state for Reads. [#14683](#14683) - [engine] Engine now prefers stable plugin versions to pre-releases when no explict version is given. [#14700](#14700) - [engine] Fix handling of delete failures for targeted destroys [#14735](#14735) - [sdkgen] Return all bind diagnostics in sdk-gen rather than just the first. [#14661](#14661) - [sdkgen/go] Fix compiling plain element type with plain maps [#14704](#14704) - [sdkgen/go] Fix generating input collection types for enums when used from an array of map of enums [#14744](#14744) - [backend/service] Service backend now validates snapshots are valid on load, same as the self managed backend. This can be disabled with --disable-integrity-checking. [#14046](#14046)
### Features - [cli] Add `--import-file` to `pulumi preview` to generate a placeholder import file for every resource that needs to Create. [#14548](#14548) - [sdk/nodejs] Add TypeScript definitions for the grpc and protobuf generated code. [#14415](#14415) ### Bug Fixes - [auto] Don't swallow error if EditDir is not found in ProgramTest. [#14695](#14695) - [cli/display] Fix a panic in diff display when parsing YAML strings [#14710](#14710) - [auto/python] Ensures that the project_settings has a main directory for inline programs in python [#14709](#14709) - [engine] Error if a resource's parent is a skipped create. [#14672](#14672) - [engine] Warn if SDKs are trying to use old RegisterResource style StackReferences. [#14678](#14678) - [engine] Send resource inputs as inputs and state for Reads. [#14683](#14683) - [engine] Engine now prefers stable plugin versions to pre-releases when no explict version is given. [#14700](#14700) - [engine] Fix handling of delete failures for targeted destroys [#14735](#14735) - [sdkgen] Return all bind diagnostics in sdk-gen rather than just the first. [#14661](#14661) - [sdkgen/go] Fix compiling plain element type with plain maps [#14704](#14704) - [sdkgen/go] Fix generating input collection types for enums when used from an array of map of enums [#14744](#14744) - [backend/service] Service backend now validates snapshots are valid on load, same as the self managed backend. This can be disabled with --disable-integrity-checking. [#14046](#14046)
When doing a targeted destroy, if a delete operation for a resource fails, the resource was still being removed from the state. Instead, if the delete operation for a resource fails, it should remain in the state.
Fixes #8164
Fixes #14416