Check for idempotent replacements #9903

Frassle · 2022-06-21T13:27:30Z

Description

Some provider Create calls are idempotent, so when we ask for some state to be created if it already exists the provider just returns OK. This confuses the engine as it thinks a new resource has been created, so when it goes to delete the old resource being replaced it actually deletes the cloud resource that was also backing the new logical resource.

This changes the engine to error in these cases (where it can see the old and new ID are the same), an alternative would be to accept this and just logically delete the old resource from state but not issue the provider delete call for it (i.e. set delete=true and retainOnDelete=true). I think that might be more user friendly but there might be some more obscure corner cases it could trip up on that I haven't thought of.

Part of fixing pulumi/pulumi-aws#2009

Checklist

I have added tests that prove my fix is effective or that my feature works

I have updated the CHANGELOG-PENDING file with my change

Yes, there are changes in this PR that warrants bumping the Pulumi Service API version

AaronFriel · 2022-06-21T16:15:48Z

This changes the engine to error in these cases (where it can see the old and new ID are the same), an alternative would be to accept this and just logically delete the old resource from state but not issue the provider delete call for it (i.e. set delete=true and retainOnDelete=true). I think that might be more user friendly but there might be some more obscure corner cases it could trip up on that I haven't thought of.

If we never call Delete, it's fine for idempotent creates, but for non-idempotent creates it seems likely to result in a resource that's orphaned and outside of Pulumi's management, and the create will fail.

I think erroring on overlapping IDs is OK, but I wonder if introducing the error here is a breaking change for any consumers whose code is working fine somehow but relies on these overlapping IDs.

Hypothetically, and not necessarily a blocker for this: Is there a way to make this a warning and add a no-op engine event that contains the resource type that ends up being relayed to the service when users are using the SaaS backend? Could we then query our engine events to determine how often these warnings occur and on what resource types?

Frassle · 2022-06-21T16:40:03Z

If we never call Delete, it's fine for idempotent creates, but for non-idempotent creates it seems likely to result in a resource that's orphaned and outside of Pulumi's management, and the create will fail.
I think erroring on overlapping IDs is OK, but I wonder if introducing the error here is a breaking change for any consumers whose code is working fine somehow but relies on these overlapping IDs.

Yes both of these depend on if every instance of returning the same ID is because of idempotent creates or if there are other resources where the ID can be the same but they refer to different resource objects. The latter seems incredibly unlikely to me, and would probably be the source of a multitude of bugs (how would import work for such a resource for example).

Hypothetically, and not necessarily a blocker for this: Is there a way to make this a warning and add a no-op engine event that contains the resource type that ends up being relayed to the service when users are using the SaaS backend? Could we then query our engine events to determine how often these warnings occur and on what resource types?

Maybe, but it would take some engine re-jigging to allow one step to issue a new step.

t0yv0 · 2022-06-21T20:04:53Z

I'm a bit uneasy with throwing an error. Can this break working programs, or those were broken already by actually deleting the cloud resources but believing them to be intact? Is there a way to make the error more actionable to the user? Can we explain why we're recreating the resource and what the user can do to either avoid recreating or make progress otherwise?

an alternative would be to accept this and just logically delete the old resource from state but not issue the provider delete call for it

This feels right-er to me, i.e. in pseudo-code the higher level logical operation is replace, that'd be right to not call provider.Delete but accept the in-place version as the "replaced" one e.g.:

     def recreate(provider, oldId):
       newId = provider.Create(...)
       if oldId != newId:
         provider.Delete(oldId)
       return newId

I appreciate this might be a deeper surgery here though so wary of obscure corner cases it could trip up also.

but for non-idempotent creates it seems likely to result in a resource that's orphaned and outside of Pulumi's management, and the create will fail.

I'd like to unroll this to understand better as I don't see how this happens.

t0yv0 · 2022-06-21T20:07:42Z

pkg/resource/deploy/step.go

@@ -252,6 +252,11 @@ func (s *CreateStep) Apply(preview bool) (resource.Status, StepCompleteFunc, err
 		s.new.Outputs = outs
 	}

+	// If we've created a new resource as a replacement for an old one check that the ID is a new ID.
+	if s.replacing && s.new.Custom && !preview && s.new.ID == s.old.ID {


Curious does it need !preview.

Also curious below s.replacing && s.pendingDelete check. Will it not try to delete unless s.old.Delete = true runs? Then maybe we should restrict the error further to s.pendingDelete as well as s.replacing?

Curious does it need !preview.

Probably not we won't have an ID at preview time.

Will it not try to delete unless s.old.Delete = true runs

Correct. This ensure that the resource doesn't get deleted in the case where ID comes back the same, we error out of the engine first.

Then maybe we should restrict the error further to s.pendingDelete as well as s.replacing?

Ah yes we probably should. If this is a CreateReplecement but "old" has already been deleted than s.pendingDelete will be false.

Frassle · 2022-06-21T20:25:14Z

Can this break working programs, or those were broken already by actually deleting the cloud resources but believing them to be intact?

I believe this to be the case. If anyone was hitting this before they were probably deleting things be accident.

Is there a way to make the error more actionable to the user?

We could tell them to make sure DeleteBeforeCreate is set. Not sure if there's a CLI option for that as well for --targeted-replace.

Can we explain why we're recreating the resource and what the user can do to either avoid recreating or make progress otherwise?

By the time we get here we don't know why we're doing a replace. But see above I think we can tell the user about DeleteBeforeCreate.

Frassle · 2022-06-21T20:26:51Z

This feels right-er to me, i.e. in pseudo-code the higher level logical operation is replace, that'd be right to not call provider.Delete but accept the in-place version as the "replaced" one e.g.:
I appreciate this might be a deeper surgery here though so wary of obscure corner cases it could trip up also.

I think we can just use retainOnDelete here to delete the state for the old resource, but not issue an actual provider Delete command for it. But yeh I need to think if there's anything this could trip up on.

t0yv0 · 2022-06-21T20:29:14Z

Given we think this does not reduce the set of working programs (they were broken already), I'm OK with taking this change for now, in lieu of the more invasive approach.

mikhailshilkov · 2022-06-22T10:24:07Z

pkg/resource/deploy/step.go

+	// If we've created a new resource as a replacement for an old one before we've deleted the old one check
+	// that the ID is a new ID.
+	if s.replacing && s.pendingDelete && s.new.Custom && s.new.ID == s.old.ID {
+		return resourceStatus, nil, fmt.Errorf("provider returned existing ID (%s) for replacement resource", s.new.ID)


I'm very uncomfortable with this error. From a user's point of view, I can't read it any other way than "Pulumi got really confused about what it's doing and decided to blow up". I suspect we need to improve the workflow in some other way that would be net-beneficial for users and not just bail.

I think the choices are:

Improve the error to suggest setting DeleteBeforeReplace.

Try the idea above about accepting the create and eliding the provider delete for the "old" resource.

Frassle · 2022-06-24T08:58:01Z

Talking to Mikhail we don't think this is safe either. There may be resources with non-unique IDs which uses other properties to distinguish deletes (nothing in our provider contracts seems to exclude doing this). There's also the problem that by the time we hit this error the cloud will differ from pulumi state so we'd need a refresh to fix things. So I'm going to close this off as well. We'll need to come up with a new plan to fix pulumi/pulumi-aws#2009.

Check for idempotent replacements

d65ed35

Frassle added the area/core label Jun 21, 2022

Frassle requested review from t0yv0 and stack72 June 21, 2022 13:27

Frassle added 3 commits June 21, 2022 15:55

lint

65938f2

Add to CHANGELOG

2cdd7fe

Fix tests

2f77414

t0yv0 reviewed Jun 21, 2022

View reviewed changes

Frassle added 3 commits June 22, 2022 09:57

Fix delete before replace

e637c8c

Merge remote-tracking branch 'origin/master' into idempotentReplace

a978b9d

lint

d62791a

mikhailshilkov requested changes Jun 22, 2022

View reviewed changes

This was referenced Jun 22, 2022

Resource Create methods may be idempotent #9925

Open

Pulumi fails to recreate Hashicorp Vault resources after provider change pulumi/pulumi-vault#176

Open

Set DeleteBeforeReplace for forced replaces #9909

Closed

Frassle added 2 commits June 23, 2022 20:41

Improve error message

8ac9347

Merge remote-tracking branch 'origin/master' into idempotentReplace

1162247

Frassle closed this Jun 24, 2022

Frassle deleted the fraser/idempotentReplace branch June 24, 2022 08:58

AaronFriel mentioned this pull request Jul 14, 2022

SQS QueuePolicy update strategy is incorrect pulumi/pulumi-aws#1923

Open

EronWright mentioned this pull request May 15, 2024

Don't delete a physical id if it was just created #15982

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for idempotent replacements #9903

Check for idempotent replacements #9903

Frassle commented Jun 21, 2022 •

edited

AaronFriel commented Jun 21, 2022

Frassle commented Jun 21, 2022

t0yv0 commented Jun 21, 2022 •

edited

t0yv0 Jun 21, 2022 •

edited

Frassle Jun 21, 2022 •

edited

Frassle commented Jun 21, 2022

Frassle commented Jun 21, 2022

t0yv0 commented Jun 21, 2022

mikhailshilkov Jun 22, 2022

Frassle Jun 22, 2022

Frassle commented Jun 24, 2022

Check for idempotent replacements #9903

Check for idempotent replacements #9903

Conversation

Frassle commented Jun 21, 2022 • edited

Description

Checklist

AaronFriel commented Jun 21, 2022

Frassle commented Jun 21, 2022

t0yv0 commented Jun 21, 2022 • edited

t0yv0 Jun 21, 2022 • edited

Choose a reason for hiding this comment

Frassle Jun 21, 2022 • edited

Choose a reason for hiding this comment

Frassle commented Jun 21, 2022

Frassle commented Jun 21, 2022

t0yv0 commented Jun 21, 2022

mikhailshilkov Jun 22, 2022

Choose a reason for hiding this comment

Frassle Jun 22, 2022

Choose a reason for hiding this comment

Frassle commented Jun 24, 2022

Frassle commented Jun 21, 2022 •

edited

t0yv0 commented Jun 21, 2022 •

edited

t0yv0 Jun 21, 2022 •

edited

Frassle Jun 21, 2022 •

edited