-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for idempotent replacements #9903
Conversation
If we never call Delete, it's fine for idempotent creates, but for non-idempotent creates it seems likely to result in a resource that's orphaned and outside of Pulumi's management, and the create will fail. I think erroring on overlapping IDs is OK, but I wonder if introducing the error here is a breaking change for any consumers whose code is working fine somehow but relies on these overlapping IDs. Hypothetically, and not necessarily a blocker for this: Is there a way to make this a warning and add a no-op engine event that contains the resource type that ends up being relayed to the service when users are using the SaaS backend? Could we then query our engine events to determine how often these warnings occur and on what resource types? |
Yes both of these depend on if every instance of returning the same ID is because of idempotent creates or if there are other resources where the ID can be the same but they refer to different resource objects. The latter seems incredibly unlikely to me, and would probably be the source of a multitude of bugs (how would import work for such a resource for example).
Maybe, but it would take some engine re-jigging to allow one step to issue a new step. |
I'm a bit uneasy with throwing an error. Can this break working programs, or those were broken already by actually deleting the cloud resources but believing them to be intact? Is there a way to make the error more actionable to the user? Can we explain why we're recreating the resource and what the user can do to either avoid recreating or make progress otherwise?
This feels right-er to me, i.e. in pseudo-code the higher level logical operation is replace, that'd be right to not call provider.Delete but accept the in-place version as the "replaced" one e.g.:
I appreciate this might be a deeper surgery here though so wary of
I'd like to unroll this to understand better as I don't see how this happens. |
pkg/resource/deploy/step.go
Outdated
@@ -252,6 +252,11 @@ func (s *CreateStep) Apply(preview bool) (resource.Status, StepCompleteFunc, err | |||
s.new.Outputs = outs | |||
} | |||
|
|||
// If we've created a new resource as a replacement for an old one check that the ID is a new ID. | |||
if s.replacing && s.new.Custom && !preview && s.new.ID == s.old.ID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious does it need !preview
.
Also curious below s.replacing && s.pendingDelete
check. Will it not try to delete unless s.old.Delete = true
runs? Then maybe we should restrict the error further to s.pendingDelete
as well as s.replacing
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious does it need !preview.
Probably not we won't have an ID at preview time.
Will it not try to delete unless s.old.Delete = true runs
Correct. This ensure that the resource doesn't get deleted in the case where ID comes back the same, we error out of the engine first.
Then maybe we should restrict the error further to s.pendingDelete as well as s.replacing?
Ah yes we probably should. If this is a CreateReplecement but "old" has already been deleted than s.pendingDelete will be false.
I believe this to be the case. If anyone was hitting this before they were probably deleting things be accident.
We could tell them to make sure DeleteBeforeCreate is set. Not sure if there's a CLI option for that as well for --targeted-replace.
By the time we get here we don't know why we're doing a replace. But see above I think we can tell the user about DeleteBeforeCreate. |
I think we can just use retainOnDelete here to delete the state for the old resource, but not issue an actual provider Delete command for it. But yeh I need to think if there's anything this could trip up on. |
Given we think this does not reduce the set of working programs (they were broken already), I'm OK with taking this change for now, in lieu of the more invasive approach. |
pkg/resource/deploy/step.go
Outdated
// If we've created a new resource as a replacement for an old one before we've deleted the old one check | ||
// that the ID is a new ID. | ||
if s.replacing && s.pendingDelete && s.new.Custom && s.new.ID == s.old.ID { | ||
return resourceStatus, nil, fmt.Errorf("provider returned existing ID (%s) for replacement resource", s.new.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very uncomfortable with this error. From a user's point of view, I can't read it any other way than "Pulumi got really confused about what it's doing and decided to blow up". I suspect we need to improve the workflow in some other way that would be net-beneficial for users and not just bail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the choices are:
- Improve the error to suggest setting DeleteBeforeReplace.
- Try the idea above about accepting the create and eliding the provider delete for the "old" resource.
Talking to Mikhail we don't think this is safe either. There may be resources with non-unique IDs which uses other properties to distinguish deletes (nothing in our provider contracts seems to exclude doing this). There's also the problem that by the time we hit this error the cloud will differ from pulumi state so we'd need a refresh to fix things. So I'm going to close this off as well. We'll need to come up with a new plan to fix pulumi/pulumi-aws#2009. |
Description
Some provider Create calls are idempotent, so when we ask for some state to be created if it already exists the provider just returns OK. This confuses the engine as it thinks a new resource has been created, so when it goes to delete the old resource being replaced it actually deletes the cloud resource that was also backing the new logical resource.
This changes the engine to error in these cases (where it can see the old and new ID are the same), an alternative would be to accept this and just logically delete the old resource from state but not issue the provider delete call for it (i.e. set delete=true and retainOnDelete=true). I think that might be more user friendly but there might be some more obscure corner cases it could trip up on that I haven't thought of.
Part of fixing pulumi/pulumi-aws#2009
Checklist