syncer: Use UID pre-condition for delete to downstream operations#628
syncer: Use UID pre-condition for delete to downstream operations#628astefanutti wants to merge 2 commits intokcp-dev:mainfrom astefanutti:pr-10
Conversation
There was a problem hiding this comment.
My inline comments assume what you're proposing makes sense, but after reviewing I'm not clear that is the case. I would appreciate it if you would update the PR description with a crisp explanation of the problem you are trying to solve and how exactly the proposed solution addresses that problem. Please emphasise the 'why' rather than the 'how'.
Assuming this PR evolves into something we want to merge, I would also expect test changes to cover the new behavior.
| func (c *Controller) deleteFromDownstream(ctx context.Context, gvr schema.GroupVersionResource, upstreamNamespace, name string, downstreamUID string) error { | ||
| if err := c.toClient.Resource(gvr).Namespace(upstreamNamespace).Delete(ctx, name, metav1.DeleteOptions{ | ||
| Preconditions: metav1.NewUIDPreconditions(downstreamUID), | ||
| }); err != nil && !k8serrors.IsNotFound(err) && !k8serrors.IsConflict(err) { |
There was a problem hiding this comment.
What would k8serrors.IsConflict being true indicate, and why is it desirable to silently ignore that?
There was a problem hiding this comment.
In that particular case, that is when the UID set in the pre-condition differs from the UID of the downstream object that actually exists, a conflict implies the object that was meant to be deleted no longer exists, and has been replaced with a new object, hence the operation that is being performed has already been fulfilled, and the error can be ignored, similarly to the 404 case.
| Patch(ctx, downstreamObj.GetName(), types.ApplyPatchType, data, metav1.PatchOptions{ | ||
| FieldManager: syncerApplyManager, | ||
| Force: pointer.Bool(true), | ||
| }, "status"); err != nil { |
There was a problem hiding this comment.
nit: Maybe add a blank line to make up for the lack of visual indication between the conditional and its block? Either that or maybe this form of error handling (if err...; err != nil) should only apply to single-line conditionals...?
There was a problem hiding this comment.
I agree, I'd be inclined to go with the "apply to single-line conditionals only". I'll do it if we decide to pursue the work on that PR.
| resyncPeriod = 10 * time.Hour | ||
| SyncerNamespaceKey = "SYNCER_NAMESPACE" | ||
| syncerApplyManager = "syncer" | ||
| targetUIDAnnotation = "workload.kcp.dev/target.uid" |
There was a problem hiding this comment.
(No action required) Maybe dashes are used instead of periods in the non-domain part of the key (e.g. target-uid)?
There was a problem hiding this comment.
I agree, I'll do it if we decide to pursue the work on that PR.
Thanks for the review. I've updated the description with the use case that I faced, and that explains why I'm proposing this change.
Agreed. |
|
That's helpful detail, thank you! Unless I'm missing something, what you're seeing is a symptom of a syncer that is not responding to pcluster events (also reported by #604). I think it might be preferable to first ensure reconciliation is bidirectional and that there exists sufficient test coverage to understand whether additional coordination is required. |
Yes, I think that would greatly reduce the likelihood of the race in that case, but I don't think that would eliminate it nonetheless. |
|
@astefanutti would you mind filing an issue for this? |
|
@ncdc sure, I've created kcp-dev/contrib-tmc#138. I'd be inclined to close that PR, and continue the discussion there, if that's alright with you. |
This fixes possible races when upstream objects are deleted and re-created.
An example of such races I've encountered, is:
In that particular example, it could be possible to prevent the race by simply ignoring 404 errors, but it seems more robust to use a concurrency control mechanism, to avoid unanticipated races for delete operations, as for update ones.
The UID of the downstream objects, used in the delete pre-conditions for concurrency control, are stored in an annotation. That seemed the easiest way to transport the downstream UID, but there may be better options I'm not aware of.