New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race in informer transformers #124344
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Apr 17 07:56:14 UTC 2024. |
} | ||
|
||
// queueReplaceActionLocked appends to the delta list for the object. | ||
// Called must lock first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Called must lock first. | |
// Caller must lock first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
7bf49d5
to
57c1f49
Compare
|
||
// queueReplaceActionLocked appends to the delta list for the object. | ||
// Caller must lock first. | ||
func (f *DeltaFIFO) queueReplaceActionLocked(actionType DeltaType, isReplace bool, obj interface{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's confusing to have queueReplaceAction take an isReplace bool
param... if I understand correctly, we need this so Replace can indicate the internal action type even when emitDeltaTypeReplaced is false?
if so, would something like this be clearer?
diff --git a/staging/src/k8s.io/client-go/tools/cache/delta_fifo.go b/staging/src/k8s.io/client-go/tools/cache/delta_fifo.go
index 7160bb1ee72..5ab888d7c4d 100644
--- a/staging/src/k8s.io/client-go/tools/cache/delta_fifo.go
+++ b/staging/src/k8s.io/client-go/tools/cache/delta_fifo.go
@@ -440,22 +440,37 @@ func isDeletionDup(a, b *Delta) *Delta {
// queueActionLocked appends to the delta list for the object.
// Caller must lock first.
func (f *DeltaFIFO) queueActionLocked(actionType DeltaType, obj interface{}) error {
+ return f.queueInternalActionLocked(actionType, actionType, obj)
+}
+
+// queueActionLocked appends to the delta list for the object.
+// The actionType is emitted, and must honor emitDeltaTypeReplaced.
+// The internalActionType is only used within this function, and must ignore emitDeltaTypeReplaced.
+// Caller must lock first.
+func (f *DeltaFIFO) queueInternalActionLocked(actionType, internalActionType DeltaType, obj interface{}) error {
id, err := f.KeyOf(obj)
if err != nil {
return KeyError{obj, err}
}
...
if f.transformer != nil {
- var err error
- obj, err = f.transformer(obj)
- if err != nil {
- return err
+ _, isTombstone := obj.(DeletedFinalStateUnknown)
+ isSyncAction := internalActionType == Sync
...
}
}
@@ -638,7 +653,7 @@ func (f *DeltaFIFO) Replace(list []interface{}, _ string) error {
return KeyError{item, err}
}
keys.Insert(key)
- if err := f.queueActionLocked(action, item); err != nil {
+ if err := f.queueInternalActionLocked(action, Replaced, item); err != nil {
return fmt.Errorf("couldn't enqueue object: %v", err)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree it's cleaner - changed
57c1f49
to
df6af03
Compare
Cluster failed to create master /retest |
// It's recommended for the TransformFunc to be idempotent. | ||
// It MUST be idempotent if objects already present in the cache are passed to | ||
// the Replace() to avoid re-mutating them. Default informers do not pass | ||
// existing objects to Replace though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little wary of this constraint. Consumers of the TransformFunc
will not be aware of nor have control over how or where the object they receive came from. Nor should they care - that's an implementation detail of the DeltaFIFO
. Also the DeltaFIFO
itself does not have control over the origin of the objects from the upstream informers. I think it would be ideal for the DeltaFIFO
to provide protection for the TransformFunc
from potential data races. If an object already exists in the knownObjects
cache and, more specifically, points to the same instance then make a copy of it before passing to the TransformFunc
. The TransformFunc
would then truly be free to mutate the object w/o any additional idempotency constraint. In fact we could go a step further and elide invoking the TransformFunc
altogether in this case since we know the object came from the cache and was already transformed. Then we wouldn't really even need to special-case the tombstone or Sync
action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that preventing more races is desired, however these for now these don't expose default usage of it. We should protect it further, but preventing existing issues seems more important to start with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however these for now these don't expose default usage of it.
I'm not clear on what you mean by this.
I think the text above would be a bit vague and confusing to users of the API. It would be for me if I hadn't studied the code and understand the issue and what it's trying say. With the changes in this PR, the potential race issue with passing cached objects to the DeltaFIFO
could only occur with a custom informer and it's probably unlikely anyone is doing that so I'd omit this text. In fact, this should really documented as part of the DeltaFIFO
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however these for now these don't expose default usage of it.
I'm not clear on what you mean by this.
After the fix in this PR to omit Sync events and DeletedFinalStateUnknown objects, none of the uses of DeltaFIFO that client-go sets up (default informers / clients, etc) feed objects already present in knownObjects
back into the transformer.
That means the only way a DeltaFIFO user would encounter that is if they wired up a non-default usage of DeltaFIFO to pass the same objects into DeltaFIFO multiple times as if they were new objects.
Someone doing that is presumably the same person wiring up the TransformFunc, and they need to understand the implications of what they are doing. Doing this effectively already breaks the TransformFunc sees the object before any other actor, and it is now safe to mutate the object in place instead of making a copy. statement / guarantee, so the TransformFunc author must be aware of the implications of mutating objects it gets when they are being provided by something with different behavior than default client-go clients / informers.
If an object already exists in the
knownObjects
cache and, more specifically, points to the same instance then make a copy of it before passing to theTransformFunc
The tracking / comparison cost to safeguard TransformFunc
in that case is too high, and even if we did, TransformFunc
would have to be aware of and protect against disrupting a non-default object provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reasoning. My point is that the TransformFunc
is part of a higher-level API and is much more widely used and visible then the DeltaFIFO
. The New*Informer
functions are really the external-facing aspect of the cache
module and I'm sure most users of these aren't even aware of DeltaFIFO
and its semantics (I wasn't until I investigated this issue). As you mentioned, the author of a non-default usage of DeltaFIFO
, if there are any, is "presumably the same person wiring up the TransformFunc", therefore document the constraints and implications re: caching and the TransformFunc
in the DeltaFIFO
API rather than here. This makes sense b/c the constraint that the TransformFunc
be idempotent only applies if a non-default DeltaFIFO
producer is used. As I said, I think the paragraph above will only confuse most people. Anyway, that's my 2 cents...
// through the transform object separately (when it was added / updated prior | ||
// to the delete), so the TransformFunc can likely safely ignore such objects | ||
// (i.e., just return the input object). | ||
// TransformFunc (similarly to ResourceEventHandler functions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can delete this sentence fragment, the DeletedFinalStateUnknown bit no longer applies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// It's recommended for the TransformFunc to be idempotent. | ||
// It MUST be idempotent if objects already present in the cache are passed to | ||
// the Replace() to avoid re-mutating them. Default informers do not pass | ||
// existing objects to Replace though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however these for now these don't expose default usage of it.
I'm not clear on what you mean by this.
After the fix in this PR to omit Sync events and DeletedFinalStateUnknown objects, none of the uses of DeltaFIFO that client-go sets up (default informers / clients, etc) feed objects already present in knownObjects
back into the transformer.
That means the only way a DeltaFIFO user would encounter that is if they wired up a non-default usage of DeltaFIFO to pass the same objects into DeltaFIFO multiple times as if they were new objects.
Someone doing that is presumably the same person wiring up the TransformFunc, and they need to understand the implications of what they are doing. Doing this effectively already breaks the TransformFunc sees the object before any other actor, and it is now safe to mutate the object in place instead of making a copy. statement / guarantee, so the TransformFunc author must be aware of the implications of mutating objects it gets when they are being provided by something with different behavior than default client-go clients / informers.
If an object already exists in the
knownObjects
cache and, more specifically, points to the same instance then make a copy of it before passing to theTransformFunc
The tracking / comparison cost to safeguard TransformFunc
in that case is too high, and even if we did, TransformFunc
would have to be aware of and protect against disrupting a non-default object provider.
// place to call the transform func. | ||
// | ||
// If obj is a DeletedFinalStateUnknown tombstone or the action is a Sync, | ||
// then the object have already done through the transformer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// then the object have already done through the transformer. | |
// then the object have already gone through the transformer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/triage accepted |
df6af03
to
e9f7459
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt - PTAL
// place to call the transform func. | ||
// | ||
// If obj is a DeletedFinalStateUnknown tombstone or the action is a Sync, | ||
// then the object have already done through the transformer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// through the transform object separately (when it was added / updated prior | ||
// to the delete), so the TransformFunc can likely safely ignore such objects | ||
// (i.e., just return the input object). | ||
// TransformFunc (similarly to ResourceEventHandler functions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/lgtm |
LGTM label has been added. Git tree hash: 5e221e38115bde0dd943e48a3b1848a3289453be
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fix #124337
/kind bug
/priority critical-urgent
/sig api-machinery
/assign @liggitt
/cc @linxiulei