-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip updating Endpoints if no relevant fields change #108078
Conversation
@tnqn: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Does the end points controller really copy the pod resource version into subsets? |
interesting, this PR was supposed to do that #50934 it does copy kubernetes/pkg/controller/endpoint/endpoints_controller.go Lines 254 to 264 in 1f041cc
|
I'd be very interested to know why… uid I could see mattering… resourceVersion… not so much |
this was changed here |
I'm looking at the wrong place, that reflect.DeepEqual was there since forever ... I think that is correct to ignore ResourceVersion , but is better to have an stale number or completely nil it out? |
endpointslices populates the podresurceversion too
it seems this was the origin daed0af |
It looks like ResourceVersion has been populated since the very beginning (https://github.com/kubernetes/kubernetes/blob/release-1.16/pkg/controller/endpointslice/utils.go#L86). I'd agree with @liggitt that this was probably a mistake and we should likely just not set it altogether. |
@aojea @liggitt @robscott thanks for your comments.
#50934 skipped updating Endpoints by not triggering I also agree not setting resourceVersion altogether might make more sense, was just not sure if there was any reason to have it, though its value may have been stale since PR #50934. I could update the PR to remove resourceVersion if @thockin could confirm it's not needed anywhere. |
68732d0
to
e858bb5
Compare
@@ -277,6 +277,64 @@ func (sl portsInOrder) Less(i, j int) bool { | |||
return h1 < h2 | |||
} | |||
|
|||
// EndpointSubsetsEqualIgnoreResourceVersion returns true if EndpointSubsets | |||
// have equal attributes but excludes ResourceVersion of Pod. | |||
func EndpointSubsetsEqualIgnoreResourceVersion(subsets1, subsets2 []v1.EndpointSubset) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should cover this function with an unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Unit test added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will catch this function being incorrect if any new fields are added to EndpointSubset or any subtypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It couldn't catch it if written in this way. Copying the subsets then resetting resourceVersions fields could but costs more, do you suggest the latter?
The performance got from the bench @aojea used in #108078 (comment):
BenchmarkEndpointSubsetEquality-40 71574 17538 ns/op 1673 B/op 69 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion-40 434190 2910 ns/op 192 B/op 14 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion2-40 53878 22584 ns/op 2938 B/op 79 allocs/op
func EndpointSubsetsEqualIgnoreResourceVersion2(subsets1, subsets2 []v1.EndpointSubset) bool {
if len(subsets1) != len(subsets2) {
return false
}
resetResourceVersions := func(addresses []v1.EndpointAddress) {
for _, address := range addresses {
address.TargetRef.ResourceVersion = ""
}
}
for i := 0; i < len(subsets1); i++ {
s1 := subsets1[i].DeepCopy()
s2 := subsets2[i].DeepCopy()
resetResourceVersions(s1.Addresses)
resetResourceVersions(s1.NotReadyAddresses)
resetResourceVersions(s2.Addresses)
resetResourceVersions(s2.NotReadyAddresses)
if !apiequality.Semantic.DeepEqual(s1, s2) {
return false
}
}
return true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying the subsets then resetting resourceVersions fields could but costs more, do you suggest the latter?
not really... that sounds expensive, since you'd have to deep copy first. another possibility:
Use reflection to inspect every field in the structs/substructs, modify each field one at a time in two otherwise equal objects. Make sure the equality check ignores specific fields we want to ignore (like resourceVersion), and catches all other differences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. While I was implementating the idea, I found Equalities
supports adding custom compare function for specific type. If we specify a function for ObjectReference, no extra overhead will be introduced. Actually it will reduce some:
BenchmarkEndpointSubsetEquality-40 64736 16305 ns/op 1674 B/op 69 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion-40 421993 2920 ns/op 192 B/op 14 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion2-40 88705 14132 ns/op 1709 B/op 36 allocs/op
var SemanticIgnoreResourceVersion = conversion.EqualitiesOrDie(
func(a, b v1.ObjectReference) bool {
a.ResourceVersion = ""
b.ResourceVersion = ""
return a == b
},
)
func EndpointSubsetsEqualIgnoreResourceVersion2(subsets1, subsets2 []v1.EndpointSubset) bool {
return SemanticIgnoreResourceVersion.DeepEqual(subsets1, subsets2)
}
I think that changing the function that compare the subsets is the safest options, it also improves the performance #108078 (comment) ... and we can always revisit later the resourceVersion field ... but since we are going to depend on the new function I think we should add unit tests #108078 (comment) |
75eb51b
to
fd33db4
Compare
/retest |
even if we stopped setting resourceVersion (which I think we should), we'd want to avoid updating on only a resourceVersion change as well, in order to avoid a thundering herd update of every endpoint in the system on the first controller-manager restart after dropping the resourceVersion value |
The resource version logic looks good to me, but I think we should merge the change to ignore resource version first, then separately merge the change to stop setting resource version. This makes it easier to backport just the "ignore resource version" change if we want to, and also makes sure tests still pass when the controller is encountering objects with resource version set but is ignoring it for "should I update" purposes (which is what the controller will encounter on upgrades of existing clusters) |
da2d3cd
to
3896645
Compare
@liggitt Sure, I have removed the patch that stops setting resource version and updated PR description. I will follow up with another PR after this is merged. |
lgtm, would like a second from tim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this include Conditions checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin, tnqn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@thockin Thanks for the review.
If it doesn't check Serving/Terminting Conditions, their changes would not trigger EndpointSlice update. It was not an issue previsouly because Pod resourceVersion also changed when any Condition changed, so /retest |
/retest |
/test pull-kubernetes-integration |
…8078-upstream-release-1.21 Automated cherry pick of #108078: Skip updating Endpoints and EndpointSlice if no relevant
…8078-upstream-release-1.23 Automated cherry pick of #108078: Skip updating Endpoints and EndpointSlice if no relevant
…8078-upstream-release-1.22 Automated cherry pick of #108078: Skip updating Endpoints and EndpointSlice if no relevant
What type of PR is this?
/kind bug
What this PR does / why we need it:
When comparing EndpointSubsets and Endpoints, we ignore the difference
in ResourceVersion of Pod to avoid unnecessary updates caused by Pod
updates that we don't care, e.g. annotation update.
Otherwise periodic Service resync would intensively update Endpoints or
EndpointSlice whose Pods have irrelevant change between two resyncs,
leading to delay in processing newly created Services. In a scale
cluster with thousands of such Endpoints, we observed 2 minutes of
delay when the resync happens.
Which issue(s) this PR fixes:
Fixes #108077
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: