Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip updating Endpoints if no relevant fields change #108078

Merged
merged 1 commit into from
Mar 2, 2022

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Feb 11, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

When comparing EndpointSubsets and Endpoints, we ignore the difference
in ResourceVersion of Pod to avoid unnecessary updates caused by Pod
updates that we don't care, e.g. annotation update.

Otherwise periodic Service resync would intensively update Endpoints or
EndpointSlice whose Pods have irrelevant change between two resyncs,
leading to delay in processing newly created Services. In a scale
cluster with thousands of such Endpoints, we observed 2 minutes of
delay when the resync happens.

Which issue(s) this PR fixes:

Fixes #108077

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Prevent unnecessary Endpoints and EndpointSlice updates caused by Pod ResourceVersion change

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 11, 2022
@k8s-ci-robot
Copy link
Contributor

@tnqn: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 11, 2022
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 11, 2022
@aojea
Copy link
Member

aojea commented Feb 11, 2022

but ... this way you don't publish the new resource version of the pods on the endpoints, I mean, the endpoints are not showing the latest information, is this ok? @robscott @liggitt

@liggitt
Copy link
Member

liggitt commented Feb 11, 2022

Does the end points controller really copy the pod resource version into subsets?

@aojea
Copy link
Member

aojea commented Feb 11, 2022

interesting, this PR was supposed to do that #50934

it does copy

return &v1.EndpointAddress{
IP: endpointIP,
NodeName: &pod.Spec.NodeName,
TargetRef: &v1.ObjectReference{
Kind: "Pod",
Namespace: pod.ObjectMeta.Namespace,
Name: pod.ObjectMeta.Name,
UID: pod.ObjectMeta.UID,
ResourceVersion: pod.ObjectMeta.ResourceVersion,
},
}, nil

subsets:
- addresses:
  - ip: 10.244.0.3
    nodeName: k123-control-plane
    targetRef:
      kind: Pod
      name: coredns-64897985d-czfg9
      namespace: kube-system
      resourceVersion: "53608"
      uid: 80ecac56-62bc-44d5-9f65-b0e376bf16c8
  - ip: 10.244.0.4
    nodeName: k123-control-plane
    targetRef:
      kind: Pod
      name: coredns-64897985d-lpxt2
      namespace: kube-system
      resourceVersion: "53648"
      uid: 6123a1cc-8fc9-4d3e-86f1-8fc5a12af84b
  ports:
  - name: dns-tcp
    port: 53
    protocol: TCP
  - name: dns
    port: 53
    protocol: UDP

@liggitt
Copy link
Member

liggitt commented Feb 11, 2022

it does copy

I'd be very interested to know why… uid I could see mattering… resourceVersion… not so much

@aojea
Copy link
Member

aojea commented Feb 11, 2022

this was changed here
#91399

@aojea
Copy link
Member

aojea commented Feb 11, 2022

I'm looking at the wrong place, that reflect.DeepEqual was there since forever ... I think that is correct to ignore ResourceVersion , but is better to have an stale number or completely nil it out?

@aojea
Copy link
Member

aojea commented Feb 11, 2022

endpointslices populates the podresurceversion too

    nodeName: k123-control-plane
    targetRef:
      kind: Pod
      name: coredns-64897985d-lpxt2
      namespace: kube-system
      resourceVersion: "53648"
      uid: 6123a1cc-8fc9-4d3e-86f1-8fc5a12af84b
  kind: EndpointSlice

it seems this was the origin daed0af
but I couldn't find a specific reason on the comments, @thockin do you have some memories about the resourceVersion fielld? is it needed for something?

@robscott
Copy link
Member

It looks like ResourceVersion has been populated since the very beginning (https://github.com/kubernetes/kubernetes/blob/release-1.16/pkg/controller/endpointslice/utils.go#L86). I'd agree with @liggitt that this was probably a mistake and we should likely just not set it altogether.

@tnqn
Copy link
Member Author

tnqn commented Feb 12, 2022

@aojea @liggitt @robscott thanks for your comments.

interesting, this PR was supposed to do that #50934

#50934 skipped updating Endpoints by not triggering syncService when the received Pod event doesn't contain relevant change. However, syncService can also be triggered by periodic service resync and regular service update.

I also agree not setting resourceVersion altogether might make more sense, was just not sure if there was any reason to have it, though its value may have been stale since PR #50934. I could update the PR to remove resourceVersion if @thockin could confirm it's not needed anywhere.

@@ -277,6 +277,64 @@ func (sl portsInOrder) Less(i, j int) bool {
return h1 < h2
}

// EndpointSubsetsEqualIgnoreResourceVersion returns true if EndpointSubsets
// have equal attributes but excludes ResourceVersion of Pod.
func EndpointSubsetsEqualIgnoreResourceVersion(subsets1, subsets2 []v1.EndpointSubset) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should cover this function with an unit test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Unit test added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will catch this function being incorrect if any new fields are added to EndpointSubset or any subtypes?

Copy link
Member Author

@tnqn tnqn Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It couldn't catch it if written in this way. Copying the subsets then resetting resourceVersions fields could but costs more, do you suggest the latter?

The performance got from the bench @aojea used in #108078 (comment):

BenchmarkEndpointSubsetEquality-40                         71574             17538 ns/op            1673 B/op         69 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion-40         434190              2910 ns/op             192 B/op         14 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion2-40         53878             22584 ns/op            2938 B/op         79 allocs/op
func EndpointSubsetsEqualIgnoreResourceVersion2(subsets1, subsets2 []v1.EndpointSubset) bool {
	if len(subsets1) != len(subsets2) {
		return false
	}

	resetResourceVersions := func(addresses []v1.EndpointAddress) {
		for _, address := range addresses {
			address.TargetRef.ResourceVersion = ""
		}
	}
	for i := 0; i < len(subsets1); i++ {
		s1 := subsets1[i].DeepCopy()
		s2 := subsets2[i].DeepCopy()

		resetResourceVersions(s1.Addresses)
		resetResourceVersions(s1.NotReadyAddresses)
		resetResourceVersions(s2.Addresses)
		resetResourceVersions(s2.NotReadyAddresses)

		if !apiequality.Semantic.DeepEqual(s1, s2) {
			return false
		}
	}

	return true
}

Copy link
Member

@liggitt liggitt Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the subsets then resetting resourceVersions fields could but costs more, do you suggest the latter?

not really... that sounds expensive, since you'd have to deep copy first. another possibility:

Use reflection to inspect every field in the structs/substructs, modify each field one at a time in two otherwise equal objects. Make sure the equality check ignores specific fields we want to ignore (like resourceVersion), and catches all other differences

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. While I was implementating the idea, I found Equalities supports adding custom compare function for specific type. If we specify a function for ObjectReference, no extra overhead will be introduced. Actually it will reduce some:

BenchmarkEndpointSubsetEquality-40                         64736             16305 ns/op            1674 B/op         69 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion-40         421993              2920 ns/op             192 B/op         14 allocs/op
BenchmarkEndpointSubsetEqualityResourceVersion2-40         88705             14132 ns/op            1709 B/op         36 allocs/op
var SemanticIgnoreResourceVersion = conversion.EqualitiesOrDie(
	func(a, b v1.ObjectReference) bool {
		a.ResourceVersion = ""
		b.ResourceVersion = ""
		return a == b
	},
)

func EndpointSubsetsEqualIgnoreResourceVersion2(subsets1, subsets2 []v1.EndpointSubset) bool {
	return SemanticIgnoreResourceVersion.DeepEqual(subsets1, subsets2)
}

@aojea
Copy link
Member

aojea commented Feb 14, 2022

I think that changing the function that compare the subsets is the safest options, it also improves the performance #108078 (comment) ... and we can always revisit later the resourceVersion field ... but since we are going to depend on the new function I think we should add unit tests #108078 (comment)

@tnqn tnqn force-pushed the endpoints-resync branch 2 times, most recently from 75eb51b to fd33db4 Compare February 14, 2022 11:27
@tnqn
Copy link
Member Author

tnqn commented Feb 14, 2022

/retest

@liggitt
Copy link
Member

liggitt commented Feb 14, 2022

even if we stopped setting resourceVersion (which I think we should), we'd want to avoid updating on only a resourceVersion change as well, in order to avoid a thundering herd update of every endpoint in the system on the first controller-manager restart after dropping the resourceVersion value

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 25, 2022
@liggitt
Copy link
Member

liggitt commented Feb 28, 2022

The resource version logic looks good to me, but I think we should merge the change to ignore resource version first, then separately merge the change to stop setting resource version. This makes it easier to backport just the "ignore resource version" change if we want to, and also makes sure tests still pass when the controller is encountering objects with resource version set but is ignoring it for "should I update" purposes (which is what the controller will encounter on upgrades of existing clusters)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 28, 2022
@tnqn
Copy link
Member Author

tnqn commented Feb 28, 2022

The resource version logic looks good to me, but I think we should merge the change to ignore resource version first, then separately merge the change to stop setting resource version. This makes it easier to backport just the "ignore resource version" change if we want to, and also makes sure tests still pass when the controller is encountering objects with resource version set but is ignoring it for "should I update" purposes (which is what the controller will encounter on upgrades of existing clusters)

@liggitt Sure, I have removed the patch that stops setting resource version and updated PR description. I will follow up with another PR after this is merged.

@liggitt
Copy link
Member

liggitt commented Feb 28, 2022

lgtm, would like a second from tim

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this include Conditions checks?

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin, tnqn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022
@tnqn
Copy link
Member Author

tnqn commented Mar 2, 2022

@thockin Thanks for the review.

Why does this include Conditions checks?

If it doesn't check Serving/Terminting Conditions, their changes would not trigger EndpointSlice update. It was not an issue previsouly because Pod resourceVersion also changed when any Condition changed, so EndpointsEqualBeyondHash would return false as long as it checks resourceVersion regardless of the other checks. Since we remove resourceVersion check in this PR, the function should check all fields that matter.

/retest

@tnqn
Copy link
Member Author

tnqn commented Mar 2, 2022

/retest

@tnqn
Copy link
Member Author

tnqn commented Mar 2, 2022

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit 0dc83fe into kubernetes:master Mar 2, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Mar 2, 2022
@tnqn tnqn deleted the endpoints-resync branch March 2, 2022 13:34
k8s-ci-robot added a commit that referenced this pull request Jun 12, 2022
…8078-upstream-release-1.21

Automated cherry pick of #108078: Skip updating Endpoints and EndpointSlice if no relevant
k8s-ci-robot added a commit that referenced this pull request Jun 12, 2022
…8078-upstream-release-1.23

Automated cherry pick of #108078: Skip updating Endpoints and EndpointSlice if no relevant
k8s-ci-robot added a commit that referenced this pull request Jun 12, 2022
…8078-upstream-release-1.22

Automated cherry pick of #108078: Skip updating Endpoints and EndpointSlice if no relevant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Long delay in creating Endpoints when periodical Service resync happens
6 participants