New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start exporting the in-cluster network programming latency metric. #71999

Merged
merged 1 commit into from Feb 12, 2019

Conversation

@mm4tt
Copy link
Contributor

mm4tt commented Dec 12, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:
This is the final step of implementing the first version of in-cluster network programming latency that was proposed here - https://github.com/kubernetes/community/blob/master/sig-scalability/slos/network_programming_latency.md
The computation of the latency is based on the EndpointsLastChangeTriggerTime annotation, which implementation can be found in #71998

Does this PR introduce a user-facing change?:
NONE

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 12, 2018

@mateuszmatejczyk: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 12, 2018

Hi @mateuszmatejczyk. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mm4tt

This comment has been minimized.

Copy link
Contributor Author

mm4tt commented Dec 12, 2018

/assign @wojtek-t

@k8s-ci-robot k8s-ci-robot requested review from freehan and lavalamp Dec 12, 2018

@k8s-ci-robot k8s-ci-robot added sig/network and removed needs-sig labels Dec 12, 2018

@mm4tt

This comment has been minimized.

Copy link
Contributor Author

mm4tt commented Dec 12, 2018

/assign @wojtek-t

@mm4tt

This comment has been minimized.

Copy link
Contributor Author

mm4tt commented Dec 12, 2018

/uncc @lavalamp
/uncc @freehan

Unassigning @lavalamp and @freehan until @wojtek-t takes a look.

@k8s-ci-robot k8s-ci-robot removed request for lavalamp and freehan Dec 12, 2018

@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Dec 12, 2018

I will do the first pass later this week and then will add more reviewers from networking team.

@wojtek-t
Copy link
Member

wojtek-t left a comment

This looks reasonable to me. Hover, this doesn't make sense to submit until #71998 is at least ready for merging.
So holding for now, but assigning someone else to also take a look.

/assign @freehan

/hold
/ok-to-test

}
return len(ect.items) > 0
}

func getLastChangeTriggerTime(endpoints *v1.Endpoints) time.Time {
val, _ := time.Parse(time.RFC3339Nano, endpoints.Annotations[v1.EndpointsLastChangeTriggerTime])

This comment has been minimized.

@wojtek-t

wojtek-t Dec 13, 2018

Member

Don't silently ignore errors - if not more, at least log an error.

This comment has been minimized.

@mm4tt

mm4tt Dec 13, 2018

Author Contributor

Done. Added log statement and a comment explaining why we can ignore the error.

@@ -30,6 +30,7 @@ import (
"k8s.io/client-go/tools/record"
utilproxy "k8s.io/kubernetes/pkg/proxy/util"
utilnet "k8s.io/kubernetes/pkg/util/net"
"time"

This comment has been minimized.

@freehan

freehan Dec 20, 2018

Member

move this up to the upper group

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

Done.

@@ -26,6 +26,8 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/sets"
"time"
"sort"

This comment has been minimized.

@freehan

freehan Dec 20, 2018

Member

move to the group on the top

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

Done.

@@ -92,6 +93,9 @@ type EndpointChangeTracker struct {
// isIPv6Mode indicates if change tracker is under IPv6/IPv4 mode. Nil means not applicable.
isIPv6Mode *bool
recorder record.EventRecorder
// Map from the Endpoints namespaced-name to the time of the trigger that caused the endpoints
// object to change. Used to calculate the network-programming-latency.
lastChangeTriggerTimes map[types.NamespacedName]time.Time

This comment has been minimized.

@freehan

freehan Dec 20, 2018

Member

Add a Warning about potential time drift between endpoint controller and kube-proxy. IIUC, if the clock on master and nodes are drifted. The measurement may not be accurate right?

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

Good point.
Added in the documentation of the metric that exports the Network Programming Latency.

Show resolved Hide resolved pkg/proxy/endpoints.go
}
change.current = ect.endpointsToEndpointsMap(current)
// if change.previous equal to change.current, it means no change
if reflect.DeepEqual(change.previous, change.current) {
delete(ect.items, namespacedName)
delete(ect.lastChangeTriggerTimes, namespacedName)

This comment has been minimized.

@freehan

freehan Dec 20, 2018

Member

What happen if there are 2 consecutive events, one event has different endpoints, the other event has the same endpoints? Will the 2nd one erase the timestamp of the first one? Causing the 1st event was not measured? Will this happen?

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

IIUC, the situation you described is something like this:

T0: proxier.Sync()
T1: proxier observes Endpoints E1 change, E1.EndpointsLastChangeTriggerTime = t0
T2: proxier observes Endpoints E1 change, E1.EndpointsLastChangeTriggerTime = t1 (t1>=t0)
T3: proxier.Sync()

In such case the implementation will ignore the second timestamp and use t0 (which is guaranteed to be <= t1) to measure the latency.

Let me know if it makes sense.

This comment has been minimized.

@wojtek-t

wojtek-t Feb 6, 2019

Member

Ohh - I think I may misunderstood it initially.
I think what Minhan is suggesting is the case:

T0 proxier Sync() (endpoint is in E0 state)
t1: proxier observes E1
t2: proxier observes E2, but this E2 resolves to the same endpointMap as E0
T3: sync

In this case, the sync may be no-op, but in fact those were two changes.

I think that instead of reporting just the first change since last Sync, I'm leaning towards saying that we may want to report all of them (i.e. it translates to keeping []time.Time instead of single time.Time for a given namespace). I don'tthink it's that extremely expensive (it's just stored in memory).

@freehan @mm4tt - thoughts?

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

I agree with reporting multiple triggerTimes for the same endpoint per sync. It makes sense and should be relatively cheap. I updated the code to do that.

Please note that regarding opposite events, that combined reduce to no-change, I left the logic as was, i.e. I'm not exporting the metric for such events.
The definition of Network Programming Latency SLI is (more or less): duration between a time of a change and a time when the network was programmed to incorporate that change. If there are events A and B that together result in no-change (between syncs) there won't be any network programming for neither of them and according the the SLI definition there is nothing to export. To give you more sophisticated example (E0,E1,E2... - endpoints states; A,B,C -> events) in case of:
(Sync) E0 -> A -> E1 -> B -> E2 -> C -> E0 -> D -> E3 -> E -> E4 (Sync)
the latency will be exported only for the trigger times of D and E.

Please correct me if you don't agree with my understanding / reasoning.

This comment has been minimized.

@wojtek-t

wojtek-t Feb 11, 2019

Member

TBH, I don't have very strong opinion which one is actually correct :-)
I'm fine with what you're proposing (i.e. skipping it in that case), but please add an explicit comment that this is a conscious decision.

This comment has been minimized.

@mm4tt

mm4tt Feb 11, 2019

Author Contributor

Done, added a comment.

This comment has been minimized.

@freehan

freehan Feb 12, 2019

Member

Okay. Noop sounds good.

@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Feb 6, 2019

/hold cancel

With endpoint controller changes already being merged, we are ready to resurrect this change.
@mm4tt - can you please rebase?

@mm4tt mm4tt force-pushed the mm4tt:kube-proxy branch from c116d4f to bfd4ee1 Feb 6, 2019

@mm4tt
Copy link
Contributor Author

mm4tt left a comment

Thanks, PTAL

@@ -30,6 +30,7 @@ import (
"k8s.io/client-go/tools/record"
utilproxy "k8s.io/kubernetes/pkg/proxy/util"
utilnet "k8s.io/kubernetes/pkg/util/net"
"time"

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

Done.

Show resolved Hide resolved pkg/proxy/endpoints.go
@@ -26,6 +26,8 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/sets"
"time"
"sort"

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

Done.

@@ -92,6 +93,9 @@ type EndpointChangeTracker struct {
// isIPv6Mode indicates if change tracker is under IPv6/IPv4 mode. Nil means not applicable.
isIPv6Mode *bool
recorder record.EventRecorder
// Map from the Endpoints namespaced-name to the time of the trigger that caused the endpoints
// object to change. Used to calculate the network-programming-latency.
lastChangeTriggerTimes map[types.NamespacedName]time.Time

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

Good point.
Added in the documentation of the metric that exports the Network Programming Latency.

}
change.current = ect.endpointsToEndpointsMap(current)
// if change.previous equal to change.current, it means no change
if reflect.DeepEqual(change.previous, change.current) {
delete(ect.items, namespacedName)
delete(ect.lastChangeTriggerTimes, namespacedName)

This comment has been minimized.

@mm4tt

mm4tt Feb 6, 2019

Author Contributor

IIUC, the situation you described is something like this:

T0: proxier.Sync()
T1: proxier observes Endpoints E1 change, E1.EndpointsLastChangeTriggerTime = t0
T2: proxier observes Endpoints E1 change, E1.EndpointsLastChangeTriggerTime = t1 (t1>=t0)
T3: proxier.Sync()

In such case the implementation will ignore the second timestamp and use t0 (which is guaranteed to be <= t1) to measure the latency.

Let me know if it makes sense.

@mm4tt mm4tt force-pushed the mm4tt:kube-proxy branch from bfd4ee1 to 0c74ca6 Feb 6, 2019

Name: "network_programming_latency_seconds",
Help: "In Cluster Network Programming Latency in seconds",
// The last bucket will be [0.001s*2^20 ~= 17min, +inf)
Buckets: prometheus.ExponentialBuckets(0.001, 2, 20),

This comment has been minimized.

@wojtek-t

wojtek-t Feb 11, 2019

Member

I'm not convinced that the buckets are correct.
I'm fine with leaving it as is for now, but please leave a TODO to reevaluate it before 1.14 release.

This comment has been minimized.

@mm4tt

mm4tt Feb 11, 2019

Author Contributor

Done.

@wojtek-t
Copy link
Member

wojtek-t left a comment

Two more comments - other than that lgtm.

@mm4tt mm4tt force-pushed the mm4tt:kube-proxy branch from 0c74ca6 to c0d750a Feb 11, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Feb 11, 2019

That LGTM.

@freehan - can you please take another look?

@freehan
Copy link
Member

freehan left a comment

just a nit, LGTM overall

// Reset the lastChangeTriggerTimes for the Endpoints object. Given that the network programming
// SLI is defined as the duration between a time of an event and a time when the network was
// programmed to incorporate that event, if there are events that happened between two
// consecutive syncs syncs and that canceled each other out, e.g. pod A added -> pod A deleted,

This comment has been minimized.

@freehan

freehan Feb 12, 2019

Member

There are 2 syncs

This comment has been minimized.

@mm4tt

mm4tt Feb 12, 2019

Author Contributor

Good catch, done.

}
change.current = ect.endpointsToEndpointsMap(current)
// if change.previous equal to change.current, it means no change
if reflect.DeepEqual(change.previous, change.current) {
delete(ect.items, namespacedName)
delete(ect.lastChangeTriggerTimes, namespacedName)

This comment has been minimized.

@freehan

freehan Feb 12, 2019

Member

Okay. Noop sounds good.

@mm4tt mm4tt force-pushed the mm4tt:kube-proxy branch from c0d750a to 7141ece Feb 12, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

wojtek-t commented Feb 12, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 12, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 12, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mm4tt, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mm4tt

This comment has been minimized.

Copy link
Contributor Author

mm4tt commented Feb 12, 2019

/retest

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 12, 2019

@mm4tt: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws c116d4f link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@krzysied

This comment has been minimized.

Copy link
Contributor

krzysied commented Feb 12, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 41d2445 into kubernetes:master Feb 12, 2019

17 checks passed

cla/linuxfoundation mm4tt authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped
tide In merge pool.
Details

@mm4tt mm4tt deleted the mm4tt:kube-proxy branch Feb 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment