Add APF's priorityLevel to httplog.go #104359

mborsz · 2021-08-13T12:14:22Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

It adds priorityLevel=X verb to httplog to increase visibility of what prioritylevel has been used for a given request.

Alternative implementations considered:

using httplog's addedInfo for this purpose -- it seems to be less readable than this approach
adding a new field to RequestInfo to avoid creating a new filter -- I'm not sure if RequestInfo is a right place for this kind of data

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

WIP, I will add more tests once we agree that this is a right approach to take.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/cc @wojtek-t @lavalamp

mborsz · 2021-08-13T13:54:42Z

Sample log entry (from presubmit https://storage.googleapis.com/sig-scalability-logs/pull-kubernetes-e2e-gce-100-performance/1426160239798915072/e2e-104359-95a39-master/kube-apiserver.log):

I0813 13:07:25.135758      11 httplog.go:104] "HTTP" verb="PUT" URI="/apis/discovery.k8s.io/v1/namespaces/test-nfdrhk-1/endpointslices/small-service-50-nw6c4" latency="3.451567ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/06ddc9e/system:serviceaccount:kube-system:endpointslice-controller" audit-ID="974e0927-e4c5-4ccb-8917-6c1d139f166b" srcIP="[::1]:50590" resp=200 priorityLevel="workload-high"
I0813 13:07:25.135967      11 httplog.go:104] "HTTP" verb="PUT" URI="/apis/apps/v1/namespaces/test-nfdrhk-1/replicasets/small-deployment-50-7bdcf7d969/status" latency="4.046861ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/06ddc9e/system:serviceaccount:kube-system:replicaset-controller" audit-ID="b2a264b2-55ce-4a68-b5a2-a55af1b2cacf" srcIP="[::1]:50590" resp=200 priorityLevel="workload-high"

lavalamp · 2021-08-13T16:08:04Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/apf_classification.go

+	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		ctx := r.Context()
+
+		r = r.WithContext(request.WithPriorityAndFairnessClassification(ctx, &classification.PriorityAndFairnessClassification{}))


Is there a reason not to build this into the main APF handler?

Correct me if I'm wrong, but I think that if we want to extract this from context in httplog handler, then this field must be added to context before we enter httplog's handler (as r.WithContext returns only a copy of request so the original request is not changed). If we set this only in the main APF handler, then the the httplog handler (which wraps APF handler) will never see the context change.

Similar approach is used for audit-id -- it is set before other audit handlers only to make it visible in WithLogging.

An alternative approach would be to use httplog's "addedInfo" feature and in APF's handler extract logging from context and add priorityLevel to addedInfo. In the current form "addedInfo" is simply a string you can append to, but technically we can make it a key=value map and finally have similar readability as we have in this approach.

WDYT?

it seems we are using this filter only to initialize a PriorityAndFairnessClassification into the request context, can we do the initialization inside the main p&f filter?

oh yes, WithLogging won't have access to the copied request context

An alternative approach would be to use httplog's "addedInfo" feature and in APF's handler extract logging from context and add priorityLevel to addedInfo

I think we already use addedInfo - https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/filterlatency/filterlatency.go#L66-L68

It seems to be a good alternative.

The addedInfo in the current form is quite hard to read it prints as addedInfo="\nslowFilter=200ms\npriorityLevel=system\nflowSchema=abcd" (where slowFilter=200ms is from filterlatency.go linked above), but we may want to solve readability issue anyway (by e.g. making it key=value).

I prefer sticking to the current approach as it's already done this way and I'm not sure I see any real value from migrating to that approach right now, but if we find a good reason to change approach I can do that as well.

if we are using httplog.AddInfof, do we still need the filter WithPriorityAndFairnessClassification?

it seems to be a runtime optimization if we can do without a new association in the request context. on the other hand, I am OK having WithPriorityAndFairnessClassification.

tkashem · 2021-08-16T12:59:52Z

staging/src/k8s.io/apiserver/pkg/endpoints/filters/apf_classification.go

+	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		ctx := r.Context()
+
+		r = r.WithContext(request.WithPriorityAndFairnessClassification(ctx, &classification.PriorityAndFairnessClassification{}))


it seems we are using this filter only to initialize a PriorityAndFairnessClassification into the request context, can we do the initialization inside the main p&f filter?

tkashem · 2021-08-16T13:07:43Z

staging/src/k8s.io/apiserver/pkg/server/httplog/httplog.go

@@ -218,6 +218,10 @@ func (rl *respLogger) Log() {
 		}
 	}

+	if plName := request.GetPriorityLevelNameTruncated(rl.req.Context()); plName != "" {
+		keysAndValues = append(keysAndValues, "priorityLevel", plName)


Multiple FlowSchemas can be associated with a single priorityLevel - so it may not be possible to get to the FlowSchema name from a given priority-level.

One option is to add both flowSchema and priorityLevel into httplog, thoughts?

Yes, the FlowSchema implies the priority level and not the other way around (for a given configuration), so if you want to log only one then it should be the FlowSchema.

I was thinking on priorityLevel as this is actually what matters when we debug high latency calls.
Adding only flowSchema will make such debugging harder, but I'm happy to add flowSchema as well.

so if you want to log only one then it should be the FlowSchema.

I guess in most cases we would want to know the priorityLevel - if we add only flowSchema then the operator will need to run kubectl command and find the related priorityLevel, which may be cumbersome. I am +1 for adding both.

FlowSchema matters, for at least two reasons:

Helping explain why the particular priority level applied, and

the choice of flow distinguisher matters.

I am not enthused about making the httplog message longer. Another consideration might be how much information can be gotten by an investigator who does not have access to the cluster config (either at all, or because it is no longer the current config). Perhaps the logic that digests the config should be sure to write it at the same V level as the httplog message (which is 3, if I recall correctly)?

I think this is worth making the message longer. I agree it's ideal to see something like APF(FS=<name> PL=<name> D=<value>) to keep it compact. Log search is much easier if we include the PL.

@lavalamp , what do you mean D=<value> to refer to? As long as I am wondering, I suppose I wonder about the chosen width and extra latency.

I am not sure introducing a layer of syntax is a good idea. I hope we can agree on short enough identifiers that we can stay with a flat set of fields.

+1 for keeping this as part of httplog. It simply makes debugging easier.

httplog doesn't support anything more than keys and values, but as suggested by @MikeSpreitzer we can use shorter identifiers like 'pl' for priorityLevel and 'fs' for flowSchema.

By D I meant the discriminator. If you don't wrap the whole thing with something APF specific it makes it a bit longer, e.g. APF_FS, APF_PL, APF_D etc. No one will figure out what "D" means without the APF context clue, but the others maybe would be ok.

I added FlowDistinguisher and added APF_ prefix. PTAL

MikeSpreitzer · 2021-08-16T14:07:04Z

@MikeSpreitzer

MikeSpreitzer · 2021-08-16T14:08:20Z

staging/src/k8s.io/apiserver/pkg/endpoints/request/context.go

+
+	// priorityAndFairnessClassificationClassificationKey is the context key for
+	// APF classification.
+	priorityAndFairnessClassificationClassificationKey


What happened to the indentation here?

I'm not sure what you mean. Could you elaborate?

I mean that the various entries in this ( ... ) are at different levels of indentation. How about making them all the same?

I'm pretty sure it's just visual "bug" on github. I've downloaded patch and all lines had the same indent.

It renders properly on my side and I manually checked that all lines in (...) have one tab indentation.

fedebongio · 2021-08-17T20:09:14Z

/assign @lavalamp
/triage accepted

lavalamp · 2021-08-19T16:42:25Z

/approve
/lgtm

MikeSpreitzer · 2021-08-19T17:25:58Z

staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness_test.go

 	queueNoteFn fq.QueueNoteFn,
 	execFn func(),
 ) {
 	if t.mockDecision == decisionSkipFilter {
 		panic("Handle should not be invoked")
 	}
-	noteFn(bootstrap.SuggestedFlowSchemaGlobalDefault, bootstrap.SuggestedPriorityLevelConfigurationGlobalDefault)
+	noteFn(bootstrap.SuggestedFlowSchemaGlobalDefault, bootstrap.SuggestedPriorityLevelConfigurationGlobalDefault, string(bootstrap.SuggestedFlowSchemaGlobalDefault.Spec.DistinguisherMethod.Type))


I know this is only a test, so it does not really matter what the string is here, but note that in the regular code the distinguisher is not a DistinguisherMethod.Type but rather a string computed by the identified method.

MikeSpreitzer · 2021-08-19T17:27:56Z

staging/src/k8s.io/apiserver/pkg/server/httplog/httplog.go

+		keysAndValues = append(keysAndValues,
+			"apf_pl", truncateLogField(pfc.PriorityLevelName),
+			"apf_fs", truncateLogField(pfc.FlowSchemaName),
+			"apf_d", pfc.FlowDistinguisher,


I do not think these are intrinsically limited to 64 characters. A username could be longer, right?

lavalamp · 2021-08-19T17:29:51Z

/lgtm cancel

Mike has some good points

MikeSpreitzer · 2021-08-19T17:31:47Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/controller_test.go

@@ -460,7 +460,7 @@ func checkNewFS(cts *ctlrTestState, rng *rand.Rand, trialName string, ftr *fsTes
 				startWG.Add(1)
 				go func(matches, isResource bool, rdu RequestDigest) {
 					expectedMatch := matches && ftr.wellFormed && (fsPrecedes(fs, catchAlls[isResource]) || fs.Name == catchAlls[isResource].Name)
-					ctlr.Handle(ctx, rdu, func(matchFS *flowcontrol.FlowSchema, matchPL *flowcontrol.PriorityLevelConfiguration) {
+					ctlr.Handle(ctx, rdu, func(matchFS *flowcontrol.FlowSchema, matchPL *flowcontrol.PriorityLevelConfiguration, _ string) {


The flowDistinguisher could be checked. Does not have to be added in this PR.

tkashem · 2021-08-19T21:18:58Z

structured key/value pair for httplog - work in progress #104465

@mborsz this is how we would use it, let me know if the httplog PR is useful

httplog.AddKeyValue(ctx, "APF_FS", classification.FlowSchemaName)
httplog.AddKeyValue(ctx, "APF_PL", classification.PriorityLevelName)

I0819 17:05:34.670038 2699979 httplog.go:110] "HTTP" verb="GET" URI="/apis/autoscaling/v2beta2?timeout=32s" latency="6.24406ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/7413f44/system:serviceaccount:kube-system:generic-garbage-collector" 
audit-ID="556c4e68-bc99-4c26-8c10-8b4a37b983cf" srcIP="[::1]:54748" 
APF_FS="kube-system-service-accounts" APF_PL="workload-high" resp=200

tkashem · 2021-08-19T23:05:42Z

/retest

MikeSpreitzer · 2021-08-20T03:20:05Z

@mborsz , @tkashem : I like the direction of PR #104465 very much. In fact, it enables an even better solution than suggested above to something that is bothering me about the present PR. This PR builds knowledge of the APF classification logging into httplog (which I think should be treated as a lower level thing). PR #104465 enables us to remove that detail from httplog, undo the added filter introduced by this PR, and make the regular APF filter add the classification information.

I am willing to approve the present PR as an interim step, if we agree on the further direction.

mborsz · 2021-08-20T08:11:08Z

@tkashem Thanks a lot for #104465. I think it helps a lot!

@tkashem @MikeSpreitzer I think in one of comments above I mentioned that I'm fine with both the current approach and using httplog's addedInfo (if we only manage to make it more readable). I was preferring to stick to the current approach as it's close to be submittable, but given that #104465 looks nearly ready to go, it seems to be reasonable to me to block this PR on #104465 and rewrite this using httplog.AddKeyValue.

Anyway, after thinking on this again I agree that httplog.AddKeyValue-based solution seems to be better overall as:

As @MikeSpreitzer mentioned it decouples httplog and APF controllers logic
Is easier to reuse for other filters (although I'm not sure how much more data we can fit in "one-liner" httplogs)
Will be easier to potentially cherry-pick to older version (it will consist of two local changes rather than a distributed change)

So yes, let's wait for #104465 (assuming it will be merged in O(days)) and rebase this PR.

mborsz · 2021-08-20T08:38:34Z

structured key/value pair for httplog - work in progress #104465

Maciej Borsz this is how we would use it, let me know if the httplog PR is useful

httplog.AddKeyValue(ctx, "APF_FS", classification.FlowSchemaName)
httplog.AddKeyValue(ctx, "APF_PL", classification.PriorityLevelName)

I0819 17:05:34.670038 2699979 httplog.go:110] "HTTP" verb="GET" URI="/apis/autoscaling/v2beta2?timeout=32s" latency="6.24406ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/7413f44/system:serviceaccount:kube-system:generic-garbage-collector" 
audit-ID="556c4e68-bc99-4c26-8c10-8b4a37b983cf" srcIP="[::1]:54748" 
APF_FS="kube-system-service-accounts" APF_PL="workload-high" resp=200

Yes, the format looks good. The idea of implementing this way seems to be cleaner to me. Thanks a lot for your help here!

tkashem · 2021-08-20T15:15:17Z

@mborsz @MikeSpreitzer #104465 is ready for review.

wojtek-t · 2021-08-23T06:48:59Z

/assign

MikeSpreitzer · 2021-08-24T02:51:58Z

@mborsz : #104465 has merged, so this PR is ready for rebase and update.

wojtek-t

It looks much easier in the current version.

wojtek-t · 2021-08-24T09:56:57Z

staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go

+
+			httplog.AddKeyValue(ctx, "apf_pl", truncateLogField(pl.Name))
+			httplog.AddKeyValue(ctx, "apf_fs", truncateLogField(fs.Name))
+			httplog.AddKeyValue(ctx, "apf_d", truncateLogField(flowDistinguisher))


Do we need to log flowDistinguisher too?

There was a discussion above and I think the conclusion was that it may be useful. @lavalamp @MikeSpreitzer

fs and pl names seem to be a good starting point for now, maybe we can add flowDistinguisher in the future if we need to?

I frankly am not sure how much value there is in including the flow distinguisher. It is definitely relevant. My concern is whether it is difficult enough to discover from other evidence that it is worth putting in this log line that is printed for every request. I will not object either way.

If we do include it, I suggest a slightly longer name: apf_fd (for "flow distinguisher", a logical companion to "flow schema").

mborsz · 2021-08-24T10:53:48Z

Sample log entry from presubmit (this run):

I0824 09:39:10.382141      12 httplog.go:105] "HTTP" verb="PUT" URI="/apis/apps/v1/namespaces/test-9odqar-1/replicasets/small-deployment-187-7996444466/status" latency="3.4923ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/32bb50c/system:serviceaccount:kube-system:replicaset-controller" audit-ID="2042a61f-51ae-459b-bd71-9a738e7c5c1d" srcIP="[::1]:33686" apf_pl="workload-high" apf_fs="kube-system-service-accounts" apf_d="test-9odqar-1" resp=200

@tkashem I think it's slightly less readable when "resp=200" is the last part of httplog. It would be easier to read when it would be close to place it used to be and httplog.AddKeyValue logs would be simply in the end. WDYT?

tkashem · 2021-08-24T12:47:00Z

@tkashem I think it's slightly less readable when "resp=200" is the last part of httplog. It would be easier to read when it would be close to place it used to be and httplog.AddKeyValue logs would be simply in the end. WDYT?

@mborsz sure, sounds good, I can restore the order of the key value pairs as part of the optimization PR I am working on.

MikeSpreitzer · 2021-08-24T15:46:41Z

staging/src/k8s.io/apiserver/pkg/server/filters/priority-and-fairness.go

+	const maxFieldLogLength = 64
+
+	if len(s) > maxFieldLogLength {
+		s = s[0:maxFieldLogLength]


The 0 here is not incorrect, but is unnecessary and is traditionally omitted.

MikeSpreitzer

I left a few minor remarks in independent comments. They can be addressed in follow-up.

/LGTM

k8s-ci-robot · 2021-08-24T15:55:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, mborsz, MikeSpreitzer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from lavalamp and wojtek-t August 13, 2021 12:14

mborsz force-pushed the apflog branch from b540d67 to 670f960 Compare August 13, 2021 12:34

lavalamp reviewed Aug 13, 2021

View reviewed changes

tkashem reviewed Aug 16, 2021

View reviewed changes

MikeSpreitzer reviewed Aug 16, 2021

View reviewed changes

k8s-ci-robot assigned lavalamp Aug 17, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 17, 2021

mborsz force-pushed the apflog branch 2 times, most recently from be72b53 to 6dc5e26 Compare August 19, 2021 13:16

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 19, 2021

MikeSpreitzer reviewed Aug 19, 2021

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2021

MikeSpreitzer reviewed Aug 19, 2021

View reviewed changes

k8s-ci-robot assigned wojtek-t Aug 23, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2021

Add APF's priorityLevel to httplog.go

6db6386

mborsz force-pushed the apflog branch from 6dc5e26 to 6db6386 Compare August 24, 2021 09:04

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 24, 2021

wojtek-t reviewed Aug 24, 2021

View reviewed changes

MikeSpreitzer reviewed Aug 24, 2021

View reviewed changes

MikeSpreitzer approved these changes Aug 24, 2021

View reviewed changes

k8s-ci-robot assigned MikeSpreitzer Aug 24, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2021

k8s-ci-robot merged commit ef75433 into kubernetes:master Aug 24, 2021

k8s-ci-robot added this to the v1.23 milestone Aug 24, 2021

jupblb mentioned this pull request Mar 10, 2022

Remove apf_fd from httplog #108631

Merged

Add APF's priorityLevel to httplog.go #104359

Add APF's priorityLevel to httplog.go #104359

Conversation

mborsz commented Aug 13, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

mborsz commented Aug 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mborsz Aug 16, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer Aug 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeSpreitzer commented Aug 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marseel Aug 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedebongio commented Aug 17, 2021

lavalamp commented Aug 19, 2021

MikeSpreitzer Aug 19, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lavalamp commented Aug 19, 2021

Choose a reason for hiding this comment

tkashem commented Aug 19, 2021 • edited

tkashem commented Aug 19, 2021

MikeSpreitzer commented Aug 20, 2021 • edited

mborsz commented Aug 20, 2021

mborsz commented Aug 20, 2021

tkashem commented Aug 20, 2021

wojtek-t commented Aug 23, 2021

MikeSpreitzer commented Aug 24, 2021

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mborsz commented Aug 24, 2021

tkashem commented Aug 24, 2021

Choose a reason for hiding this comment

MikeSpreitzer left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 24, 2021

mborsz Aug 16, 2021 •

edited

MikeSpreitzer Aug 18, 2021 •

edited

marseel Aug 18, 2021 •

edited

MikeSpreitzer Aug 19, 2021 •

edited

tkashem commented Aug 19, 2021 •

edited

MikeSpreitzer commented Aug 20, 2021 •

edited