Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add APF's priorityLevel to httplog.go #104359

Merged
merged 1 commit into from
Aug 24, 2021
Merged

Conversation

mborsz
Copy link
Member

@mborsz mborsz commented Aug 13, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

It adds priorityLevel=X verb to httplog to increase visibility of what prioritylevel has been used for a given request.

Alternative implementations considered:

  • using httplog's addedInfo for this purpose -- it seems to be less readable than this approach
  • adding a new field to RequestInfo to avoid creating a new filter -- I'm not sure if RequestInfo is a right place for this kind of data

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

WIP, I will add more tests once we agree that this is a right approach to take.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc @wojtek-t @lavalamp

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 13, 2021
@mborsz
Copy link
Member Author

mborsz commented Aug 13, 2021

Sample log entry (from presubmit https://storage.googleapis.com/sig-scalability-logs/pull-kubernetes-e2e-gce-100-performance/1426160239798915072/e2e-104359-95a39-master/kube-apiserver.log):

I0813 13:07:25.135758      11 httplog.go:104] "HTTP" verb="PUT" URI="/apis/discovery.k8s.io/v1/namespaces/test-nfdrhk-1/endpointslices/small-service-50-nw6c4" latency="3.451567ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/06ddc9e/system:serviceaccount:kube-system:endpointslice-controller" audit-ID="974e0927-e4c5-4ccb-8917-6c1d139f166b" srcIP="[::1]:50590" resp=200 priorityLevel="workload-high"
I0813 13:07:25.135967      11 httplog.go:104] "HTTP" verb="PUT" URI="/apis/apps/v1/namespaces/test-nfdrhk-1/replicasets/small-deployment-50-7bdcf7d969/status" latency="4.046861ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/06ddc9e/system:serviceaccount:kube-system:replicaset-controller" audit-ID="b2a264b2-55ce-4a68-b5a2-a55af1b2cacf" srcIP="[::1]:50590" resp=200 priorityLevel="workload-high"

return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()

r = r.WithContext(request.WithPriorityAndFairnessClassification(ctx, &classification.PriorityAndFairnessClassification{}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to build this into the main APF handler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I think that if we want to extract this from context in httplog handler, then this field must be added to context before we enter httplog's handler (as r.WithContext returns only a copy of request so the original request is not changed). If we set this only in the main APF handler, then the the httplog handler (which wraps APF handler) will never see the context change.

Similar approach is used for audit-id -- it is set before other audit handlers only to make it visible in WithLogging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach would be to use httplog's "addedInfo" feature and in APF's handler extract logging from context and add priorityLevel to addedInfo. In the current form "addedInfo" is simply a string you can append to, but technically we can make it a key=value map and finally have similar readability as we have in this approach.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we are using this filter only to initialize a PriorityAndFairnessClassification into the request context, can we do the initialization inside the main p&f filter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, WithLogging won't have access to the copied request context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach would be to use httplog's "addedInfo" feature and in APF's handler extract logging from context and add priorityLevel to addedInfo

I think we already use addedInfo - https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/filterlatency/filterlatency.go#L66-L68

It seems to be a good alternative.

Copy link
Member Author

@mborsz mborsz Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addedInfo in the current form is quite hard to read it prints as addedInfo="\nslowFilter=200ms\npriorityLevel=system\nflowSchema=abcd" (where slowFilter=200ms is from filterlatency.go linked above), but we may want to solve readability issue anyway (by e.g. making it key=value).

I prefer sticking to the current approach as it's already done this way and I'm not sure I see any real value from migrating to that approach right now, but if we find a good reason to change approach I can do that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are using httplog.AddInfof, do we still need the filter WithPriorityAndFairnessClassification?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be a runtime optimization if we can do without a new association in the request context. on the other hand, I am OK having WithPriorityAndFairnessClassification.

return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()

r = r.WithContext(request.WithPriorityAndFairnessClassification(ctx, &classification.PriorityAndFairnessClassification{}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we are using this filter only to initialize a PriorityAndFairnessClassification into the request context, can we do the initialization inside the main p&f filter?

@@ -218,6 +218,10 @@ func (rl *respLogger) Log() {
}
}

if plName := request.GetPriorityLevelNameTruncated(rl.req.Context()); plName != "" {
keysAndValues = append(keysAndValues, "priorityLevel", plName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple FlowSchemas can be associated with a single priorityLevel - so it may not be possible to get to the FlowSchema name from a given priority-level.

One option is to add both flowSchema and priorityLevel into httplog, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the FlowSchema implies the priority level and not the other way around (for a given configuration), so if you want to log only one then it should be the FlowSchema.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking on priorityLevel as this is actually what matters when we debug high latency calls.
Adding only flowSchema will make such debugging harder, but I'm happy to add flowSchema as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if you want to log only one then it should be the FlowSchema.

I guess in most cases we would want to know the priorityLevel - if we add only flowSchema then the operator will need to run kubectl command and find the related priorityLevel, which may be cumbersome. I am +1 for adding both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlowSchema matters, for at least two reasons:

  1. Helping explain why the particular priority level applied, and
  2. the choice of flow distinguisher matters.

I am not enthused about making the httplog message longer. Another consideration might be how much information can be gotten by an investigator who does not have access to the cluster config (either at all, or because it is no longer the current config). Perhaps the logic that digests the config should be sure to write it at the same V level as the httplog message (which is 3, if I recall correctly)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth making the message longer. I agree it's ideal to see something like APF(FS=<name> PL=<name> D=<value>) to keep it compact. Log search is much easier if we include the PL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp , what do you mean D=<value> to refer to? As long as I am wondering, I suppose I wonder about the chosen width and extra latency.

I am not sure introducing a layer of syntax is a good idea. I hope we can agree on short enough identifiers that we can stay with a flat set of fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for keeping this as part of httplog. It simply makes debugging easier.

httplog doesn't support anything more than keys and values, but as suggested by @MikeSpreitzer we can use shorter identifiers like 'pl' for priorityLevel and 'fs' for flowSchema.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By D I meant the discriminator. If you don't wrap the whole thing with something APF specific it makes it a bit longer, e.g. APF_FS, APF_PL, APF_D etc. No one will figure out what "D" means without the APF context clue, but the others maybe would be ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added FlowDistinguisher and added APF_ prefix. PTAL

@MikeSpreitzer
Copy link
Member

@MikeSpreitzer


// priorityAndFairnessClassificationClassificationKey is the context key for
// APF classification.
priorityAndFairnessClassificationClassificationKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the indentation here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. Could you elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the various entries in this ( ... ) are at different levels of indentation. How about making them all the same?

Copy link
Member

@marseel marseel Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's just visual "bug" on github. I've downloaded patch and all lines had the same indent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It renders properly on my side and I manually checked that all lines in (...) have one tab indentation.

@fedebongio
Copy link
Contributor

/assign @lavalamp
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 17, 2021
@mborsz mborsz force-pushed the apflog branch 2 times, most recently from be72b53 to 6dc5e26 Compare August 19, 2021 13:16
@lavalamp
Copy link
Member

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 19, 2021
queueNoteFn fq.QueueNoteFn,
execFn func(),
) {
if t.mockDecision == decisionSkipFilter {
panic("Handle should not be invoked")
}
noteFn(bootstrap.SuggestedFlowSchemaGlobalDefault, bootstrap.SuggestedPriorityLevelConfigurationGlobalDefault)
noteFn(bootstrap.SuggestedFlowSchemaGlobalDefault, bootstrap.SuggestedPriorityLevelConfigurationGlobalDefault, string(bootstrap.SuggestedFlowSchemaGlobalDefault.Spec.DistinguisherMethod.Type))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is only a test, so it does not really matter what the string is here, but note that in the regular code the distinguisher is not a DistinguisherMethod.Type but rather a string computed by the identified method.

keysAndValues = append(keysAndValues,
"apf_pl", truncateLogField(pfc.PriorityLevelName),
"apf_fs", truncateLogField(pfc.FlowSchemaName),
"apf_d", pfc.FlowDistinguisher,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think these are intrinsically limited to 64 characters. A username could be longer, right?

@lavalamp
Copy link
Member

/lgtm cancel

Mike has some good points

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2021
@@ -460,7 +460,7 @@ func checkNewFS(cts *ctlrTestState, rng *rand.Rand, trialName string, ftr *fsTes
startWG.Add(1)
go func(matches, isResource bool, rdu RequestDigest) {
expectedMatch := matches && ftr.wellFormed && (fsPrecedes(fs, catchAlls[isResource]) || fs.Name == catchAlls[isResource].Name)
ctlr.Handle(ctx, rdu, func(matchFS *flowcontrol.FlowSchema, matchPL *flowcontrol.PriorityLevelConfiguration) {
ctlr.Handle(ctx, rdu, func(matchFS *flowcontrol.FlowSchema, matchPL *flowcontrol.PriorityLevelConfiguration, _ string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flowDistinguisher could be checked. Does not have to be added in this PR.

@tkashem
Copy link
Contributor

tkashem commented Aug 19, 2021

structured key/value pair for httplog - work in progress #104465

@mborsz this is how we would use it, let me know if the httplog PR is useful

httplog.AddKeyValue(ctx, "APF_FS", classification.FlowSchemaName)
httplog.AddKeyValue(ctx, "APF_PL", classification.PriorityLevelName)
I0819 17:05:34.670038 2699979 httplog.go:110] "HTTP" verb="GET" URI="/apis/autoscaling/v2beta2?timeout=32s" latency="6.24406ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/7413f44/system:serviceaccount:kube-system:generic-garbage-collector" 
audit-ID="556c4e68-bc99-4c26-8c10-8b4a37b983cf" srcIP="[::1]:54748" 
APF_FS="kube-system-service-accounts" APF_PL="workload-high" resp=200

@tkashem
Copy link
Contributor

tkashem commented Aug 19, 2021

/retest

@MikeSpreitzer
Copy link
Member

MikeSpreitzer commented Aug 20, 2021

@mborsz , @tkashem : I like the direction of PR #104465 very much. In fact, it enables an even better solution than suggested above to something that is bothering me about the present PR. This PR builds knowledge of the APF classification logging into httplog (which I think should be treated as a lower level thing). PR #104465 enables us to remove that detail from httplog, undo the added filter introduced by this PR, and make the regular APF filter add the classification information.

I am willing to approve the present PR as an interim step, if we agree on the further direction.

@mborsz
Copy link
Member Author

mborsz commented Aug 20, 2021

@tkashem Thanks a lot for #104465. I think it helps a lot!

@tkashem @MikeSpreitzer I think in one of comments above I mentioned that I'm fine with both the current approach and using httplog's addedInfo (if we only manage to make it more readable). I was preferring to stick to the current approach as it's close to be submittable, but given that #104465 looks nearly ready to go, it seems to be reasonable to me to block this PR on #104465 and rewrite this using httplog.AddKeyValue.

Anyway, after thinking on this again I agree that httplog.AddKeyValue-based solution seems to be better overall as:

  • As @MikeSpreitzer mentioned it decouples httplog and APF controllers logic
  • Is easier to reuse for other filters (although I'm not sure how much more data we can fit in "one-liner" httplogs)
  • Will be easier to potentially cherry-pick to older version (it will consist of two local changes rather than a distributed change)

So yes, let's wait for #104465 (assuming it will be merged in O(days)) and rebase this PR.

@mborsz
Copy link
Member Author

mborsz commented Aug 20, 2021

structured key/value pair for httplog - work in progress #104465

Maciej Borsz this is how we would use it, let me know if the httplog PR is useful

httplog.AddKeyValue(ctx, "APF_FS", classification.FlowSchemaName)
httplog.AddKeyValue(ctx, "APF_PL", classification.PriorityLevelName)
I0819 17:05:34.670038 2699979 httplog.go:110] "HTTP" verb="GET" URI="/apis/autoscaling/v2beta2?timeout=32s" latency="6.24406ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/7413f44/system:serviceaccount:kube-system:generic-garbage-collector" 
audit-ID="556c4e68-bc99-4c26-8c10-8b4a37b983cf" srcIP="[::1]:54748" 
APF_FS="kube-system-service-accounts" APF_PL="workload-high" resp=200

Yes, the format looks good. The idea of implementing this way seems to be cleaner to me. Thanks a lot for your help here!

@tkashem
Copy link
Contributor

tkashem commented Aug 20, 2021

@mborsz @MikeSpreitzer #104465 is ready for review.

@wojtek-t
Copy link
Member

/assign

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2021
@MikeSpreitzer
Copy link
Member

@mborsz : #104465 has merged, so this PR is ready for rebase and update.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 24, 2021
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much easier in the current version.


httplog.AddKeyValue(ctx, "apf_pl", truncateLogField(pl.Name))
httplog.AddKeyValue(ctx, "apf_fs", truncateLogField(fs.Name))
httplog.AddKeyValue(ctx, "apf_d", truncateLogField(flowDistinguisher))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to log flowDistinguisher too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion above and I think the conclusion was that it may be useful. @lavalamp @MikeSpreitzer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs and pl names seem to be a good starting point for now, maybe we can add flowDistinguisher in the future if we need to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I frankly am not sure how much value there is in including the flow distinguisher. It is definitely relevant. My concern is whether it is difficult enough to discover from other evidence that it is worth putting in this log line that is printed for every request. I will not object either way.

If we do include it, I suggest a slightly longer name: apf_fd (for "flow distinguisher", a logical companion to "flow schema").

@mborsz
Copy link
Member Author

mborsz commented Aug 24, 2021

Sample log entry from presubmit (this run):

I0824 09:39:10.382141      12 httplog.go:105] "HTTP" verb="PUT" URI="/apis/apps/v1/namespaces/test-9odqar-1/replicasets/small-deployment-187-7996444466/status" latency="3.4923ms" userAgent="kube-controller-manager/v1.23.0 (linux/amd64) kubernetes/32bb50c/system:serviceaccount:kube-system:replicaset-controller" audit-ID="2042a61f-51ae-459b-bd71-9a738e7c5c1d" srcIP="[::1]:33686" apf_pl="workload-high" apf_fs="kube-system-service-accounts" apf_d="test-9odqar-1" resp=200

@tkashem I think it's slightly less readable when "resp=200" is the last part of httplog. It would be easier to read when it would be close to place it used to be and httplog.AddKeyValue logs would be simply in the end. WDYT?

@tkashem
Copy link
Contributor

tkashem commented Aug 24, 2021

@tkashem I think it's slightly less readable when "resp=200" is the last part of httplog. It would be easier to read when it would be close to place it used to be and httplog.AddKeyValue logs would be simply in the end. WDYT?

@mborsz sure, sounds good, I can restore the order of the key value pairs as part of the optimization PR I am working on.

const maxFieldLogLength = 64

if len(s) > maxFieldLogLength {
s = s[0:maxFieldLogLength]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0 here is not incorrect, but is unnecessary and is traditionally omitted.

Copy link
Member

@MikeSpreitzer MikeSpreitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor remarks in independent comments. They can be addressed in follow-up.

/LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, mborsz, MikeSpreitzer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants