Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tracing for long running requests. #69756

Merged
merged 1 commit into from
Oct 29, 2018

Conversation

cheftako
Copy link
Member

What this PR does / why we need it:
Adds tracing for use apiserver requests (get etc) where etcd is not the cause of the long running
requests.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #69748

Special notes for your reviewer:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 13, 2018
@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 13, 2018
@neolit123
Copy link
Member

do we need a release note?
/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 13, 2018
@cheftako
Copy link
Member Author

@neolit123 Historically trace PRs such as #44667 have not needed release notes. This change should only be adding logging to help debug long running requests. Is there a particular part of the change you feel we need to warn customers about?

@neolit123
Copy link
Member

@cheftako ok, sounds good.

@cheftako
Copy link
Member Author

/test pull-kubernetes-integration

@fedebongio
Copy link
Contributor

/assign @wojtek-t

transformResponseObject(ctx, scope, req, w, status, result)
}
}

// DeleteCollection returns a function that will handle a collection deletion
func DeleteCollection(r rest.CollectionDeleter, checkBody bool, scope RequestScope, admit admission.Interface) http.HandlerFunc {
return func(w http.ResponseWriter, req *http.Request) {
trace := utiltrace.New("Delete " + req.URL.Path)
defer trace.LogIfLong(1000 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with all other requests *create/update", maybe 500ms (we use 500ms there)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question. On the other collection operations we do 500ms but we start them later in the operation. https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/handlers/get.go#L260 for example skips any watch latency.
I can certainly change to 500ms but I am wondering if that may trigger to often on collection requests which are expected to take longer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - get is special, so let's ignore it.

But for both create:
https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/handlers/create.go#L47
and update:
https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/handlers/update.go#L49

we start at exactly the same place and use 500ms.

So I think we should be consistent.

mediaType, _, err := negotiation.NegotiateOutputMediaType(req, scope.Serializer, &scope)
if err != nil {
status := responsewriters.ErrorToAPIStatus(err)
trace.Step("Writing raw JSON response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one won't provide much value - negotiation is really fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted but I suspect that some of the time the problem is actually writing the response to the requester. As such I would like a tight bracket around the "responsewriters.WriteRawJSON(int(status.Code), status, w)" call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - writing response is many times dominating factor - I completely agree that.

I completely wasn't clear what I wanted to say - sorry for that.
My point was that - the time between "negotiating output media type" and this step should be negligible.

So given that your point is completely correct, I think what I wanted to say is to:

  • leave this one (to see whether we spent time here)
  • but remove the "negotiating output media type" step (since different between this one and that should be negligible).

Does it make sense now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

_, info, err := negotiation.NegotiateOutputMediaType(req, metainternalversion.Codecs, &scope)
if err != nil {
scope.err(err, w, req)
return
}
encoder := metainternalversion.Codecs.EncoderForVersion(info.Serializer, metav1beta1.SchemeGroupVersion)
trace.Step(fmt.Sprintf("Serializing response as type %s", info.MediaType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 99,9% sure that all the time will spent after this.

So TBH, I would skip the previous steps (as tracing itself has some overhead) and leave just this one...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again trying to get a tight bracket around the "responsewriters.SerializeObject(info.MediaType, encoder, w, req, statusCode, partial)" call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point here (sorry for being completely unclear on that).

I completely agree that this Step has value.
I just suggest removing those previous two ones (that should be negligible):

  • trace.Step("Getting object kind, setting GVK")
  • trace.Step("Renegotiating internal version")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -95,12 +102,14 @@ func transformResponseObject(ctx context.Context, scope RequestScope, req *http.
}

// renegotiate under the internal version
trace.Step("Renegotiating internal version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to drop this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now inline with the previous comments - so please drop this and the one mentioned below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -150,25 +162,30 @@ func transformResponseObject(ctx context.Context, scope RequestScope, req *http.
}

// renegotiate under the internal version
trace.Step("Renegotiating internal version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

responsewriters.SerializeObject(info.MediaType, encoder, w, req, statusCode, table)
return

default:
// this block should only be hit if scope AllowsConversion is incorrect
trace.Step("Renegotiating internal version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@logicalhan
Copy link
Member

/lgtm

@k8s-ci-robot
Copy link
Contributor

@logicalhan: changing LGTM is restricted to assignees, and only kubernetes/kubernetes repo collaborators may be assigned issues.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being extremely unclear with what I meant - I hope now it's more clear what i wanted to write.

transformResponseObject(ctx, scope, req, w, status, result)
}
}

// DeleteCollection returns a function that will handle a collection deletion
func DeleteCollection(r rest.CollectionDeleter, checkBody bool, scope RequestScope, admit admission.Interface) http.HandlerFunc {
return func(w http.ResponseWriter, req *http.Request) {
trace := utiltrace.New("Delete " + req.URL.Path)
defer trace.LogIfLong(1000 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - get is special, so let's ignore it.

But for both create:
https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/handlers/create.go#L47
and update:
https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/handlers/update.go#L49

we start at exactly the same place and use 500ms.

So I think we should be consistent.

mediaType, _, err := negotiation.NegotiateOutputMediaType(req, scope.Serializer, &scope)
if err != nil {
status := responsewriters.ErrorToAPIStatus(err)
trace.Step("Writing raw JSON response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - writing response is many times dominating factor - I completely agree that.

I completely wasn't clear what I wanted to say - sorry for that.
My point was that - the time between "negotiating output media type" and this step should be negligible.

So given that your point is completely correct, I think what I wanted to say is to:

  • leave this one (to see whether we spent time here)
  • but remove the "negotiating output media type" step (since different between this one and that should be negligible).

Does it make sense now?

_, info, err := negotiation.NegotiateOutputMediaType(req, metainternalversion.Codecs, &scope)
if err != nil {
scope.err(err, w, req)
return
}
encoder := metainternalversion.Codecs.EncoderForVersion(info.Serializer, metav1beta1.SchemeGroupVersion)
trace.Step(fmt.Sprintf("Serializing response as type %s", info.MediaType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point here (sorry for being completely unclear on that).

I completely agree that this Step has value.
I just suggest removing those previous two ones (that should be negligible):

  • trace.Step("Getting object kind, setting GVK")
  • trace.Step("Renegotiating internal version")

@@ -95,12 +102,14 @@ func transformResponseObject(ctx context.Context, scope RequestScope, req *http.
}

// renegotiate under the internal version
trace.Step("Renegotiating internal version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now inline with the previous comments - so please drop this and the one mentioned below.

Added tracing for use cases where etcd is not the cause of long running
requests.
Fixed spelling.
Factored in Wojtek-t feedback.
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 29, 2018
@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2018
@k8s-ci-robot k8s-ci-robot merged commit 833dcc2 into kubernetes:master Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Insufficient trace on long running GET request
6 participants