Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trace from k/k repo #65

Merged
merged 14 commits into from
Jan 24, 2019
Merged

Conversation

dims
Copy link
Member

@dims dims commented Jan 23, 2019

per discussion in kubernetes/kubernetes#73209

sttts and others added 12 commits January 29, 2017 21:41
Automatic merge from submit-queue (batch tested with PRs 41597, 42185, 42075, 42178, 41705)

stop spamming logs on restart of api server

**What this PR does / why we need it**:
This PR reduces an excessive amount of log spam on startup of any API server with larger amounts of content.  We have experienced server restart times ~5 minutes that is a consequence of us tracing each directory in etcd during our initial decoding.  This spams the journald process, and results in much of those messages getting rate limited and dropped, and making actual problems much harder to debug.  This stops us logging discrete keys in favor of just knowing that an operation was slow for the particular object type (which is enough information!)

this PR also makes it so we know what trace operation was ended in our log messages!

sample log now:

```
Trace "decodeNodeList *[]api.PolicyBinding" (started 2017-02-20 22:20:33.98771112 +0000 UTC)
Decoded 10718 nodes
decodeNodeList *[]api.PolicyBinding [500ms] END
```
Automatic merge from submit-queue (batch tested with PRs 47484, 47904, 48034)

prioritize messages for long steps

This pull prioritizes the trace messages, so steps that are unusually large come out at the info level and all details come out a v(4) level.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
…or-klog-take-2

Update cadvisor and other repos for klog
fix golint problem of volume cephfs/iscsi/nfs
fluentd-elasticsearch - update fluentd to 1.3.3 & added filter_parser for json fields in es
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 23, 2019
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 23, 2019
@dims
Copy link
Member Author

dims commented Jan 23, 2019

/override cla/linuxfoundation

Change-Id: I8d2ad6e80131048441b73c90d5cf9b1f40c69f4e
Change-Id: I02c0fb44ad09876a33b84d8ec325d97c9894232a
@mcrute
Copy link
Contributor

mcrute commented Jan 23, 2019

It would be nice to have some tests for this but since it's just an extraction...

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, mcrute

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mcrute
Copy link
Contributor

mcrute commented Jan 23, 2019

Looks like only an admin can override. Ping @thockin @liggitt

@dims
Copy link
Member Author

dims commented Jan 23, 2019

/override cla/linuxfoundation

@k8s-ci-robot
Copy link
Contributor

@dims: Overrode contexts on behalf of dims: cla/linuxfoundation

In response to this:

/override cla/linuxfoundation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 23, 2019
@k8s-ci-robot k8s-ci-robot removed the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 23, 2019
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 23, 2019
@wojtek-t
Copy link
Member

Thanks @dims !

@krzysied - FYI

@dims dims merged commit ca2bd90 into kubernetes:master Jan 24, 2019
@dims dims mentioned this pull request Jan 24, 2019
for _, step := range t.steps {
stepDuration := step.stepTime.Sub(lastStepTime)
if stepThreshold == 0 || stepDuration > stepThreshold || klog.V(4) {
buffer.WriteString(fmt.Sprintf("Trace[%d]: [%v] [%v] %v\n", tracenum, step.stepTime.Sub(t.startTime), stepDuration, step.msg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DirectXMan12 This is an "interesting" use of logs - how do we think we would convert this to logr-style?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not entirely clear to me why it's written to a buffer first, except perhaps performance. If we remove that constraint, it looks fairly straightforward:

log.Info("trace", "trace id", tracenum, "since start", step.stepTime.Sub(t.startTime), "step duration", stepDuration, "message", step.msg)

or better yet

log := log.WithValues("trace id", tracenum)
log.Info("trace", "since start", step.stepTime.Sub(t.startTime), "step duration", stepDuration, "message", step.msg)

Individual traces can easily be reconstituted by filtering on "id" == ${TRACE_ID} or somesuch.

In the long term? This is probably a good candidate for actual tracing ;-).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's at least mildly unclear what the motivation for this package is though, except to reduce logging throughput though -- it seems like a decent amount of this could be replaced with "normal" logging plus structured filtering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DirectXMan12 this is extremely useful especially in the performance scaling tests. A variation of this helped me find the nil panic propagation problem ( see https://github.com/kubernetes/kubernetes/pull/72820/files )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amount of logs is one example. If something is happening 100 millions of times but it's only 100 times it's relatively slow, I don't want to see 100 millions of logs - I want to have 100.

Yes - once we have proper tracing, we probably should at least consider replacing those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.