-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add trace from k/k repo #65
Conversation
Automatic merge from submit-queue (batch tested with PRs 41597, 42185, 42075, 42178, 41705) stop spamming logs on restart of api server **What this PR does / why we need it**: This PR reduces an excessive amount of log spam on startup of any API server with larger amounts of content. We have experienced server restart times ~5 minutes that is a consequence of us tracing each directory in etcd during our initial decoding. This spams the journald process, and results in much of those messages getting rate limited and dropped, and making actual problems much harder to debug. This stops us logging discrete keys in favor of just knowing that an operation was slow for the particular object type (which is enough information!) this PR also makes it so we know what trace operation was ended in our log messages! sample log now: ``` Trace "decodeNodeList *[]api.PolicyBinding" (started 2017-02-20 22:20:33.98771112 +0000 UTC) Decoded 10718 nodes decodeNodeList *[]api.PolicyBinding [500ms] END ```
Automatic merge from submit-queue (batch tested with PRs 47484, 47904, 48034) prioritize messages for long steps This pull prioritizes the trace messages, so steps that are unusually large come out at the info level and all details come out a v(4) level.
- Move from the old github.com/golang/glog to k8s.io/klog - klog as explicit InitFlags() so we add them as necessary - we update the other repositories that we vendor that made a similar change from glog to klog * github.com/kubernetes/repo-infra * k8s.io/gengo/ * k8s.io/kube-openapi/ * github.com/google/cadvisor - Entirely remove all references to glog - Fix some tests by explicit InitFlags in their init() methods Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
…or-klog-take-2 Update cadvisor and other repos for klog
fix golint problem of volume cephfs/iscsi/nfs
fluentd-elasticsearch - update fluentd to 1.3.3 & added filter_parser for json fields in es
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/override cla/linuxfoundation |
Change-Id: I8d2ad6e80131048441b73c90d5cf9b1f40c69f4e
Change-Id: I02c0fb44ad09876a33b84d8ec325d97c9894232a
It would be nice to have some tests for this but since it's just an extraction... /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, mcrute The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override cla/linuxfoundation |
@dims: Overrode contexts on behalf of dims: cla/linuxfoundation In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
for _, step := range t.steps { | ||
stepDuration := step.stepTime.Sub(lastStepTime) | ||
if stepThreshold == 0 || stepDuration > stepThreshold || klog.V(4) { | ||
buffer.WriteString(fmt.Sprintf("Trace[%d]: [%v] [%v] %v\n", tracenum, step.stepTime.Sub(t.startTime), stepDuration, step.msg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DirectXMan12 This is an "interesting" use of logs - how do we think we would convert this to logr-style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not entirely clear to me why it's written to a buffer first, except perhaps performance. If we remove that constraint, it looks fairly straightforward:
log.Info("trace", "trace id", tracenum, "since start", step.stepTime.Sub(t.startTime), "step duration", stepDuration, "message", step.msg)
or better yet
log := log.WithValues("trace id", tracenum)
log.Info("trace", "since start", step.stepTime.Sub(t.startTime), "step duration", stepDuration, "message", step.msg)
Individual traces can easily be reconstituted by filtering on "id" == ${TRACE_ID}
or somesuch.
In the long term? This is probably a good candidate for actual tracing ;-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's at least mildly unclear what the motivation for this package is though, except to reduce logging throughput though -- it seems like a decent amount of this could be replaced with "normal" logging plus structured filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DirectXMan12 this is extremely useful especially in the performance scaling tests. A variation of this helped me find the nil panic propagation
problem ( see https://github.com/kubernetes/kubernetes/pull/72820/files )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @wojtek-t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amount of logs is one example. If something is happening 100 millions of times but it's only 100 times it's relatively slow, I don't want to see 100 millions of logs - I want to have 100.
Yes - once we have proper tracing, we probably should at least consider replacing those.
per discussion in kubernetes/kubernetes#73209