Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use logrus for structured logging #416

Merged
merged 4 commits into from
Mar 1, 2018
Merged

Use logrus for structured logging #416

merged 4 commits into from
Mar 1, 2018

Conversation

ankushagarwal
Copy link

@ankushagarwal ankushagarwal commented Feb 27, 2018

  • Move from glog to sirupsen/logrus for logging
  • Add a new flag json_log_format
  • Refactor all log statements to use logrus
  • Use Info level for log level V(1) and below
  • Use Debug level for log level V(2) and above
  • Tested locally

Addresses #24

Sample logs

{"filename":"app/server.go:54","level":"info","msg":"EnvKubeflowNamespace not set, use default namespace","time":"2018-02-27T18:25:18-08:00"}
{"filename":"app/server.go:59","level":"info","msg":"[Version: 0.3.0+git Git SHA: Not provided. Go Version: go1.9.3 Go OS/Arch: darwin/amd64]","time":"2018-02-27T18:25:18-08:00"}
{"filename":"app/server.go:145","level":"info","msg":"No controller_config_file provided; using empty config.","time":"2018-02-27T18:25:18-08:00"}
{"filename":"controller/controller.go:110","level":"info","msg":"Setting up event handlers","time":"2018-02-27T18:25:18-08:00"}

This change is Reviewable

@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2018

nit: Could you include some sample log output in the PR description?

@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2018

Does the log output include line number or just filename?

@ankushagarwal
Copy link
Author

Added sample log output.

The log output includes line numbers as well.

@gaocegege
Copy link
Member

gaocegege commented Feb 28, 2018

Thanks for your contribution 🎉

There is two concerns:

@gaocegege
Copy link
Member

/retest

@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2018

/assign jlewi
/assign @gaocegege

@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2018

@gaocegege That's a good point about vendor and generated code. I don't think we should try to update those packages manually; that seems like a burden to maintain long term.

I think it should be fine to intersperse json and not json log entries. Hopefully logging backends are sophisticated enough to deal with this.

The motivation for using structured logging is so that we can add fields to the log from our code so that we can easily search for logs pertaining to a particular job, replica, index, etc... I think this will really help us to debug issues.

A lot of the bugs we get are of the form I expected X to happen for job Y but it did not. So we'd like to know what the operator was doing for job Y. But if a lot of jobs are running finding the relevant logs for a particular job can be diffcult.

@ankushagarwal Can you open up a follow on issue to add such labels to the operator.

@ankushagarwal
Copy link
Author

Created #424

* Move from glog to sirupsen/logrus for logging
* Add a new flag json_log_format
* Refactor all log statements to use logrus
* Use Info level for log level V(1) and below
* Use Debug level for log level V(2) and above

Addresses #24
@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2018

I'd suggest resyncing to see if that fixes the tests.
/lgtm
/hold
Lets wait for @gaocegege to review

@ankushagarwal
Copy link
Author

/retest

@gaocegege
Copy link
Member

I will have a try and submit my review today.

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankushagarwal

There are somethings:

  • Code in pkg/clientset is changed in the PR, and I think we should not do it manually, so please revert the changes by ./hack/update-codegen.sh
  • tf-operator -version returns some useless logs
./tf-operator -version         
{"filename":"app/server.go:54","level":"info","msg":"EnvKubeflowNamespace not set, use default namespace","time":"2018-03-01T09:57:27+08:00"}
{"filename":"app/server.go:59","level":"info","msg":"[Version: 0.3.0+git Git SHA: Not provided. Go Version: go1.8.3 Go OS/Arch: linux/amd64]","time":"2018-03-01T09:57:27+08:00"}
Version: 0.3.0+git
Git SHA: Not provided.
Go Version: go1.8.3
Go OS/Arch: linux/amd64

It is ok to ignore the second one since it is not in high priority.

@gaocegege
Copy link
Member

gaocegege commented Mar 1, 2018

I have observation to share with you @ankushagarwal @jlewi , I am not sure how to deal with it.

Now we use flag package to support command line flags, and glog also uses it by default. Then you can see our binary have more flags than we thing although we use logrus instead of glog:

➜  tf-operator git:(416) ✗ ./tf-operator -h               
Usage of ./tf-operator:
  -alsologtostderr
    	log to standard error as well as files
  -chaos-level int
    	DO NOT USE IN PRODUCTION - level of chaos injected into the TFJob created by the operator. (default -1)
  -controller-config-file string
    	Path to file containing the controller config.
  -gc-interval duration
    	GC interval (default 10m0s)
  -json-log-format
    	Set true to use json style log format. Set false to use plaintext style log format (default true)
  -log_backtrace_at value
    	when logging hits line file:N, emit a stack trace
  -log_dir string
    	If non-empty, write log files in this directory
  -logtostderr
    	log to standard error instead of files
  -stderrthreshold value
    	logs at or above this threshold go to stderr
  -v value
    	log level for V logs
  -version
    	Show version and quit
  -vmodule value
    	comma-separated list of pattern=N settings for file-filtered logging

There are some pros and cons:

  • We can support vendor and client's glog, since we have glog's flags
  • But the users may be confused since the tf-operator outputs logs regardless of the flag -logtostderr

We could discuss it in the issue, not in the PR. After the client is regenerated the PR LGTM

@ankushagarwal
Copy link
Author

@gaocegege I have changed pkg/client/clientset/versioned/clientset.go which does not seem to have auto generated. Also looks like ./hack/update-codegen.sh is broken

I have fixed the -version logging issue.

@gaocegege
Copy link
Member

/approve

@ankushagarwal

pkg/client is all auto-generated by kubernetes/code-generator.

update-codegen.sh works for me, it seems that the script uses the wrong code-generator in kubernetes repo.

But no worry about it, I can open a PR to fix the clientset.

Thanks for your awesome contribution and patience! The log problem has been bothering us for long time.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege, jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gaocegege
Copy link
Member

gaocegege commented Mar 1, 2018

/retest

I am not sure why travis ci forgot to test the PR, maybe it is tired.

screenshot from 2018-03-01 10-48-19

Interesting....

@gaocegege
Copy link
Member

@ankushagarwal

Sorry, the CI failed because we use "github.com/sirupsen/logrus" but the vendor's logrus is "github.com/onrik/logrus"

Could you fix it?

@ankushagarwal
Copy link
Author

@gaocegege : Can you link me to the failure?

We use both sirupsen/logrus and onrik/logrus

@ankushagarwal
Copy link
Author

Added sirupsen/logrus as a glide dependency - hopefully build should pass now

@ankushagarwal
Copy link
Author

/retest

@gaocegege gaocegege merged commit 5d0d126 into kubeflow:master Mar 1, 2018
@ankushagarwal ankushagarwal deleted the logrus branch March 1, 2018 05:07
jimexist pushed a commit to jimexist/tf-operator that referenced this pull request Mar 7, 2018
* Move from glog to sirupsen/logrus for logging
* Add a new flag json_log_format
* Refactor all log statements to use logrus
* Use Info level for log level V(1) and below
* Use Debug level for log level V(2) and above
* Tested locally

Addresses kubeflow#24

Sample logs
```
{"filename":"app/server.go:54","level":"info","msg":"EnvKubeflowNamespace not set, use default namespace","time":"2018-02-27T18:25:18-08:00"}
{"filename":"app/server.go:59","level":"info","msg":"[Version: 0.3.0+git Git SHA: Not provided. Go Version: go1.9.3 Go OS/Arch: darwin/amd64]","time":"2018-02-27T18:25:18-08:00"}
{"filename":"app/server.go:145","level":"info","msg":"No controller_config_file provided; using empty config.","time":"2018-02-27T18:25:18-08:00"}
{"filename":"controller/controller.go:110","level":"info","msg":"Setting up event handlers","time":"2018-02-27T18:25:18-08:00"}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants