Revise log levels #295

tiabc · 2017-09-04T11:41:17Z

We have lots of logs and I sometimes see improper logs usage and have some questions about logging while reading tests output, for example.

My suggestion is reduce our log levels down to only a few characterised by their purpose perfectly described by their name:

WakeMeUpInTheMiddleOfTheNight: status-go ERROR, go-ethereum ERROR.
ToInvestigateTomorrow: status-go WARN, go-ethereum WARN.
InTests: status-go DEBUG, go-ethereum INFO.
DebugGeth: status-go DEBUG go-ethereum DEBUG. Sometimes something weird is happening with geth and you don't get the behavior you expect. In this case it's useful to see what's happening under the hood of geth. This log level is only available for viewing logs, not for writing them (so log package doesn't expose DebugGeth method).

About other levels in status-go:

CRIT level should be avoided because it causes uncontrollable application termination. It's also not log's responsibility to terminate the application.
INFO logs should be revised and moved to one of the three levels defined above. Most of the time they're actually DEBUG but sometimes they're WARN.
TRACE and DEBUG are actually the same thing in our case. We don't need such granularity.

PS. Inspired by https://divan.github.io/posts/wakemeupinthemiddleofthenight/ and painful log-digging experience.

This is a proposal rather than a defined specification to work on, let's discuss it.

The text was updated successfully, but these errors were encountered:

influx6 · 2017-09-04T11:53:03Z

I fully agree that logs should never terminate a program. More so, we should have way to direct log output to a file and not just stdout

tiabc · 2017-09-04T12:08:52Z

@influx6 #269

adambabik · 2017-09-04T13:32:08Z

I think we also must ask ourselves a few questions:

What do we need logs for? Is it for devs to make sure the program works correctly during development? Is it for debugging? Is it for metrics?
Why do we want to limit the number of logs? Is it problematic because of the environment our app runs in (devices with limited disk space)? A problem during development where there is too much noise?
How will we compensate reduced number of logs? For instance, if we used them to debug issues, how can we do that after we remove substantial part of them?

tiabc · 2017-09-04T16:03:19Z

Logs are needed for:

To help us debug the application during development.
To help us fix issues with deep integration with ethereum.
To give us background when status-react team faces weird behavior and asks us what's going wrong.
To give us more background what's going wrong when we run tests both manually and on CI.
To help us figure out what's going wrong when somebody sends us a bug report (which should also contain status-go logs).

So, currently, these are just logs: for debugging, testing, integrating and problem solving from production devices.
This proposal only addresses logging, metrics are out of scope for now.

We need it because:

We need to understand importance of every log entry. Should we stop everything and fix it or it can wait.
Our logs are noisy in tests and we want to make them more readable. By the way, I see a shortcoming on usage pattern here. For example, I don't want to see boot nodes addition every time I run tests. It needs investigation.

We're not going to compensate just because we haven't reduced anything. We're going to clearly specify the goal of each log level and use them appropriately. We're also going to coordinate status-go and go-ethereum log levels to set log output to the desired state.

PS. I think this should be documented in case we decide to implement it. This way or something: https://github.com/joelparkerhenderson/architecture_decision_record

adambabik · 2017-09-04T17:09:00Z

Now we have all requirements for the logs, thanks!

The only concern I have is what log-level will be used by status-react? Because you don't want to collect DEBUG logs in production. However, following the proposal from the first comment, we won't have INFO logs in status-go and it's gonna be pretty hard to satisfy

To help us fix issues with deep integration with ethereum.

To give us background when status-react team faces weird behavior and asks us what's going wrong.

and

To help us figure out what's going wrong when somebody sends us a bug report (which should also contain status-go logs).

Also, I don't get this line ragarding DebugGeth:

This log level is only available for viewing logs, not for writing them (so log package doesn't expose DebugGeth method).

So, we won't be able to use log.Debug?

divan · 2017-09-04T23:27:00Z

I totally agree that this problem should be attacked from the perspective of asking proper questions - how do we use logs, who is consumer of the logs, in what situations?

Generally I see logs as a "program asking for human attention". When program prints error message, it means:

Hey, human, something weird here, please intervene.

So here are my thoughts on important aspects of logging for us:

Log levels

Error log level

Something, like, "disk is out of space" (human need to increase disk space) or "failed to serialize data" (probably some error in the code). Requires human attention ASAP.

Info log Level

Info log level is something program prints just to make nervous developer calm and ensure that program works as expected - like"Node boot up", "Connected to the remote service succesfully". But this is expected behaviour. It should not require attention from human all the time. As @tiabc mentioned, seeing node addition all the time is useless, even harmful due to polluting test output.

So, yes, Info logs should be optional, as they're almost often falls into the Debug bucket. One exception could be added is initialization and finish of program - they could be nice as a timestamp marks of service start/stop.

Debug log level

This is more complicated. Again, in many cases you don't need a lot of debug output, because during debug session in Go it's often faster to add Println() output in the desired place, run test and remove. Thanks to fast compilation time it's really possible. But in our case we often need to go through change-crosscompile-rebuild-mobile-app-run loop which is not that fast as native go edit-compile-run loop (~5 minutes vs ~5 seconds), so we do need debug output.

Now, debug output is usually really noisy, so we may want to introduce different "labels" or "filters" - what to debug exactly. For example, if problem related to nodes communication - enable only logging related to this part. If it's related JS execution - enable only these logs, etc. And we might want to even change these in a runtime (via signals or other mechanism).

But I think we just need to classify real problems we encounter that need log analysis and work on logs to help with those specific cases. Also we need to provide convinient and easy way to enable/disable debug mode (via env variables, or whatever works better with mobile emulators or real devices).

Warning log level

This one could be safely removed. Warning is something between Error and Info, it's when program trying to say - "Hey, human, I have no idea whether this behaviour is ok or it's an error, so you decide". I think this is confusing for everyone. When I see warning like "WARN: File X not found" I have no idea if it's a problem or not. You tell me :)
I think we don't need Warn log level at all.

The same goes for Crit. Crit is just Error.

Tests

Tests should not produce any log output by default. Ideally tests should give only fail/pass output, perhaps coverage. If particular test is used for debugging purposes we need to use debug enabling mechanism (env variables for example). But default should be "no output".

Log format

The next question is a log format. Logs may be read by humans or by machines (elasticsearch or other solutions). In our case log consumers are humans, so we need to adapt logs for them (us). Structured logging is a bad choice as it hurts readability.

As we're going to consume logs via terminal, one of the consideration is the length of the message. Now we have a lot of long lines, like:

DEBUG[09-05|00:37:43] Removing p2p peer                        id=482484b9198530ee conn=staticdial duration=742.114ms peers=0  req=false err="client quitting"

In a terminal those messages got wrapped, making logs unusable without using file and opening them in editor. To be honest, I have ultrawide 29" monitor and I can see those lines perfectly when I stretch terminal to the full width, but that's too demanding :) I'd love to have logs usable in a typical terminal

When and how to write logs

This is also important and related to error handling. There are two approaches with logging errors:

log messages as much lower down the stack at the point where error or logged event occured
log in a wrapper handler somewhere up the stack
Both approaches, obviously, have their cons and pros.

Logging down the stack:
✅ helps readability and locality, you know exactly that this error/event will be logged, just by looking and code.
✅ allows utilizing runtime.Caller method to "inject" file:line number into log message (more on that later).
🚫 could be too verbose, especially where many repetitive calls

Logging up the stack:
✅ decreases number of lines for logging
✅ changing log/output requires change in less places
🚫 could be hard to adapt for specific errors (often ends up writing switch cases for different errors)
🚫 need additional instrumenting of errors (storing stacktrace, file:num info, etc - +performance overhead), using external packages like https://github.com/pkg/errors

I'd prefer first approach, but it really depends on the codebase.

Stacktraces

Stacktraces could be useful, but I'd prefer avoid them where we can. For example, error log message:
2017/09/05 00:54:16 [ERROR] geth/jail/jail.go:45 Failed to create cell: chatID validation failed
is much more efficient for us developers than huge stack trace in which we interested in one line anyway.

If code is not too complicated and design is nice and clear, it's often enough just file:number. For huge codebases with many levels of abstractions it may be more useful to print stacktraces though.

Simplicity of usage

Finally, there is a question of simplicity of log usage. Logging is very basic and ubiquitous operation and should have as little overhead in usage as possible.

I like logs being encapsulated as a dependencies of structs/objects, but having node.Logger().Debug() everywhere is too verbose, plus need to deal with proper log initialization and configuration in tests/constructors is often complcated. So, from this perspective, using global logger or common package log is more than justified; not without its own drawbacks, though.

One of the approaches I love to use in relatively small projects - is just using the stdlib log package.
It has zero entry barier, you don't have to learn new API (remembering if it's Error or Errorf, for example) and, actually, quite flexible and easy to work with.

For example, to add "[ERROR]" log level to be printed in red color along with file:linenumber info, you can create custom log writer for stdlib log:

$ cat log.go
package main

import (
	"fmt"
	"io"
	"os"
	"path"
	"runtime"
	"strings"
)

var (
	reset = string([]byte{27, 91, 48, 109})
	red   = string([]byte{27, 91, 57, 55, 59, 52, 49, 109})
)

type LogWriter struct {
	path string // exec path, used for stripping absolute file path in log messages
	out  io.Writer
}

func NewLogWriter() *LogWriter {
	dir, err := os.Executable()
	if err == nil {
		dir = path.Dir(dir) + "/"
	}
	return &LogWriter{
		path: dir,
		out:  os.Stderr,
	}
}

func (w *LogWriter) Write(data []byte) (int, error) {
	if len(data) > 7 && string(data[:7]) == "[ERROR]" {
		_, file, line, _ := runtime.Caller(3)
		str := fmt.Sprintf("%s[ERROR] %s:%d%s:%s", red, strings.TrimPrefix(file, w.path), line, reset, data[7:])
		return w.out.Write([]byte(str))
	}

	return w.out.Write(data)
}

at the program initialization you can do the following:

log.SetOutput(NewLogWriter())

and then using as simple as:

import "log"
...
log.Println("[ERROR] Failed to connect to service", err)

which will produce folllowing:

which is quite nice and readable and enough in many cases.

Again, I'm not rooting for this approach, but showing that it's possible to have nice easy to use levelled logging without using bloated logging libraries.

--

And as a final note to this unexpectedly large comment, here is the real log levels distributions for status-go :)

$ grep -rI "log\." geth/ cmd/ helpers/ | cut -d':' -f2 | tr -d '\t'  | grep -e '^log\.' | cut -d'(' -f1 | sort | uniq -c | sort -r
  83 log.Info
   9 log.Error
   5 log.Warn

adambabik · 2017-09-05T07:07:59Z

Regarding the log format, we can have different formats depending where the program is run. The current format is great for terminals but might not be the best in production, especially, if we would like to export logs to other systems in order to analyze them.

When and how to write logs

I think this topic is quite important, to have some guidelines and make the codebase more consistent.

One more technique I used when writing node.js programs was log categories. When running in a debug mode, I could use env like DEBUG=db/* to prints logs only from a given domain, in this case related to the database.

influx6 · 2017-09-07T14:57:52Z

Am not sure how everyone feels about structured logging (see https://medium.com/@tjholowaychuk/apex-log-e8d9627f4a9a).

What if we can introduce a structured logger but then having specific sinks where we rendered specific errors for human reading, more so, we can completed ignore certain logs that may not be suitable for use but needed for metrics or consumption over some external service.

Just a suggestion though :)

divan · 2017-10-31T00:53:03Z

Doing some investigation task, which involved understanding of a code and running app, I used today trace log entries a lot, but the experience was quite different from how it's assumed to work. Usual understanding of the tracing/logging workflow is:

enable TRACE loglevel via -log TRACE flag
run debug/test session and analyze the output

The obvious problem here is that TRACE loglevel produces extremely high-traffic log output, almost immediately overflowing terminal buffer, leaving the only option — log to file (via -logfile) and analyze/grep in retrospective. While it may be a good option for some cases, it's still a cumbersome process — open a new terminal tab to open a log, searching through vast amount of logs into the single line of interest, figuring out how to grep it and so on.

In order to search exact line of interest, you should go to the code and take a look at that tracing line anyway. So what I was doing instead:

I set loglevel to ERROR
replaced log.Trace() to log.Error() in a places that I was interested in
rebuilt the binary
launched the session and analyzed the output which contained exactly log lines what I was interested in

And it was much faster and cheaper than struggle through thousands of high-traffic irrelevant tracing log lines.
In a few cases I was just adding new lines in the places where I need it, running sessions, analyzing the output and removing them.

I was also thinking in using some of the debuggers, but that was not an attempt to "understand a piece of code", rather to see some events happening in real world situation and see the bigger picture (like start/stop peer handling functions, etc).

Considering this, I see no point of having Trace log level at all (at least in our code). We don't use it for now, and loglevels are mostly for compatibility with geth logger, but in context of this discussion I guess it proves the point of getting rid of useless log levels.

pedropombeiro · 2017-12-28T11:30:30Z

One way to reduce noise or find interesting things in logs would be to use a capable log viewer that affords filtering and highlighting of entries on your live log automatically. E.g.:

As long as the device doesn't get bogged down with writing the log entries, this seems like a good compromise in not requiring code changes, but still allowing to easily surface information of interest. This particular log viewer even allows sharing useful highlight rules within a team, so that others can benefit from a fine-tuned collective configuration (it's Windows-only unfortunately, but there should be Linux alternatives, or redirecting console output to a shared file).
It also helps with post-mortem log analysis, as mentioned by @adambabik.

One question by the way: is there an automated way to generate an archive of all the app logs from the app UI, or on the event of a crash (status-react, status-go and go-ethereum log files), so that the team can receive bug reports from the field in a consistent fashion?

divan · 2018-01-31T23:38:45Z

@azer thanks, but our log usage patterns are different.

Basically, we use logs in two ways:

Running statusd as a node, usually locally on the laptop, to run checks, verify ideas, etc. Those sessions are short-lived and usually, log is used to track the chain synchronization, debug/trace output from geth sometimes.
Running status-go as a part of a mobile app with log enabled for debug sessions. It's very rarely used to debug some status-react/status-go bugs/issues, and, again, those are short-lived. That's where we use log output to file because it's hard to get raw stdout output from mobile emulators/devices.

That said, most of our log usage cases are short-lived debug sessions. As we're using Go, for debug sessions, we can instrument part of the code that is being debugged with temporary log print calls, so no need to have advanced leveling or filtering for the log. That approach works best for long running servers and retrospective bugs investigation, but not in this case.

In the context of "revising log levels", as I wrote before, I would get rid of "WARN" levels, and kept only "INFO" (just normal print) and "ERROR" levels.

status-github-bot · 2018-09-24T15:04:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

status-github-bot · 2018-12-24T07:48:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

status-github-bot · 2018-12-31T07:49:39Z

This issue has been automatically closed. Please re-open if this issue is important to you.

tiabc added intermediate labels Sep 4, 2017

tiabc added medium-priority and removed low-priority labels Sep 4, 2017

tiabc added this to the Beta milestone 3 of 6 milestone Sep 4, 2017

influx6 mentioned this issue Sep 22, 2017

Make it possible to analyse which messages were lost or resent in the test environment #325

Closed

divan mentioned this issue Oct 17, 2017

Refactoring: improve error handling #287

Closed

influx6 mentioned this issue Oct 19, 2017

Handle Message Delivery logging and processing #392

Closed

3 tasks

tiabc added proposal and removed intermediate labels Oct 25, 2017

tiabc removed this from the 0.9.12 milestone Oct 25, 2017

divan mentioned this issue Mar 2, 2018

Use go-ethereum/log in status-go #717

Closed

status-github-bot bot added stale and removed stale labels Sep 24, 2018

status-github-bot bot added the stale label Dec 24, 2018

status-github-bot bot closed this as completed Dec 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise log levels #295

Revise log levels #295

tiabc commented Sep 4, 2017

influx6 commented Sep 4, 2017 •

edited

Loading

tiabc commented Sep 4, 2017

adambabik commented Sep 4, 2017

tiabc commented Sep 4, 2017

adambabik commented Sep 4, 2017 •

edited

Loading

divan commented Sep 4, 2017

adambabik commented Sep 5, 2017

influx6 commented Sep 7, 2017

divan commented Oct 31, 2017 •

edited

Loading

pedropombeiro commented Dec 28, 2017

divan commented Jan 31, 2018

status-github-bot bot commented Sep 24, 2018

status-github-bot bot commented Dec 24, 2018

status-github-bot bot commented Dec 31, 2018

Revise log levels #295

Revise log levels #295

Comments

tiabc commented Sep 4, 2017

influx6 commented Sep 4, 2017 • edited Loading

tiabc commented Sep 4, 2017

adambabik commented Sep 4, 2017

tiabc commented Sep 4, 2017

adambabik commented Sep 4, 2017 • edited Loading

divan commented Sep 4, 2017

Log levels

Error log level

Info log Level

Debug log level

Warning log level

Tests

Log format

When and how to write logs

Stacktraces

Simplicity of usage

adambabik commented Sep 5, 2017

influx6 commented Sep 7, 2017

divan commented Oct 31, 2017 • edited Loading

pedropombeiro commented Dec 28, 2017

divan commented Jan 31, 2018

status-github-bot bot commented Sep 24, 2018

status-github-bot bot commented Dec 24, 2018

status-github-bot bot commented Dec 31, 2018

influx6 commented Sep 4, 2017 •

edited

Loading

adambabik commented Sep 4, 2017 •

edited

Loading

divan commented Oct 31, 2017 •

edited

Loading