ensure logging is consistent policy- and mechanism-wise #35

phritz · 2020-04-15T00:29:13Z

Background:

syslog-derived log levels are a well-established historical norm eg https://reflectoring.io/logging-levels/
however golang folks make a compelling argument for a simpler approach: https://dave.cheney.net/2015/11/05/lets-talk-about-logging
replicache client and diff-server use the golang 'log' package and use a small utility to initialize it https://github.com/rocicorp/diff-server/blob/a99baedd298425ab933e2c0cb62c0b393b371161/util/log/log.go#L30
repm accepts a custom logger that the app can use to direct logs to a specific place: https://github.com/rocicorp/replicache-client/blob/348828de6b411421ab377a1eed224d19d0a76600/repm/repm.go#L41
the diff-server stores the body of /pull requests to noms and logs the hash https://github.com/rocicorp/diff-server/blob/a99baedd298425ab933e2c0cb62c0b393b371161/serve/pull.go#L184
noms uses 'log' directly and also has a verbose option https://github.com/attic-labs/noms/blob/master/go/util/verbose/verbose.go that uses 'log' which can be enabled programmatically (it is enabled in the diff-server https://github.com/rocicorp/diff-server/blob/a99baedd298425ab933e2c0cb62c0b393b371161/serve/service.go#L73)
zeit has intentionally limited support for logging https://zeit.co/blog/refined-logging. logs are only persisted for requests that fail (500, etc). the only way to see other logs is to be actively tailing the instance's log either via now logs -f, the api https://zeit.co/docs/api#endpoints/logs/stream-serverless-function-logs, or the web interface. and only the latest 4kb is visible on the web interface. more info: https://zeit.co/docs/v2/platform/deployments#runtime
in order to meet user logging needs, zeit pushes heavily for users to enable integration with one of several cloud logging services https://zeit.co/integrations?category=logging
lots of interesting things happen even during a 200 from the diffserver so we need to enable one of these integrations so we can debug our deployment environments

Problem:

when running esp in production we want to have a clear signal that something truly unexpected has gone wrong and a developer should look at the problem. often by convention this signal is an ERROR log line. we don't have a consistent policy or mechanism for making this signal. (note that what we do with an ERROR can be context dependent. maybe in a server in production we log it and have an alert notify us. or when in development of the client we panic as a way to require the issue not be ignored.)
similarly, since we haven't had the need for conventions around logging there is no way to tune how spammy or what kind of log information is desired beyond the noms verbose setting. or even to know what is or is not appropriate to log. some conventions would be helpful here, as well as mechanisms. for example we may want to log verbosely in the client for development but only log errors in production so we don't fill up their phone. or we might want logs to normally be quiet to save money but to be able to enable spewing debugging information while we're tracking down a problem.
noms as a library takes an opinion about how logging should happen: it says logging should use golang's log package and callers can't change that. if a consumer of noms wants to log differently, say using a more featureful logger, a structure dlogger, or adding some additional context to log lines, it can't because there is no mechanism. concretely if say replicache wanted to annotate log lines with a requestid, it can't do that to noms lines, they're inaccessible to the caller. (libraries often solve this problem by defining the log interface they expect and enabling callers to pass in a thingy that maches.)

I think it's also important to look down the road at problems that are likely coming soon on the server side:

when using a cloud logging service like we must with zeit...
- these services often work better if our server emits structured logs (eg, json). this makes searching the logs easier because the logging service doesn't have to be taught how to parse log lines and it can infer data types. for example a structured json log field emitted as an integer is typically immediately available for range queries without having to tell the log service to parse out /lastMutationId: (\d+)/ and treat $1 as an Integer. (note that IF we had structured logging this doesn't mean we would HAVE to use structured logs everywhere, eg when developing locally; it's easy enough to change log formats depending on environment)
- we are likely to want to include meta information in log lines that enables us to aggregate across requests eg the accountid or path requested, or to correlate across process boundaries eg a requestid enabling us to locate the log lines from the server from a particular client request. it is annoying and error prone to try to include this information manually in each log line so often a context logger is created for the duration of a request that automatically annotates log lines with this information.
we will likely want to start tracking timing of operations and suboperations. this is often easiest to roll into something like a context logger, though there are many strategies.

High-level Proposal

Let's create a logging policy and just a little more structure to give us a place to add/tweak logging behavior that we know is coming. Let's do this in a way that:

is as simple as possible and easy to use
is easily satisfiable by the plain vanilla golang 'log'
introduces only small changes to the code
gives us a place to to put logging logic should we need it

Sketch of a Proposal

Establish three log levels:
- ERROR: something truly unexpected has happened and a developer should go look. Example: https://github.com/rocicorp/diff-server/blob/a99baedd298425ab933e2c0cb62c0b393b371161/serve/service.go#L80
- INFO: an important change of state or information that is immediately useful to the developer. Example: https://github.com/rocicorp/diff-server/blob/a99baedd298425ab933e2c0cb62c0b393b371161/cmd/diffs/main.go#L127
- DEBUG: verbose information about what's happening that might be useful to a developer. Example: https://github.com/rocicorp/diff-server/blob/a99baedd298425ab933e2c0cb62c0b393b371161/serve/pull.go#L174

Having the ERROR level makes it easy to do something with this important signal. Having INFO makes it easy to know what to show by default. Having DEBUG makes it possible to spew lots of information when you need it. (We probably log at DEBUG level by default until things are stable.)

There are obviously some grey areas but I'm confident we can work them out.

Expand https://github.com/rocicorp/diff-server/blob/master/util/log/log.go to have three new public functions that our code uses to log. These functions have the signature of Printf and are thus easily satisfiable by golang's log.Printf:

func Error(format string, v ...interface{}) {...}
func Info(format string, v ...interface{}) {...}
func Debug(format string, v ...interface{}) {...}

For now all they do is write to golang's log.Printf with a "ERROR"/"INFO"/"DEBUG" prefix.

Convert replicache client and diffserver to use rlog.Error/Info/Debug consistent with the policy above.
In order to get consistent log output (eg that has the appropriate annotations) I think it is probably also worthwhile to convert noms to use these interfaces (and maybe for rlog to live there). If we don't want to change the output of noms for some reason we can always do this transparently, eg by configuring noms rlog to write all of error/info/debug directly to log.Printf without a prefix. Consumers of noms should be able to easily pass their own logger in if they want to control what it does (eg if they use a more feature logging library it should be easy to wrap it and pass it in for noms to use).

Additional Info

at some point we might want a way to get logs from clients in the field
this issue is somewhat related to Followup replicache#34

The text was updated successfully, but these errors were encountered:

Addresses rocicorp/replicache#35 for the flutter repo.

phritz mentioned this issue Apr 16, 2020

full launch scratch pad big list #40

Closed

aboodman mentioned this issue Apr 16, 2020

Do something about spamminess of log output #41

Closed

phritz added this to the Public Announcement 1 milestone Apr 20, 2020

aboodman self-assigned this May 13, 2020

aboodman referenced this issue in rocicorp/replicache-sdk-flutter May 14, 2020

Add basic leveled logging.

376a636

Addresses rocicorp/replicache#35 for the flutter repo.

aboodman mentioned this issue May 14, 2020

Add basic leveled logging rocicorp/replicache-sdk-flutter#53

Merged

aboodman referenced this issue in rocicorp/replicache-sdk-flutter May 14, 2020

Add basic leveled logging.

09856b0

Addresses rocicorp/replicache#35 for the flutter repo.

aboodman closed this as completed May 14, 2020

aboodman referenced this issue in rocicorp/replicache-sdk-flutter May 14, 2020

Add basic leveled logging.

1da33e5

Addresses rocicorp/replicache#35 for the flutter repo.

aboodman mentioned this issue Jul 14, 2020

Logging infrastructure rocicorp/repc#49

Closed

arv mentioned this issue Mar 31, 2021

Introduce logLevel to ReplicacheOptions rocicorp/replicache#347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure logging is consistent policy- and mechanism-wise #35

ensure logging is consistent policy- and mechanism-wise #35

phritz commented Apr 15, 2020 •

edited

Loading

ensure logging is consistent policy- and mechanism-wise #35

ensure logging is consistent policy- and mechanism-wise #35

Comments

phritz commented Apr 15, 2020 • edited Loading

phritz commented Apr 15, 2020 •

edited

Loading