A human readable approach to make parameters from an actual log message machine-readable.
Disclaimer: sllm is not a logging library. Its a concept to make log messages human- and machine-readable at the same time. There are some examples in the Go reference docs. For a Go logging lib that uses sllm see qblog ≥v0.13.0.
Usage:
import "git.fractalqb.de/fractalqb/sllm/v3"
Logging is subject to two conflicting requirements. Log entries should be understandable and easy to read. On the other hand, they should be able to be reliably processed automatically, i.e. to extract relevant data from the entry. Current technologies force a decision for one of the two requirements.
To decide for either human readability or machine processing means that significant cutbacks are made to the other requirement. sllm picks up the idea of "markup", which is typically intended to bring human readability and machine processing together. At the same time, sllm remains simple and unobtrusive—unlike XML or even JSON.
Let's take the example of a standard log message that states some business relevant event and might be generated from the following pseudocode:
…
// code that knows something about transaction:
DC.put("tx.id", tx.getId());
…
// down the stack: an actual business related message:
log.Info("added {} x {} to shopping cart by {}", 7, "Hat", "John Doe");
…
// finally(!) when climbing up the stack:
DC.clear();
…
Besides the message the log entry contains some standard fields
time
, thread
, level
and module
and some Diagnostic Context
information to give a more complete picture.
How would the output of such a scenario look like? – Either one gets a visually pleasant message that is rather good to read, or you get output that is easy to be consumed by computers. Currently, one has to make a choice. But, can't we have both in one?
Note that sllm focuses on the log message. We do not propose how to
handle other pieces of information! So lets have a look what sllm would do
with some decent log formats. We use ↩/↪
only to avoid very long lines on this
page. In a real log, everything would be on a single line!
2018-07-02 20:52:39 [main] INFO sllm.Example - added 7 x Hat to ↩
↪ shopping cart by John Doe - tx.id=4711
This message is nice to read for humans. Especially the message part is easy to understand because humans are used to gain some understanding from natural language. However, the relevant parameters— i.e. the number of items, which type of item and the respective user—is not easy to extract from the text by machines. Even if you do, simple changes of the text template can easily break the mechanism to extract those parameters.
With sllm message:
2018-07-02 20:52:39 [main] INFO sllm.Example - added `count:7` x ↩
↪ `item:Hat` to shopping cart by `user:John Doe` - tx.id=4711
The sllm'ed format is still quite readable but lets one reliably identify the business relevant values.
time=2018-07-02T20:52:39 thread=main level=INFO module=sllm.Example ↩
↪ ix.id=4711 number=7 item=Hat user=John_Doe tag=fill_shopping_cart
The logftm page itself states that the human readability of logftm is far from perfect and encourages the approach to include a human-readable message with every log line:
time=2018-07-02T20:52:39 thread=main level=INFO module=sllm.Example ↩
↪ ix.id=4711 msg="added 7 x Hat to shopping cart by John Doe" ↩
↪ number=7 item=Hat user=John_Doe tag=fill_shopping_cart
Once you find the message by skimming the log entry its meaning is not subject of personal interpretation of technical key/value pairs any more. That's fine! But there is still a significant amount of “visual noise”. However, the result is quite acceptable. But one still may ask if the redundancy in the log entry is necessary. With a “slim” message you don't need that redundancy.
With sllm message:
time=2018-07-02T20:52:39 thread=main level=INFO module=sllm.Example ↩
↪ ix.id=4711 msg=added `count:7` x `item:Hat` to shopping cart by ↩
↪ `user:John Doe`
{"time":"2018-07-02T20:52:39","thread":"main","level":"INFO", ↩
↪ "module":"sllm.Example","ix.id"="4711","number":"7","item":"Hat", ↩
↪ "user":"John Doe","tag":"fill_shopping_cart"}
Obviously, JSON is the least readable format. However, JSON has an outstanding advantage: JSON can display structured data. Structured data is deliberately avoided with sllm. Taking this path would inevitably lead to something with the complexity of XML.
However, similar to the logfmt example, you can use sllm to insert a machine-readable message into the entry.
With sllm message:
{"time":"2018-07-02T20:52:39","thread":"main","level":"INFO", ↩
↪ "module":"sllm.Example","ix.id"="4711","msg":"added `count:7` x ↩
↪ `item:Hat` to shopping cart by `user:John Doe`"}
The markup is simple and sticks to the following requirements:
-
No support for multi-line messages
Spreading a single log entry over multiple lines is considered a bad practice. However, there may be use cases, e.g. logging a stack trace, that justify the multi-line approach. But in any case the message of a log entry shall not exceed a single line!
-
Message arguments are unstructured character sequences
sllm works on the text level. There is no type system implied by sllm. As such the arguments of a sllm message are simply sub-strings of the message string.
-
Arguments are identified by a parameter name
Within a message each argument is identified by its parameter name. A parameter name also is a sub-strings of the message string.
-
Reliable and robust recognition of parameters and arguments
The argument and the parameter can be uniquely recognized within a message. Changes of a message that do not affect neither the parameters nor the arguments do not break the recognition.
-
Be transparent, simple and unobtrusive
A message shall be human-readable so that the meaning of the message is easy to get. The system must be transparent in the sense that even the human reader can easily recognize the parameters with their arguments.
Note that the readability of a message also depends to a certain extent on its author.
With these requirements, why was the backtick '`' chosen for markup? – The backtick is a rarely used character from the ASCII characters set, i.e. it is also compatible with UTF-8. The fact that it is rarely used implies that we don't have to escape it often. This affects backticks in the message template and the arguments. In parameter names backticks are simply not allowed.
And last but not least: Simpler markup rules make simpler software implementations (as long as it is not too simple). Besides many advantages this gives room for efficient implementations. Part of this repository is a Go implementation with quite decent performance.
Benchmarks from https://github.com/fractalqb/go-sllm-benchmark – go there for details.
Ran on: cpu: AMD Ryzen 7 5800X 8-Core Processor
BenchmarkSllmAppend-16 10492954 116.0 ns/op 0 B/op 0 allocs/op
BenchmarkSllmByteBuffer-16 8785130 119.2 ns/op 0 B/op 0 allocs/op
BenchmarkSllmStringBuilder-16 3096673 410.0 ns/op 360 B/op 6 allocs/op
BenchmarkGoJSONDynamic-16 659040 1644 ns/op 760 B/op 15 allocs/op
BenchmarkGoJSONStatic-16 2714055 437.3 ns/op 224 B/op 2 allocs/op
BenchmarkJSONiterDynamic-16 712359 1687 ns/op 950 B/op 14 allocs/op
BenchmarkJSONiterStatic-16 2494083 488.5 ns/op 224 B/op 2 allocs/op
BenchmarkSllmParseDynamic-16 1621568 644.9 ns/op 693 B/op 10 allocs/op
BenchmarkGoJSONparseDynamic-16 545088 2977 ns/op 560 B/op 28 allocs/op
BenchmarkGoJSONparseStatic-16 657501 1935 ns/op 296 B/op 9 allocs/op
BenchmarkGoJSONiterParseDynamic-16 916117 1262 ns/op 344 B/op 24 allocs/op
BenchmarkGoJSONiterParseStatic-16 2122716 551.0 ns/op 112 B/op 9 allocs/op