Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

structured logging: support log messages with line breaks #268

Closed
wants to merge 3 commits into from

Conversation

pohly
Copy link

@pohly pohly commented Nov 9, 2021

What this PR does / why we need it:

The initial structured logging output (almost) always produced a single line
per log message, with quoting of strings to represent line breaks as \n. This
made the output hard to read (see
kubernetes/kubernetes#104868).

It was still possible to get line breaks when formatting a value with %+v and
that ended up emitting line breaks; this was probably not intended.

Now the message is never quoted, as in non-structured output. String values are
quoted if they contain no line break. If they do, start/end markers delimit the
text which appears on its own lines.

All additional lines of a structure log message get indented by one space. This
makes it obvious where a new log message starts, which is an improvement
compared to the traditional format.

Which issue(s) this PR fixes:

Fixes kubernetes/kubernetes#106262, kubernetes/kubernetes#106428

Special notes for your reviewer:

Traditional output:

I1112 14:06:35.783354  328441 structured_logging.go:42] someData printed using InfoF: {hello world 0}
I1112 14:06:35.783472  328441 structured_logging.go:43] longData printed using InfoF: {long Multiple
lines
with quite a bit
of text. 0}
I1112 14:06:35.783483  328441 structured_logging.go:44] stringData printed using InfoF,
with the message across multiple lines:
long: Multiple
lines
with quite a bit
of text.
I1112 14:06:35.898176  142908 structured_logging.go:54] logData printed using InfoF:
{log output from some program I0000 12:00:00.000000  123456 main.go:42] Starting
E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
 0}

Old InfoS output before this commit:

I1112 14:06:35.783512  328441 structured_logging.go:50] "using InfoS" someData={Name:hello Data:world internal:0}
I1112 14:06:35.783529  328441 structured_logging.go:51] "using InfoS" longData={Name:long Data:Multiple
lines
with quite a bit
of text. internal:0}
I1112 14:06:35.783549  328441 structured_logging.go:52] "using InfoS with\nthe message across multiple lines" nt=1 stringData="long: Multiple\nlines\nwith quite a bit\nof text." str="another value"
I1112 14:06:35.898278  142908 structured_logging.go:65] "using InfoS" logData={Name:log output from some program Data:I0000 12:00:00.000000  123456 main.go:42] Starting
E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
 internal:0}
I1112 14:06:35.783565  328441 structured_logging.go:61] "Did something" item="foobar"
I1112 14:06:35.783576  328441 structured_logging.go:63] "This is a full sentence." item="foobar"

New InfoS output:

I1119 13:18:00.210970   78836 structured_logging.go:58] using InfoS | someData={Name:hello Data:world internal:0}
I1119 13:18:00.210988   78836 structured_logging.go:59] using InfoS | longData>>>
 {Name:long Data:Multiple
 lines
 with quite a bit
 of text. internal:0}
 <<<
I1119 13:18:00.211009   78836 structured_logging.go:60] using InfoS with
 the message across multiple lines | int=1 stringData>>>
 long: Multiple
 lines
 with quite a bit
 of text.
 <<< str="another value"
I1119 13:18:00.211032   78836 structured_logging.go:65] using InfoS | logData>>>
 {Name:log output from some program Data:I0000 12:00:00.000000  123456 main.go:42] Starting
 E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
  internal:0}
 <<<
I1119 13:18:00.211068   78836 structured_logging.go:71] Did something | item="foobar"
I1119 13:18:00.211079   78836 structured_logging.go:73] This is a full sentence. item="foobar"

Release note:

In structured output (= InfoS, ErrorS), values with line breaks will be printed across multiple lines to make the output more readable. The message is also not quoted anymore and may stretch across multiple lines. Every new line gets indented by one space.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 9, 2021
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
To complete the pull request process, please assign justinsb after the PR has been reviewed.
You can assign the PR to them by writing /assign @justinsb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pohly pohly changed the title structured logging: avoid quoting of strings with line breaks RFC: structured logging: avoid quoting of strings with line breaks Nov 9, 2021
@pohly
Copy link
Author

pohly commented Nov 9, 2021

I'm open for suggestions regarding the markup for string start and end...

@pohly
Copy link
Author

pohly commented Nov 9, 2021

Another alternative is with the key name repeated inside the start/end tags instead of the normal printing as variable name:

 <previous key/value> vvvvv multiLine vvvvv
 ....
 ^^^^^ multiLine ^^^^^
 <next key/value>

"multiLine" here is the key for the key/value pair, not some special hard-coded delimiter as "string" is in the current PR.

@pohly
Copy link
Author

pohly commented Nov 9, 2021

The message string for InfoS has the same problem. I also find the quotation marks around the text distracting. On the other hand, we may want to provide some hint on where the message text ends and key/value pairs start.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 12, 2021
@pohly pohly changed the title RFC: structured logging: avoid quoting of strings with line breaks structured logging: support log messages with line breaks Nov 12, 2021
@pohly
Copy link
Author

pohly commented Nov 12, 2021

I ended up revamping the entire formatting. See updated description and the new example for a comparison.

@pohly
Copy link
Author

pohly commented Nov 12, 2021

This is ready for review of the concept. I might still do some performance improvements.

/cc @shivanshu1333 @serathius
/wg structured-logging
/sig instrumentation
/priority important-soon
/milestone 1.23
/area logging
/kind feature

@k8s-ci-robot k8s-ci-robot added wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 12, 2021
@k8s-ci-robot
Copy link

@pohly: The label(s) area/logging cannot be applied, because the repository doesn't have them.

In response to this:

This is ready for review of the concept. I might still do some performance improvements.

/cc @shivanshu1333 @serathius
/wg structured-logging
/sig instrumentation
/priority important-soon
/milestone 1.23
/area logging
/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 12, 2021
b.WriteString(v)
return
}
// Complex multi-line string, show as-is with indention.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any benefit of adding built in markers for logs. They

  • don't make logs more readable as users can klog header to already distinguish start vs end of multi line log
  • adding standard markers makes parsing logs harder, existing kubernetes log parsers already should handle multi line log.
  • Increases log volume

K8s logs already have multi line logs that provide their own markers, thus I think we should leave it up user to provide
them themselves.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a misunderstanding. The markers are not delimiters for messages. They are delimiters for values.

With the current InfoS format, the quotation marks provides those delimiters. But they result in output that is basically unreadable for a human, which defeats the purpose of the text output.

Copy link
Author

@pohly pohly Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increases log volume

That's a valid point. We could make them shorter. For example:

I1112 15:01:48.696677  413871 structured_logging.go:52] using InfoS with
 the message across multiple lines: int=1 stringData>>>
 long: Multiple
 lines
 with quite a bit
 of text.
 <<< str="another value"

That's four bytes (>>> plus newline) instead of two (=") and three bytes (<<<) instead of one (") at the end.

Line breaks are the same with new line plus space instead of \n.

If the text contains many quotation marks, this format might actually be shorter because those don't need to be escaped.

}

func writeString(b *bytes.Buffer, s string) {
for i, line := range strings.Split(s, "\n") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does strings.Split here needlessly copies the string into slices?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll add benchmarking and make the code more efficient before merging. Right now I am only looking for feedback on the format.

/hold

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#273 has detailed benchmark data.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2021
b.WriteString(fmt.Sprintf("%q", msg))
// The message is never quoted. New lines get indented.
writeString(b, msg)
// Enhance readability by inserting : between message and key/value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this change, we decided on quoting the string as this gives us predictable and somewhat parsable output (not everyone will imminently switch to JSON). Idea of quoting was related to message being treated as structured metadata. This is based on idea that JSON uses "msg" key to store message, so we though that message should be written as other string values with one difference of msg= being skipped.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this change,

This PR or the line you commented on, the : separator? I can drop that separator, it's not important.

But quoting is a major problem. I get what was meant to be achieved with it, but IMHO the effect on readability was not considered. it now prevents conversion to structured logging because developers will complain or even go as far as filing bugs and reverting the conversion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Marek, JSON should not become a dependency for log parsing, the raw log output should be parsable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current text format is not machine-readable. If that was the goal, then we have bugs in its implementation around handling of structs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as you are both against this proposal, how do you suggest we should deal with cases where developers want multi-line output in a readable format (kubernetes/kubernetes#104868)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate how we consider multi line for different part of logs. In my opinion multiline:

  • doesn't make sense for log messages. They should be a short description of what event is happening in system. Their format should not differ much or at all between different log formats. For example as I user that switched between log formats I want to be able to find log I have seen in old format in the new format just by grepping the message.
  • doesn't make sense for log keys. They should strive to uniquely identify types of metadata attached to log. Strive comes from idea that over time a schema will develop that will have a predictable way to find values under specific keys. Their names should follow a strict convention to avoid diverging.
  • can make sense for complex values when debugging. I assumed handling this case is needed Scheduler logged config prints without line breaks kubernetes#104868. There are multiple cases in k8s logs that multiline logs are used to improve readability. We should allow users to customize that some values (for example dumping whole k8s object yaml) are written in multiline format.

Copy link
Author

@pohly pohly Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make sense for complex values when debugging.

Agreed.

We should allow users to customize that some values (for example dumping whole k8s object yaml) are written in multiline format.

And how to you propose to do that?

We need an API for it and a serialization format that log parsers then will understand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion multiline doesn't make sense for log messages.

I can buy that argument. So let me propose a solution just for multi-line values in #273

I'm going to close this PR here in favor of that one.

/close

klog.go Outdated
// pairs, but only when the message does not already end with
// punctation.
if len(msg) > 0 && !unicode.IsPunct(rune(msg[len(msg)-1])) {
b.WriteString(":")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think about using some other separator here? I've seen many logs use something like klog.InfoS("Warning: some text", key, value)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps |, with a space before and after it?

Structured logging with InfoS is formatted differently than traditional output,
so it's worthwhile to have an example that demostrates that.

The new example didn't compile initially because an older version of klog was
used. This gets avoids by always using the source from the current directory.
The initial structured logging output (almost) always produced a single line
per log message, with quoting of strings to represent line breaks as \n.  This
made the output hard to read (see
kubernetes/kubernetes#104868).

It was still possible to get line breaks when formatting a value with `%+v` and
that ended up emitting line breaks; this was probably not intended.

Now the message is never quoted, as in non-structured output. String values are
quoted if they contain no line break. If they do, start/end markers delimit the
text which appears on its own lines.

All additional lines of a structure log message get indented by one space. This
makes it obvious where a new log message starts, which is an improvement
compared to the traditional format.

Traditional output:

 I1112 14:06:35.783354  328441 structured_logging.go:42] someData printed using InfoF: {hello world 0}
 I1112 14:06:35.783472  328441 structured_logging.go:43] longData printed using InfoF: {long Multiple
 lines
 with quite a bit
 of text. 0}
 I1112 14:06:35.783483  328441 structured_logging.go:44] stringData printed using InfoF,
 with the message across multiple lines:
 long: Multiple
 lines
 with quite a bit
 of text.
 I1112 14:06:35.898176  142908 structured_logging.go:54] logData printed using InfoF:
 {log output from some program I0000 12:00:00.000000  123456 main.go:42] Starting
 E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
  0}

Old InfoS output before this commit:

 I1112 14:06:35.783512  328441 structured_logging.go:50] "using InfoS" someData={Name:hello Data:world internal:0}
 I1112 14:06:35.783529  328441 structured_logging.go:51] "using InfoS" longData={Name:long Data:Multiple
 lines
 with quite a bit
 of text. internal:0}
 I1112 14:06:35.783549  328441 structured_logging.go:52] "using InfoS with\nthe message across multiple lines" int=1 stringData="long: Multiple\nlines\nwith quite a bit\nof text." str="another value"
 I1112 14:06:35.783565  328441 structured_logging.go:61] "Did something" item="foobar"
 I1112 14:06:35.783576  328441 structured_logging.go:63] "This is a full sentence." item="foobar"
 I1112 14:06:35.898278  142908 structured_logging.go:65] "using InfoS" logData={Name:log output from some program Data:I0000 12:00:00.000000  123456 main.go:42] Starting
 E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
  internal:0}

New InfoS output:

 I1112 15:01:48.696617  413871 structured_logging.go:50] using InfoS: someData={Name:hello Data:world internal:0}
 I1112 15:01:48.696646  413871 structured_logging.go:51] using InfoS: ===start of longData===
  {Name:long Data:Multiple
  lines
  with quite a bit
  of text. internal:0}
  ===end of longData===
 I1112 15:01:48.696677  413871 structured_logging.go:52] using InfoS with
  the message across multiple lines: int=1 ===start of stringData===
  long: Multiple
  lines
  with quite a bit
  of text.
  ===end of stringData=== str="another value"
 I1112 15:01:48.788207  143146 structured_logging.go:65] using InfoS: ===start of logData===
  {Name:log output from some program Data:I0000 12:00:00.000000  123456 main.go:42] Starting
  E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
   internal:0}
  ===end of logData===
 I1112 15:01:48.696704  413871 structured_logging.go:61] Did something: item="foobar"
 I1112 15:01:48.696719  413871 structured_logging.go:63] This is a full sentence. item="foobar"
Multi-line strings are now shown as
   fieldName>>>
   <<<
and "| " is used as separator between message and values.

This is shorter:

I1119 13:18:00.210970   78836 structured_logging.go:58] using InfoS | someData={Name:hello Data:world internal:0}
I1119 13:18:00.210988   78836 structured_logging.go:59] using InfoS | longData>>>
 {Name:long Data:Multiple
 lines
 with quite a bit
 of text. internal:0}
 <<<
I1119 13:18:00.211009   78836 structured_logging.go:60] using InfoS with
 the message across multiple lines | int=1 stringData>>>
 long: Multiple
 lines
 with quite a bit
 of text.
 <<< str="another value"
I1119 13:18:00.211032   78836 structured_logging.go:65] using InfoS | logData>>>
 {Name:log output from some program Data:I0000 12:00:00.000000  123456 main.go:42] Starting
 E0000 12:00:01.000000  123456 main.go:43] Failed for some reason
  internal:0}
 <<<
I1119 13:18:00.211055   78836 structured_logging.go:66] using InfoS | boolean=true int=1 float=0.1
I1119 13:18:00.211068   78836 structured_logging.go:71] Did something | item="foobar"
I1119 13:18:00.211079   78836 structured_logging.go:73] This is a full sentence. item="foobar"
@pohly
Copy link
Author

pohly commented Nov 22, 2021

When comparing this change with kubernetes/kubernetes#106594 (comment) shows that the new format is slightly smaller: 61450443 bytes vs. 60583121 bytes for a kubelet log with 268551 messages and 310746 key/value pairs in those messages.

@k8s-ci-robot
Copy link

@pohly: Closed this PR.

In response to this:

In my opinion multiline doesn't make sense for log messages.

I can buy that argument. So let me propose a solution just for multi-line values in #273

I'm going to close this PR here in favor of that one.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

structured logging: better support for multi-line strings
4 participants