Skip to content

Add logtag field to crio logs#876

Closed
chlunde wants to merge 1 commit intofluent:masterfrom
chlunde:crio-logtag
Closed

Add logtag field to crio logs#876
chlunde wants to merge 1 commit intofluent:masterfrom
chlunde:crio-logtag

Conversation

@chlunde
Copy link
Copy Markdown
Contributor

@chlunde chlunde commented Oct 31, 2018

Since 1.9.0 in november 2017 cri-o has written a log tag field before the
log message:

https://github.com/kubernetes-sigs/cri-o/blob/f58419d6cf462070a0c3727ad2dc554ef151e832/conmon/conmon.c#L499-L509

If a single character is detected, consider this the log tag for the line.
This is a part of the multiline handling for cri-o logs.

Signed-off-by: Carl Henrik Lunde chlunde@ifi.uio.no

Since 1.9.0 in november 2017 cri-o has written a log tag field before the
log message:

https://github.com/kubernetes-sigs/cri-o/blob/f58419d6cf462070a0c3727ad2dc554ef151e832/conmon/conmon.c#L499-L509

If a single character is detected, consider this the log tag for the line.
This is a part of the multiline handling for cri-o logs.

Signed-off-by: Carl Henrik Lunde <chlunde@ifi.uio.no>
@chlunde
Copy link
Copy Markdown
Contributor Author

chlunde commented Oct 31, 2018

@StevenACoffman I see you mention this in your original PR. Do you remember why you did not move the tag to a dedicated field?

@StevenACoffman
Copy link
Copy Markdown
Contributor

StevenACoffman commented Nov 1, 2018

In #667 I modeled it after the kubernetes/kubernetes#54777 fluentd change that was done in October 2017.

It appears that the original example logs looked like this:

2016-02-17T00:04:05.931087621Z stdout [info:2016-02-16T16:04:05.930-08:00] Some log text here

Since that time, they have been revised (note the F tag difference):

2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here

I am looking at this file: fluentd-es-configmap.yaml

If you look at the regex they use here Line 128 it is currently:

/^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/

Where I notice that yours is:

/^(?<time>.+)\b(?<stream>stdout|stderr)( (?<logtag>[A-Z]))?\b(?<log>.*)$/

@StevenACoffman
Copy link
Copy Markdown
Contributor

The current crio parser uses this regex:

/^(?<time>.+)\b(?<stream>stdout|stderr)\b(?<log>.*)$/

@chlunde You have proposed in #876 this regex proposal:

/^(?<time>.+)\b(?<stream>stdout|stderr)( (?<logtag>[A-Z]))?\b(?<log>.*)$/

@kskewes has this proposed this regex in #873 :

^(?<time>.+) (?<stream>stdout|stderr) (?<partialflag>F|P) (?<message>.*)$

If you look at the regex fluentd uses here Line 128 it is currently:

/^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/

Can you speak to which would best meet your understanding, use case, and experience?

@StevenACoffman
Copy link
Copy Markdown
Contributor

StevenACoffman commented Nov 1, 2018

The CRI reached a conclusion in 1.9 to continue using the text format, with a new TAG field to address the existing issues, and provide some level of extensibility.

Per kubernetes/kubernetes#44976 (comment)

The updated log format:
kubernetes/pkg/kubelet/apis/cri/v1alpha1/runtime/constants.go

Lines 39 to 55 in 3a1c944

// LogTag is the tag of a log line in CRI container log.
// Currently defined log tags:
// * First tag: Partial/End - P/E.
// The field in the container log format can be extended to include multiple
// tags by using a delimiter, but changes should be rare. If it becomes clear
// that better extensibility is desired, a more extensible format (e.g., json)
// should be adopted as a replacement and/or addition.
type LogTag string

const (
// LogTagPartial means the line is part of multiple lines.
LogTagPartial LogTag = "P"
// LogTagFull means the line is a single full line or the end of multiple lines.
LogTagFull LogTag = "F"
// LogTagDelimiter is the delimiter for different log tags.
LogTagDelimiter = ":"
)

New code location is https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/cri/runtime/v1alpha2/constants.go#L39-L55

@StevenACoffman
Copy link
Copy Markdown
Contributor

I am not sure whether log messages can also look like this:

2016-10-06T00:17:09.669794202Z stdout P:TAG1:TAG2 log content 1
2016-10-06T00:17:09.669794203Z stdout P:TAG1:TAG2 log content 2

I have asked for clarification in the old dead issue referenced above.

@StevenACoffman
Copy link
Copy Markdown
Contributor

StevenACoffman commented Nov 2, 2018

@chlunde How about:

^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<message>.*)$

Or

^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
  • Most parsers now use message instead of log.
  • If we are not using the partialFlag and tags to stitch together multilines, then probably discard it with [^ ]* from fluentd
  • \b is incorrect as it finds commas and other non-word character boundaries
  • This is more consistent with fluentd elasticsearch regex.

@chlunde
Copy link
Copy Markdown
Contributor Author

chlunde commented Nov 2, 2018

Thanks for taking this upstream @StevenACoffman!

Changing log to message would be a breaking change, but since it's not yet 1.0 I guess it's a good time to do it now.

I think we should keep logtag so we can add crio-support to log line merging in the new filter_join_partial module.

Also, I realize we have an issue with greedy regex and malicious users injecting data which could confuse the parsing, so we should use [^ ] for time too: http://rubular.com/r/PPTskxP9m0

So maybe ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$? http://rubular.com/r/tjUt3Awgg4

I assume the log tag can never contain spaces.

StevenACoffman added a commit to StevenACoffman/fluent-bit that referenced this pull request Nov 2, 2018
StevenACoffman added a commit to StevenACoffman/fluent-bit that referenced this pull request Nov 2, 2018
See fluent#876 and fluent#873

Signed-off-by: Steve Coffman <steve.coffman@ithaka.org>
@StevenACoffman
Copy link
Copy Markdown
Contributor

StevenACoffman commented Nov 2, 2018

See #873 (comment)

I made a new PR #881 that unified this discussion and the feedback from @kskewes
Thank you so much! There's still more work to support multi-line with the new filter_join_partial module. Please see #852 for the json logger work.

A new filter for appending log events. Solves issue #821
Currently only implemented for Docker json-log format, but can support CRI-O logs with little work.
This plugin is supposed to be used early in the filter-chain, before kubernetes-filter, e.g. right after tail-input.

@chlunde chlunde closed this Nov 2, 2018
edsiper pushed a commit that referenced this pull request Nov 6, 2018
* Support CRI-O and containerd

See #876 and #873

Signed-off-by: Steve Coffman <steve.coffman@ithaka.org>

* Fix based on @kskewes from #881

Signed-off-by: Steve Coffman <steve.coffman@ithaka.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants