New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log driver should support multiline #22920

Closed
wjimenez5271 opened this Issue May 23, 2016 · 14 comments

Comments

Projects
None yet
@wjimenez5271

wjimenez5271 commented May 23, 2016

A significant use case of docker logs with Java applications is the stack trace, which is by nature multiline. Right now each line becomes a new event, and while this can theoretically be pieced together by later stream processing, it is a much more complex problem to solve there than at the source of the serialized output of stdout or stderrr (which is where the docker log driver is located).

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 23, 2016

Contributor

How would you propose we handle this case?
Seems like it's either some multi-line parsing, or single-line only but not both.

Contributor

cpuguy83 commented May 23, 2016

How would you propose we handle this case?
Seems like it's either some multi-line parsing, or single-line only but not both.

@phemmer

This comment has been minimized.

Show comment
Hide comment
@phemmer

phemmer May 23, 2016

Contributor

I think this should be solved within the application doing the logging, not within docker.
It's near impossible to properly detect when multiple lines should be merged together into a single message. There's always going to be some edge case. For stuff like this is almost always better to have the application write directly to the logging facility (syslog, logstash, splunk, whatever). Then there's no re-assembly required, as it was never split up in the first place.

Also I think stuff like this is asking Docker to do too much. If you want advanced log collection, it would be better to use a tool designed to do just that. There's a reason tools like splunk have dozens of config params regarding their log ingestion :-)

Contributor

phemmer commented May 23, 2016

I think this should be solved within the application doing the logging, not within docker.
It's near impossible to properly detect when multiple lines should be merged together into a single message. There's always going to be some edge case. For stuff like this is almost always better to have the application write directly to the logging facility (syslog, logstash, splunk, whatever). Then there's no re-assembly required, as it was never split up in the first place.

Also I think stuff like this is asking Docker to do too much. If you want advanced log collection, it would be better to use a tool designed to do just that. There's a reason tools like splunk have dozens of config params regarding their log ingestion :-)

@vdemeester

This comment has been minimized.

Show comment
Hide comment
@vdemeester

vdemeester May 24, 2016

Member

I completely agree with @phemmer, I feel it's not the docker responsability to handle single vs multiple lines in log..

Member

vdemeester commented May 24, 2016

I completely agree with @phemmer, I feel it's not the docker responsability to handle single vs multiple lines in log..

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 24, 2016

Member

Ok, I'll go ahead and close this; I agree that it adds too much complication, so not something we should do.

Thanks for suggesting though @wjimenez5271, I hope you understand

Member

thaJeztah commented May 24, 2016

Ok, I'll go ahead and close this; I agree that it adds too much complication, so not something we should do.

Thanks for suggesting though @wjimenez5271, I hope you understand

@thaJeztah thaJeztah closed this May 24, 2016

@wjimenez5271

This comment has been minimized.

Show comment
Hide comment
@wjimenez5271

wjimenez5271 May 24, 2016

I don't think its that difficult to accomplish, but perhaps I am missing some piece of the equation. Mutliline would require a configurable of some regex that describes the beginning of a log line for a given container, and then the driver would need to concat lines that don't start with that regex with the previous line until it sees the regex match again. It would be important to have some limit on this string buffer to protect against memory usage.

Agreed it can be done downstream by other tools, but keep in mind its much more complex when you have multiplexed streams to do this work.

I can also understand the responsibility argument, but keep in mind the message you're sending to customers. To this end my team recently suggested that maybe Docker's logging solution wasn't the right choice for the problem and instead we should build log shippers into our container runtime to send to the collection tool over the network. This means we now have to manage log tooling that can work with a variety of application stacks and also manage config of where to make network connections to inside the container vs the elegance of a well defined serial channel that almost every language knows how to log to (namely stdout, sterr). So by making the logging features of Docker decidedly limited, you've weakened the promise of Docker making complexities of many different applications at scale easier.

Just some food for thought, I respect you have your own set of priorities to manage :)

wjimenez5271 commented May 24, 2016

I don't think its that difficult to accomplish, but perhaps I am missing some piece of the equation. Mutliline would require a configurable of some regex that describes the beginning of a log line for a given container, and then the driver would need to concat lines that don't start with that regex with the previous line until it sees the regex match again. It would be important to have some limit on this string buffer to protect against memory usage.

Agreed it can be done downstream by other tools, but keep in mind its much more complex when you have multiplexed streams to do this work.

I can also understand the responsibility argument, but keep in mind the message you're sending to customers. To this end my team recently suggested that maybe Docker's logging solution wasn't the right choice for the problem and instead we should build log shippers into our container runtime to send to the collection tool over the network. This means we now have to manage log tooling that can work with a variety of application stacks and also manage config of where to make network connections to inside the container vs the elegance of a well defined serial channel that almost every language knows how to log to (namely stdout, sterr). So by making the logging features of Docker decidedly limited, you've weakened the promise of Docker making complexities of many different applications at scale easier.

Just some food for thought, I respect you have your own set of priorities to manage :)

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 24, 2016

Member

Our goal is to add support for pluggable logging drivers, and having this option as an opt-in (through a logging driver) is a viable option (see #18604). Adding too many features to logging is often non-trivial, as processing logs can become a real bottleneck in high-volume logging setups - we want to avoid such bottlenecks.

Member

thaJeztah commented May 24, 2016

Our goal is to add support for pluggable logging drivers, and having this option as an opt-in (through a logging driver) is a viable option (see #18604). Adding too many features to logging is often non-trivial, as processing logs can become a real bottleneck in high-volume logging setups - we want to avoid such bottlenecks.

@andreysaksonov

This comment has been minimized.

Show comment
Hide comment
@andreysaksonov

andreysaksonov Aug 2, 2016

why not just --log-delimiter="<%regex%>" ?

andreysaksonov commented Aug 2, 2016

why not just --log-delimiter="<%regex%>" ?

@spanktar

This comment has been minimized.

Show comment
Hide comment
@spanktar

spanktar Dec 1, 2016

Closed. Don't care.

spanktar commented Dec 1, 2016

Closed. Don't care.

@michaelkrog

This comment has been minimized.

Show comment
Hide comment
@michaelkrog

michaelkrog Mar 31, 2017

It would REALLY be nice if the container could collect multiline loggings as one event.

Without it the central logging system has to put a lot of effort into figuring out which events belongs together. And when it comes to the ELK stack it currently is not support at all as the Gelf input cannot use the multiline codec(logstash-plugins/logstash-input-gelf#37).

michaelkrog commented Mar 31, 2017

It would REALLY be nice if the container could collect multiline loggings as one event.

Without it the central logging system has to put a lot of effort into figuring out which events belongs together. And when it comes to the ELK stack it currently is not support at all as the Gelf input cannot use the multiline codec(logstash-plugins/logstash-input-gelf#37).

@replicant0wnz

This comment has been minimized.

Show comment
Hide comment
@replicant0wnz

replicant0wnz Apr 28, 2017

Using the Docker GELF driver is pretty much useless if you have multi-line logs.

https://community.graylog.org/t/consuming-multiline-docker-logs-via-gelf-udp/

replicant0wnz commented Apr 28, 2017

Using the Docker GELF driver is pretty much useless if you have multi-line logs.

https://community.graylog.org/t/consuming-multiline-docker-logs-via-gelf-udp/

@michaelkrog

This comment has been minimized.

Show comment
Hide comment
@michaelkrog

michaelkrog Apr 29, 2017

@replicant0wnz

Agreed. We decided to make all our services log in json format and it works like a charm.

michaelkrog commented Apr 29, 2017

@replicant0wnz

Agreed. We decided to make all our services log in json format and it works like a charm.

@niall-byrne

This comment has been minimized.

Show comment
Hide comment
@niall-byrne

niall-byrne Sep 1, 2017

This worked for me, so I'm leaving a comment here for anyone else trying to make their splunk logs readable:

If you have control of the log content before it gets to stdout:
try replacing your '\n' line endings with '\r'

Docker stops breaking the lines up, and the output is parsable.

niall-byrne commented Sep 1, 2017

This worked for me, so I'm leaving a comment here for anyone else trying to make their splunk logs readable:

If you have control of the log content before it gets to stdout:
try replacing your '\n' line endings with '\r'

Docker stops breaking the lines up, and the output is parsable.

@iwilltry42

This comment has been minimized.

Show comment
Hide comment
@iwilltry42

iwilltry42 May 7, 2018

We're currently using Logspout in production to collect logs from all the containers running in our cluster.
With this adapter (still open PR), multiline logs are aggregated easily: gliderlabs/logspout#370
We merged it into our copy of Logspout and it's working perfectly fine for a couple of weeks now.
We then use the Logstash adapter to feed it to ELK.

iwilltry42 commented May 7, 2018

We're currently using Logspout in production to collect logs from all the containers running in our cluster.
With this adapter (still open PR), multiline logs are aggregated easily: gliderlabs/logspout#370
We merged it into our copy of Logspout and it's working perfectly fine for a couple of weeks now.
We then use the Logstash adapter to feed it to ELK.

@portante

This comment has been minimized.

Show comment
Hide comment
@portante

portante Jun 25, 2018

Until we are able to annotate every byte written to a pipe with pid and thread id from which it came, isn't the case that handling multiple log lines via a docker log driver will always be error prone, only able to solve the "single process with a single thread container" case?

In the mean time, it might be worth considering the addition of a log driver which just performs a raw byte-capture of stdout & stderr, removing the byte-level interpretation all together. We could then defer the interpretation of the byte stream to the consumer, allowing the writer to operate with as little overhead as possible (avoiding bad containers that write only newlines, or small numbers of bytes between newlines). The writer would only have to record each payload read from a pipe with the timestamp, pipe it came from, and the bytes read from the the read() system call. The reader of the byte stream would then be tasked with reassembling the stream according to whatever interpretation they see fit to use.

If a reader wants to do multi-line reassembly, that reader would have all the information available to know what byte came from which pipe, but will still have the multi-process, multi-thread issues.

See: kubernetes-sigs/cri-o#1605

portante commented Jun 25, 2018

Until we are able to annotate every byte written to a pipe with pid and thread id from which it came, isn't the case that handling multiple log lines via a docker log driver will always be error prone, only able to solve the "single process with a single thread container" case?

In the mean time, it might be worth considering the addition of a log driver which just performs a raw byte-capture of stdout & stderr, removing the byte-level interpretation all together. We could then defer the interpretation of the byte stream to the consumer, allowing the writer to operate with as little overhead as possible (avoiding bad containers that write only newlines, or small numbers of bytes between newlines). The writer would only have to record each payload read from a pipe with the timestamp, pipe it came from, and the bytes read from the the read() system call. The reader of the byte stream would then be tasked with reassembling the stream according to whatever interpretation they see fit to use.

If a reader wants to do multi-line reassembly, that reader would have all the information available to know what byte came from which pipe, but will still have the multi-process, multi-thread issues.

See: kubernetes-sigs/cri-o#1605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment