Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log driver should support multiline #22920

Closed
wjimenez5271 opened this issue May 23, 2016 · 14 comments
Closed

log driver should support multiline #22920

wjimenez5271 opened this issue May 23, 2016 · 14 comments

Comments

@wjimenez5271
Copy link

@wjimenez5271 wjimenez5271 commented May 23, 2016

A significant use case of docker logs with Java applications is the stack trace, which is by nature multiline. Right now each line becomes a new event, and while this can theoretically be pieced together by later stream processing, it is a much more complex problem to solve there than at the source of the serialized output of stdout or stderrr (which is where the docker log driver is located).

@cpuguy83
Copy link
Contributor

@cpuguy83 cpuguy83 commented May 23, 2016

How would you propose we handle this case?
Seems like it's either some multi-line parsing, or single-line only but not both.

@phemmer
Copy link
Contributor

@phemmer phemmer commented May 23, 2016

I think this should be solved within the application doing the logging, not within docker.
It's near impossible to properly detect when multiple lines should be merged together into a single message. There's always going to be some edge case. For stuff like this is almost always better to have the application write directly to the logging facility (syslog, logstash, splunk, whatever). Then there's no re-assembly required, as it was never split up in the first place.

Also I think stuff like this is asking Docker to do too much. If you want advanced log collection, it would be better to use a tool designed to do just that. There's a reason tools like splunk have dozens of config params regarding their log ingestion :-)

@vdemeester
Copy link
Member

@vdemeester vdemeester commented May 24, 2016

I completely agree with @phemmer, I feel it's not the docker responsability to handle single vs multiple lines in log..

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented May 24, 2016

Ok, I'll go ahead and close this; I agree that it adds too much complication, so not something we should do.

Thanks for suggesting though @wjimenez5271, I hope you understand

@thaJeztah thaJeztah closed this May 24, 2016
@wjimenez5271
Copy link
Author

@wjimenez5271 wjimenez5271 commented May 24, 2016

I don't think its that difficult to accomplish, but perhaps I am missing some piece of the equation. Mutliline would require a configurable of some regex that describes the beginning of a log line for a given container, and then the driver would need to concat lines that don't start with that regex with the previous line until it sees the regex match again. It would be important to have some limit on this string buffer to protect against memory usage.

Agreed it can be done downstream by other tools, but keep in mind its much more complex when you have multiplexed streams to do this work.

I can also understand the responsibility argument, but keep in mind the message you're sending to customers. To this end my team recently suggested that maybe Docker's logging solution wasn't the right choice for the problem and instead we should build log shippers into our container runtime to send to the collection tool over the network. This means we now have to manage log tooling that can work with a variety of application stacks and also manage config of where to make network connections to inside the container vs the elegance of a well defined serial channel that almost every language knows how to log to (namely stdout, sterr). So by making the logging features of Docker decidedly limited, you've weakened the promise of Docker making complexities of many different applications at scale easier.

Just some food for thought, I respect you have your own set of priorities to manage :)

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented May 24, 2016

Our goal is to add support for pluggable logging drivers, and having this option as an opt-in (through a logging driver) is a viable option (see #18604). Adding too many features to logging is often non-trivial, as processing logs can become a real bottleneck in high-volume logging setups - we want to avoid such bottlenecks.

@andreysaksonov
Copy link

@andreysaksonov andreysaksonov commented Aug 2, 2016

why not just --log-delimiter="<%regex%>" ?

@spanktar
Copy link

@spanktar spanktar commented Dec 1, 2016

Closed. Don't care.

@michaelkrog
Copy link

@michaelkrog michaelkrog commented Mar 31, 2017

It would REALLY be nice if the container could collect multiline loggings as one event.

Without it the central logging system has to put a lot of effort into figuring out which events belongs together. And when it comes to the ELK stack it currently is not support at all as the Gelf input cannot use the multiline codec(logstash-plugins/logstash-input-gelf#37).

@replicant0wnz
Copy link

@replicant0wnz replicant0wnz commented Apr 28, 2017

Using the Docker GELF driver is pretty much useless if you have multi-line logs.

https://community.graylog.org/t/consuming-multiline-docker-logs-via-gelf-udp/

@michaelkrog
Copy link

@michaelkrog michaelkrog commented Apr 29, 2017

@replicant0wnz

Agreed. We decided to make all our services log in json format and it works like a charm.

@niall-byrne
Copy link

@niall-byrne niall-byrne commented Sep 1, 2017

This worked for me, so I'm leaving a comment here for anyone else trying to make their splunk logs readable:

If you have control of the log content before it gets to stdout:
try replacing your '\n' line endings with '\r'

Docker stops breaking the lines up, and the output is parsable.

@iwilltry42
Copy link

@iwilltry42 iwilltry42 commented May 7, 2018

We're currently using Logspout in production to collect logs from all the containers running in our cluster.
With this adapter (still open PR), multiline logs are aggregated easily: gliderlabs/logspout#370
We merged it into our copy of Logspout and it's working perfectly fine for a couple of weeks now.
We then use the Logstash adapter to feed it to ELK.

@portante
Copy link

@portante portante commented Jun 25, 2018

Until we are able to annotate every byte written to a pipe with pid and thread id from which it came, isn't the case that handling multiple log lines via a docker log driver will always be error prone, only able to solve the "single process with a single thread container" case?

In the mean time, it might be worth considering the addition of a log driver which just performs a raw byte-capture of stdout & stderr, removing the byte-level interpretation all together. We could then defer the interpretation of the byte stream to the consumer, allowing the writer to operate with as little overhead as possible (avoiding bad containers that write only newlines, or small numbers of bytes between newlines). The writer would only have to record each payload read from a pipe with the timestamp, pipe it came from, and the bytes read from the the read() system call. The reader of the byte stream would then be tasked with reassembling the stream according to whatever interpretation they see fit to use.

If a reader wants to do multi-line reassembly, that reader would have all the information available to know what byte came from which pipe, but will still have the multi-process, multi-thread issues.

See: cri-o/cri-o#1605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.