Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log driver should support multiline #22920

Closed
wjimenez5271 opened this issue May 23, 2016 · 14 comments
Closed

log driver should support multiline #22920

wjimenez5271 opened this issue May 23, 2016 · 14 comments
Labels
area/logging kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@wjimenez5271
Copy link

A significant use case of docker logs with Java applications is the stack trace, which is by nature multiline. Right now each line becomes a new event, and while this can theoretically be pieced together by later stream processing, it is a much more complex problem to solve there than at the source of the serialized output of stdout or stderrr (which is where the docker log driver is located).

@cpuguy83
Copy link
Member

How would you propose we handle this case?
Seems like it's either some multi-line parsing, or single-line only but not both.

@thaJeztah thaJeztah added area/logging kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels May 23, 2016
@phemmer
Copy link
Contributor

phemmer commented May 23, 2016

I think this should be solved within the application doing the logging, not within docker.
It's near impossible to properly detect when multiple lines should be merged together into a single message. There's always going to be some edge case. For stuff like this is almost always better to have the application write directly to the logging facility (syslog, logstash, splunk, whatever). Then there's no re-assembly required, as it was never split up in the first place.

Also I think stuff like this is asking Docker to do too much. If you want advanced log collection, it would be better to use a tool designed to do just that. There's a reason tools like splunk have dozens of config params regarding their log ingestion :-)

@vdemeester
Copy link
Member

I completely agree with @phemmer, I feel it's not the docker responsability to handle single vs multiple lines in log..

@thaJeztah
Copy link
Member

Ok, I'll go ahead and close this; I agree that it adds too much complication, so not something we should do.

Thanks for suggesting though @wjimenez5271, I hope you understand

@wjimenez5271
Copy link
Author

I don't think its that difficult to accomplish, but perhaps I am missing some piece of the equation. Mutliline would require a configurable of some regex that describes the beginning of a log line for a given container, and then the driver would need to concat lines that don't start with that regex with the previous line until it sees the regex match again. It would be important to have some limit on this string buffer to protect against memory usage.

Agreed it can be done downstream by other tools, but keep in mind its much more complex when you have multiplexed streams to do this work.

I can also understand the responsibility argument, but keep in mind the message you're sending to customers. To this end my team recently suggested that maybe Docker's logging solution wasn't the right choice for the problem and instead we should build log shippers into our container runtime to send to the collection tool over the network. This means we now have to manage log tooling that can work with a variety of application stacks and also manage config of where to make network connections to inside the container vs the elegance of a well defined serial channel that almost every language knows how to log to (namely stdout, sterr). So by making the logging features of Docker decidedly limited, you've weakened the promise of Docker making complexities of many different applications at scale easier.

Just some food for thought, I respect you have your own set of priorities to manage :)

@thaJeztah
Copy link
Member

Our goal is to add support for pluggable logging drivers, and having this option as an opt-in (through a logging driver) is a viable option (see #18604). Adding too many features to logging is often non-trivial, as processing logs can become a real bottleneck in high-volume logging setups - we want to avoid such bottlenecks.

@andreysaksonov
Copy link

andreysaksonov commented Aug 2, 2016

why not just --log-delimiter="<%regex%>" ?

@spanktar
Copy link

spanktar commented Dec 1, 2016

Closed. Don't care.

@michaelkrog
Copy link

It would REALLY be nice if the container could collect multiline loggings as one event.

Without it the central logging system has to put a lot of effort into figuring out which events belongs together. And when it comes to the ELK stack it currently is not support at all as the Gelf input cannot use the multiline codec(logstash-plugins/logstash-input-gelf#37).

@replicant0wnz
Copy link

Using the Docker GELF driver is pretty much useless if you have multi-line logs.

https://community.graylog.org/t/consuming-multiline-docker-logs-via-gelf-udp/

@michaelkrog
Copy link

@replicant0wnz

Agreed. We decided to make all our services log in json format and it works like a charm.

@niall-byrne
Copy link

niall-byrne commented Sep 1, 2017

This worked for me, so I'm leaving a comment here for anyone else trying to make their splunk logs readable:

If you have control of the log content before it gets to stdout:
try replacing your '\n' line endings with '\r'

Docker stops breaking the lines up, and the output is parsable.

@iwilltry42
Copy link

We're currently using Logspout in production to collect logs from all the containers running in our cluster.
With this adapter (still open PR), multiline logs are aggregated easily: gliderlabs/logspout#370
We merged it into our copy of Logspout and it's working perfectly fine for a couple of weeks now.
We then use the Logstash adapter to feed it to ELK.

@portante
Copy link

portante commented Jun 25, 2018

Until we are able to annotate every byte written to a pipe with pid and thread id from which it came, isn't the case that handling multiple log lines via a docker log driver will always be error prone, only able to solve the "single process with a single thread container" case?

In the mean time, it might be worth considering the addition of a log driver which just performs a raw byte-capture of stdout & stderr, removing the byte-level interpretation all together. We could then defer the interpretation of the byte stream to the consumer, allowing the writer to operate with as little overhead as possible (avoiding bad containers that write only newlines, or small numbers of bytes between newlines). The writer would only have to record each payload read from a pipe with the timestamp, pipe it came from, and the bytes read from the the read() system call. The reader of the byte stream would then be tasked with reassembling the stream according to whatever interpretation they see fit to use.

If a reader wants to do multi-line reassembly, that reader would have all the information available to know what byte came from which pipe, but will still have the multi-process, multi-thread issues.

See: cri-o/cri-o#1605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests