-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the log lines splitting configurable #34855
Comments
-1 We should just make sure each log driver handles this properly. Also why are log messages greater than 16K? |
Another way to fix this issue is implement reconstructing entries in json-file logging driver Addressing @portante's comment from the similar issue:
I don't argue that this change was needed, it's the way it was rolled out that troubles me. It wasn't made sure that people are prepared to this change and have time to adapt these tools or even know about this change. Additionally, there's a difference between an edge case, which is known to happen only when something is wrong, and a usual operation of the logging system. If applications running in the cluster are known to produce lines in JSON format no longer than 1MB, why increase the complexity of the collection system?
Yes, but hey, isn't it something any big system should do in case of a braking change?
That sounds great in practice, but that's a leaking abstraction: interface doesn't have it explicitly defined and it breaks end-to-end experience. If you would introduce this change in a non-breaking way and would wait until tools have some time to support that, it would probably be fine, but in reality it breaks people here and there who don't expect their contract (JSON/XML per line/log entry) being implicitly transformed in the middle of the pipeline. |
@cpuguy83 Note, that I don't mind the change, that's super useful! I mind the way it was released
Does the paragraph in the earlier message about edge case vs. usual operation make sense?
Problem is, not only log driver should know how to handle this, but also log collector
For example large JSON objects which include big context |
By the way, do you realize that this is not solving the original issue, but passing the responsibility to the entity that will be reconstructing these log lines? Because this entity also has to implement exactly the same splitting functionality for edge cases, but be configurable or have a different hardcoded constant for a concrete setup |
I've the original issue and I believe that there are some arguments missing in the whole discussion. AFAIU everyone agrees that a limit is needed and that backed at some point will have to reconstruct messages. This goes without saying. What is missing however, is that for many logging solution there might be additional limits in-between, that might not be compatible with arbitrary 16k used in Docker. Imagine a situation with a logging service that uses agents to collect logs and send it backend. Such backend will obviously also have it's own limits for request size etc. Let's consider two scenarios: @portante @cpuguy83 I know you had a strong opinion on this topic. Can you please share your thoughts in the context of the example I described above? |
Ok, I can see the value in making this configurable to make sure existing solutions continue to work (although containers will have to be re-created with the updated max line size). As a side note: |
@cpuguy83 would you accept a PR from us adding such configuration option? |
@fgrzadkowski Yes. |
This is the same as #34887, though it results in unintended splitting instead of rejoining, because nothing is currently doing rejoining.
This is the approach taken in #34889, but I think this just adds additional complication and duplication of effort.
I think it makes sense to make it user-configurable when a log driver can send to multiple different implementations of log collectors (like the syslog driver), but makes more sense to have it be driver-configured when there's a known single implementation of a log collector (awslogs, gcplogs, etc).
I think to some extent that's beside the point; Docker is the entity treating stdout/stderr as "logs", but people are containerizing applications that have existed long before Docker and may additional have different opinions as to what a log should be. I know that AWS customers using CloudWatch Logs frequently configure the CloudWatch Logs Agent (separate from Docker) to combine things like a Java stack trace into a single log entry and support for combining lines was added to Docker for the same reason. |
@cpuguy83 Friendly ping. Since your closed the PR, can we propagate this setting to the |
@crassirostris No I do no think json-file should have this option. The json format looks like it's missing the |
By introducing a hacky pipeline. I understand that you don't want to maintain the code that's technically possible to replace with an external hack, but there's no clear migration path: e.g. no ready-to-use fluentd/logstash/etc. plugins. |
Sorry, how is supporting a partial message in the storage format hacky?
…On Mon, Oct 16, 2017 at 10:40 AM, Mik Vyatskov ***@***.***> wrote:
@cpuguy83 <https://github.com/cpuguy83>
we can achieve what's needed here without such a hacky setting
By introducing a hacky pipeline. I understand that you don't want to
maintain the code that's technically possible to replace with an external
hack, but there's no clear migration path: e.g. no ready-to-use
fluentd/logstash/etc. plugins.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34855 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAwxZmo3IqWupsAAny3ESMe9uTNeqbliks5ss2rBgaJpZM4PX98W>
.
--
- Brian Goff
|
That's not hacky, that's a valuable change we really anticipate :) Do you have someone working on this, is there an issue? I can help with that What's hacky is introducing reconstruction and second splitting on agent when it's known that all log entries cannot be larger than X, where X != 16KB |
Fortunately for this case (but generally unfortunately, IMO), to the client
logs are just a stream of bytes, the actual formatting is done in the
daemon.
…On Mon, Oct 16, 2017 at 11:07 AM, Mik Vyatskov ***@***.***> wrote:
That's not hacky, that's a valuable change we really anticipate :) Do you
have someone working on this, is there an issue? I can help with that
What's hacky is introducing reconstruction and second splitting on agent
when it's known that all log entries cannot be larger than X, where X !=
16KB
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34855 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAwxZilMGpyRtj0e1PItevfHjpoleE2wks5ss3FEgaJpZM4PX98W>
.
--
- Brian Goff
|
For a living being who is typing In many Kubernetes setups currently logging agent is reading directly from the json log file, that's why this seemingly non-important implementation detail is actually important. |
I see your point... this is very broken behavior for Kubernetes to rely on
anything within docker's data root.
Will have to think about this.
…On Mon, Oct 16, 2017 at 11:39 AM, Mik Vyatskov ***@***.***> wrote:
For a living being who is typing docker logs -- yes, but for a logging
agent (e.g. fluentd) reading all logs through docker has drawbacks:
performance issues, impossible to memorize the last read position, need to
distinguish between stdout and stderr and so on. Using fluentd log driver
is also not an option -- in Kubernetes setup it requires to change of lot
in logging mechanics (e.g. how metadata is passed around).
In many Kubernetes setups currently logging agent is reading directly from
the json log file, that's why this seemingly non-important implementation
detail is actually important.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34855 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAwxZqw8SNZ7pPhG5N5KTrnX7OBGs1dKks5ss3jIgaJpZM4PX98W>
.
--
- Brian Goff
|
I can second this.. I ran into a nasty issue today while collecting docker container logs using the splunk logging driver. The application outputs log messages as json payload to stdout. Everything is fine if the length of a message stays below limit. Long messages are split and the data that ends up in Splunk is garbage as the parts are not valid json and the initial log message cannot be re-constructed. |
It's a part of the Docker ContainerInspect API. Additionally, as mentioned earlier, considering it broken doesn't have a clear migration path for existing applications which logs are collected by a logging agent like fluentd. |
@crassirostris This only provides the location of the current log file, what if there is rotation involved? What about ongoing writes to the log file?
I'll have to look at how cri-containerd is managing container logs for kubernetes. There are logging plugins available, and they are simpler to write than getting a change into Docker which is much harder/longer process to handle special cases. |
Looks like it's not handling partials either: https://github.com/kubernetes-incubator/cri-containerd/blob/master/pkg/server/io/logger.go#L73 |
@crassirostris hah, I see you have an open issue on the cri-containerd repo :) |
It's handled by fluentd out of the box
That might be an option together with #34888. Could you please point to more docs about this approach?
Yup, it's not used widely in production yet, so it's not a high priority, though it will certainly be at some point |
Well, a log scraper can't handle file rotation if it is not running. =) Isn't it true that currently, when a log file is rotated, unless one configures the entity reading from the container's stdout/stderr and writing to a log file to keep a number of additional versions of that log file around, the previous log file is deleted (unlinked) and a new log file created? And that at the time it is unlinked, if a log scraper is not running, it won't have an open handle on that file, and so the data will then be lost? It seems the hand-off to log scrapers is problematic in some cases if not configured properly. |
Log exporting is hard, there are not right answers, all (generic) solutions suck. We find log scrapers generally more reliable and performing better than other solutions
That depends on a setup. By default, log file is not rotated, right? So in some setups known to me, logrotate is configured to truncate the log file instead of creating a new one. And where log rotation is implemented using move-and-create semantics, usually several rotations are kept, one at least.
Yes, and this is not the only way to lose logs. For example, if several rotations were completed when the log scraper was down, some logs would also be irreversibly lost
Yes, true |
Experiencing the same w/ filebeats consuming from docker... |
Same issue here with filebeat and no workaround |
@cpuguy83 we had finally found a decent and performant solution to siphoning logs from Docker, while maintaining the possibility to use And then you folks BREAK it, and philosophize about how extending the new arbitrary limit of 16k is somehow hacky. |
@towolf You can always write your own logging plugin to do what you need it to do. The reason we have the limit is to prevent containers from being able to trigger an OOM on the host. The daemon behavior hasn't changed in this regard for ~2 years. |
@cpuguy83 we upgraded from docker engine 1.11 to 17.03. Logging worked splendidly before and now developers tell me that their stack traces are not working as before. And it should be up to me how I configure this for my own hardware, no? Now we have a broken version and no fix in sight. BTW, the systemd developers made a similar argument, and their super reasonable limit was 2048 characters. They eventually made it configurable and default to 48k But we had to deal with 2k for two years! Now we have the same problem with Docker. |
@crassirostris We've worked around this issue by utilizing this fluentd plugin: https://github.com/fluent-plugins-nursery/fluent-plugin-concat @cpuguy83 We may be good now! This commit (merged into master very recently) looks like it may address some of the concerns here by adding some metadata for partial log lines: 0b4b0a7 |
Hi @JohnStarich, I have a log with the (size: 59k) message that is being truncated to four separated log. How did you configured to append the splitted log? Once in awhile the oversized log would get generate from the container. I'm not sure on configuring the partial message or newline detection. How does it detect when the log is being split as partial message?
Here is my conf file.
Thanks in Advance. |
@dcvtruong use fluentd log driver |
@dcvtruong They have some pretty good examples on their README. You’ll want to place the concat plugin immediately after the tail plugin. (Also probably need to make sure your output plugin supports sending lines that long)
|
@alexei38 The splitted log does not contain newline and looks like combine_partial is available only for filebeat. I'm using fluentd. Thanks for responding. |
https://github.com/moby/moby/blob/master/daemon/logger/jsonfilelog/jsonfilelog.go#L135-L137 https://github.com/elastic/beats/blob/master/libbeat/reader/readjson/docker_json.go#L172 The right solution would be to add fields to json-file as in fluentd and journald |
take a look here, i think it covers quite good what you're looking for: |
While working with te logging operator, i just bumped into this: |
I still don't get it, in our case (Docker version 18.09.9, build 039a7df9ba) docker produces new line character at the end of every splitted line! So no of the proposed solutions works for us more than four years later, it is soo insane. |
Really, no container log handling facility should be interpreting the byte stream. That means what the container writes in the stream directs a behavior of the collector. What operating system kernel would ever implement an We really need to have the log stream collected as a raw set of bytes, and then place the newline interpretation when someone reads the stream. Then we can slice and dice to our hearts content. See cri-o/cri-o#1605. |
Follow-up of #34620
As mentioned in the original bug, logging change introduced in 1.13 is not compatible with the existing tools for collecting logs. There are several scenarios when this change severely breaks logs collection. Example: if a message is a JSON object longer than 16KB and the logs collector is not prepared, parsing will be broken.
This feature is a breaking change that should have been made configurable in the first place. Right now 16KB is constant in the code picked for better artificial benchmark results. Moreover, this breaking change was described in the release notes as
Improve logging of long log lines
which tells nothing about the change.This functionality has to be configurable: it should be possible to change the limit and/or turn the feature off completely. Please also backport this to all affected releases.
/cc @piosz @fgrzadkowski @thockin @igorpeshansky
The text was updated successfully, but these errors were encountered: