Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeats truncates log messages after 16k #6605

Closed
bitsofinfo opened this issue Mar 20, 2018 · 23 comments
Closed

filebeats truncates log messages after 16k #6605

bitsofinfo opened this issue Mar 20, 2018 · 23 comments

Comments

@bitsofinfo
Copy link

# filebeat 6.2.2
filebeat.prospectors:
-
  paths: [ "/var/lib/docker/containers/*/*-json.log" ]
  harvester_buffer_size: 65536
  json.message_key: log
  json.keys_under_root: true
  json.add_error_key: true
  fields_under_root: true
  processors:
    - add_docker_metadata: ~

harvester_buffer_size: 65536

output.file:
  path: "/tmp/filebeat"
  filename: filebeat
  • The docker container outputs a single line in valid json (about 18k)
  • The value of the "log" message key is also a single line in valid json
  • The message seems to be cut off at about 16k or a bit above (depends if you count the backslashes for escaping)
  • A second message gets created with the remaining part of the message including full decoration (docker meta data, additional fields etc)
  • Looks like filebeat splits the message into 2 separate ones
  • harvester_buffer_size has no effect
  • removing the json.* options has no effect

How to reproduce:

use my config, start a container and run the following:

#!/bin/bash
key="somerandomkey_"
value="somerandomvalue_"
echo -n '{'
for i in $(seq 420); do
  echo -n "\"${key}${i}\":\"${value}${i}\","
done
echo '"lastkey":"end"}'

This produces a valid json output. For me it cuts off at "somerand" and a new message continues with "omkey_396".

We also tried the following and it does not fix the issue: (as well as with and without the harvester_buffer_size option.)

filebeat.prospectors:
- type: docker
  containers.ids:
    - "*"
  processors:
    - add_docker_metadata: ~

@paltryeffort https://discuss.elastic.co/t/filebeat-splits-message-after-16k/123718/4

@bmudda
Copy link

bmudda commented Mar 21, 2018

+1

@nfedyk
Copy link

nfedyk commented Mar 21, 2018

we encounter this with some messages as well, can it be fixed?

@ruflin
Copy link
Member

ruflin commented Mar 25, 2018

I initially tried to reproduce this without docker to see if it is something in Filebeat itself or more likely related to Docker. As I didn't manage to reproduce it with local files I started to investigate a bit deeper into Docker. It seems this breaking change was introduced with 1.13 and is kind of a known issue: moby/moby#34855 With moby/moby#34888 it was made possible for log drivers to make the buffer size configurable, but if no buffer size is set it will fall back to the 16k we see above.

As Filebeat reads the files from disk I assume the lines are already split up in the log file which docker created. I need to further investigate if the above is correct and if there is a config option in the json-file driver to change the buffer size.

@ruflin
Copy link
Member

ruflin commented Mar 26, 2018

I did a test with the json-file logger and indeed in the file itself the log message it split up across two lines which means Filebeat cannot do much else then read the file line by line. I looked that BufSize interface and the only logger which makes use of it is the cloudwatchlogs logger: https://github.com/moby/moby/search?utf8=%E2%9C%93&q=BufSize&type= Based on this I assume this option does not exist for the json-file logger. There seem to be some other logging options like max-size but they only seem to influence the overall rollover size of the file. https://docs.docker.com/config/containers/logging/json-file/

@exekias
Copy link
Contributor

exekias commented Mar 26, 2018

I've noticed docker format doesn't add a \r\n to messages when it splits them, so we could potentially detect that and do the join. Example:

{"log":"\"1\":\"1\",\"2\":\"2\",\"3\":\"3\", ... ,\"1328\":\"1328\",\"1329\"","stream":"stdout","time":"2018-03-26T10:33:19.796343111Z"}
{"log":":\"1329\",\"1330\":\"1330\" ... ,\"1419\":\"1419\",\"1420\":\"1420\",/ # \r\n","stream":"stdout","time":"2018-03-26T10:33:53.840245637Z"}

@f0
Copy link
Contributor

f0 commented Mar 26, 2018

@exekias
if i understand this correct, docker adds a PARTIAL flag to the messages if they are truncated.
Here ist how the docker journald plungin handel this:
https://github.com/moby/moby/blob/80edccda708c6771824c245b4d634c2a9ce29795/daemon/logger/journald/journald.go#L110

@ruflin
Copy link
Member

ruflin commented Mar 27, 2018

@f0 That is interesting. But as this is internal it is up to the specific logger to make use of it. Do you know if the json-file logger has something similar?

@f0
Copy link
Contributor

f0 commented Mar 27, 2018

@ruflin i chzecked with 17.09.1-ce , seems that the json-logger does not has this flag.

I can confirm that the log ends with \r\n

@bitsofinfo
Copy link
Author

So does a ticket have to be opened w/ moby for this?

@wallies
Copy link

wallies commented Mar 29, 2018

@ruflin @f0 what seems more interesting is this work to add partial message metadata. Might be better to base filebeat work off this. moby/moby#35831

@ruflin
Copy link
Member

ruflin commented Mar 30, 2018

@wallies Thanks for the link. I only skimmed through the PR but it seems in case of partial data the json logger would skip the newline and put the second object on the same line. It would mean we have 2 (or more) json objects on the same line. Not sure yet how we would deal with it on our end but I'm sure we would find a way.

From a Filebeat perspective I would prefer if the json logger would have a configurable buffer size so we would not have to glue things together. I assume a PR could be done against the json file logger to have support for it but I don't know if that would be an option from the moby project perspective. Anyone knows more here?

@ruflin
Copy link
Member

ruflin commented Mar 30, 2018

@bitsofinfo It is probably worth to open an issue related to the Buffer support in json file logger to get a feeling if a PR in this direction would have a change in the moby project.

@bitsofinfo
Copy link
Author

bitsofinfo commented Mar 30, 2018

@ruflin Would it have more weight if someone from elastic opened such an issue? I'm not sure which moby project or area of code to reference and point them to. Also concerned that any moby side solution will take months if not more than a year to actually get implemented :/ Nothing can be done in beats to handle this?

@ruflin
Copy link
Member

ruflin commented Apr 3, 2018

As long as there is no flag or indication that the same event is spread to multiple lines there is not much beats can do.

I think the quickest to get a change in is to open a PR and get it merged as I doubt someone would open the PR based on the Github issue if there is no real urgency for it. If the PR is open by a Beats team member of an external contributor I would assume does not make a difference. It would probably be best if someone would open it that has done a few changes in the past to moby / docker.

Right now filebeat fully depends on the json file logger. There are other outputs like journald which in the future are potentially supported by filebeat, at least we are investigating it. This is also a long shot but perhaps some of these outputs already have these features in.

@bitsofinfo The biggest pushback from the moby contributors I expect is why someone would have 16kb in a log line (based on the comments in the thread). The good news is that some PR's to make this possible were already accepted just so this might be an obsolete topic. But the changes never made it to the default logger so far.

@bitsofinfo
Copy link
Author

added an issue at moby/moby#36777 please chime in there.

@towolf
Copy link

towolf commented Apr 27, 2018

Is there any way to fix this in Filebeat without waiting for movement in moby/moby?

Even if they add mechanisms that Filebeat can hook into, to puzzle partial messages back together, it's not a given that a new Docker release can be rolled out everywhere.

Kubernetes usually states a supported Docker engine that is a few releases behind, and not everybody can freely choose their Docker version, and so on and so forth.

@exekias
Copy link
Contributor

exekias commented Apr 27, 2018

I think we can fix this by checking the string ending, I'll do some tests

@bitsofinfo
Copy link
Author

+1 the changes will likely never happen in moby, or at least not within a reasonable timeframe

@exekias
Copy link
Contributor

exekias commented Apr 27, 2018

I've open #6967 to handle partial messages when using the docker prospector, it seems to work for me, additional confirmation would be nice.

@towolf
Copy link

towolf commented Apr 27, 2018

@exekias I'd love to help validating this. Can you throw a snapshot into a registry somewhere?

@exekias
Copy link
Contributor

exekias commented Apr 27, 2018

I've pushed a build of my branch to exekias/filebeat:6967. Of course: it is not intended for production, just to test this, take into account it's based on master.

Settings should look like this (take partial.enabled into account):

prospectors:
   - type: docker
     containers:
       partial: true
       ids:
         - '*'

@towolf
Copy link

towolf commented Apr 27, 2018

@exekias I've tested your build and it works like magic. I hit the limit in ES, where an indexed field cannot contain more than 32k characters, but when I split the message into multiple fields, I could reach 300k with no problems.

Thanks a lot for the quick action.

exekias pushed a commit to exekias/beats that referenced this issue May 8, 2018
Docker `json-file` driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
`combine_partials` flag.

Fixes elastic#6605
@kvch kvch closed this as completed in #6967 May 8, 2018
kvch pushed a commit that referenced this issue May 8, 2018
Docker `json-file` driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
`combine_partials` flag.

Fixes #6605
stevea78 pushed a commit to stevea78/beats that referenced this issue May 20, 2018
Docker `json-file` driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
`combine_partials` flag.

Fixes elastic#6605
stevea78 pushed a commit to stevea78/beats that referenced this issue May 20, 2018
Docker `json-file` driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
`combine_partials` flag.

Fixes elastic#6605
@bizmate
Copy link

bizmate commented Apr 18, 2023

hi, apologies for hijacking the issue but based on the above I am not quite sure what the solution was. I am seeing this issue https://discuss.elastic.co/t/filebeat-logstash-output-split-messages-into-multiple-with-approximately-a-maximum-field-size-of-8191-charactes/330174/2

I use hint based filebeat to collect stdout logs from docker to logstash to ES. At the moment it is splitting the messages and I thought this was a problem with filebeat. However, it looks like a solution was found as part of this to send the messages together just by changing the configuration in filebeat? Can someone confirm if this is indeed in place on what version and how to tell filebeat to collate messages that are supposed to be together ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants