filebeats truncates log messages after 16k #6605

bitsofinfo · 2018-03-20T17:21:25Z

# filebeat 6.2.2
filebeat.prospectors:
-
  paths: [ "/var/lib/docker/containers/*/*-json.log" ]
  harvester_buffer_size: 65536
  json.message_key: log
  json.keys_under_root: true
  json.add_error_key: true
  fields_under_root: true
  processors:
    - add_docker_metadata: ~

harvester_buffer_size: 65536

output.file:
  path: "/tmp/filebeat"
  filename: filebeat

The docker container outputs a single line in valid json (about 18k)
The value of the "log" message key is also a single line in valid json
The message seems to be cut off at about 16k or a bit above (depends if you count the backslashes for escaping)
A second message gets created with the remaining part of the message including full decoration (docker meta data, additional fields etc)
Looks like filebeat splits the message into 2 separate ones
harvester_buffer_size has no effect
removing the json.* options has no effect

How to reproduce:

use my config, start a container and run the following:

#!/bin/bash
key="somerandomkey_"
value="somerandomvalue_"
echo -n '{'
for i in $(seq 420); do
  echo -n "\"${key}${i}\":\"${value}${i}\","
done
echo '"lastkey":"end"}'

This produces a valid json output. For me it cuts off at "somerand" and a new message continues with "omkey_396".

We also tried the following and it does not fix the issue: (as well as with and without the harvester_buffer_size option.)

filebeat.prospectors:
- type: docker
  containers.ids:
    - "*"
  processors:
    - add_docker_metadata: ~

@paltryeffort https://discuss.elastic.co/t/filebeat-splits-message-after-16k/123718/4

The text was updated successfully, but these errors were encountered:

bmudda · 2018-03-21T15:46:29Z

+1

nfedyk · 2018-03-21T15:49:56Z

we encounter this with some messages as well, can it be fixed?

ruflin · 2018-03-25T18:38:17Z

I initially tried to reproduce this without docker to see if it is something in Filebeat itself or more likely related to Docker. As I didn't manage to reproduce it with local files I started to investigate a bit deeper into Docker. It seems this breaking change was introduced with 1.13 and is kind of a known issue: moby/moby#34855 With moby/moby#34888 it was made possible for log drivers to make the buffer size configurable, but if no buffer size is set it will fall back to the 16k we see above.

As Filebeat reads the files from disk I assume the lines are already split up in the log file which docker created. I need to further investigate if the above is correct and if there is a config option in the json-file driver to change the buffer size.

ruflin · 2018-03-26T07:58:42Z

I did a test with the json-file logger and indeed in the file itself the log message it split up across two lines which means Filebeat cannot do much else then read the file line by line. I looked that BufSize interface and the only logger which makes use of it is the cloudwatchlogs logger: https://github.com/moby/moby/search?utf8=%E2%9C%93&q=BufSize&type= Based on this I assume this option does not exist for the json-file logger. There seem to be some other logging options like max-size but they only seem to influence the overall rollover size of the file. https://docs.docker.com/config/containers/logging/json-file/

exekias · 2018-03-26T10:35:33Z

I've noticed docker format doesn't add a \r\n to messages when it splits them, so we could potentially detect that and do the join. Example:

{"log":"\"1\":\"1\",\"2\":\"2\",\"3\":\"3\", ... ,\"1328\":\"1328\",\"1329\"","stream":"stdout","time":"2018-03-26T10:33:19.796343111Z"}
{"log":":\"1329\",\"1330\":\"1330\" ... ,\"1419\":\"1419\",\"1420\":\"1420\",/ # \r\n","stream":"stdout","time":"2018-03-26T10:33:53.840245637Z"}

f0 · 2018-03-26T18:43:07Z

@exekias
if i understand this correct, docker adds a PARTIAL flag to the messages if they are truncated.
Here ist how the docker journald plungin handel this:
https://github.com/moby/moby/blob/80edccda708c6771824c245b4d634c2a9ce29795/daemon/logger/journald/journald.go#L110

ruflin · 2018-03-27T11:36:40Z

@f0 That is interesting. But as this is internal it is up to the specific logger to make use of it. Do you know if the json-file logger has something similar?

f0 · 2018-03-27T12:01:50Z

@ruflin i chzecked with 17.09.1-ce , seems that the json-logger does not has this flag.

I can confirm that the log ends with \r\n

bitsofinfo · 2018-03-28T18:27:03Z

So does a ticket have to be opened w/ moby for this?

wallies · 2018-03-29T00:21:03Z

@ruflin @f0 what seems more interesting is this work to add partial message metadata. Might be better to base filebeat work off this. moby/moby#35831

ruflin · 2018-03-30T08:36:54Z

@wallies Thanks for the link. I only skimmed through the PR but it seems in case of partial data the json logger would skip the newline and put the second object on the same line. It would mean we have 2 (or more) json objects on the same line. Not sure yet how we would deal with it on our end but I'm sure we would find a way.

From a Filebeat perspective I would prefer if the json logger would have a configurable buffer size so we would not have to glue things together. I assume a PR could be done against the json file logger to have support for it but I don't know if that would be an option from the moby project perspective. Anyone knows more here?

ruflin · 2018-03-30T08:39:07Z

@bitsofinfo It is probably worth to open an issue related to the Buffer support in json file logger to get a feeling if a PR in this direction would have a change in the moby project.

bitsofinfo · 2018-03-30T21:26:02Z

@ruflin Would it have more weight if someone from elastic opened such an issue? I'm not sure which moby project or area of code to reference and point them to. Also concerned that any moby side solution will take months if not more than a year to actually get implemented :/ Nothing can be done in beats to handle this?

ruflin · 2018-04-03T11:48:37Z

As long as there is no flag or indication that the same event is spread to multiple lines there is not much beats can do.

I think the quickest to get a change in is to open a PR and get it merged as I doubt someone would open the PR based on the Github issue if there is no real urgency for it. If the PR is open by a Beats team member of an external contributor I would assume does not make a difference. It would probably be best if someone would open it that has done a few changes in the past to moby / docker.

Right now filebeat fully depends on the json file logger. There are other outputs like journald which in the future are potentially supported by filebeat, at least we are investigating it. This is also a long shot but perhaps some of these outputs already have these features in.

@bitsofinfo The biggest pushback from the moby contributors I expect is why someone would have 16kb in a log line (based on the comments in the thread). The good news is that some PR's to make this possible were already accepted just so this might be an obsolete topic. But the changes never made it to the default logger so far.

bitsofinfo · 2018-04-03T19:38:38Z

added an issue at moby/moby#36777 please chime in there.

towolf · 2018-04-27T12:25:37Z

Is there any way to fix this in Filebeat without waiting for movement in moby/moby?

Even if they add mechanisms that Filebeat can hook into, to puzzle partial messages back together, it's not a given that a new Docker release can be rolled out everywhere.

Kubernetes usually states a supported Docker engine that is a few releases behind, and not everybody can freely choose their Docker version, and so on and so forth.

exekias · 2018-04-27T12:37:38Z

I think we can fix this by checking the string ending, I'll do some tests

bitsofinfo · 2018-04-27T13:58:43Z

+1 the changes will likely never happen in moby, or at least not within a reasonable timeframe

exekias · 2018-04-27T14:10:53Z

I've open #6967 to handle partial messages when using the docker prospector, it seems to work for me, additional confirmation would be nice.

towolf · 2018-04-27T14:17:42Z

@exekias I'd love to help validating this. Can you throw a snapshot into a registry somewhere?

exekias · 2018-04-27T15:59:56Z

I've pushed a build of my branch to exekias/filebeat:6967. Of course: it is not intended for production, just to test this, take into account it's based on master.

Settings should look like this (take partial.enabled into account):

prospectors:
   - type: docker
     containers:
       partial: true
       ids:
         - '*'

towolf · 2018-04-27T19:07:15Z

@exekias I've tested your build and it works like magic. I hit the limit in ES, where an indexed field cannot contain more than 32k characters, but when I split the message into multiple fields, I could reach 300k with no problems.

Thanks a lot for the quick action.

Docker `json-file` driver splits lines longer than 16k bytes, this change adapts the code to detect that situation and join them again to output a single event. This behavior is enabled by default, it can be disabled by using the new `combine_partials` flag. Fixes elastic#6605

Docker `json-file` driver splits lines longer than 16k bytes, this change adapts the code to detect that situation and join them again to output a single event. This behavior is enabled by default, it can be disabled by using the new `combine_partials` flag. Fixes #6605

Docker `json-file` driver splits lines longer than 16k bytes, this change adapts the code to detect that situation and join them again to output a single event. This behavior is enabled by default, it can be disabled by using the new `combine_partials` flag. Fixes elastic#6605

bizmate · 2023-04-18T08:31:56Z

hi, apologies for hijacking the issue but based on the above I am not quite sure what the solution was. I am seeing this issue https://discuss.elastic.co/t/filebeat-logstash-output-split-messages-into-multiple-with-approximately-a-maximum-field-size-of-8191-charactes/330174/2

I use hint based filebeat to collect stdout logs from docker to logstash to ES. At the moment it is splitting the messages and I thought this was a problem with filebeat. However, it looks like a solution was found as part of this to send the messages together just by changing the configuration in filebeat? Can someone confirm if this is indeed in place on what version and how to tell filebeat to collate messages that are supposed to be together ?

kvch added enhancement Filebeat Filebeat labels Mar 22, 2018

This was referenced Apr 3, 2018

logger: copy to log driver's bufsize, fixes #34887 moby/moby#34888

Merged

Improve partial message support in json-file logging driver for > 16k log entries moby/moby#36777

Open

exekias mentioned this issue Apr 27, 2018

Join Docker log lines when they are split because of size #6967

Merged

kvch closed this as completed in #6967 May 8, 2018

bizmate mentioned this issue Apr 21, 2023

Docker logs are still split up and not rejoined by filebeat - fix or document how to aggregate them #35163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filebeats truncates log messages after 16k #6605

filebeats truncates log messages after 16k #6605

bitsofinfo commented Mar 20, 2018

bmudda commented Mar 21, 2018

nfedyk commented Mar 21, 2018

ruflin commented Mar 25, 2018

ruflin commented Mar 26, 2018

exekias commented Mar 26, 2018 •

edited

Loading

f0 commented Mar 26, 2018 •

edited

Loading

ruflin commented Mar 27, 2018

f0 commented Mar 27, 2018 •

edited

Loading

bitsofinfo commented Mar 28, 2018

wallies commented Mar 29, 2018

ruflin commented Mar 30, 2018

ruflin commented Mar 30, 2018 •

edited

Loading

bitsofinfo commented Mar 30, 2018 •

edited

Loading

ruflin commented Apr 3, 2018

bitsofinfo commented Apr 3, 2018

towolf commented Apr 27, 2018

exekias commented Apr 27, 2018

bitsofinfo commented Apr 27, 2018

exekias commented Apr 27, 2018 •

edited

Loading

towolf commented Apr 27, 2018

exekias commented Apr 27, 2018 •

edited

Loading

towolf commented Apr 27, 2018 •

edited

Loading

bizmate commented Apr 18, 2023

filebeats truncates log messages after 16k #6605

filebeats truncates log messages after 16k #6605

Comments

bitsofinfo commented Mar 20, 2018

bmudda commented Mar 21, 2018

nfedyk commented Mar 21, 2018

ruflin commented Mar 25, 2018

ruflin commented Mar 26, 2018

exekias commented Mar 26, 2018 • edited Loading

f0 commented Mar 26, 2018 • edited Loading

ruflin commented Mar 27, 2018

f0 commented Mar 27, 2018 • edited Loading

bitsofinfo commented Mar 28, 2018

wallies commented Mar 29, 2018

ruflin commented Mar 30, 2018

ruflin commented Mar 30, 2018 • edited Loading

bitsofinfo commented Mar 30, 2018 • edited Loading

ruflin commented Apr 3, 2018

bitsofinfo commented Apr 3, 2018

towolf commented Apr 27, 2018

exekias commented Apr 27, 2018

bitsofinfo commented Apr 27, 2018

exekias commented Apr 27, 2018 • edited Loading

towolf commented Apr 27, 2018

exekias commented Apr 27, 2018 • edited Loading

towolf commented Apr 27, 2018 • edited Loading

bizmate commented Apr 18, 2023

exekias commented Mar 26, 2018 •

edited

Loading

f0 commented Mar 26, 2018 •

edited

Loading

f0 commented Mar 27, 2018 •

edited

Loading

ruflin commented Mar 30, 2018 •

edited

Loading

bitsofinfo commented Mar 30, 2018 •

edited

Loading

exekias commented Apr 27, 2018 •

edited

Loading

exekias commented Apr 27, 2018 •

edited

Loading

towolf commented Apr 27, 2018 •

edited

Loading