Reconstruct split docker log lines in Promtail #2281

slim-bean · 2020-07-01T14:13:47Z

Starting in moby/moby#22982 docker now splits log lines longer than ~16kb

This can totally break JSON processing of these lines in Promtail.

There are a few moving pieces here to support this, the big one is adding support for multi-line logs in promtail which we have been avoiding. Mostly because there were other valid solutions like show context in explore.

However, not being able to parse JSON in Promtail because of this split is certainly something we want to fix so I think we are going to finally have to go down the multi-line road.

It does appear docker has some specific metadata to help with this however: moby/moby@0b4b0a7 which could help special case reconstructing split docker lines.

The text was updated successfully, but these errors were encountered:

slim-bean · 2020-07-01T14:44:03Z

Doing some checking on this, unfortunately that metadata isn't present in kubernetes logs, it looks like the most common alternative is looking for \n character at the end of the log line, I guess if the line is split the partial lines don't have a newline character:

moby/moby#34855 has some more details and references to different solutions

123BLiN · 2020-10-29T11:23:31Z

In our case Docker version 18.09.9, build 039a7df9ba
We have docker logs like:

{"log":"{\"@timestamp\":\"2020-10-28T15:53:14.460060-04:00\",first part of the huge one line JSON here\n","stream":"stderr","time":"2020-10-28T19:53:13.080198941Z"}
{"log":"second part of huge one line JSON here\n","stream":"stderr","time":"2020-10-28T19:53:13.080332682Z"}
{"log":"last part of huge one line JSON here\"}\n","stream":"stderr","time":"2020-10-28T19:53:13.080332682Z"}

So we have new line at the end of every splitted log line.
Just FYI that this solution just may not work for all. You should check your docker json logs first.

cf-sewe · 2020-10-29T11:41:02Z

So we have new line at the end of every splitted log line.
Just FYI that this solution just may not work for all. You should check your docker json logs first.

You got me scared for a second. I checked the log format of our AWS EKS (1.18) nodes and can confirm the behaviour regarding new lines is as expected. When split, the split entries have no newline.

{"log":"{\"@timestamp\":\"2020-10-29T11:12:03.015Z\",\"@version\":\"1\",\"message\":\"==== INCREMENTAL DUMP ====\\n\\nauditSlidingTimeFrameTimer ... 4)\\njava.util.concurr","stream":"stdout","time":"2020-10-29T11:12:03.01653623Z"}

I still dont like all of this, because relying on newlines seems unreliable (e.g. what happens if there is an intentional newline just at 64k limit?).

123BLiN · 2020-10-30T17:55:06Z

Sorry, my bad - in our case new lines were added by php monolog app at about 10Kb.
I like containerd solution and format for logs - they have P/F field that tells collector that line is partial or full.

nivekuil · 2020-12-20T04:43:44Z

You might be interested in the work Vector did to address this problem: vectordotdev/vector#1488 I use it to ingest docker logs into Loki and it seems to work fine.

slim-bean · 2021-01-21T14:20:18Z

The multi-line support recently merged into promtail could possibly solve this however it's not been tested.

triplaaj · 2021-01-27T10:19:24Z

Can confirm that the multiline can be used to restructure JSON. After multiline the line breaks would need to be removed to keep the JSON valid.

- multiline:
    firstline: '(^\{(.*))|([^\}\s]\s$)'
- replace:
    expression: '(\n)'
    replace: ''

slim-bean · 2021-01-27T11:10:59Z

Awesome @triplaaj! Thanks for this update!!!

stale · 2021-03-20T00:28:49Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

pgassmann · 2021-03-22T12:49:35Z

I have a related Question. Is the loki docker-logging-driver using promtail to read the json.log and then already applying pipeline stages to extract the message from json lines?
We have java applications and want to use multiline stage to combine the stacktraces etc. to one message again. The application is configured to log with the zero-width-space character as recommended in the multiline docs.

How can multiline logs be combined with the docker-logging driver?

Example:

Source json.log file in the loki plugin root: /var/lib/docker/plugins/f7623531faa0d4bc052a32f5878645955bd9c81296b88992d29dde8f6b7c2836/rootfs/var/log/docker/a7491ff6dd01dc84f9d7a14a6c64ab1d470498b196a71826d96ef0049006fdb5/json.log

{"log":"<U+200B>[22-03-2021 10:07:32.467] [main] \u001b[34mINFO \u001b[0;39m \u001b[36mo.e.j.s.AbstractConnector\u001b[0;39m - Started ServerConnector@642505c7{HTTP/1.1, (http/1.1)}{0.0.0.0:8086}\n","stream":"stdout","time":"2021-03-22T10:07:32.468129427Z"}
{"log":"<U+200B>[22-03-2021 10:07:32.468] [main] \u001b[34mINFO \u001b[0;39m \u001b[36mo.e.j.s.Server\u001b[0;39m - Started @2626ms\n","stream":"stdout","time":"2021-03-22T10:07:32.468132422Z"}
{"log":"WARNING: An illegal reflective access operation has occurred\n","stream":"stderr","time":"2021-03-22T10:07:32.96668075Z"}
{"log":"WARNING: Illegal reflective access by com.thoughtworks.xstream.core.util.Fields (file:/app/libs/xstream-1.4.11.1.jar) to field java.util.TreeMap.comparator\n","stream":"stderr","time":"2021-03-22T10:07:32.966686732Z"}
{"log":"WARNING: Please consider reporting this to the maintainers of com.thoughtworks.xstream.core.util.Fields\n","stream":"stderr","time":"2021-03-22T10:07:32.966688455Z"}
{"log":"WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\n","stream":"stderr","time":"2021-03-22T10:07:32.966689677Z"}
{"log":"WARNING: All illegal access operations will be denied in a future release\n","stream":"stderr","time":"2021-03-22T10:07:32.966690118Z"}

The config in docker-compose:

    logging:
      driver: loki
      options:
        loki-batch-size: '102400'
        loki-external-labels: container_name={{.Name}},category=dockerlogs
        loki-pipeline-stages: |
          - multiline:
              # Identify zero-width space as first line of a multiline block. Note the string should be in single quotes.
              firstline: '^\x{200B}\['
              max_wait_time: 3s
        loki-retries: '10'
        loki-url: https://username:pass@loki.example.com/loki/api/v1/push
        max-buffer-size: 10m
        max-file: '50'
        max-size: 100m
        mode: non-blocking

@triplaaj The replace-stage in your comment would also remove \ns that are part of the message. Like in my example.
@slim-bean can you prevent this from becoming stale&closed

patrickhuy · 2021-04-09T15:46:42Z

@triplaaj @slim-bean I tried a combination of multiline and the json stage and that didn't seem to work. The multiline did work but promtail was apparently not able to json parse/extract labels from the multiline json. Is there a way to make this work?
I had something like this:

 - multiline:
    firstline: '(^\{(.*))|([^\}\s]\s$)'
- replace:
    expression: '(\n)'
    replace: ''
- json: 
    expressions:
      msg: msg
      level: level
- labels:
    level:

but the label is not applied for multiline entries (apparently?)
Is there something that could be done to make this work?

stale · 2021-06-02T17:05:08Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

* Allow usage of host lookups for memcache discovery Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Move docs to arguments which is a better place? Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Address feedback Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

MurzNN · 2021-09-29T06:19:00Z

For easier debugging I've created a docker-compose repo that reproduces this problem: https://github.com/MurzNN/loki-long-lines

MurzNN · 2021-09-29T06:32:37Z

Workaround from #2281 (comment) must work well for most cases, but not always: if the long json line is split by Docker exactly on { symbol, it will not merge JSON into one line.

MurzNN · 2021-09-29T06:34:19Z

Is there any issue exists on Docker side to allow extending max length of it's log line?

icebob · 2022-01-13T07:44:21Z

I've the same issue when I want to parse the Elasticsearch log files. ES prints json logs and in case of errors, it prints multiline jsons. I can't find out how to create promtail pipelines.

My current pipeline part:

    - match:
        selector: '{container_name="elasticsearch_es_1"}'
        stages:
          - multiline:
              firstline: '(^\{(.*))|([^\}\s]\s$)'
              max_wait_time: 1s
              source: eslog     
          - replace:
              expression: '(\n)'
              replace: ''        
              source: eslog     
          - json:
              expressions:
                output: message
                time: timestamp
                level: level
                node_name: node.name
                #component: component
              source: eslog

hgranillo · 2022-01-19T09:47:17Z

Hi all, I also been facing this same issue. The multiline parser suggested by triplaaj seems to work in the majority of cases.

Does this happens when using containerd instead of docker and the criparser? Might be a good excuse to just switch container runtimes

patrickhuy · 2022-01-21T08:31:25Z

Does this happens when using containerd instead of docker and the criparser? Might be a good excuse to just switch container runtimes

What do you mean by "this"? containerd also splits logs. Unfortunatly I couldn't find a lot of references online how that is happening besides this rather old issue: containerd/cri#283

stefan-fast · 2022-01-21T12:06:55Z

Hey, I'm currently facing the same issue with containerd and cri parser. containerd also splits logs lines greater than 16kb by default. I found a parameter max_container_log_line_size here but I'm unable to change it because of missing access to that config in my cloud environment.
From what I can tell, containerd logs seem to be splitt in the following way:

2022-01-21T10:10:24.993695084Z stdout P <very long log message 1>
2022-01-21T10:10:24.993695085Z stdout P <still log message 1>
2022-01-21T10:10:24.993846586Z stdout F <final part of log message 1>

Each log message that is split has the parts tagged with P and has a final part F indicating that the next line is a new log message.
Shorter log messages that don't need splitting have the F tag right away:

2022-01-21T10:10:24.993846587Z stdout F <short log message 2>
2022-01-21T10:10:24.993846587Z stdout F <short log message 3>

Both Fluent-Bit and FileBeat have working options to enable reconstruction of split Docker as well as containerd/cri logs:

FileBeat

cri.parse_flags (cri)
combine_partial (docker)

Fluent-Bit

multiline-parsing (docker & cri parser)

It would be really cool if Promtail had similair options that work out of the box :)

hgranillo · 2022-01-24T10:47:47Z

Does this happens when using containerd instead of docker and the criparser? Might be a good excuse to just switch container runtimes

What do you mean by "this"? containerd also splits logs. Unfortunatly I couldn't find a lot of references online how that is happening besides this rather old issue: containerd/cri#283

Hi, yes sorry for the lack of context, by "this" I meant the long log line getting spitted.

BitProcessor · 2022-04-21T13:11:03Z

For JSON, this works:

                        - multiline:
                            # Identify zero-width space as first line of a multiline block.
                            # Note the string should be in single quotes.
                            firstline: '^\x{200B}\{'
                            max_wait_time: 3s
                        - replace:
                            expression: '^(\x{200B})'
                            replace: ''
                        - replace:
                            expression: '([\n])'
                            replace: ''

To keep Grafana happy and make it valid JSON again:

the zero-width space character is removed again
the newlines inserted in the promtail reconstruction are removed again

Result: you can now happily ingest JSON up till the Loki item size limit (which is 64KB in Grafana Cloud at the time of writing) 🥳 🎉

nantiferov · 2022-10-17T10:17:03Z

@icebob This configuration seems works fine for elasticsearch to collect error level logs correctly

    pipeline_stages:
      - multiline:
          firstline: '^\{'
          max_wait_time: 3s
      - replace:
          expression: '([\n])'
          replace: ''

MurzNN · 2022-10-17T10:20:57Z

@nantiferov This will work well until the line break occurs exactly before the { character in the middle of a log line text. The probability of this is small, but still not zero!

And to properly handle this case - you should write a lot more complex construction, that will count all opening and closing braces... 😞

icebob · 2022-10-17T10:48:59Z

@nantiferov thanks, currently we are using this (similar for your solution) and it also works:

    - match:
        selector: '{container_name="elasticsearch_es_1"}'
        stages:
          - multiline:
              firstline: '(^\{)'
              max_wait_time: 1s
          - replace:
              expression: '(\n)'
              replace: ' '        
          - json:
              expressions:
                message: message
                time: timestamp
                level: level
                node_name: node.name
                #component: component
          - labels:
              level:
              node_name:
          - timestamp:
              format: RFC3339Nano
              source: time
              action_on_failure: fudge
          - output:
              source: message

mbonanata · 2022-12-07T18:31:15Z

Hi, there is something that I don't understand. I am seeing the same behaviour with broken lines in 16k in Loki but when I execute "kubectl logs my-pod" I see the whole lines. Is that correct?

I am using EKS v1.21.14 and Loki/Promtail 2.5.0

marinnedea · 2023-07-05T16:49:57Z

@icebob - will trat work with Grafana Agent too, or it's Promtail specific?

icebob · 2023-07-11T18:18:44Z

I think it should work with Grafana Agent as well.

hu-chia · 2023-10-24T08:23:22Z

@icebob this replace stage is replacing all line breaks to space? why doesn't just remove them?

stefan-fast · 2024-01-24T12:57:46Z

Hey, I'm currently facing the same issue with containerd and cri parser. containerd also splits logs lines greater than 16kb by default. I found a parameter max_container_log_line_size here but I'm unable to change it because of missing access to that config in my cloud environment. From what I can tell, containerd logs seem to be splitt in the following way:
2022-01-21T10:10:24.993695084Z stdout P <very long log message 1>
2022-01-21T10:10:24.993695085Z stdout P <still log message 1>
2022-01-21T10:10:24.993846586Z stdout F <final part of log message 1>
Each log message that is split has the parts tagged with P and has a final part F indicating that the next line is a new log message. Shorter log messages that don't need splitting have the F tag right away:
2022-01-21T10:10:24.993846587Z stdout F <short log message 2>
2022-01-21T10:10:24.993846587Z stdout F <short log message 3>
Both Fluent-Bit and FileBeat have working options to enable reconstruction of split Docker as well as containerd/cri logs:

FileBeat
* [cri.parse_flags](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-docker.html#_cri_parse_flags) (cri)

* [combine_partial](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-docker.html#_combine_partial) (docker)
Fluent-Bit
* [multiline-parsing](https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/multiline-parsing) (`docker` & `cri` parser)
It would be really cool if Promtail had similair options that work out of the box :)

For cri stage, reconstructing partial logs is implemented in the latest versions of Promtail (see docs for cri stage) and (for me) works out of the box without any configurations. Thanky you! 🎉

jonaslb · 2024-03-13T13:40:46Z

It seems that docker_sd_configs does combine the split logs from docker now, without any multiline configuration. It just seems to do it in an incorrect manner (see #12197).

andoks · 2024-03-19T13:46:32Z

Got bit by this recently, and seems to still be an issue with loki and/or the docker loki log plugin.

Also found a discussion about making the size limit in docker configurable (moby/moby#32923) and in that issue it seems like the maintainers of docker suggests that the best way forward is for consumers (plugins among them) to start handle split log lines. From the rest of the discussion it also seems like maybe split log lines are marked in some way: https://github.com/moby/moby/blob/cd14846d0cde098bb83037d99104db6fadfef039/daemon/logger/copier.go#L139-L152

slim-bean added the keepalive An issue or PR that will be kept alive and never marked as stale. label Jul 1, 2020

patrickhuy mentioned this issue Dec 4, 2020

Docker splits long log lines and Promtail considers them multiple logs, let's fix that #2920

Closed

slim-bean removed the keepalive An issue or PR that will be kept alive and never marked as stale. label Jan 21, 2021

stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 20, 2021

stale bot removed the stale A stale issue or PR that will automatically be closed. label Mar 23, 2021

stale bot added the stale A stale issue or PR that will automatically be closed. label Jun 2, 2021

stale bot closed this as completed Jun 9, 2021

dannykopping reopened this Jul 29, 2021

stale bot removed the stale A stale issue or PR that will automatically be closed. label Jul 29, 2021

dannykopping added component/promtail keepalive An issue or PR that will be kept alive and never marked as stale. labels Jul 29, 2021

ukolovda mentioned this issue Sep 29, 2021

Loki's Promtail cuts long log lines with length greater than 16384 characters #4392

Closed

megamorf mentioned this issue Jan 7, 2022

Logspout integration for logging through Docker #82

Open

BitProcessor mentioned this issue Apr 20, 2022

Multi-line: add flag to skip \n addition in reconstruct #5975

Closed

This was referenced Sep 2, 2022

Neon.Service logging broken nforgeio/neonKUBE#1668

Closed

NeonService JSON log lines are being split nforgeio/neonKUBE#1675

Closed

doanbutar mentioned this issue Sep 8, 2022

Promtail unable to handle docker logs larger than 16kb #7100

Closed

schakko mentioned this issue May 8, 2023

Promtail received file watcher event but positions do not increase after a few hours #9258

Open

heckj mentioned this issue Dec 15, 2023

Save the searches SwiftPackageIndex/SwiftPackageIndex-Server#2786

Merged

jonaslb mentioned this issue Mar 13, 2024

Consistently corrupt log data for long (>16kB) log lines #12197

Closed

andoks mentioned this issue Mar 19, 2024

loki-docker-driver does not support reconstructing docker logs split by docker in 16kB chunks #12262

Open

jonaslb mentioned this issue Mar 27, 2024

fix(promtail): Handle docker logs when a log is split in multiple frames #12374

Merged

8 tasks

cstyan closed this as completed in #12374 Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconstruct split docker log lines in Promtail #2281

Reconstruct split docker log lines in Promtail #2281

slim-bean commented Jul 1, 2020

slim-bean commented Jul 1, 2020

123BLiN commented Oct 29, 2020 •

edited

Loading

cf-sewe commented Oct 29, 2020 •

edited

Loading

123BLiN commented Oct 30, 2020

nivekuil commented Dec 20, 2020

slim-bean commented Jan 21, 2021

triplaaj commented Jan 27, 2021

slim-bean commented Jan 27, 2021

stale bot commented Mar 20, 2021

pgassmann commented Mar 22, 2021 •

edited

Loading

patrickhuy commented Apr 9, 2021 •

edited

Loading

stale bot commented Jun 2, 2021

MurzNN commented Sep 29, 2021

MurzNN commented Sep 29, 2021

MurzNN commented Sep 29, 2021 •

edited

Loading

icebob commented Jan 13, 2022

hgranillo commented Jan 19, 2022 •

edited

Loading

patrickhuy commented Jan 21, 2022

stefan-fast commented Jan 21, 2022 •

edited

Loading

hgranillo commented Jan 24, 2022

BitProcessor commented Apr 21, 2022

nantiferov commented Oct 17, 2022 •

edited

Loading

MurzNN commented Oct 17, 2022 •

edited

Loading

icebob commented Oct 17, 2022

mbonanata commented Dec 7, 2022

marinnedea commented Jul 5, 2023

icebob commented Jul 11, 2023

hu-chia commented Oct 24, 2023

stefan-fast commented Jan 24, 2024

jonaslb commented Mar 13, 2024

andoks commented Mar 19, 2024

Reconstruct split docker log lines in Promtail #2281

Reconstruct split docker log lines in Promtail #2281

Comments

slim-bean commented Jul 1, 2020

slim-bean commented Jul 1, 2020

123BLiN commented Oct 29, 2020 • edited Loading

cf-sewe commented Oct 29, 2020 • edited Loading

123BLiN commented Oct 30, 2020

nivekuil commented Dec 20, 2020

slim-bean commented Jan 21, 2021

triplaaj commented Jan 27, 2021

slim-bean commented Jan 27, 2021

stale bot commented Mar 20, 2021

pgassmann commented Mar 22, 2021 • edited Loading

patrickhuy commented Apr 9, 2021 • edited Loading

stale bot commented Jun 2, 2021

MurzNN commented Sep 29, 2021

MurzNN commented Sep 29, 2021

MurzNN commented Sep 29, 2021 • edited Loading

icebob commented Jan 13, 2022

hgranillo commented Jan 19, 2022 • edited Loading

patrickhuy commented Jan 21, 2022

stefan-fast commented Jan 21, 2022 • edited Loading

hgranillo commented Jan 24, 2022

BitProcessor commented Apr 21, 2022

nantiferov commented Oct 17, 2022 • edited Loading

MurzNN commented Oct 17, 2022 • edited Loading

icebob commented Oct 17, 2022

mbonanata commented Dec 7, 2022

marinnedea commented Jul 5, 2023

icebob commented Jul 11, 2023

hu-chia commented Oct 24, 2023

stefan-fast commented Jan 24, 2024

jonaslb commented Mar 13, 2024

andoks commented Mar 19, 2024

123BLiN commented Oct 29, 2020 •

edited

Loading

cf-sewe commented Oct 29, 2020 •

edited

Loading

pgassmann commented Mar 22, 2021 •

edited

Loading

patrickhuy commented Apr 9, 2021 •

edited

Loading

MurzNN commented Sep 29, 2021 •

edited

Loading

hgranillo commented Jan 19, 2022 •

edited

Loading

stefan-fast commented Jan 21, 2022 •

edited

Loading

nantiferov commented Oct 17, 2022 •

edited

Loading

MurzNN commented Oct 17, 2022 •

edited

Loading