Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/fluent-bit] Document how to parse JSON logs produced by an application #10424

Closed
gajus opened this issue Jan 7, 2019 · 27 comments
Closed

Comments

@gajus
Copy link

gajus commented Jan 7, 2019

Which chart:

stable/fluent-bit

What happened:

An application produces a JSON log, e.g.

{"context":{"package":"@applaudience/cinema-data-scraper-http-client","namespace":"httpClient","logLevel":20},"message":"using proxy http://proxy-gateway:8080","sequence":19715,"time":1546821495356,"version":"1.0.0"}

but ES receives the log as a plain text message (log field):

{
  "_index": "kubernetes_cluster-2019.01.07",
  "_type": "flb_type",
  [..]
  "fields": {
    "@timestamp": [
      "2019-01-07T01:30:01.815Z"
    ]
  },
  "highlight": {
    "log": [
      "2019-01-07T00:38:15.356711687Z stdout F {\"context\":{\"package\":\"@applaudience/cinema-data-scraper-http-client\",\"namespace\":\"httpClient\",\"logLevel\":20},\"message\":\"using @kibana-highlighted-field@proxy@/kibana-highlighted-field@ http://@kibana-highlighted-field@proxy@/kibana-highlighted-field@-@kibana-highlighted-field@gateway@/kibana-highlighted-field@:8080\",\"sequence\":19715,\"time\":1546821495356,\"version\":\"1.0.0\"}"
    ],
    "kubernetes.container_name": [
      "@kibana-highlighted-field@data@/kibana-highlighted-field@-@kibana-highlighted-field@manager@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1546824601815
  ]
}

What you expected to happen:

I expect that fluent-bit-parses the json message and providers the parsed message to ES.

How to reproduce it (as minimally and precisely as possible):

Using default configuration.

CC @naseemkullah @jknipper @vroyer (Recent contributors to stable/fluent-bit Chart).

What is the configuration for parsing JSON logs produced by an application running in a Kubernetes Pod?

@gajus
Copy link
Author

gajus commented Jan 7, 2019

I have tried multiple configuration combinations, including:

Thank you for the link @kfox1111 .

I have tried multiple combinations, including:

backend:
  type: es
  es:
    host: elasticsearch-client
image:
  fluent_bit:
    tag: 1.0.1
parsers:
  json:
    - extraEntries: |-
       Decode_Field_As escaped log do_next
       Decode_Field_As json log

and

backend:
  type: es
  es:
    host: elasticsearch-client
image:
  fluent_bit:
    tag: 1.0.1
parsers:
  json:
    - extraEntries: |-
       Decode_Field_As escaped_utf8 log do_next
       Decode_Field_As json log

but it continuous to come as a plain text log:

{
  "_index": "kubernetes_cluster-2019.01.07",
  "_type": "flb_type",
  "_id": "HnceJmgBINsxlb1YnF_W",
  "_score": 1,
  "_source": {
    "@timestamp": "2019-01-07T02:22:46.865Z",
    "log": "2019-01-07T02:22:46.865426429Z stdout F {\"context\":{\"package\":\"slonik\",\"logLevel\":20,\"executionTime\":\"1.28 ms\",\"rowCount\":1,\"sql\":\"SELECT id FROM event_seat_state_change WHERE event_id = ? ORDER BY id DESC LIMIT 1\"},\"message\":\"query\",\"sequence\":47781,\"time\":1546827766865,\"version\":\"1.0.0\"}",
    "kubernetes": {
      "pod_name": "adm-do-event-seating-lookups-7555bf849c-h4qlc",
      "namespace_name": "default",
      "pod_id": "563e747d-1208-11e9-b648-42010aa40038",
      "labels": {
        "chart-name": "data-manager",
        "chart-version": "1.0.0",
        "heritage": "Tiller",
        "pod-template-hash": "3111694057",
        "release": "adm-do-event-seating-lookups"
      },
      [...]
  },
  "fields": {
    "@timestamp": [
      "2019-01-07T02:22:46.865Z"
    ]
  }
}

@max-rocket-internet
Copy link
Contributor

@gajus but your key. _source . log is not valid JSON. It starts with 2019-01-07T02:22:46.865426429Z stdout F {\"context\":{. i.e. there is a bunch of randomn strings before the JSON. Isn't that the problem here?

@gajus
Copy link
Author

gajus commented Jan 7, 2019

@gajus but your key. _source . log is not valid JSON. It starts with 2019-01-07T02:22:46.865426429Z stdout F {\"context\":{. i.e. there is a bunch of randomn strings before the JSON. Isn't that the problem here?

My understanding is that the prefix to which you are referring is added by Docker.

I have SSHed to one of the GKE nodes and checked the raw logs to confirm this:

$ tail -1 /var/log/pods/6829cea2-1208-11e9-b648-42010aa40038/data-manager/0.log
2019-01-07T10:52:37.095818517Z stdout F {"context":{"package":"@applaudience/cinema-data-scraper-http-client","namespace":"httpClient","logLevel":20},"message":"using proxy http://proxy-gateway:8080","sequence":180167,"time":1546858357095,"version":"1.0.0"}

The app itself simply outputs a JSON line, i.e. just:

{"context":{"package":"@applaudience/cinema-data-scraper-http-client","namespace":"httpClient","logLevel":20},"message":"using proxy http://proxy-gateway:8080","sequence":180167,"time":1546858357095,"version":"1.0.0"}

My understanding is that fluent-bit is supposed to search the log message for occurrences satisfying the JSON parser and extract that information.

@max-rocket-internet
Copy link
Contributor

I see.

FWFIW, this works for us with chart version 0.12.0 and these settings:

$ kubectl -n kube-system get -o yaml configmap fluent-bit-config
apiVersion: v1
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush        1
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf

    [INPUT]
        Name             tail
        Path             /var/log/containers/*.log
        Parser           json
        Tag              kube.*
        Refresh_Interval 5
        Mem_Buf_Limit    5MB
        Skip_Long_Lines  On

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           Off
        Merge_Log_Key       log
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On

    [OUTPUT]
        Name  splunk
        <removed>
  parsers.conf: "[PARSER]\n    Name json\n    Format json\n    Time_Keep On\n    Time_Key
    time\n    Time_Format %Y-%m-%dT%H:%M:%S.%L\n    Decode_Field_As   escaped_utf8
    \ log  do_next\n    Decode_Field_As   json          log \n    Decode_Field_As
    \  escaped_utf8  logj.message  do_next\n    Decode_Field_As   json          logj.message\n\n[PARSER]\n
    \   Name docker\n    Format json\n    Time_Keep On\n    Time_Key time\n    Time_Format
    %Y-%m-%dT%H:%M:%S.%L\n"
kind: ConfigMap

But it only works with top level JSON keys as far as I can tell. And here's one line from an nginx container that is set to log in JSON:

# tail -1 /var/log/pods/cb921f33-xxx-xxx-xxxx-0ae5e5d97c08/nginx-ingress-controller/0.log
{"log":"{ \"@timestamp\": \"2019-01-07T11:13:06+00:00\", \"body_bytes_sent\": \"0\", \"client_ip\":...

This works for us in Splunk, not ES.

@gajus
Copy link
Author

gajus commented Jan 7, 2019

Do you mind telling me what is the contents of cat /etc/docker/daemon.json in your Kubernetes cluster?

Mine is:

{
	"disable-legacy-registry": true,
	"live-restore": true,
	"storage-driver": "overlay2"
}

Shouldn't this say log-driver: json-file?

@gajus
Copy link
Author

gajus commented Jan 7, 2019

Never mind. It appears that docker was launched with that config:

$ docker info | grep 'Logging Driver: json-file'
Logging Driver: json-file

@gajus
Copy link
Author

gajus commented Jan 7, 2019

That is strange. I logged in to another cluster that I own to see what is the format output, and it is indeed just plain JSON.

Something somewhere is broken in my setup.

@max-rocket-internet
Copy link
Contributor

Mine is the default for an EKS node on AWS:

{ "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "10" }}

I think the default GKE node has the same log-driver, no?

@gajus
Copy link
Author

gajus commented Jan 7, 2019

I have identified the cause.

The reasons the log format is off is because I am using containerd.

Have not discovered what is the fix, yet.

@gajus
Copy link
Author

gajus commented Jan 7, 2019

There is something utterly broken with fluent-bit (or poorly documented).

fluent/fluent-bit#1012

@gajus
Copy link
Author

gajus commented Jan 7, 2019

After many hours... it looks that I needed to configure Merge_Log_Trim On.

https://docs.fluentbit.io/manual/filter/kubernetes

Everything else works after configuring Merge_Log_Trim.

@gajus gajus closed this as completed Jan 7, 2019
@sahlone
Copy link

sahlone commented May 2, 2019

@gajus I had the same issue and I can see it works only if u have a JSON MAP, so if u have a nested json the decoder will still fail.
It doesn't matter if u use kubernetes filter, because the same result can be achieved using normal JSON parser.

  1. If a simple JSON object, the keys will be parsed
  2. If a nested JSON object the decoder will fail and keys will not be parsed.

@naseemkullah
Copy link
Collaborator

an issue has been created several months ago fluent/fluent-bit#1130 regarding this matter, who knows C ? :)

@max-rocket-internet
Copy link
Contributor

If a nested JSON object the decoder will fail and keys will not be parsed.

There's many issues for this, just search. In this one they have a list of them. It's a fluent-bit problem though, not a chart issue.

@TarekAS
Copy link

TarekAS commented May 21, 2019

I found this to be a working configuration to properly format json:

input:
  tail:
    parser: dockerCustom

parsers:
  enabled: true
  json:
    - name: dockerCustom
      extraEntries: |-
       Time_Key    time
       Time_Format %Y-%m-%dT%H:%M:%S.%L
       Time_Keep   On
       Decode_Field_As escaped_utf8 log do_next
       Decode_Field_As json log

@scottmbaker
Copy link

I am also experiencing the same problem as others in this thread, and have tried the solutions presented here without success. In every case I'm seeing log: 2019-08-20T16:08:35.976976032Z stdout F {my_properly_formatted_json_map}, with no merging of the json map, and no obvious reason why it's not being merged. I don't have nested json or anything fancy, it appears to be a simple map of key names to strings.

@TarekAS, @gajus Since you both have this working, is it still working for you?

@marcelocquadros
Copy link

Hi, I'm facing this same problem "log": 2019-09-17T18:17:07.799810563Z stdout F { ... }" using a managed cluster from IBMCloud. I've tried all the options on this post without success.

@sgujrati-up
Copy link

I am facing the same issue "log": 2019-09-17T18:17:07.799810563Z stdout F { ... }" on OpenShift cluster. Any suggestions?

@neoxack
Copy link

neoxack commented Jan 31, 2020

same issue on k3s cluster
"log": 2020-01-31T05:29:30.815679275+03:00 stdout F {"level":"warning","msg":"A new release has been found: 2.1.3. Please consider updating.","time":"2020-01-31T02:29:30Z"}

@boxcee
Copy link
Contributor

boxcee commented Feb 13, 2020

Same here:

2020-02-13T15:45:55.192902696Z stdout F {"@timestamp":"2020-02-13T15:45:55.192+00:00","@version":"1","message":"Span reported: 852eeff1c903a4e3:19b0f345af5bac8:1ba028c3f553cb66:1 - /permissions/_find_by_traits","logger_name":"io.jaegertracing.reporters.LoggingReporter","thread_name":"http-nio-8080-exec-9","level":"INFO","level_value":20000,"trace.id":"852eeff1c903a4e3"}

@boxcee
Copy link
Contributor

boxcee commented Feb 13, 2020

After many hours... it looks that I needed to configure Merge_Log_Trim On.

https://docs.fluentbit.io/manual/filter/kubernetes

Everything else works after configuring Merge_Log_Trim.

I tried the same, but it still appears to have problems with the 2019-01-07T10:52:37.095818517Z stdout F

@sgujrati-up
Copy link

sgujrati-up commented Feb 13, 2020

Log messages from app containers in openshift cluster are updated before they are saved to log files. If the log message from app container is This is test, then when it is saved to the file, something like 2019-01-07T10:52:37.095818517Z stdout F is prepended to the log. If the log message itself is in json format, prepending it with 2019-01-07T10:52:37.095818517Z stdout F invalidates json. I believe https://github.com/cri-o/cri-o/blob/0366d726c1b5b957671412efac86c96b63175987/utils/io/logger.go#L82 does it. I don't think there is any way to get rid of it, but you can configure fluent-bit parser and input to make it more sensible.

In fluent-bit config, have one INPUT for k8s application pods' logs

[INPUT]
    Name              tail
    Path              /var/log/containers/*.log
    Tag               kube.<namespace_name>.<container_name>.<pod_name>.<docker_id>-
    Tag_Regex         (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
    DB                /var/log/containers/fluentbit_kube.db
    Parser            crio
    Mem_Buf_Limit     100MB
    Skip_Long_Lines   On
    Refresh_Interval  5

and have crio PARSER

[PARSER]
  Name crio
  Format regex
  Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
  Time_Key    time
  Time_Format %Y-%m-%dT%H:%M:%S.%L%z     

This will change the log message to look like:

{
  "date": 1581526271.544489,
  "stream": "stdout",
  "logtag": "F",
  "log": "This is test"
}

@boxcee
Copy link
Contributor

boxcee commented Feb 14, 2020

@sgujrati-up I got it partly working by simply setting input.tail.parser to cri, which then uses the built-in parser https://github.com/fluent/fluent-bit/blob/551ae7778e5e4cad0593ad299010998b52ac2ee1/conf/parsers.conf#L97.

However, some of my messages contain logs in JSON which I would also like to parse.

Some look like this:

E0214 07:59:30.060195       1 reststorage.go:147] unable to fetch pod metrics for pod mip-playground/permission-selector-ui-service-5f6f84cdbd-7f7zs: no metrics known for pod

Perfect, nothing to change here.

But some look like this:

{"@timestamp":"2020-02-14T07:59:30.057+00:00","@version":"1","message":"Could not update new incoming content for search/cognitive feed with feedId.","logger_name":"newsupdater","thread_name":"pool-2-thread-1","level":"WARN","level_value":30000}

Here I would also love to parse the JSON and merge it.

All logs come from k8s-application.

@boxcee
Copy link
Contributor

boxcee commented Feb 14, 2020

Soooo, I got it working.

Additionally to the above, I just needed to add another filter to rawConfig:

rawConfig: |-
  @INCLUDE fluent-bit-service.conf
  @INCLUDE fluent-bit-input.conf

  [FILTER]
      Name      parser
      Match     kube.*
      Key_Name  message
      Parser    json

  @INCLUDE fluent-bit-filter.conf
  @INCLUDE fluent-bit-output.conf

Now it works!

@bit-cloner
Copy link

@boxcee . could you show us the full config for fluentbit where log is parsed as json please?
and which fluentbit image are you using ?

@lunarfs
Copy link

lunarfs commented Jul 2, 2020

So I spend some time getting this working.
I am running an docker image that is writing json as output
so the files is as follows
td-agent-bit.conf

[SERVICE]
    Flush 1
    Parsers_File parsers.conf

@INCLUDE inputs.conf
@INCLUDE filters.conf
@INCLUDE outputs.conf

filters.conf

[FILTER]
    Name parser
    Match_regex docker.fluentd.*
    Key_Name log
    Reserve_Data true
    Parser json

inputs.conf

[INPUT]
    Name              forward
    Listen            0.0.0.0
    Port              24224
    Buffer_Chunk_Size 32KB
    Buffer_Max_Size   64KB

to get the docker container logging I added tagging (the forward plugin does not support Tag) so this is the options I am having on my docker container

                              --log-driver fluentd \
                              --log-opt 'fluentd-address=localhost:24224' \
                              --log-opt 'fluentd-async-connect=true' \
                              --log-opt 'fluentd-buffer-limit=4m' \
                              --log-opt 'fluentd-max-retries=1000' \
                              --log-opt 'tag=docker.fluentd' \
                              --log-opt 'mode=non-blocking' \

@haisum
Copy link

haisum commented Aug 12, 2020

Passing these values to helm chart solves this issue for me in k3d clusters:

parsers:
  enabled: true
  regex:
    - name: log_parser
      regex: ^(?<logtimestamp>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
      timeKey: logtimestamp
      timeFormat: "%Y-%m-%dT%H:%M:%S.%L%z"
input:
  tail:
    parser: log_parser

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests