Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested JSON parsing stopped working with fluent/fluentd-kubernetes-daemonset:v0.12-debian-elasticsearch #2073

Closed
arikunbotify opened this issue Jul 15, 2018 · 14 comments

Comments

@arikunbotify
Copy link

Hi,
I'm using fluent/fluentd-kubernetes-daemonset:v0.12-debian-elasticsearch and after updating to the new image (based on 0.12.43 and after solving the UID=0 issue reported here) I've stopped getting parsed nested objects. I get the kubernetes and docker fields parsed but the inside message in "log", which is a standard JSON from the application i run, is no longer parsed.
Have anyone encountered this issue with the new image?

(Also, the image based on 0.12.33 doesn't start at all form some reason, and I can't find older version tags to try).

Best,
AA

@repeatedly
Copy link
Member

Maybe, the problem is kubernetes-metadata-filter did breaking changes.
Using parser filter resolve the problem. See #2021

@arikunbotify
Copy link
Author

Thanks. In case anyone else will wonder how to combine nested json parsing with kubernetes fields, that's what works for me (in kubernetes.conf):

    <filter kubernetes.**>
      @type kubernetes_metadata
      @id filter_kube_metadata
    </filter>

    <filter kubernetes.var.log.containers.**>
      @type parser
      <parse>
        @type json
        json_parser json
      </parse>
      replace_invalid_sequence true
      emit_invalid_record_to_error false
      key_name log
      reserve_data true
    </filter>

@calinah
Copy link

calinah commented Jul 27, 2018

hey @arikunbotify can you please share your full configuration if you can ? I have been troubleshooting this problem for days now and my log messages are not passed as json to both elasticsearch and stdout. I have added the filter you suggested to my configuration as well still no luck. I can see they are escaped accordingly but when passed, they are passes as text and not json

@arikunbotify
Copy link
Author

arikunbotify commented Aug 1, 2018

@calinah I totally forgot to mention i switched to:
fluent/fluentd-kubernetes-daemonset:v1.2.2-debian-elasticsearch

I think this is the relevant config part:

   <match fluent.**>
      @type null
    </match>

    <source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head false
      <parse>
        @type json
        json_parser json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <filter kubernetes.**>
      @type kubernetes_metadata
      @id filter_kube_metadata
    </filter>

    <filter kubernetes.var.log.containers.**>
      @type parser
      <parse>
        @type json
        json_parser json
      </parse>
      replace_invalid_sequence true
      emit_invalid_record_to_error false
      key_name log
      reserve_data true
    </filter>

hope it helps

@Datise
Copy link

Datise commented Oct 23, 2018

@arikunbotify Sorry to drudge up but what is your strategy for adding the filter to the daemonset? I'm attempting to load via configmap and am not having much luck. Would love to avoid the initcontainer solution I see here:
fluent/fluentd-kubernetes-daemonset#174 (comment)

@theothermike
Copy link

Since this feature used to work, why can't you just add that config in the docker image by default so everyone doesn't need to manually override with custom configmaps?

@warrenackerman
Copy link

We are having this parsing issue and followed @arikunbotify example but the log field is not returning individual fields in kibana. it is a single log entry and the json is still showing escape characters.

    <source>
      @id goapp_logs
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/goapp-log.pos
      tag kubernetes.*
      read_from_head false
      <parse>
        @type json
        json_parser json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>
    <filter kubernetes.var.log.containers.**>
      @type parser
      <parse>
        @type json
        json_parser json
      </parse>
      replace_invalid_sequence true
      emit_invalid_record_to_error false
      key_name log
      reserve_data true
    </filter>

results

"time=\"2019-02-07T22:29:13Z\" level=info msg=\"started handling request\" method=GET remote=\"10.1.1.1:34234\" request=/healthz source=\"blahh@v1.1.0/entry.go:111\"\n"

Any advice. we want the kibana table results to show:

level         info
msg         started handling request
method    GET
remote    10.1.1.1:34234
etc...

@Gambler13
Copy link

@Datise
Did you solve your problem? I'm struggling with the exact same one.

@kompiuter
Copy link

kompiuter commented May 23, 2019

The following worked for me:

fuentd-config-map.yml

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |
    <match fluent.**>
      @type null
    </match>

    <match kubernetes.var.log.containers.**fluentd**.log>
      @type null
    </match>

    <match kubernetes.var.log.containers.**kube-system**.log>
      @type null
    </match>

    <match kubernetes.var.log.containers.**kibana**.log>
      @type null
    </match>

    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head false
      <parse>
        @type json
        json_parser oj
        time_format %Y-%m-%dT%H:%M:%S
      </parse>
    </source>

    <filter kubernetes.**>
      @type kubernetes_metadata
      @id filter_kube_metadata
    </filter>

    <filter kubernetes.var.log.containers.**>
      @type parser
      <parse>
        @type json
        json_parser oj
        time_format %Y-%m-%dT%H:%M:%S
      </parse>
      key_name log
      replace_invalid_sequence true
      emit_invalid_record_to_error true
      reserve_data true
    </filter>

    <match kubernetes.**>
      @type elasticsearch
      @log_level debug
      host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
      port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
      scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}"
      ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}"
      user "#{ENV['FLUENT_ELASTICSEARCH_USER']}" # remove these lines if not needed
      password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}" # remove these lines if not needed
      logstash_format true
      logstash_prefix fluentd
      logstash_dateformat %Y%m%d
      include_tag_key true
      reload_connections true
      log_es_400_reason true
      <buffer>
        flush_thread_count 8
        flush_interval 5s
        chunk_limit_size 2M
        queue_limit_length 32
        retry_max_interval 30
        retry_forever true
      </buffer>
    </match>

fluentd-daemonset.yml

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluentd-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.4-debian-elasticsearch-1
        env:
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: "elasticsearch.default"
          - name:  FLUENT_ELASTICSEARCH_PORT
            value: "9200"
          - name: FLUENT_ELASTICSEARCH_SCHEME
            value: "http"
          - name: FLUENT_UID
            value: "0"
          - name: FLUENT_ELASTICSEARCH_USER
            value: "foo"
          - name: FLUENT_ELASTICSEARCH_PASSWORD 
            value: "bar"
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluentd-config
          mountPath: /fluentd/etc
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentd-config
        configMap:
          name: fluentd-config

elasticsearch image: docker.elastic.co/elasticsearch/elasticsearch:7.1.0
kibana image: docker.elastic.co/kibana/kibana:7.1.0

@jpugliesi
Copy link

In our case, running fluent/fluentd-kubernetes-daemonset/v1.7.4-debian-elasticsearch7-1.0, we saw that only some types of kubernetes json logs were not being parsed by fluentd. The fix was adding the reserve_time true to the filter, like so:

..... # standard kubernetes.conf
<filter kubernetes.**>
  @type kubernetes_metadata
  @id filter_kube_metadata
</filter>

# Fixes parsing nested json in the docker json logs
<filter kubernetes.**>
  @id filter_parser
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  replace_invalid_sequence true
  reserve_time true
  <parse>
    @type multi_format
    <pattern>
      format json
      json_parser json
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>
</filter>

In our case, the json logs failing to parse had a time field that apparently doesn't play nicely with the fluentd configuration unless reserve_time true is added.

@peetasan
Copy link

peetasan commented Sep 10, 2020

I had an issue with this config (and the original from https://github.com/fluent/fluentd-kubernetes-daemonset/tree/master/docker-image/v1.11/debian-graylog/conf) where my json log was parsed correctly but the k8s metadata was packed in a kubernetes key as one json value. This way I can't filter for pod_name or anything like this. Any ideas why this data is not on the top level of the log which is sent to wherever (graylog in my case)?

@ediezh
Copy link

ediezh commented Dec 7, 2020

I'm having the same issue as @peetasan

@Sieabah
Copy link

Sieabah commented Dec 17, 2020

For those wondering why the "Fixed" version might also still not work anymore (thanks fluentd, really making me work to get my logs ingested) is because using multi_format and the filter causes the following error to arise.

/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/plugin.rb:125:in `new_parser': undefined method `[]' for nil:NilClass (NoMethodError)
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-multi-format-parser-1.0.0/lib/fluent/plugin/parser_multi_format.rb:21:in `block in configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-multi-format-parser-1.0.0/lib/fluent/plugin/parser_multi_format.rb:17:in `each'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-multi-format-parser-1.0.0/lib/fluent/plugin/parser_multi_format.rb:17:in `configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/plugin.rb:173:in `configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/plugin_helper/parser.rb:90:in `block in configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/plugin_helper/parser.rb:85:in `each'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/plugin_helper/parser.rb:85:in `configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/plugin/in_tail.rb:128:in `configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/plugin.rb:173:in `configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/root_agent.rb:317:in `add_source'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/root_agent.rb:158:in `block in configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/root_agent.rb:152:in `each'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/root_agent.rb:152:in `configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/engine.rb:105:in `configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/engine.rb:80:in `run_configure'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/supervisor.rb:555:in `run_supervisor'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/lib/fluent/command/fluentd.rb:341:in `<top (required)>'
        from /usr/local/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/local/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.5/bin/fluentd:8:in `<top (required)>'
        from /fluentd/vendor/bundle/ruby/2.6.0/bin/fluentd:23:in `load'
        from /fluentd/vendor/bundle/ruby/2.6.0/bin/fluentd:23:in `<main>'

Below is the config that works for me while excluding the fluent logs which the previous one still breaks with. It breaks out the kubernetes metadata as well and looks like the following within kibana.

image

<source>
  @type tail
  @id in_tail_container_logs
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  exclude_path ["/var/log/containers/fluent*"]
  read_from_head true
  <parse>
    @type regexp
    expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
  </parse>
</source>
<filter kubernetes.**>
  @type kubernetes_metadata
  @id filter_kube_metadata
</filter>

<filter kubernetes.var.log.containers.**>
  @type parser
  <parse>
    @type json
    json_parser json
  </parse>
  replace_invalid_sequence true
  emit_invalid_record_to_error false
  key_name log
  reserve_data true
</filter>

@jamietanna
Copy link

Sorry to necrobump, but this StackOverflow worked for me, which handles multiple formats using the Multi format parser plugin:

   <filter **>
     @type parser
     key_name message
     reserve_data true
     remove_key_name_field true
     <parse>
       @type multi_format
       <pattern>
         format json
       </pattern>
       <pattern>
         format none
       </pattern>
     </parse>
   </filter>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests