Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1740263: enable the parsing of docker log-driver=json-file continuation lines using fluent concat plugin #1723

Conversation

richm
Copy link
Contributor

@richm richm commented Aug 14, 2019

This commit allows fluentd to reconstruct partial lines written
by the docker json-file and journald log drivers when logs
exceed the 16K byte limit.

There are two new environment variables:

  • USE_MULTILINE_JSON - by default this is false - if you do
    oc set env ds/logging-fluentd USE_MULTILINE_JSON=true
    then fluentd will be able to reconstruct docker json-file
    partial logs.

  • USE_MULTILINE_JOURNAL - by default this is false - if you do
    oc set env ds/logging-fluentd USE_MULTILINE_JOURNAL=true
    then fluentd will be able to reconstruct docker journald
    partial logs.

For json-file logs, the "log" field ends in \n for the final
part of the log, and does not end in \n for starting and
continuation lines. For journald logs, the field
CONTAINER_PARTIAL_MESSAGE=true is present for starting and
continuation lines, but is omitted for final lines.

fluent-plugin-concat 2.4.0 was backported to work with ruby 2.0
and fluentd 0.12. The main feature was the ability to have
only multiline_end_regexp without multiline_start_regexp
which is required for docker json-file log support. The
partial_key support for journald was already there for cri-o.
The wrinkle with journald is that all records to be
reconstructed must have the CONTAINER_PARTIAL_MESSAGE field,
so a filter was added to set CONTAINER_PARTIAL_MESSAGE=false
for container log records which did not already have the
CONTAINER_PARTIAL_MESSAGE field, in order to make the concat
filter work for partial_key.

If you want to try this out without building the image, you can
follow these steps:

Hack fluent.conf like this:

    #@include configs.d/dynamic/input-docker-*.conf
    <source>
      @type tail
      @id docker-input
      @label @INGRESS
      path "/var/log/containers/*.log"
      pos_file "/var/log/es-containers.log.pos"
      time_format %Y-%m-%dT%H:%M:%S.%N%Z
      tag kubernetes.*
      format json
      keep_time_key true
      read_from_head "true"
      exclude_path []
      @label @CONCAT
    </source>
    <label @CONCAT>
      <filter kubernetes.**>
        @type concat
        key log
        multiline_end_regexp /\n$/
      </filter>
      <match kubernetes.**>
        @type relabel
        @label @INGRESS
      </match>
    </label>
    ...
    <label @INGRESS>
    ## filters
      @include configs.d/openshift/filter-pre-*.conf
      <filter journal>
        @type record_modifier
        <record>
          ignoreme ${if record.key?("CONTAINER_ID_FULL") && !record.key?("CONTAINER_PARTIAL_MESSAGE"); record["CONTAINER_PARTIAL_MESSAGE"] = "false"; end; "ignoreme"}
        </record>
        remove_keys ignoreme
      </filter>
      <filter journal>
        @type concat
        key MESSAGE
        separator ""
        stream_identity_key CONTAINER_ID_FULL
        partial_key CONTAINER_PARTIAL_MESSAGE
        partial_value true
      </filter>

Create a special configmap for the plugin code:

mkdir cm-fluentd-plugin
oc get pods -l component=fluentd
fpod=logging-fluentd-xxx
for file in $( oc exec $fpod -- ls /etc/fluent/plugin ) ; do
  oc exec $fpod -- cat /etc/fluent/plugin/$file > cm-fluentd-plugin/$file
done
cp cm-fluentd-plugin/filter_concat.rb cm-fluentd-plugin/filter_concat.rb.orig
cp /path/to/new/filter_concat.rb cm-fluentd-plugin/filter_concat.rb
oc create configmap fluentd-plugin --from-file=cm-fluentd-plugin/

Then, add the volume and volumemount to the fluentd daemonset:

oc edit ds/logging-fluentd

Add to volumeMounts and volumes

        volumeMounts:
        - mountPath: /etc/fluent/plugin
          name: fluentd-plugin
          readOnly: true
        ...
      volumes:
      - configMap:
          defaultMode: 420
          name: fluentd-plugin
        name: fluentd-plugin

Restart fluentd

oc delete pods -l component=fluentd

You may see errors like this in the fluentd log:

/etc/fluent/plugin/viaq_docker_audit.rb:51: warning: already initialized constant Fluent::ViaqDockerAudit::ENV_HOSTNAME

You can ignore them.

add support for USE_MULTILINE_JOURNAL

If USE_MULTILINE_JOURNAL=true, then docker log-driver=journald logs
that are spread over multiple records using CONTAINER_PARTIAL_MESSAGE
will be concatenated together as a single record.

bug fixes

dump indices upon error

This commit allows fluentd to reconstruct partial lines written
by the docker json-file and journald log drivers when logs
exceed the 16K byte limit.

There are two new environment variables:

* `USE_MULTILINE_JSON` - by default this is false - if you do
`oc set env ds/logging-fluentd USE_MULTILINE_JSON=true`
then fluentd will be able to reconstruct docker json-file
partial logs.

* `USE_MULTILINE_JOURNAL` - by default this is false - if you do
`oc set env ds/logging-fluentd USE_MULTILINE_JOURNAL=true`
then fluentd will be able to reconstruct docker journald
partial logs.

For json-file logs, the "log" field ends in `\n` for the final
part of the log, and does not end in `\n` for starting and
continuation lines.  For journald logs, the field
`CONTAINER_PARTIAL_MESSAGE=true` is present for starting and
continuation lines, but is omitted for final lines.

fluent-plugin-concat 2.4.0 was backported to work with ruby 2.0
and fluentd 0.12.  The main feature was the ability to have
only `multiline_end_regexp` without `multiline_start_regexp`
which is required for docker json-file log support.  The
partial_key support for journald was already there for cri-o.
The wrinkle with journald is that _all_ records to be
reconstructed must have the `CONTAINER_PARTIAL_MESSAGE` field,
so a filter was added to set `CONTAINER_PARTIAL_MESSAGE=false`
for container log records which did not already have the
`CONTAINER_PARTIAL_MESSAGE` field, in order to make the concat
filter work for partial_key.

If you want to try this out without building the image, you can
follow these steps:

Hack fluent.conf like this:
```
    #@include configs.d/dynamic/input-docker-*.conf
    <source>
      @type tail
      @id docker-input
      @Label @ingress
      path "/var/log/containers/*.log"
      pos_file "/var/log/es-containers.log.pos"
      time_format %Y-%m-%dT%H:%M:%S.%N%Z
      tag kubernetes.*
      format json
      keep_time_key true
      read_from_head "true"
      exclude_path []
      @Label @concat
    </source>
    <label @concat>
      <filter kubernetes.**>
        @type concat
        key log
        multiline_end_regexp /\n$/
      </filter>
      <match kubernetes.**>
        @type relabel
        @Label @ingress
      </match>
    </label>
    ...
    <label @ingress>
    ## filters
      @include configs.d/openshift/filter-pre-*.conf
      <filter journal>
        @type record_modifier
        <record>
          ignoreme ${if record.key?("CONTAINER_ID_FULL") && !record.key?("CONTAINER_PARTIAL_MESSAGE"); record["CONTAINER_PARTIAL_MESSAGE"] = "false"; end; "ignoreme"}
        </record>
        remove_keys ignoreme
      </filter>
      <filter journal>
        @type concat
        key MESSAGE
        separator ""
        stream_identity_key CONTAINER_ID_FULL
        partial_key CONTAINER_PARTIAL_MESSAGE
        partial_value true
      </filter>

```

Create a special configmap for the plugin code:
```
mkdir cm-fluentd-plugin
oc get pods -l component=fluentd
fpod=logging-fluentd-xxx
for file in $( oc exec $fpod -- ls /etc/fluent/plugin ) ; do
  oc exec $fpod -- cat /etc/fluent/plugin/$file > cm-fluentd-plugin/$file
done
cp cm-fluentd-plugin/filter_concat.rb cm-fluentd-plugin/filter_concat.rb.orig
cp /path/to/new/filter_concat.rb cm-fluentd-plugin/filter_concat.rb
oc create configmap fluentd-plugin --from-file=cm-fluentd-plugin/
```
Then, add the volume and volumemount to the fluentd daemonset:
```
oc edit ds/logging-fluentd
```
Add to volumeMounts and volumes
```
        volumeMounts:
        - mountPath: /etc/fluent/plugin
          name: fluentd-plugin
          readOnly: true
        ...
      volumes:
      - configMap:
          defaultMode: 420
          name: fluentd-plugin
        name: fluentd-plugin
```
Restart fluentd
```
oc delete pods -l component=fluentd
```
You may see errors like this in the fluentd log:
```
/etc/fluent/plugin/viaq_docker_audit.rb:51: warning: already initialized constant Fluent::ViaqDockerAudit::ENV_HOSTNAME
```
You can ignore them.

add support for USE_MULTILINE_JOURNAL

If USE_MULTILINE_JOURNAL=true, then docker log-driver=journald logs
that are spread over multiple records using CONTAINER_PARTIAL_MESSAGE
will be concatenated together as a single record.

bug fixes

dump indices upon error
@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 14, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2019
@richm richm added area/documentation area/performance autoretest Please auto-retest this PR if one of the flaky tests fail component/fluentd kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/3.11 labels Aug 14, 2019
@richm richm requested a review from jcantrill August 14, 2019 15:21
@richm richm changed the title reconstruct partial line docker json-file and journald logs Bug 1740263: enable the parsing of docker log-driver=json-file continuation lines using fluent concat plugin Aug 14, 2019
@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 14, 2019
@openshift-ci-robot
Copy link

@richm: This pull request references a valid Bugzilla bug.

In response to this:

Bug 1740263: enable the parsing of docker log-driver=json-file continuation lines using fluent concat plugin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@richm
Copy link
Contributor Author

richm commented Aug 14, 2019

/test json-file

1 similar comment
@richm
Copy link
Contributor Author

richm commented Aug 14, 2019

/test json-file

@nhosoi
Copy link
Contributor

nhosoi commented Aug 14, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nhosoi, richm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit abee0bd into openshift:release-3.11 Aug 15, 2019
@openshift-ci-robot
Copy link

@richm: All pull requests linked via external trackers have merged. The Bugzilla bug has been moved to the MODIFIED state.

In response to this:

Bug 1740263: enable the parsing of docker log-driver=json-file continuation lines using fluent concat plugin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@richm richm deleted the release-3.11-multiline-docker-json-file branch August 15, 2019 13:57
@dmilde
Copy link

dmilde commented Aug 19, 2019

Hi, unfortunately this fails due to the require statement of the v2.4.0 filter_concat.rb which is for fluentd v0.14+:

https://github.com/fluent-plugins-nursery/fluent-plugin-concat/blob/v2.4.0/lib/fluent/plugin/filter_concat.rb

require "fluent/plugin/filter"

fluentd pods in CrashLoop with Log:

2019-08-19 14:30:26 +0200 [info]: reading config file path="/etc/fluent/fluent.conf"
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require': cannot load such file -- fluent/plugin/filter (LoadError) from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require'
from /etc/fluent/plugin/filter_concat.rb:1:in <top (required)>' from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require' from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/plugin.rb:89:in block in load_plugin_dir'
from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/plugin.rb:87:in each' from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/plugin.rb:87:in load_plugin_dir'
from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/engine.rb:138:in load_plugin_dir' from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:528:in block in init_engine'
from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:525:in each' from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:525:in init_engine'
from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:181:in block in start' from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:375:in call'
from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:375:in main_process' from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/supervisor.rb:179:in start'
from /usr/share/gems/gems/fluentd-0.12.43/lib/fluent/command/fluentd.rb:173:in <top (required)>' from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require' from /usr/share/gems/gems/fluentd-0.12.43/bin/fluentd:8:in <top (required)>'
from /usr/bin/fluentd:23:in load' from /usr/bin/fluentd:23:in

'

Best regards
dmilde

@richm
Copy link
Contributor Author

richm commented Aug 19, 2019

Hi, unfortunately this fails due to the require statement of the v2.4.0 filter_concat.rb which is for fluentd v0.14+:

https://github.com/fluent-plugins-nursery/fluent-plugin-concat/blob/v2.4.0/lib/fluent/plugin/filter_concat.rb

require "fluent/plugin/filter"

fluentd pods in CrashLoop with Log:

2019-08-19 14:30:26 +0200 [info]: reading config file path="/etc/fluent/fluent.conf"
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require': cannot load such file -- fluent/plugin/filter (LoadError) from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in require'

Correct, which is why we had to create our own special version of filter_concat.rb, backported from 2.4.0, but that works on fluentd 0.12 - https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/fluentd/lib/filter_concat/lib/filter_concat.rb

@dmilde
Copy link

dmilde commented Aug 19, 2019

Thank you, Pods are running fine now.

@dmilde
Copy link

dmilde commented Aug 19, 2019

Tested with following Log lines:

unset j; for i in {1..17000}; do j+=a; done
echo $j > /proc/1/fd/1

==> working.

echo {1..17000} > /proc/1/fd/1
==> working.


Regarding the fluent config, I think the first label to INGRESS is not needed since we always run through CONCAT?

#@include configs.d/dynamic/input-docker-*.conf
    <source>
      @type tail
      @id docker-input
      @label @INGRESS      <---
 ...

Best regards
dmilde

@richm
Copy link
Contributor Author

richm commented Aug 19, 2019

This is the CI test for multiline support: https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/test/docker_multiline.sh

It creates a file with a length > 16384 bytes. It then creates a test namespace and runs a test pod that writes the contents of this file as the "message" field of an embedded JSON object (with other fields such as "uniqueid" to facilitate searching for the specific logs produced by this test). It then does a search of Elasticsearch to confirm that the records have been written, and that the "message" field contains the reconstructed log lines.

If some output test is not working, please first confirm that you can see the logs using oc logs. You can also find the node where the pod is running e.g. oc get pods -o wide, then look at the log files under /var/log/containers/*.log on that node, to see if there is output for the pod.

re: the @INGRESS label - are you looking at the file /etc/fluent/configs.d/dynamic/input-docker-*.conf that is being generated in the pod? If so, that file isn't being used by your testing, if you have commented it out from the configmap logging-fluentd. The actual, real, final fix is not in an officially released Red Hat product yet (it is working its way through the pipeline). When the fix is in an officially released Red Hat product, what you will observe is that if you do oc set env daemonset/logging-fluentd USE_MULTILINE_JSON=true, when the fluentd pod is restarted, the file /etc/fluent/configs.d/dynamic/input-docker-*.conf will use @label @CONCAT and it will have the concat config logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation area/performance autoretest Please auto-retest this PR if one of the flaky tests fail bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. component/fluentd kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/3.11 size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants