Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting docker log splitting in Kubernetes logging integrations #52444

Closed
crassirostris opened this issue Sep 13, 2017 · 27 comments · Fixed by #68012
Closed

Supporting docker log splitting in Kubernetes logging integrations #52444

crassirostris opened this issue Sep 13, 2017 · 27 comments · Fixed by #68012
Labels
area/logging kind/bug Categorizes issue or PR as related to a bug. milestone/needs-approval priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.

Comments

@crassirostris
Copy link

Follow-up of #52433

Docker 1.13 introduced a change into the logging mechanism (moby/moby#22982), that split the log lines longer than 16K into chunks of no more than that. Current default logging integrations (fluentd-gcp and fluentd-es addons) do not support that and this might result in broken ingestion, i.e. when JSON parsing is broken for entries longer than 16K.

One possible solution is introducing a fluentd plugin to the configuration to concat such entries in the similar way https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions does it (assuming there is one out there, cc @tagomoris @repeatedly @edsiper for that)

/cc @igorpeshansky @fgrzadkowski @piosz

@crassirostris crassirostris added area/logging kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Sep 13, 2017
@igorpeshansky
Copy link

Presumably this should be done in the Docker fluentd logging driver... The discussion on moby/moby#34620 seems to be leaning that way...

@repeatedly
Copy link

In fluentd logging driver side, journald driver approach is better: https://github.com/moby/moby/blob/80edccda708c6771824c245b4d634c2a9ce29795/daemon/logger/journald/journald.go#L110
I'm not sure buffering partial message in logging driver is good or not...

In fluentd side, implementing in_docker_forward or filter_docker_xxx is needed.

@dims
Copy link
Member

dims commented Sep 18, 2017

@crassirostris @repeatedly @igorpeshansky do we need to block 1.8 release? can we please move this out of 1.8 if it's not?

@crassirostris crassirostris modified the milestones: v1.8, v1.9 Sep 18, 2017
@crassirostris
Copy link
Author

@dims Moved to 1.9, since it's not specific for 1.8

@JorritSalverda
Copy link

Is there any way this can be fixed by running a custom fluentd? Or is there a fluentd docker driver github issue I can follow for updates on when they expect to have this fixed? Txh

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Issue Needs Approval

@crassirostris @kubernetes/sig-instrumentation-misc

Action required: This issue must have the status/approved-for-milestone label applied by a SIG maintainer.

Issue Labels
  • sig/instrumentation: Issue will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@crassirostris
Copy link
Author

crassirostris commented Nov 8, 2017

Is there any way this can be fixed by running a custom fluentd?

Problem is that there's no fluentd plugin to fix the problem, so running a custom fluentd won't help, there's nothing to add to the configuration to make it work (unless you want to write your own solution)

is there a fluentd docker driver github issue I can follow for updates on when they expect to have this fixed

moby/moby#34855

They don't think it's a docker problem and don't plan to fix it. However, it will be fixed in GKE

@crassirostris crassirostris modified the milestones: v1.9, next-candidate Nov 8, 2017
@JorritSalverda
Copy link

That seems fair from Docker's perspective. I'd rather have stable containers as well :)

So you're thinking of switching from the current log drive json-file - at least that's what my Google Container Optimized image in GKE 1.8.1 uses - to the journald driver to circumvent the problem? And then use a plugin like https://github.com/okushchenko/fluent-plugin-docker-journald-concat to recombine the partial messages?

@crassirostris
Copy link
Author

@JorritSalverda

So you're thinking of switching from the current log drive json-file

No, that's not the case. Solution is TBD, but if you export your logs to Stackdriver (that has a limit of 100KB per entry) you won't see the difference

@JorritSalverda
Copy link

I think there's still a difference in that the 100KB stackdriver logging limit is larger than the 16KB docker limit at which the partial messaging starts, so without proper recombination of those chunked messages you'll lose the ability to send log lines larger than 16KB into stackdriver.

However we're also running our own fluentd that sends messages straight into pub/sub, which has a limit of 10MB per message. And although we're trying to keep the messages much smaller than 16KB it would be nice to be able to send bigger messages without instrumenting our applications to push the messages into pub/sub themselves.

I hope you find a nice solution that will make this possible :)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 9, 2018
@crassirostris
Copy link
Author

/remove-lifecycle rotten

Still relevant, hasn't been fixed

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 18, 2018
@desaintmartin
Copy link
Member

desaintmartin commented Aug 24, 2018

Would using the "concat" fluentd plugin be relevant?
https://github.com/fluent-plugins-nursery/fluent-plugin-concat
It seems to have explicit support for our specific use case.

Edit: For EFK, Adding fluentd-plugin-concat to fluentd-es image (https://github.com/wiremind/kubernetes/tree/fluentd-concat/cluster/addons/fluentd-elasticsearch/fluentd-es-image, https://hub.docker.com/r/wiremind/fluentd-elasticsearch-concat/) and adding

<filter **>
  @type concat
  key log
  multiline_end_regexp /\n$/
</filter>

to the ConfigMap
Works well.

Would a PR with those changes be acceptable?

desaintmartin added a commit to wiremind/kubernetes that referenced this issue Aug 29, 2018
@liangjacky
Copy link

trying to solve this as well. Having the same issue as @abedwardsw where long lines are being truncated and MDC gone.

desaintmartin added a commit to wiremind/kubernetes that referenced this issue Oct 16, 2018
…cker into several lines.

See kubernetes#52444.

Signed-off-by: Cédric de Saint Martin <cdesaintmartin@wiremind.fr>
vithati pushed a commit to vithati/kubernetes that referenced this issue Oct 25, 2018
…cker into several lines.

See kubernetes#52444.

Signed-off-by: Cédric de Saint Martin <cdesaintmartin@wiremind.fr>
@ju-la-berger
Copy link

ju-la-berger commented Nov 5, 2018

Please see

The big problem here is concurrency. Just concatenating all partial log messages as they arrive may be wrong. Let's say your app logs two events, called a and b here. Then the Docker log messages may be written to the JSON log file (or whatever Docker logging driver is used, e.g. in plain Docker or Docker Swarm) in this order:

a (first 16 KB)
b (first 16 KB)
a (another 5 KB)
b (another 16 KB)
b (another 13 KB)

The fluentd concat plugin cannot decide what to do. What records belong together?

The good message is: This can be solved (similar to moby/moby#37889) as Docker logs contain all the needed information here:

https://github.com/moby/moby/blob/8e610b2b55bfd1bfa9436ab110d311f5e8a74dcb/api/types/backend/backend.go#L32

So you just have to extend these methods and the JSONLogs struct by the partial log meta data:

https://github.com/moby/moby/blob/8e610b2b55bfd1bfa9436ab110d311f5e8a74dcb/daemon/logger/jsonfilelog/jsonfilelog.go#L143
https://github.com/moby/moby/blob/8e610b2b55bfd1bfa9436ab110d311f5e8a74dcb/daemon/logger/jsonfilelog/jsonlog/jsonlogbytes.go#L22

Then you have to fix fluent-plugins-nursery/fluent-plugin-concat#59: Mind the ID and ordinal of the partial log metadata so that you concatenate the right records in the correct order.

@ju-la-berger
Copy link

I just edited my above comment because I am not sure how the case of linebreaks (i.e. those written by the application) in log messages relate to this issue.

However, often you can control the logging of your application and let it write JSON to STDOUT (e.g. https://github.com/logstash/logstash-logback-encoder for Java / Spring Boot applications) so you do not have linebreaks.

Back to the original issue:

@desaintmartin: I have serious doubts that #68012 will do it under load with a lot of concurrent partial messages. I observed this in a Docker Swarm setup with fluentd and the concat plugin configured analogously.

@ju-la-berger
Copy link

@crassirostris Could you please reopen this issue?

The concurrency is the problem here: Let your container simultaneously produce two log messages of >16KB (let's say 20 KB). Then fluentd will concatenate all four messages to one large (16+16+4+4 KB = 40 KB) message. The fluentd (or whatever you use) needs to know the ID of the partial log metadata to concatenate only those messages with identical IDs. So this needs to be implemented in the (default) local JSON file log driver in Docker (moby).

@desaintmartin
Copy link
Member

Problem is that ID comes with Docker 18.06 if I'm not wrong, this is why I didn't use it: do we want to require a docker version which is 2 months old?

@coffeepac
Copy link
Contributor

@crassirostris is no longer much involved with the project. if you would like to open this back up to alter the implementation, please file a new issue @ju-la-berger.

@desaintmartin
Copy link
Member

If you do so, please ping me as well.

@kahootali
Copy link

@desaintmartin I have added the concat plugin

      <filter **.log>
          @type concat
          key log
          multiline_end_regexp /\n$/
      </filter>

but still it is not concatenating, Any Idea?

@desaintmartin
Copy link
Member

No idea why it does not work. For up-to-date information, I have inserted:

    # Concatenate multi-line logs (>=16KB)
    <filter kubernetes.var.log.containers.**>
      @type concat
      key log
      multiline_end_regexp /\n$/
      separator ""
    </filter>

Just after the kubernetes_metadate injection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug Categorizes issue or PR as related to a bug. milestone/needs-approval priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.