GELF (Graylog Extended Format) driver does not handle partial log lines or log metadata #43162

smullenrga · 2022-01-18T18:37:15Z

GELF logging driver does not handle log metadata and therefore does not handle docker partial log line reassembly or passing of the metadata to a downstream processor.

Steps to reproduce the issue:

Enable the GELF driver in your daemon.json file.
In a docker container, generate a STDOUT line greater that 16k
Review logs generated through GELF and notice they are missing metadata and are not reassembled.

Describe the results you received:
GELF generated logs from lines greater than 16K are missing the required metadata for proper reassembly and they are improperly concatenated onto each other. This issue was found running Ansible AWX 19.X in a kubernetes environment configured to use GELF. When an ansible command generates an output line greater than 16k, Docker splits the line at 16k and the GELF driver messes up the output data and thereby breaks the AWX GUI because the log lines are no longer parsable JSON.

Describe the results you expected:
Expect to see either the full 16K+ log line reassembled properly or a log message containing the partial log line metadata and the partial log message.

Additional information you deem important (e.g. issue happens only occasionally):
Looking at the source code in daemon/logger, jsonfilelog appears to have been modified to handle partial log lines by concatenating log lines until PLogMetaData.last is true (or PLogMetaData is nil).

Fluentd appears to have been updated to read metadata from the PLogMetaData and post it along with the message content.

The GELF driver seems to have no processing of PLogMetaData at all and therefor passes the message through without concatenating and without any metadata needed to reassemble the messages.

Output of docker version:

sudo docker version
Client: Mirantis Container Runtime
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        40ef3b6
 Built:             Thu Aug 19 18:59:41 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Mirantis Container Runtime (this node is not a swarm manager - check license status on a manager node)
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       e1bf5b9
  Built:            Thu Aug 19 18:57:54 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.5
  GitCommit:        72cec4be58a9eb6b2910f5d10f1c01ca47d231c0
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

sudo docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.0-zeta1-96-g44932b629b)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-tp-docker)
  cluster: Manage Mirantis Container Cloud clusters (Mirantis Inc., v1.9.0)
  containercloud: Manage Docker Enterprise Container Cloud (Mirantis Inc., v0.1.0-beta1)
  registry: Manage Docker registries (Docker Inc., 0.1.0)

Server:
 Containers: 33
  Running: 28
  Paused: 0
  Stopped: 5
 Images: 18
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: gelf
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: <REDACTED>
  Is Manager: false
  Node Address: <REDACTED>
  Manager Addresses:
   <REDACTED>
   <REDACTED>
   <REDACTED>
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 72cec4be58a9eb6b2910f5d10f1c01ca47d231c0
 runc version: v1.0.1-0-g4144b63
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1160.49.1.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 62.74GiB
 Name: <REDACTED>
 ID: <REDACTED>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
  com.docker.security.seccomp=enabled
 Experimental: false
 Insecure Registries:
  <REDACTED>
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

The text was updated successfully, but these errors were encountered:

thaJeztah · 2022-01-19T15:01:17Z

Thanks for opening this ticket; it looks indeed that #35831 only updated some of the logging drivers, but other logging drivers do not (yet) take advantage of the additional metadata.

I'm not super-familiar with the Gelf specs, but a quick look at the docs shows that there's some support for partial messages, mentioned in the "chunking" section; https://docs.graylog.org/docs/gelf

(...)
Prepend the following structure to your GELF message to make it chunked:

Chunked GELF magic bytes - 2 bytes: 0x1e 0x0f

Message ID - 8 bytes: Must be the same for every chunk of this message. Identifying the whole message and is used to reassemble the chunks later. Generate from millisecond timestamp + hostname for example.

Sequence number - 1 byte: The sequence number of this chunk. Starting at 0 and always less than the sequence count.

Sequence count - 1 byte: Total number of chunks this message has.

All chunks MUST arrive within 5 seconds or the server will discard all already arrived and still arriving chunks. A message MUST NOT consist of more than 128 chunks.

But I'm not sure if that's a viable solution;

it's only supported when using UDP (not when using TCP)
the prefix must contain the total number of chunks (Sequence count); the docker daemon won't be able to provide this information, as it splits the logs once it reaches the maximum log-entry size, at which point it does not yet know how much more logs will arrive. There's no real solution to that, other than buffering the whole log output before splitting, but this would defeat the reason for splitting (prevent a DOS due to memory exhaustion, which can easily happen if a container prints large amounts of data to stdout/stderr).

So to support GELF "native" chunking, I think the only solution would be to have support for this in the GELF spec (use a combination of "start of chunk" and "end of chunk" prefixes, without requiring the total number to be passed).

Alternatives

An alternative would be to include the metadata as attributes on log messages that are sent to GELF (using _[additional field]s), this would preserve the information that a message was split, but would require manually reconstructing the log-messages based on that information (so not ideal either).

smullenrga · 2022-01-19T15:23:37Z

I would propose that another possible though not ideal solution would be to modify the GELF driver to work like the jsonfilelog code where the docker log driver reassembles the message. The fluentd driver passes on the partial message with metadata. The jsonfilelog driver appears to append lines until until PLogMetaData.last is true (or PLogMetaData is nil) and then sends the message buffer (unless I'm reading that code incorrectly).

While ideally, the log driver would simply pass on the partial message and data needed to reassemble, that does not seem to be possible for some drivers where partial metadata as Docker provides is not supported. Reassembling lines as jsonfilelog does would at least provide functional processing without the requirement that the protocol supports partial messages in Docker format.

As things stand, the GELF driver produces breaking output for any systems relying upon log data where an output line exceeds 16k. Ansible AWX is a solid example of this currently. If Docker is using GELF and running AWX with Kubernetes in the environment, STDOUT and STDERR gathered by AWX controllers from the AWX Execution Environments becomes gibberish and breaks AWX. (I understand this is not necessarily "Docker's problem", I simply provided my current use case where this was found as an example where the 16k limit is directly breaking products due to lack of support).

thaJeztah · 2022-01-19T15:49:25Z

The reassembling for the json-file logging driver happens when reading the logs (docker logs); I don't think the GELF logging driver has "read" support (so the daemon only sends the messages to GELF). When using docker 20.10, it should also write to a local buffer (which uses the local logging driver) to provide read support (through docker logs); https://docs.docker.com/config/containers/logging/dual-logging/.

thaJeztah added area/logging kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. labels Jan 19, 2022

AmyChik mentioned this issue May 14, 2022

deleted #43600

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GELF (Graylog Extended Format) driver does not handle partial log lines or log metadata #43162

GELF (Graylog Extended Format) driver does not handle partial log lines or log metadata #43162

smullenrga commented Jan 18, 2022

thaJeztah commented Jan 19, 2022

smullenrga commented Jan 19, 2022

thaJeztah commented Jan 19, 2022

GELF (Graylog Extended Format) driver does not handle partial log lines or log metadata #43162

GELF (Graylog Extended Format) driver does not handle partial log lines or log metadata #43162

Comments

smullenrga commented Jan 18, 2022

thaJeztah commented Jan 19, 2022

Alternatives

smullenrga commented Jan 19, 2022

thaJeztah commented Jan 19, 2022