Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to 20.10 breaks log tailing: unexpected EOF errors #41820

Closed
PandarinDev opened this issue Dec 18, 2020 · 57 comments · Fixed by #42104
Closed

Upgrade to 20.10 breaks log tailing: unexpected EOF errors #41820

PandarinDev opened this issue Dec 18, 2020 · 57 comments · Fixed by #42104
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10

Comments

@PandarinDev
Copy link

PandarinDev commented Dec 18, 2020

Description

Our E2E test suite uses testcontainers-java to run our microservices and their dependencies in containers for the duration of our tests. All of the tests were passing with 19.03.14 until we updated our build nodes to Docker 20.10.1, at which point we started repeatedly getting STDOUT: $Error grabbing logs: unexpected EOF errors, and our Wait.forLogMessage calls started timing out.

Note however, that this issue seems sporadic as about 1 out of 10 builds still passes. After we reverted to Docker 19.03.14 the issue went away. We're using the latest version of Testcontainers (1.15.1 at the time of reporting the issue).

After contacting the developers I was told this is most likely a Docker issue.

Steps to reproduce the issue:
Unfortunately our projects are not OSS, so I cannot give you the exact containers used. I'll try to describe a matching scenario.

  1. Set up a project that uses the latest version of testcontainers-java (1.15.1).
  2. Set up a docker image that produces a lot of standard output.
  3. Start the image using testcontainers-java and wait for a log message using Wait.forLogMessage().
  4. A huge percentage of the times this will result in the error message STDOUT: $Error grabbing logs: unexpected EOF errors and the wait policy will not trigger even though the messages are present in the log.

Describe the results you received:
Error message: STDOUT: $Error grabbing logs: unexpected EOF errors, wait policy for log message not triggering even though the message was present in the log.

Describe the results you expected:
No EOF error, wait policy for log message working.

Additional information you deem important (e.g. issue happens only occasionally):
As mentioned in the description the issue seems sporadic, but it happens in about 80+% of our CI runs.

Output of docker version:
Previous version where the tests were passing:

Client: Docker Engine - Community
 Version:           19.03.14
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        5eb3275d40
 Built:             Tue Dec  1 19:20:26 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       5eb3275d40
  Built:            Tue Dec  1 19:18:53 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Current version where the issue is present:

Client:
 Version:           20.10.1
 API version:       1.41
 Go version:        go1.15.6
 Git commit:        831ebeae96
 Built:             Tue Dec 15 22:25:01 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.15.6
  Git commit:       f0014860c1
  Built:            Tue Dec 15 22:24:28 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b.m
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Previous version where the tests were passing:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 92
 Server Version: 19.03.14
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ea765aba0d05254012b0b9e595e995c09186427f
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-58-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 3
 Total Memory: 15.64GiB
 Name: build1
 ID: QWYR:EE5C:DWTE:MY65:MQ7Y:FERG:F7J2:IDTT:7LVR:VHSE:REFY:6AQH
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Current version where the issue is present:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-tp-docker)

Server:
 Containers: 134
  Running: 8
  Paused: 0
  Stopped: 126
 Images: 1390
 Server Version: 20.10.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b.m
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.9.14-arch1-1
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.24GiB
 Name: RE02WL03
 ID: YC3H:JCWA:APFN:VOQ7:CWPG:B4VO:5FYW:UPVL:2H5B:PO4H:SXO7:LSGH
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio weight support
WARNING: No blkio weight_device support

The current version and info parts were copied from an Arch box, while the previous version and info parts were copied from an Ubuntu box, but we could also reproduce the issue on Ubuntu after upgrading to 20.10.

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Could reproduce using both Ubuntu 20.10 and Arch Linux.
@cpuguy83 cpuguy83 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. area/logging version/20.10 labels Dec 18, 2020
@tiborvass
Copy link
Collaborator

Related docker/for-mac#5145

@sorinvisan89
Copy link

I also can confirm this is happening. Worked fine before upgrade of docker engine from 19 to 20.

I have multiple containers(about 7 running), 3 of them produce generous output.

Usually, for one or two containers the logging fails with the EOF error during integration testing.

docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)
  scan: Docker Scan (Docker Inc., v0.5.0)

Server:
 Containers: 7
  Running: 7
  Paused: 0
  Stopped: 0
 Images: 20
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.121-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 4.825GiB
 Name: docker-desktop
 ID: QDG2:L34D:MF6P:RWBK:ULT5:CTNT:GNSN:UCNB:K2EU:U4GP:ZVE6:VIA7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: gateway.docker.internal:3128
 HTTPS Proxy: gateway.docker.internal:3129
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Logs in the dockerd are filled with the following:

2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.781629156Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6765
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.781650830Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6766
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.781666688Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6767
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782293790Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6768
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782321359Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6769
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782337211Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6770
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782352359Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6771
2021-01-04T17:58:55Z dockerd time="2021-01-04T17:58:55.782382550Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=6772

Best regards,
Sorin

@lindycoder
Copy link

lindycoder commented Jan 6, 2021

Hello!

This issue is also affecting docker-for-mac version 3.0.3 me with a very verbose container

$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)
  scan: Docker Scan (Docker Inc., v0.5.0)

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 206
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.121-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 1.942GiB
 Name: docker-desktop
 ID: HOTY:MO5N:FHBP:X5XL:ATD5:FECJ:TO4C:77OU:DAI2:KQJR:DATW:R4KO
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 41
  Goroutines: 46
  System Time: 2021-01-06T21:36:10.251766204Z
  EventsListeners: 3
 HTTP Proxy: gateway.docker.internal:3128
 HTTPS Proxy: gateway.docker.internal:3129
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Seeing the same error logs as other reports in ~/Library/Containers/com.docker.docker/Data/log/vm/dockerd.log

2021-01-06T19:59:40Z dockerd time="2021-01-06T19:59:40.627461991Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=19995
2021-01-06T19:59:40Z dockerd time="2021-01-06T19:59:40.627484481Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=19996
2021-01-06T19:59:40Z dockerd time="2021-01-06T19:59:40.627534667Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=19997
2021-01-06T19:59:40Z dockerd time="2021-01-06T19:59:40.627550573Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=19998
2021-01-06T19:59:40Z dockerd time="2021-01-06T19:59:40.627568544Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=19999
2021-01-06T19:59:40Z dockerd time="2021-01-06T19:59:40.627916644Z" level=error msg="Error streaming logs: unexpected EOF" container=fb0dea1014da664d46c72b1cd3ced88fb7ecc3c7b2abf7f10fd0f43898b373bb method="(*Daemon).ContainerLogs" module=daemon
2021-01-06T19:59:40Z dockerd time="2021-01-06T19:59:40.628082325Z" level=debug msg="end logs (<nil>)" container=fb0dea1014da664d46c72b1cd3ced88fb7ecc3c7b2abf7f10fd0f43898b373bb method="(*Daemon).ContainerLogs" module=daemon

I can currently reproduce consistantly, so if there's extra information i can provide just ask me!

Thank you,
Martin

@cpuguy83
Copy link
Member

cpuguy83 commented Jan 6, 2021

Any chance we can get a sample of the log file in question? This would be under /var/lib/docker/containers/<id>/<id>-log.json

@lindycoder
Copy link

lindycoder commented Jan 7, 2021

Good morning @cpuguy83

I tried to isolate a sample of the file where the error happens, but it's always a different part and i got the -log.json out and loaded every lines with JSON and they're all fine. Which means the content being written to this file is probably not the cause.

My very wild guess is that there is a race condition between the writer and the reader of this file/stream, but that's just a guess.

I noticed in the dockerd.log that i periodically get a single line like

2021-01-07T13:53:40Z dockerd time="2021-01-07T13:53:40.593422800Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=0

I can't really tell if it's the same container that gives the problem, but this is there alone and not causing any problem.

This leaves me to believe that there is a resilient mechanism for handling the race condition, if that's the problem.

But sometimes it just goes in a loop and retries up to 20000 times. This whole process takes ~2 seconds during which the container still produce logs.

Here's the output of one failure from the docker logs -f:

2021-01-07 15:17:24.114 UTC [562] LOG:  duration: 0.207 ms  statement: SELECT table_a.setting AS table_a_setting, table_a.id AS table_a_id
error from daemon in stream: Error grabbing logs: unexpected EOF

And here's the corresponding lines in the -log.json

{"log":"2021-01-07 15:17:24.114 UTC [562] LOG:  duration: 0.207 ms  statement: SELECT table_a.setting AS table_a_setting, table_a.id AS table_a_id \n","stream":"stderr","time":"2021-01-07T15:17:24.1158133Z"}
{"log":"\u0009FROM table_a JOIN apps ON table_a.id = apps.id \n","stream":"stderr","time":"2021-01-07T15:17:24.1158641Z"}

At the time of this failure, the -log.json file was 35 Mb long

@cpuguy83
Copy link
Member

cpuguy83 commented Jan 7, 2021

Indeed we do handle the case where there can be a race between reading/writing a json entry.
It seems like we have messed up the byte stream somehow.

Working on getting more debugging in here (at least adding container ID to these error logs from the daemon).

Do you have log rotation or anything else enabled?
I have not been able to reproduce this trying in a few different ways.

@cpuguy83
Copy link
Member

cpuguy83 commented Jan 7, 2021

Also, is this while following logs or just tailing existing ones?

@lindycoder
Copy link

@cpuguy83 I'm currently using docker-for-mac, following logs docker logs --follow <name>, it's the default install without any special considerations, so the vm dunning docker server is pristine. The -json.log is not rotated.

It usually takes a couple minutes before triggering. This only affect a single container producing massive logging. It's actually a postgres container logging all the queries while running a big suite of tests. Couple MBs of logs every test run.

I've managed to get it to occur more if there are more containers around with several docker logs --follow. Our setup launches containers and --follow them printing in a file. The stack i use to reproduce is running a dozen containers.

So that's a dozen docker logs --follow running concurrently, but the only container triggering the error is the postgres one, which produces the most logs, by far.

@SvenDowideit
Copy link
Contributor

SvenDowideit commented Jan 13, 2021

I'm getting this very often - I'm running on Ubuntu - and it happens whenever I'm tailing (docker logs -f <caddy>) the logs of a very active proxy (both caddy and goproxies) that have lots of log lines

@cpuguy83
Copy link
Member

@SvenDowideit Is there any chance you could provide and image that repros it?
I have been unable to through a variety of tests.

@SvenDowideit
Copy link
Contributor

yup

I'm using a more complicated version of:

	docker run -dit \
  		--port 8081:8081 \
  		--restart always \
		--name goproxy \
		--network cirri_proxy \
		--label "virtual.port=8081" \
		-v goproxy_cache:/go \
			goproxy/goproxy

docker logs -f goproxy

and then building something non-trivial with GOPROXY=http://localhost:8081

i don't think it happened the first time, while its filling its cache, but after that, I get it every time.

The extra complexity is that I don't have the --port part, and am instead using a caddyserver hooked up to the cirri_network that has a reverse proxy config generated from the virtual.port label - no stinking --port mapping for us :D

@wendigo
Copy link

wendigo commented Jan 13, 2021

We've hit the same bug in the https://github.com/trinodb/trino while running our product tests (testcontainers based). When Trino is starting, it generates a lot of logs. During the startup, 20% of runs fail with EOF.

It's pretty straightforward to reproduce it locally:

git clone https://github.com/trinodb/trino
cd trino
./mvnw clean install -DskipTests
./testing/trino-product-tests-launcher/bin/run-launcher test run --no-bind --environment  multinode-tls-kerberos -- -t io.trino.tests.cli.TestTrinoCli

@pidzama
Copy link

pidzama commented Jan 21, 2021

I have the same issue when starting docker compose with ~10 Wildfly services, each generating rather normal amount of logs during WAR/EAR deployment.

@instantlinux
Copy link

I have a local gitlab instance running. The problem happened right away when I upgraded. That's a great way to generate many megabytes per hour of logs, for anyone who needs to reproduce this.

@pete-woods
Copy link
Contributor

We are also seeing this issue in some integration tests that run against Docker that read logs from containers using the Go Docker API.

@pete-woods
Copy link
Contributor

pete-woods commented Jan 27, 2021

Okay, I have come up with an easy way to reproduce it:

docker run -d --rm --name teh-logs ubuntu seq 10000000
docker logs -f teh-logs > logs.txt

In my case, I get the output:

% docker run -d --rm --name teh-logs ubuntu seq 10000000
14cabbd7f8515e5e0631a39c29535177bfaeba8c97e76269e5318f49c3b306e6
% docker logs -f teh-logs > logs.txt
error from daemon in stream: Error grabbing logs: unexpected EOF

@pete-woods
Copy link
Contributor

I have found running more than one copy of this at once (with -2, -3 appended to the name of the container names) helps make it reproduce more frequently.

@cpuguy83
Copy link
Member

@pete-woods Anything special about the environment? I can't repro with this either :(

I'm assuming this is all happening in real envs and not Docker Desktop specifically?

@pete-woods
Copy link
Contributor

pete-woods commented Jan 27, 2021

I have reproduced with this technique now on Docker for Mac, and in a Vagrant Ubuntu 18.04 VM.

It definitely took more poking in the VMs than the Mac to reproduce, though. Running two copies of the Ubuntu seq container and tailing the logs from both did it fairly reliably for me.

@pete-woods
Copy link
Contributor

The other environment we're reproducing in is a GCP VM running Ubuntu 20.04 in our integration tests. (GCP is where CircleCI runs its VMs).

@instantlinux
Copy link

My environment is a quad-core bare-metal Ubuntu 20.04 with 16GB RAM (half of it in avail state, and another 5GB in buff/cache state), running about 2 dozen containers. Load average usually about 0.80 - 1.50.

@aeriksson
Copy link

aeriksson commented Feb 3, 2021

I wasn't able to reproduce using @pete-woods's approach either, but it always seems to happen in a few seconds if I run

docker run -it --name foo ubuntu sh -c "apt-get update && apt-get install -y nyancat && nyancat"

in one terminal window, and then

docker logs -f foo

in a separate terminal window.

For what it's worth, it also happens if I tail the logs over Docker's HTTP API.

Running Docker 20.10.2 on MacOS on a Macbook Pro.

@tobiasstadler
Copy link

@cpuguy83 did you run ˋdocker logs -fˋ? Do you use the json-file logging?

@instantlinux
Copy link

Can you run a heavy mixed load on the worker and repeat the nyancat test?

@cclafferty
Copy link

It breaks within 10 or so seconds for me as soon as you start requesting logs.

@PandarinDev
Copy link
Author

+1 to @tobiasstadler 's last comment, the issue is only present if you run docker logs -f on an already running container.

@cpuguy83
Copy link
Member

cpuguy83 commented Mar 1, 2021

@tobiasstadler Yes and yes.

Looking at your suggested change.
There is something fishy here and this is related to the commit I had in mind.

@cpuguy83
Copy link
Member

cpuguy83 commented Mar 1, 2021

@tobiasstadler

re: #41820 (comment)

Yes, I think that's the culprit.
I haven't been able to reproduce with a real docker client, however I wrote a unit test to simulate the condition.

@cpuguy83
Copy link
Member

cpuguy83 commented Mar 1, 2021

#42104 should take care of this, but will need to be backported after it is merged.

@thaJeztah thaJeztah moved this from Needs triage to High priority in 20.10.x bugs/regressions Mar 4, 2021
@emulanob
Copy link

I don't know if my case contributes to the resolution at all, but all clients in our Nomad cluster are flooding the syslog with these error messages, just like @sorinvisan89 and @lindycoder reported. I'm still able to retrieve logs both via docker logs and nomad logs commands, though.

The issue was noticed because our elasticsearch cluster alerted low disk space.

For reference:

$ docker version
Client: Docker Engine - Community
 Version:           20.10.3
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        48d30b5
 Built:             Fri Jan 29 14:33:21 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.3
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       46229ca
  Built:            Fri Jan 29 14:31:32 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

20.10.x bugs/regressions automation moved this from High priority to Closed Mar 18, 2021
craig-willis added a commit to whole-tale/terraform_deployment that referenced this issue Mar 24, 2021
craig-willis added a commit to whole-tale/terraform_deployment that referenced this issue Mar 24, 2021
@anodyne-canvas
Copy link

Thanks for the fix, when will this be included in a release?

@pete-woods
Copy link
Contributor

I can't see it in the 20.10.6 milestone, where I think a lot of us would hope to.

@anodyne-canvas
Copy link

It actually is in the 20.10.6 milestone:
https://github.com/moby/moby/milestone/93?closed=1
#42174

@rbreejen
Copy link

rbreejen commented May 21, 2021

I am still encountering these errors, while running the latest docker version: 20.10.6. Can somebody else confirm?

How to reproduce (tested on Ubuntu 20.04.2 LTS & MacOS):
Terminal 1: docker run -it --name foo ubuntu sh -c "apt-get update && apt-get install -y nyancat && nyancat"
Terminal 2: docker logs -f foo (expect the same "You have nyaned for x seconds" screen as in terminal 1)
Terminal 3 (Ubuntu): journalctl -u docker -f
Terminal 3 (MacOS): tail -f ~/Library/Containers/com.docker.docker/Data/log/vm/dockerd.log
Terminal 3 (other OS): see https://stackoverflow.com/a/30970134/9103163 how to view the logs of the docker daemon process on your OS.

The error is showing up immediately.

@aleon1220
Copy link

Docker Engine Error json-file: fix sporadic unexpected EOF errors
I had spent sometime researching and trying to find errors in the docker
logs.
There is a bug in docker engine 20.10.5 which is the version running in docker. A Github PR fixes this. #42104
I found the error while running

$ journalctl -u docker.service

level=warning msg="got error while decoding json" error="unexpected EOF" retries=19999

Docker driver for logs has a bug and it won't handle some log formatting. The bug then floods the system's log with retries messages. This situation generates a very annoying CPU load.

Docker engine latest is 20.10.7. The issue has been fixed in release Docker engine 20.10.6 https://docs.docker.com/engine/release-notes/#20106

@emulanob
Copy link

I checked a few of my instances at random and they are running version 20.10.6 where the bug should be fixed, but instead I still see lots of warning messages when journalctl -afu docker.service is executed:

level=warning msg="got error while decoding json" error="unexpected EOF" retries=X

Maybe there's something else causing it too?

@1it
Copy link

1it commented Aug 19, 2021

I've upgraded the docker version on a test machine and still have the same warnings:

Aug 19 09:13:44 test-docker-0 dockerd[17267]: time="2021-08-19T09:13:44.285053897Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=0
Client: Docker Engine - Community
 Version:           v20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        aa949f2ad
 Built:             Thu Aug 12 15:10:34 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          v20.10.8
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       91dc595e96
  Built:            Thu Aug 12 15:10:34 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

@dragos-dgp
Copy link

Hello, we upgraded the engine to the latest and we are still flooded with that error message @cpuguy83 :
docker_engine

@gbolo
Copy link

gbolo commented May 2, 2022

I can also confirm this flooding. Using Server Version: 20.10.11. Also running nomad

@schemmerling
Copy link

schemmerling commented May 31, 2022

ditto: I can also confirm this flooding. Using Server Version: 20.10.11. Also running nomad

@oren-nonamesecurity
Copy link

Same here, docker 20.10.10 still flooding
Screenshot from 2022-10-30 09-10-58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10
Projects
No open projects
Development

Successfully merging a pull request may close this issue.