Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Engine Spews GRPC Errors on Fedora 34 (FedoraCoreOS) #42285

Open
fifofonix opened this issue Apr 12, 2021 · 3 comments
Open

Docker Engine Spews GRPC Errors on Fedora 34 (FedoraCoreOS) #42285

fifofonix opened this issue Apr 12, 2021 · 3 comments

Comments

@fifofonix
Copy link

Description

Newly formed docker swarm of FedoraCoreOS (34.20210328.1.1 - Fedora 34 based) running Docker Engine 20.10.5 spews journald GRPC errors as soon as any service is created. This error occurs whether running cgroups v1 or v2. Info below from v2 example.

Steps to reproduce the issue:

  1. Create new 2-node docker swarm (docker swarm init / docker swarm join...)
  2. Create noddy service docker service create nginx

Describe the results you received:

Docker service is running fine docker service ls. Docker unit is spewing errors continuously to journal. Extract below from journald (all units) with manual edit showing point of occurrence:

Apr 12 13:07:18 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:18.078662501Z" level=info msg="Node 66cb2d7b5a7a/172.25.80.44, joined gossip cluster"
Apr 12 13:07:18 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:18.079448145Z" level=info msg="Node 66cb2d7b5a7a/172.25.80.44, added to nodes list"
...
Above is when swarm formed...below occurs when docker service create is executed...
...
Apr 12 13:07:39 l-pdm-a1-1 systemd[1692]: var-lib-docker-overlay2-1d91a5b15e3e2c5520cc76feef0947a6e580cc65c7fce708ec1690c087e5b3d6\x2dinit-merged.mount: Deact
ivated successfully.
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.540893306Z" level=error msg="[core] grpc: server failed to encode response:  rpc error: co
de = Internal desc = grpc: error while marshaling: proto: Marshal called with nil" module=grpc
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.541510912Z" level=error msg="closing session after fatal error" error="rpc error: code = I
nternal desc = grpc: error while marshaling: proto: Marshal called with nil" module=node/agent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.541684913Z" level=error msg="status reporter failed to report status to agent" error="rpc
error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil" module=node/agent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.541841435Z" level=error msg="agent: session failed" backoff=100ms error="rpc error: code =
 Internal desc = grpc: error while marshaling: proto: Marshal called with nil" module=node/agent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.541968423Z" level=info msg="[core] Channel Connectivity change to SHUTDOWN" module=grpc
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.542070025Z" level=info msg="[core] Subchannel Connectivity change to SHUTDOWN" module=grpc
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.542193391Z" level=info msg="manager selected by agent for new session: { }" module=node/ag
ent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.542320089Z" level=info msg="waiting 50.121161ms before registering session" module=node/ag
ent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 systemd[1692]: var-lib-docker-overlay2-1d91a5b15e3e2c5520cc76feef0947a6e580cc65c7fce708ec1690c087e5b3d6-merged.mount: Deactivated s
uccessfully.
Apr 12 13:07:39 l-pdm-a1-1 NetworkManager[929]: <info>  [1618232859.5617] manager: (vethf99b46e): new Veth device (/org/freedesktop/NetworkManager/Devices/29)
Apr 12 13:07:39 l-pdm-a1-1 NetworkManager[929]: <info>  [1618232859.5639] manager: (vethd802193): new Veth device (/org/freedesktop/NetworkManager/Devices/30)
Apr 12 13:07:39 l-pdm-a1-1 systemd-udevd[2857]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 12 13:07:39 l-pdm-a1-1 systemd-udevd[2857]: Using default interface naming scheme 'v247'.
Apr 12 13:07:39 l-pdm-a1-1 kernel: docker0: port 1(vethd802193) entered blocking state
Apr 12 13:07:39 l-pdm-a1-1 kernel: docker0: port 1(vethd802193) entered disabled state
Apr 12 13:07:39 l-pdm-a1-1 kernel: device vethd802193 entered promiscuous mode
Apr 12 13:07:39 l-pdm-a1-1 kernel: audit: type=1700 audit(1618232859.574:566): dev=vethd802193 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Apr 12 13:07:39 l-pdm-a1-1 audit: ANOM_PROMISCUOUS dev=vethd802193 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Apr 12 13:07:39 l-pdm-a1-1 systemd-udevd[2855]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 12 13:07:39 l-pdm-a1-1 systemd-udevd[2855]: Using default interface naming scheme 'v247'.
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.630753162Z" level=error msg="[core] grpc: server failed to encode response:  rpc error: co
de = Internal desc = grpc: error while marshaling: proto: Marshal called with nil" module=grpc
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.632322582Z" level=error msg="closing session after fatal error" error="rpc error: code = I
nternal desc = grpc: error while marshaling: proto: Marshal called with nil" module=node/agent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.632477062Z" level=error msg="status reporter failed to report status to agent" error="rpc
error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil" module=node/agent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.637585037Z" level=error msg="agent: session failed" backoff=100ms error="rpc error: code =
 Internal desc = grpc: error while marshaling: proto: Marshal called with nil" module=node/agent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.637749154Z" level=info msg="manager selected by agent for new session: { }" module=node/ag
ent node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 dockerd[2582]: time="2021-04-12T13:07:39.637872633Z" level=info msg="waiting 71.53823ms before registering session" module=node/age
nt node.id=qkymec22jd5p2y7kb8ojw4kqw
Apr 12 13:07:39 l-pdm-a1-1 audit[2582]: SYSCALL arch=c000003e syscall=44 success=yes exit=40 a0=12 a1=c001be3fb0 a2=28 a3=0 items=0 ppid=1 pid=2582 auid=42949
67295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" subj=system_u:system_r:container
_runtime_t:s0 key=(null)
Apr 12 13:07:39 l-pdm-a1-1 kernel: audit: type=1300 audit(1618232859.574:566): arch=c000003e syscall=44 success=yes exit=40 a0=12 a1=c001be3fb0 a2=28 a3=0 ite
ms=0 ppid=1 pid=2582 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="dockerd" exe="/usr/bin/dockerd" s
ubj=system_u:system_r:container_runtime_t:s0 key=(null)

Describe the results you expected:

Docker service running fine. No errors in the log just like I get in FedoraCoreOS v33 based swarms.

Output of docker version:

Client:
 Version:           20.10.5
 API version:       1.41
 Go version:        go1.16
 Git commit:        55c4c88
 Built:             Sun Mar 14 22:41:08 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.5
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16
  Git commit:       363e9a8
  Built:            2021-03-14 22:40:26.546445533+00:00
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.0~beta.3
  GitCommit:
 runc:
  Version:          1.0.0-rc93
  GitCommit:        2d1a5059279447243892c75b7c5c726065d8e4c9
 docker-init:
  Version:          0.19.0
  GitCommit:

Output of docker info:

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 20.10.5
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: yde0lkwowy6oswxy8khia8axc
  Is Manager: true
  ClusterID: sam7qg925zscs7jps6wz8v8ne
  Managers: 1
  Nodes: 2
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.229.51
  Manager Addresses:
   192.168.229.51:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: /usr/libexec/docker/docker-init
 containerd version:
 runc version: 2d1a5059279447243892c75b7c5c726065d8e4c9
 init version:
 Security Options:
  seccomp
   Profile: default
  selinux
  cgroupns
 Kernel Version: 5.11.10-300.fc34.x86_64
 Operating System: Fedora CoreOS 34.20210328.1.1
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1.93GiB
 Name: l-pdm-c1-1
 ID: OAK3:PTGW:VMXY:UQD3:UGML:O45L:VHTF:YP5U:YZ5U:N77J:MR5A:MVYL
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: Support for cgroup v2 is experimental

Additional environment details (AWS, VirtualBox, physical, etc.):

Have reproduced the error on vSphere, vFusion, AWS running with either cgroups v1 or v2.

@thaJeztah
Copy link
Member

This looks like a build of docker that was built by the CoreOS packagers, and is using at least a different version of Go, and containerd; is it possible it was built with different versions of build-time go dependencies as well? For example, containerd requires specific versions of gRPC, as there are incompatible changes in other versions; https://github.com/containerd/containerd/blob/v1.5.0-rc.1/go.mod#L69-L76

@fifofonix
Copy link
Author

Thanks. Makes sense. I'm following up with FedoraCoreOS team in linked issue.

@thaJeztah
Copy link
Member

Thanks! Keep us updated! (mostly trying to avoid diving deep into issues that are related to / specific to packaging)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants