You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
docker node cant participate in the swarm due to GRPC message size errors following a large number of task failure due to a failing network card. I am not able to delete the node nor manage the runaway task's history. There seem to be no way to recover from this situation.
Steps to reproduce the issue:
have working swarm node
large number of task instance fail on the same node
grpc message now too large
Describe the results you received:
grpc message too large, makes most of docker unusable in older versions, makes node unable to participate in newer version
Describe the results you expected:
docker node works normally after large number of task failure or ability to manage leftover task instances as to lighten the weight of the 'grpc message' to fix the problem
Additional information you deem important (e.g. issue happens only occasionally):
I encountered the issue where a large number of tasks failed and 'docker service ls' stopped working with "grpc: received message larger than max (x vs. 4194304)", fixed it by upgrading docker, meanwhile a node couldnt seem to do anything during that, it is now repetadly throwing "level=error msg="agent: session failed" backoff=100ms error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (6057386 vs. 4194304)"" while trying to participate in the swarm.
Attempting to delete the node gives "Error response from daemon: rpc error: code = Unknown desc = raft: raft message is too large and can't be send"
I did try "docker swarm update --task-history-limit 0" to clear the task instance history but that didnt work.
Output of docker version:
Master
Client:
Version: 18.09.5
API version: 1.39
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:44:24 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.5
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:10:53 2019
OS/Arch: linux/amd64
Experimental: false
Node:
Client:
Version: 18.09.5-ce
API version: 1.39
Go version: go1.12.3
Git commit: e8ff056dbc
Built: Fri Apr 12 08:22:13 2019
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.09.5-ce
API version: 1.39 (minimum version 1.12)
Go version: go1.12.3
Git commit: e8ff056dbc
Built: Fri Apr 12 08:21:24 2019
OS/Arch: linux/amd64
Experimental: false
VynDragon
changed the title
swarm node cant stay in swarm due to grpc size limit
swarm node cant participate in swarm due to grpc size limit
Apr 30, 2019
Description
docker node cant participate in the swarm due to GRPC message size errors following a large number of task failure due to a failing network card. I am not able to delete the node nor manage the runaway task's history. There seem to be no way to recover from this situation.
Steps to reproduce the issue:
Describe the results you received:
grpc message too large, makes most of docker unusable in older versions, makes node unable to participate in newer version
Describe the results you expected:
docker node works normally after large number of task failure or ability to manage leftover task instances as to lighten the weight of the 'grpc message' to fix the problem
Additional information you deem important (e.g. issue happens only occasionally):
I encountered the issue where a large number of tasks failed and 'docker service ls' stopped working with "grpc: received message larger than max (x vs. 4194304)", fixed it by upgrading docker, meanwhile a node couldnt seem to do anything during that, it is now repetadly throwing "level=error msg="agent: session failed" backoff=100ms error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (6057386 vs. 4194304)"" while trying to participate in the swarm.
Attempting to delete the node gives "Error response from daemon: rpc error: code = Unknown desc = raft: raft message is too large and can't be send"
I did try "docker swarm update --task-history-limit 0" to clear the task instance history but that didnt work.
Output of
docker version
:Master
Node:
Output of
docker info
:Master
Node:
Additional environment details (AWS, VirtualBox, physical, etc.):
All physical, Master running ubuntu 16.04, node running a up to date arch linux.
The text was updated successfully, but these errors were encountered: