You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
The issues seen in issue 35548 appear to have resurfaced in 18.3 and 18.4.
Steps to reproduce the issue:
1.Set up swarm
2.Brake some node?? To force this issue we take rabbit offline to cause containers to die and try to restart to reconnect.
3. Look for dead tasks in network overlay.
Describe the results you received:
Ingress network containing dead tasks.
Describe the results you expected:
No dead tasks in overlay network
Additional information you deem important (e.g. issue happens only occasionally):
We have tried Ubuntu 18, and RHEL but with no change.
We have been struggling with this issue on 19.03.5 for about a month.
We think it may have first occurred as a result of OOM on a manager node. We had a second recurrence of the issue before we could address the root cause of the OOM. We have since addressed that.
At the time of both incidents we were running 19 managers. We have since reduced that to 5.
The issue hasn't recurred, but it's difficult to prove if either or both of our attempted mitigations made a difference, or if our swarm is still a ticking time bomb.
As of now, there are services and tasks that are stuck in the ingress overlay network that we can't remove. These services and tasks are not visible in the list of services we get back from docker service ls, and are not visible as running containers if we run docker container ls on any node in the swarm. Attempting to remove them with docker network disconnnect <containerid> results in an error message.
Does anyone know a way to manually clean up the dead tasks stuck in the overlay networks?
Description
The issues seen in issue 35548 appear to have resurfaced in 18.3 and 18.4.
Steps to reproduce the issue:
1.Set up swarm
2.Brake some node?? To force this issue we take rabbit offline to cause containers to die and try to restart to reconnect.
3. Look for dead tasks in network overlay.
Describe the results you received:
Ingress network containing dead tasks.
Describe the results you expected:
No dead tasks in overlay network
Additional information you deem important (e.g. issue happens only occasionally):
We have tried Ubuntu 18, and RHEL but with no change.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Hosts are AWS M4.XLarge
The text was updated successfully, but these errors were encountered: