Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker swarm leave results in Error response from daemon: context deadline exceeded #34140

Closed
rommik opened this issue Jul 17, 2017 · 14 comments
Closed

Comments

@rommik
Copy link

rommik commented Jul 17, 2017

I'm trying to remove a Docker Swarm Node, but I get
Error response from daemon: context deadline exceeded
-- force results in the same issue.

Docker node ls shows the node as down but active.

I was able to remove the node by running
docker node rm node-name from a swarm manager.

I am able to docker-machine ssh in the node.

docker -v is Docker version 17.06.0-ce, build 02c1d87

'lsb_release -a` is

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 17.04
Release:        17.04
Codename:       zesty

What else can be done to troubleshoot this issue?

How can I manually force the node to be removed from swarm? If I delete the /swarm folder in Docker directory, will it do it?

UPDATE After about 30 minutes, docker swarm leave -f worked. This is alarming, which logs should I look up to help understand why this happened and what was the reason for the delay?

@ganeshmuralidhar
Copy link

This never worked for me. I am still facing the error. Any help here please?

@thaJeztah
Copy link
Member

The context deadline exceeded message usually indicates that you lost quorum (there's no majority of managers available, and a majority is needed to make any change to the swarm cluster, including "removing nodes from the swarm" - see Raft consensus in swarm mode, and (Maintain the quorum of managers)[https://docs.docker.com/engine/swarm/admin_guide/#maintain-the-quorum-of-managers]).

  • If you're running a two-manager setup (which you should never do, as it brings no fault tolerance, and actually doubles the probability you loose quorum), both managers need to be available.
  • If the node you're trying to remove is a manager, the node needs to be demoted first, before you can remove it from the swarm
  • docker swarm leave only affects the "local" state of the node you're running from (it destroys the local swarm state), but does not remove the node from the swarm. Because of this, it does not affect the number of expected managers to be present (other managers still expect that node to be there), so docker node demote / docker node rm is still needed.

Please keep in mind that the GitHub issue tracker is not intended as a general support forum,
but for reporting bugs and feature requests. For other type of questions, consider using one of;

I'm closing this issue because this is a support question. If you suspect there's a bug at hand, and you can reproduce on a current version of Docker, please open an issue with the information that's requested in the issue template.

@ganeshmuralidhar
Copy link

Restarting the docker service and getting the time in sync between manager host and worker fixed this for me

@thaJeztah
Copy link
Member

Yes, it's important to prevent clock-skew between nodes. While there is a grace-period built-in, if the difference between nodes becomes to big, it's possible that (e.g.) certificates are marked "expired", resulting in problems communicating between nodes.

@yunghoy
Copy link

yunghoy commented Nov 6, 2017

My swarm cluster has lost its quorum.

The problem is that the only way to re-configuring the swarm cluster is making a new cluster, and any of them cannot leave swarm cluster.

$ docker swarm leave -f
Error response from daemon: context deadline exceeded

I turned off docker services and deleted swarm directory.

@hushenbeg
Copy link

hushenbeg commented May 15, 2018

i created one swarm by calling init_swarm() method in python.I passed some manager ip initially and swarm has created after when i am goinig to add worker to that created swarm,i am getting error swarm already part of manager.Even though i am using slave ip or worker ip at this time. Please help me to solve this error...See the following code for more clearity

@hushenbeg
Copy link

$docker swarm init
Swarm initialized: current node (09qrryt5hp0cpfj5mhefd6gzm) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-01skg7jtkpsp8h5nww8t8z8lpu7ud3s2z2tpmj11jezth4ufzr-5onincnsiq18eyht7nt1syoqx 192.168.2.219:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
$docker swarm join --token SWMTKN-1-01skg7jtkpsp8h5nww8t8z8lpu7ud3s2z2tpmj11jezth4ufzr-5onincnsiq18eyht7nt1syoqx 192.168.2.219:2377

Error response from daemon: This node is already part of a swarm. Use "docker swarm leave" to leave this swarm and join another one.

@askumar-dn
Copy link

askumar-dn commented Jun 19, 2018

Restart the docker service and re-attemt to leave the swarm(docker swarm leave -f). It should work.

@snth
Copy link

snth commented Jul 25, 2019

None of the steps described above worked for me.

Also removing /var/lib/docker/swarm didn't work for me either.

However removing /var/lib/docker did fix the issue for me. Of course that means that you lose all locally stored images and have to download these again but at least I regained the ability to use the node.

@HueyLiu
Copy link

HueyLiu commented Sep 26, 2019

This works for me:

  • sudo service docker stop
  • sudo rm -rf /var/lib/docker/swarm
  • sudo service docker start

@rashidk08
Copy link

If you get error "Error response from daemon: context deadline exceeded" then simple restart docker at Master Node:
sudo service docker restart
then
docker swarm leave Or docker swarm leave -f

@saurabhcdt
Copy link

saurabhcdt commented Nov 10, 2020

docker -v
Docker version 19.03.1, build 74b1e89

I have a 3 master nodes cluster. We were doing OS patching activity to patch CentOS from 7.6 to 7.8 version.
For 2 nodes, it went well. For the last node, when VM is restarted, node was Reachable but with status = down.

Now, it's not getting in sync with other 2. I tried restarting the docker service but of no use.
I tried to demote this node from other master node but then also, docker swarm leave --force is not leaving the node from cluster getting below error:

docker swarm leave --force
Error response from daemon: context deadline exceeded

Then I removed the node in issue using docker node rm but still same error on docker swarm leave --force

Updating this post to put answer:
Docker 10.03.1 has a bug where there's a file "tasks.db" which grows large & is never purged. This is fixed in later versions. But, for this specific version, it grows unless it's purged/removed manually. In my case, it was about 15G & hence, swarm was taking long time to load & start swarm services. To this, we thought swarm is not coming up. If we would have given more time like 30-60 min, it may have come up.
So, in my case, below sequence solved the issue:

  1. Stop docker service
  2. Delete tasks.db from /var/lib/docker/swarm/worker directory
  3. Start the docker service.

As I was trying to remove the node from swarm, I had to add it to swarm again.

@eyonys
Copy link

eyonys commented Oct 7, 2021

This works for me:

  • sudo service docker stop
  • sudo rm -rf /var/lib/docker/swarm
  • sudo service docker start

Saved my life 😓

@SMUEric1127
Copy link

SMUEric1127 commented Apr 11, 2024

sudo systemctl restart docker

works for me too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests