Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All published services within a docker swarm are unreachable while containers deployed normally work fine. #41825

Closed
TheRealAlexV opened this issue Dec 19, 2020 · 10 comments

Comments

@TheRealAlexV
Copy link

I've run into an issue that seems similar too this one; https://forums.docker.com/t/cant-access-service-in-swarm/63876. My setup is a little bit different though and I haven't found a solution to my problem yet.

The minimal, reproducible example

  1. Build a swarm cluster between atleast 3 Ubuntu 20.04 docker swarm managers.

  2. Deploy a service docker service create --name test_web --replicas 3 --publish published=8080,target=80 nginxdemos/hello

  3. Check that the containers and services were created properly and observe the failure of connecting to that service:

demi-ubu01:~/stacks$ docker ps

CONTAINER ID   IMAGE                     COMMAND                  CREATED              STATUS              PORTS     NAMES
d4a12a3c5448   nginxdemos/hello:latest   "nginx -g 'daemon of…"   About a minute ago   Up About a minute   80/tcp    test_web.2.yul33wdycarig3qoxnehgrjrz
demi-ubu01:~/stacks$ docker service ls

ID             NAME      MODE         REPLICAS   IMAGE                     PORTS
0yqd7gvggwuh   test_web      replicated   3/3        nginxdemos/hello:latest   *:8080->80/tcp
# External test:
demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080     
curl: (7) Failed to connect to 10.100.4.5 port 8080: Connection refused

# Inside container to published service port:
demi-ubu01:~/stacks$ docker exec -it d4a12a3c5448 wget http://test_web:8080
Connecting to test_web:8080 (10.0.4.2:8080)
wget: can't connect to remote host (10.0.4.2): Host is unreachable

# Inside container to apps exposed port:
demi-ubu01:~/stacks$ docker exec -it d4a12a3c5448 wget http://localhost:80
Connecting to localhost:80 (127.0.0.1:80)
index.html    100% |****************************|  7217   0:00:00 ETA

The expected result of the first curl command should be a Status 200 Ok.

The detailed report

My setup is 4 nodes in total. They are identical Ubuntu 20.04 KVM virtual machines all on the same network. There are no firewalls between them. I have 3 Managers and 1 Worker (which i've only added as a step during troubleshooting).

:~/stacks$ docker node ls 
ID                            HOSTNAME     STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
kcm5v64psntjxngnqkfdj1jzh *   demi-ubu01   Ready     Active         Reachable        20.10.1
uo3rljg6ax5qkjm898pyym9t1     demi-ubu02   Ready     Active         Leader           20.10.1
pysnl8sohdp4fv67gui156z4k     demi-ubu03   Ready     Active         Reachable        20.10.1
rp2otsqpnxkgbmxbpkv21yjs6     demi-ubu04   Ready     Active                          20.10.1

I can run a container normally and reach it on the local host fine.

demi-ubu01:~/stacks$ docker run -p 8080:80 -d nginxdemos/hello
de4d0a937710acb1d6d8ae3b7eb9175860b6614dfd9ce92bc972efe619ae095f

demi-ubu01:~/stacks$ docker ps
CONTAINER ID   IMAGE              COMMAND                  CREATED         STATUS         PORTS                  NAMES
de4d0a937710   nginxdemos/hello   "nginx -g 'daemon of…"   4 seconds ago   Up 2 seconds   0.0.0.0:8080->80/tcp   pedantic_wiles

demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080
HTTP/1.1 200 OK
Server: nginx/1.13.8
Date: Sat, 19 Dec 2020 17:59:23 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Sat, 19 Dec 2020 17:59:22 GMT
Cache-Control: no-cache

However the same app deployed as a service using the following compose file:

demi-ubu01:~/stacks$ cat test.yml 
version: "3.6"

services:
  web:
    image: nginxdemos/hello:latest
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: "0.1"
          memory: 50M
      restart_policy:
        condition: on-failure
    ports:
      - target: 80
        published: 8080
        protocol: tcp
        mode: ingress
    networks:
      - webnet

networks:
  webnet:
    driver: overlay

It does not become reachable from any of the hosts at all:

demi-ubu01:~/stacks$ docker stack deploy -c test.yml test
Creating network test_webnet
Creating service test_web

demi-ubu01:~/stacks$ docker ps
CONTAINER ID   IMAGE                     COMMAND                  CREATED          STATUS         PORTS     NAMES
05030ef897a1   nginxdemos/hello:latest   "nginx -g 'daemon of…"   10 seconds ago   Up 7 seconds   80/tcp    test_web.1.kobrpkp68f2qbs4jhd6o8aebg

# Trying on all of the hosts in the cluster. No firewalls here.

demi-ubu01:~/stacks$ curl -I 10.100.4.5:8080
curl: (7) Failed to connect to 10.100.4.5 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.9:8080
curl: (7) Failed to connect to 10.100.4.9 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.10:8080
curl: (7) Failed to connect to 10.100.4.10 port 8080: Connection refused
demi-ubu01:~/stacks$ curl -I 10.100.4.11:8080
curl: (7) Failed to connect to 10.100.4.11 port 8080: Connection refused

demi-ubu01:~/stacks$ docker service ls
ID             NAME       MODE         REPLICAS   IMAGE                     PORTS
elvfm7o4v4zo   test_web   replicated   3/3        nginxdemos/hello:latest   *:8080->80/tcp

I also don't see any port bindings being made on those hosts at all, so it doesn't look like any ports are being published.


INeed2Poo@demi-ubu01:~/stacks$ docker service inspect test_web
[
    ## https://pastebin.com/WqqyDnVS ##
]

demi-ubu01:~/stacks$ netstat -na | grep LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN

demi-ubu01:~/stacks$ docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
6e5f7e7cebc3   bridge            bridge    local
7a1155f87a62   docker_gwbridge   bridge    local
ab32da8ac1ec   host              host      local
46id8wzw4ayf   ingress           overlay   swarm
a24a40ef78f4   none              null      local
d9l7msysdx8m   test_webnet       overlay   swarm
INeed2Poo@demi-ubu01:~/stacks$ docker network inspect 46id8wzw4ayf
[
    https://pastebin.com/JPA0ZBjE
]

I also can't reach the service while exec'ed into a container for that service. Execing into a container, I'm able to hit the LOCAL app port, however I cannot hit the service by name. The container CAN resolve the service name.

## Testing the app's service from the local container fails:

demi-ubu01:~/stacks$ docker exec -it 05030ef897a1 wget http://test_web:8080
Connecting to test_web:8080 (10.0.4.2:8080)
wget: can't connect to remote host (10.0.4.2): Host is unreachable


## Testing the app's local port from the local container is sucessful:

demi-ubu01:~/stacks$ docker exec -it 05030ef897a1 wget http://localhost:80
Connecting to localhost:80 (127.0.0.1:80)
index.html    100% |****************************|  7217   0:00:00 ETA
demi-ubu01:~/stacks$ docker --version
Docker version 20.10.1, build 831ebea

I've also changed the default-addr-pool for the swarm cluster from the original 10.0.0.0/8 network to:

demi-ubu01:~$ docker info --format '{{json .Swarm.Cluster.DefaultAddrPool}}'
["10.135.0.0/16"]

I've gone and made sure that I'm not using any overlapping networks that might be causing this and have gone so far as to completely redeploy the cluster. I've just about exhausted all of my troubleshooting idea's. Any Idea's?

@TheRealAlexV
Copy link
Author

Update: I redeployed using Ubuntu 18.04 as my base image, and the same exact setup on that (deployed using ansible) seems to work fine... So this is an issue with the current version of Docker on Ubuntu 20.04.

@thaJeztah
Copy link
Member

What version of docker was installed when things didn't work? v20.10.0 or v20.10.1 ? Are you installing the official packages from download.docker.com or Ubuntu distro packages (packaged by Ubuntu)?

I tried reproducing on a Ubuntu 20.04 machine, but so far wasn't able to reproduce the issue.

I also don't see any port bindings being made on those hosts at all, so it doesn't look like any ports are being published.

To narrow down possible issues; this was the same on machines that had a service instance running, as on machines that did not have an instance running?

@ylluminate
Copy link

Any new developments on this? Projects dependent upon Moby like CapRover are continuing to recommend sticking to 18.04, but it sure would be nice to be able to move onto 20.04 in some scenarios vs sticking to the older release.

@TheRealAlexV
Copy link
Author

What version of docker was installed when things didn't work? v20.10.0 or v20.10.1 ? Are you installing the official packages from download.docker.com or Ubuntu distro packages (packaged by Ubuntu)?

I tried reproducing on a Ubuntu 20.04 machine, but so far wasn't able to reproduce the issue.

I also don't see any port bindings being made on those hosts at all, so it doesn't look like any ports are being published.

To narrow down possible issues; this was the same on machines that had a service instance running, as on machines that did not have an instance running?

Sorry it took so long to reply. This was on docker version 20.10.1 installed directly from the docker official packages.

@xeddmc
Copy link

xeddmc commented Feb 28, 2021

Hmm. I've got a new KVM VPS and installing 20.04 gives me the option in the boot install, to install docker with it. I am now wondering if I should install it and wait to see if I have any problems, or if I should just scuttle the vm and use an 18.04 image instead..although I don't like how close that EoL is. 

Has there been any updates on whether or not the issue is with 20.04 or with docker yet?

@devius
Copy link

devius commented May 14, 2021

Is this issue closed? or we need to worry deploying on 20.04?

@alexander-potemkin
Copy link

wondering of the same

@ylluminate
Copy link

I have it working (seemingly) with no changes on 20.04 now... Anyone else seeing success?

@ToS0
Copy link

ToS0 commented Jun 20, 2022

I have it working (seemingly) with no changes on 20.04 now... Anyone else seeing success?

Works for me too, thx! 👍

@sam-thibault
Copy link

This is an old issue and appears to be working for some users. I will close it as stale. If you encounter this issue on a current engine release, please open a new issue.

@sam-thibault sam-thibault closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants