New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services not being able to reach each other on ports exposed in underlying containers #28168

Closed
jesuscript opened this Issue Nov 8, 2016 · 9 comments

Comments

Projects
None yet
5 participants
@jesuscript
Copy link

jesuscript commented Nov 8, 2016

Description
I have a setup with two Ubuntu 16.04 hosts in a cloud (Scaleway) running docker swarm mode: 1 manager and 1 worker. When I deploy services into the swarm they are unable to communicate with each other on exposed ports.

Steps to reproduce the issue:

  1. Provision 2 Ubuntu 16.04 hosts
  2. Install dependencies, where needed. In my case:
# required to install docker on scaleway servers; you may not need this
$ sudo apt-get install -y linux-image-virtual linux-image-extra-virtual dmsetup
$ dmsetup mknodes 
  1. Install docker
$ curl -sSL https://experimental.docker.com/ | sh
  1. Init docker swarm
# ssh into the "manager" host
$ docker swarm init --advertise-addr <host ip address>
# copy the worker join command
  1. Connect the worker
# ssh into the "worker" host
# paste and run the worker join command copied in the previous step
  1. (The rest of instructions are to be executed on the "manager") Create a new network
$ docker network create main --driver overlay
  1. Launch 2 separate nginx services
$ docker service create --network main --name foo1 nginx
9wr50h8lg6gu1ji8stda7qns0
$ docker service create --network main --name foo2 nginx
7e3nbqj9hp920soc7zmjsuz0q
$ docker service ls
ID            NAME  REPLICAS  IMAGE  COMMAND
7e3nbqj9hp92  foo2  1/1       nginx  
9wr50h8lg6gu  foo1  1/1       nginx  
  1. Check that foo1 is running on the "manager" and get its container ID
$ docker service ps foo1
ID                         NAME    IMAGE  NODE            DESIRED STATE  CURRENT STATE           ERROR
0tl75cc5fplbrnybfdcbgc24u  foo1.1  nginx  ubuntu-swarm-1  Running        Running 52 seconds ago  
$ docker ps 
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
8e87aa1a2234        nginx:latest        "nginx -g 'daemon off"   2 minutes ago       Up 2 minutes        80/tcp, 443/tcp     foo1.1.0tl75cc5fplbrnybfdcbgc24u
  1. Connect to the container's shell:
$ docker exec -it 8e87aa1a2234 /bin/bash
  1. First, try to ping foo2
$ ping foo2
PING foo2 (10.0.0.4): 56 data bytes
92 bytes from 8e87aa1a2234 (10.0.0.3): Destination Host Unreachable
92 bytes from 8e87aa1a2234 (10.0.0.3): Destination Host Unreachable
92 bytes from 8e87aa1a2234 (10.0.0.3): Destination Host Unreachable
^C--- foo2 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss
  1. Install curl
$ apt-get update && apt-get install -y curl
  1. Curl localhost to see what output we're expecting in step 13.
$ curl localhost
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
  1. Curl foo2
$ curl foo2
curl: (7) Failed to connect to foo2 port 80: No route to host

Describe the results you received:
Step 10: Destination Host Unreachable
Step 13: Failed to connect to foo2 port 80: No route to host

Describe the results you expected:

Step 10: expected ping to reach the destination host (ping failing is expected in 1.12.3)
Step 13: expected curl to return the ouptut same as in step 12

Additional information you deem important (e.g. issue happens only occasionally):
I appreciate that there is a number of open issues that are reporting problems similar to mine. Apologies for creating yet another one. However, all the related issues that I looked at were either talking about services not being able to reach each other intermittently or under certain circumstances. In my case, I was never able to reach other services regardless of which containers I used, --advertise-addr and --listen-addr settings etc.

Also, there is no firewall installed on either host:

$ ufw status
-bash: ufw: command not found

Output of docker version:

Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 22:07:18 2016
 OS/Arch:      linux/amd64
 Experimental: true

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 22:07:18 2016
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.12.3
Storage Driver: devicemapper
 Pool Name: docker-43:0-395319-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.4 GB
 Data Space Total: 107.4 GB
 Data Space Available: 44.82 GB
 Metadata Space Used: 2.548 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.145 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay null host bridge
Swarm: active
 NodeID: 5sd91ufxkr84n5tilsnatfq88
 Is Manager: true
 ClusterID: 8vu28cn3ohdf4pumouqnz43lc
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 51.15.42.252
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.5.7-std-3
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.62 GiB
Name: ubuntu-swarm-1
ID: XBYQ:N2NZ:XLFC:JNSZ:QOIK:PEQX:HC7W:AKY5:GO77:SZNA:EUUT:T5N6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Experimental: true
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04 LTS
Release:	16.04
Codename:	xenial
$ uname -a
Linux ubuntu-swarm-1 4.5.7-std-3 #1 SMP Tue Jul 12 09:56:30 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Again, sorry if this is a dupe. Happy to add any more info that might help debugging this.

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Nov 8, 2016

@jesuscript

  • ping failing in step 10 is expected in 1.12.3 (since foo2 represents a virtual-ip of the service backed by IPVS).
  • But curl foo2 must succeed. Since it is failing for you, we can narrow it down by checking if ipvs modules are installed in the kernel ? pls share the output for lsmod | grep ip_vs. If the ip_vs modules are installed, then try to reproduce the issue in a single machine (to confirm if the issue happens only in a multi-host situation using overlay networking.
@jesuscript

This comment has been minimized.

Copy link
Author

jesuscript commented Nov 8, 2016

thanks!!

$ lsmod | grep ip_vs
# no output

(what should I do next?)

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Nov 8, 2016

@jesuscript thanks. So that is the issue.

You can run contrib/check-config.sh to check your kernel configs.
Can you redo your tests after :

$ sudo modprobe ip_vs
$ sudo modprobe ip_vs_rr
@jesuscript

This comment has been minimized.

Copy link
Author

jesuscript commented Nov 8, 2016

 bash <(curl https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh -Lk)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8711  100  8711    0     0  11193      0 --:--:-- --:--:-- --:--:-- 11182
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
- CONFIG_MEMCG_KMEM: missing
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: missing
- CONFIG_CFS_BANDWIDTH: missing
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_VS: enabled
- CONFIG_IP_VS_NFCT: missing
- CONFIG_IP_VS_RR: missing
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled (as module)
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

$ sudo modprobe ip_vs
# no output
$ sudo modprobe ip_vs_rr
modprobe: FATAL: Module ip_vs_rr not found in directory /lib/modules/4.5.7-std-3
@mrjana

This comment has been minimized.

Copy link
Contributor

mrjana commented Nov 8, 2016

@jesuscript Your ip_vs config is builtin to the kernel so modprobe won't show you anything and that is fine. But you are missing IPVS_NFCT and IPVS_RR from your kernel config. This is needed for load balancer to work. Please enable them in your kernel.

@jesuscript

This comment has been minimized.

Copy link
Author

jesuscript commented Nov 8, 2016

Thanks @mrjana Is there a specific kernel image that has ipvs support you'd recommend to install?

@justincormack

This comment has been minimized.

Copy link
Contributor

justincormack commented Nov 9, 2016

@jesuscript I believe standard Ubuntu 16.04 does have this, but Scaleway may be using their own kernels optimised for their hardware, I seem to remember, so it might be worth asking them, they are pretty helpful. They may also have a more recent kernel build you can select.

@jesuscript

This comment has been minimized.

Copy link
Author

jesuscript commented Nov 9, 2016

@justincormack thanks a lot! I opened a support ticket. Will update this thread with my findings.

@jesuscript

This comment has been minimized.

Copy link
Author

jesuscript commented Nov 9, 2016

You can find the discussion here: scaleway/image-ubuntu#78 The gist is that you need to use a specific bootscript if you want to run docker on Scaleway.

Since this is clearly not an issue with Docker, I'm going to close this ticket but before I do I would like to thank everyone in this thread for all the help you offered, you guys saved me so much time! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment