Skip to content
This repository has been archived by the owner on Aug 14, 2023. It is now read-only.

Docker with Traefik running inside swarm manager as a service is not able to redirect requests to services that are in the same overlay network outside from the manager to other service in worker nodes. #42

Closed
al-sabr opened this issue Jun 4, 2017 · 21 comments

Comments

@al-sabr
Copy link

al-sabr commented Jun 4, 2017

This problem I'm having might be the same reason and related to this old kernel bug we found and fixed.

Further investigation about this ticket and possible link with old closed ticket #38

@docbobo I would like to have your perspective on this problem.

This is a brief description of my setup.

Running Docke Swarm cluster

(2) Odroid C1 armhfv7(arm32) servers

  1. Swarm Manager (initial stack docker-compose.yml)
  2. Worker node (drone stack docker-compose-yml)

(10) Odroid C2 aarch64(arm64) servers

Docker version arm32

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:28:23 2017
 OS/Arch:      linux/arm

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:28:23 2017
 OS/Arch:      linux/arm
 Experimental: false

$ uname -a
Linux bambuserver2 3.10.104 #1 SMP PREEMPT Sun Jun 4 07:54:52 UTC 2017 armv7l GNU/Linux

Docker version arm64

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658bed6
 Built:        Tue May  9 07:22:23 2017
 OS/Arch:      linux/arm64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658bed6
 Built:        Tue May  9 07:22:23 2017
 OS/Arch:      linux/arm64
 Experimental: false

$ uname -a
Linux bambuserver12 3.14.79-109 #1 SMP PREEMPT Thu Mar 16 20:05:25 BRT 2017 aarch64 GNU/Linux

docker info arm32

Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 414
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /mnt/virtual/var/lib/docker/aufs
 Backing Filesystem: <unknown>
 Dirs: 561
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local local-persist
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: m7uvwoo1s1335vy20evjz9752
 Is Manager: true
 ClusterID: v1wra9jgbzas12b639h5oc5fm
 Managers: 1
 Nodes: 12
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.1.3
 Manager Addresses:
  192.168.1.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 3.10.104
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 940.9MiB
Name: bambuserver1
ID: 7GHE:CHRG:TDC4:UOTO:3JWM:2ZYU:CHBN:AMIE:W45Y:I5G7:AMSK:ETMY
Docker Root Dir: /mnt/virtual/var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No kernel memory limit support

docker info arm64

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /mnt/virtual/var/lib/docker/aufs
 Backing Filesystem: <unknown>
 Dirs: 8
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local local-persist
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: apddjnzk1njxiqwlm5l8pan0h
 Is Manager: false
 Node Address: 192.168.1.14
 Manager Addresses:
  192.168.1.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 3.14.79-109
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 1.928GiB
Name: bambuserver12
ID: B64M:RE77:NE6P:CZWV:7XNT:DQ3B:SVYF:LDL5:JHVX:VY2V:GFEB:J7CV
Docker Root Dir: /mnt/virtual/var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Check config arm32

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: missing
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: missing
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_MEMCG_KMEM: missing
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_NETPRIO_CGROUP: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: missing
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: missing
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: missing
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: missing
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Check config arm64

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: missing
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_MEMCG_KMEM: enabled
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: missing
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: missing
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled (as module)
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled (as module)
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

==========================================

Steps to reproduce the problem...

  1. docker network create -d overlay traefik-net

  2. create a docker-compose.yml for initial stack

version: "3"

networks:
  traefik-net:
    external: true 
    
services:
  traefik:
    image: hypriot/rpi-traefik
    ports:
      - "80:80"
      - "443:443"
      #- "8080:8080"
    command: --web --docker --docker.swarmmode=true --docker.watch=true --docker.domain=cluster.publicvm.com -l DEBUG 
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
    networks:
      - traefik-net
    deploy:
      placement:
        constraints:
          - node.role==manager
          #- node.hostname==bambuserver1
      restart_policy:
        condition: on-failure
      labels:
        traefik.docker: "true"
        traefik.docker.network: "traefik-net"
        traefik.port: 8080
        traefik.backend.loadbalancer.sticky: "true"
        traefik.backend.loadbalancer.method: "wrr"
        #traefik.backend.loadbalancer.swarm: "true"
        traefik.frontend.passHostHeader: "true"
        traefik.frontend.rule: "Host:traefik-admin.cluster.publicvm.com"
  
  portainer:
    depends_on: [ traefik ]
    image: portainer/portainer:linux-arm-1.13.1
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - portainer-datas:/data
    networks:
      - traefik-net
    #ports:
      #- 9000:9000
    deploy:
      placement:
        constraints:
          - node.role == manager
          #- node.hostname==bambuserver1
      restart_policy:
        condition: on-failure
      labels:
        traefik.enable: "true"
        traefik.docker.network: "traefik-net"
        traefik.port: "9000"
        traefik.backend.loadbalancer.sticky: "true"
        traefik.backend.loadbalancer.method: "wrr"
        #traefik.backend.loadbalancer.swarm: "true"
        traefik.frontend.passHostHeader: "true"
        traefik.frontend.rule: "Host:portainer.cluster.publicvm.com"      

  gitea:
    depends_on: [ portainer ]
    image: bambuserver1:5000/ergu/gitea-arm:1.1.1
    volumes:
      - gitea-datas:/data
    networks:
      - traefik-net
    ports:
      - 3022:22
      #- 3000:3000
    deploy:
      placement:
        constraints:
          - node.role==manager
          #- node.hostname == bambuserver2
      restart_policy:
        condition: on-failure
      labels:
        traefik.enable: "true"
        traefik.docker.network: "traefik-net"
        traefik.port: "3000"
        traefik.backend.loadbalancer.sticky: "true"
        traefik.backend.loadbalancer.method: "wrr"
        #traefik.backend.loadbalancer.swarm: "true"
        traefik.frontend.passHostHeader: "true"
        traefik.frontend.rule: "Host:gitea.cluster.publicvm.com"
    
volumes:
  portainer-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/containers/portainer
  gitea-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/containers/gitea    
  1. create a docker-compose.yml for drone stack
version: "3.2"

networks:
  traefik-net:
    external: true 

services:
    server:
        image: bambuserver1:5000/ergu/drone-arm32:0.7.1
        networks:
          - traefik-net
        deploy:
            labels:
                traefik.enable: "true"
                traefik.docker.network: "traefik-net"
                traefik.port: "8000"
                traefik.backend.loadbalancer.sticky: "true"
                traefik.backend.loadbalancer.method: "drr"
                traefik.backend.loadbalancer.swarm: "true"
                traefik.frontend.passHostHeader: "true"
                traefik.frontend.rule: "Host:drone.cluster.publicvm.com"
            placement:
                constraints:
                - node.hostname==bambuserver2
        ports:
          - 8000:8000
       
        volumes:
            - drone-datas:/var/lib/drone
        environment:
          - DRONE_ADMIN=administrator
          - DRONE_DEBUG=true
          - DRONE_OPEN=false
          - DRONE_HOST=http://server:8000/
          - DRONE_SECRET=${DRONE_SECRET}
          - DRONE_SERVER_PORT=${DRONE_SERVER_PORT}
          - DRONE_GOGS=true
          - DRONE_GOGS_URL=http://gitea.cluster.publicvm.com/
          - DRONE_GOGS_SKIP_VERIFY=true
          - DRONE_PLUGIN_PRIVILEGED=armhfplugins/docker,armhfplugins/drone-docker

    agent:
        depends_on: [ server ]
        image: bambuserver1:5000/ergu/drone-arm32:0.7.1
        networks:
          - traefik-net
        command: agent
        deploy:
            placement:
                constraints:
                - node.hostname==bambuserver2
        volumes: [ "/var/run/docker.sock:/var/run/docker.sock" ]
        environment:
          - DRONE_SERVER=ws://server:8000/ws/broker
          - DRONE_SECRET=${DRONE_SECRET}
          - DRONE_DEBUG=true

volumes:              
  drone-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/volumes/drone/      
  1. Run the initial stack docker-compose with docker stack deploy on the manager node
  2. Run the drone stack docker-compose with docker stack deploy on the manager node

What is expected as result:
Go in your browser and all calls to the url http://drone.cluster.publicvm.com/ should go through Traefik and then routed to the drone_server service on bambuserver2 which is a worker node. The login page from drone should be displayed.

What is actually happening:

Traefik is not able to go out of the Swarm Manager and route the request to bambuserver2. Instead this error message is shown in the logs:

time="2017-06-04T04:12:55Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 1.712028ms" 
time="2017-06-04T04:12:56Z" level=warning msg="Error forwarding to http://10.0.0.9:8000, err: dial tcp 10.0.0.9:8000: i/o timeout" 
time="2017-06-04T04:12:57Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 2.247036ms" 

I've talked with @emilevauge from Traefik and he told me that this is not a Traefik bug but rather a docker bug. Then @firecyberice had tried my setup on amd64 architecture via playwithdocker and he was able to run my config without a problem. Based on his results and tests I've come up with the conclusion that some other disabled kernel options like - CONFIG_VXLAN: missing in ticket #38 are blocking the routing within docker and hypriotOS

I'm not 100% sure about this but I would say 90% close to the root of the problem. Since I don't know which other kernel options are necessary so to successfully route overlay networks packets between nodes.

@docbobo @firecyberice and maybe others can help us on this bug.

Screenshots of the networks in my Swarm Manager

traefik-net (overlay)

Screenshots of the networks in worker node 1 (bambuserver2)

traefik-net (overlay)

Traefik redirecting successfully the 3 services in inital stack (please look at urls and traefik tab)

portainer

traefik

gitea

Traefik showing error for the 4th service on bambuserver2

drone error

What is inside the log from Traefik

time="2017-06-04T04:12:55Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 1.712028ms" 
time="2017-06-04T04:12:56Z" level=warning msg="Error forwarding to http://10.0.0.9:8000, err: dial tcp 10.0.0.9:8000: i/o timeout" 
time="2017-06-04T04:12:57Z" level=debug msg="Round trip: http://10.0.0.3:8080, code: 200, duration: 2.247036ms" 
@al-sabr al-sabr changed the title Docker with Traefik running inside swarm manager as a service is not able to redirect services that are in the same overlay network outside from the manager other service in worker nodes. Docker with Traefik running inside swarm manager as a service is not able to redirect services that are in the same overlay network outside from the manager to other service in worker nodes. Jun 4, 2017
@al-sabr al-sabr changed the title Docker with Traefik running inside swarm manager as a service is not able to redirect services that are in the same overlay network outside from the manager to other service in worker nodes. Docker with Traefik running inside swarm manager as a service is not able to redirect requests to services that are in the same overlay network outside from the manager to other service in worker nodes. Jun 4, 2017
@firecyberice
Copy link
Member

I tried the setup again on my 5 node rpi 3 model B cluster without any problems but the rpi version of hypriotOS has all network related features available only some storage driver are missing.
I think the odroid version needs to enable CONFIG_IPVLAN which is missing the listing of @gdeverlant

@al-sabr
Copy link
Author

al-sabr commented Jun 4, 2017

As a reminder of what I did before.

sudo apt-get install -y bc curl gcc git libncurses5-dev lzop make
git clone --depth 1 --single-branch -b odroidc-3.10.y https://github.com/hardkernel/linux
cd linux
make odroidc_defconfig
sed -ie 's/# CONFIG_VXLAN is not set/CONFIG_VXLAN=m/g' .config
make -j 4 uImage dtbs modules
sudo cp arch/arm/boot/uImage arch/arm/boot/dts/*.dtb /boot
sudo make modules_install
sudo make firmware_install
sudo make headers_install INSTALL_HDR_PATH=/usr
kver=`make kernelrelease`
sudo cp .config /boot/config-${kver}
cd /boot
sudo update-initramfs -c -k ${kver}
sudo mkimage -A arm -O linux -T ramdisk -a 0x0 -e 0x0 -n initrd.img-${kver} -d initrd.img-${kver} uInitrd-${kver}
sudo cp uInitrd-${kver} /boot/uInitrd

@al-sabr
Copy link
Author

al-sabr commented Jun 4, 2017

I don't know if this is relevant to this problem maybe you guys can have a look

moby/moby#27897

@al-sabr
Copy link
Author

al-sabr commented Jun 4, 2017

Ahhh crap it didn't change anything :(

This is the furthest i was able to go !

https://github.com/mlinuxguy/odroid-c1-kernel-3.19

@al-sabr
Copy link
Author

al-sabr commented Jun 4, 2017

This Docker feature requires Kernel Vendor libnetwork v0.7.0-dev.7 : Experimental MacVlan and IPVlan network drivers moby/moby#21122

Warning I'm not even sure that I understand correctly my last claim .... or if it is connected with this bug : D

@al-sabr
Copy link
Author

al-sabr commented Jun 4, 2017

@al-sabr
Copy link
Author

al-sabr commented Jun 4, 2017

Tried the following steps for
ODROID-C1 mainline (experimental!)

$ curl -sSL https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.8.tar.xz | unxz | tar -xvf -
$ cd linux
$ make multi_v7_defconfig

and decided to compare what is in the config and what is missing and found at the following results:
compared with my odroid c1 ./check-config.sh

=======================================
missing but should be there

CONFIG_NET_NS
CONFIG_PID_NS
CONFIG_IPC_NS
CONFIG_UTS_NS
CONFIG_BRIDGE_NETFILTER
CONFIG_NF_NAT_IPV4
CONFIG_IP_NF_FILTER
CONFIG_IP_NF_TARGET_MASQUERADE
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE
CONFIG_NETFILTER_XT_MATCH_CONNTRACK
CONFIG_NETFILTER_XT_MATCH_IPVS
CONFIG_IP_NF_NAT
CONFIG_NF_NAT
CONFIG_NF_NAT_NEEDED
CONFIG_DEVPTS_MULTIPLE_INSTANCES

CONFIG_USER_NS
CONFIG_MEMCG_SWAP
CONFIG_MEMCG_SWAP_ENABLED
CONFIG_RESOURCE_COUNTERS

CONFIG_NET_CLS_CGROUP
CONFIG_CFS_BANDWIDTH
CONFIG_FAIR_GROUP_SCHED
CONFIG_RT_GROUP_SCHED

CONFIG_IP_VS
CONFIG_IP_VS_NFCT
CONFIG_IP_VS_RR
- "ftp,tftp client in container":
CONFIG_NF_NAT_FTP
CONFIG_NF_CONNTRACK_FTP
CONFIG_NF_NAT_TFTP
CONFIG_NF_CONNTRACK_TFTP

CONFIG_AUFS_FS
CONFIG_BTRFS_FS
CONFIG_BTRFS_FS_POSIX_ACL

CONFIG_BLK_DEV_DM
CONFIG_DM_THIN_PROVISIONING:

=======================================
added manually

CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_SCHED=y
CONFIG_CPUSETS=y
CONFIG_MEMCG=y

CONFIG_VETH=y
CONFIG_BRIDGE=y

CONFIG_POSIX_MQUEUE=y

CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_PERF=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y

CONFIG_IPVLAN=m
CONFIG_VXLAN=m

CONFIG_XFRM_USER=y
CONFIG_INET_ESP=y

=======================================
added but not necessary

CONFIG_EXT4_ENCRYPTION=y
CONFIG_EXT4_DEBUG=y
CONFIG_DUMMY=m

@docbobo
Copy link
Collaborator

docbobo commented Jun 5, 2017

IMHO, you need to add drone-net to traefik, otherwise it can't route traffic there...

@al-sabr
Copy link
Author

al-sabr commented Jun 5, 2017

Sorry I've updated the description to the actual latest correct version my bad. They are in the same overlay network traefik-net as you can see. There is no drone-net :D

This is what is running right now in my cluster.

@docbobo
Copy link
Collaborator

docbobo commented Jun 5, 2017

I feel this must be related to your config, since I am running traefik, consul, and node-red in a similar fashion on my C2s. Unless it's related to the rather old kernel for the C1...

Do you think you could simplify your setup to

  • Traefik in one compose file
  • Some other container - E.g. hello world - in another, exposing it via traefik

?

@al-sabr
Copy link
Author

al-sabr commented Jun 5, 2017

I've tried the whoami setup with 10 replicas on the 10 odroid c2 nodes and traefik was seeing the service but docker could not give traefik access to the overlay network. Ok let me simplify it as you said.

Do you also have a cluster or are you running everything on 1 node?

@docbobo
Copy link
Collaborator

docbobo commented Jun 5, 2017

Can you also post the output of docker network inspect traefik-net?

@al-sabr
Copy link
Author

al-sabr commented Jun 5, 2017

The manager node is an Odroid C1

image

docker network inspect traefik-net
[
    {
        "Name": "traefik-net",
        "Id": "fvbi2v9rit8yfj1ij113ks9om",
        "Created": "2017-06-05T10:38:18.527963864+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "Containers": {
            "9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
                "Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
                "EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "bambuserver1-5a1d9d8fe251",
                "IP": "192.168.1.3"
            }
        ]
    }
]

@al-sabr
Copy link
Author

al-sabr commented Jun 5, 2017

yaml whoami

The whoami service is running on the Odroic C2 nodes.

version: "3"

networks:
  traefik-net:
    external: true     
    
services:
    
  whoami: 
    image: admiralobvious/whoami-aarch64
    networks:
      - traefik-net
    deploy:
        replicas: 5
        labels:
            traefik.port: "80"
            traefik.enable: "true"
            traefik.backend.loadbalancer.sticky: "true"
            traefik.backend.loadbalancer.method: "wrr"
            #traefik.backend.loadbalancer.swarm: "true"
            traefik.frontend.passHostHeader: "true"
            traefik.docker.network: "traefik-net"
            traefik.frontend.rule: "Host:whoami.cluster.publicvm.com"
        placement:
            constraints:
                - node.labels.arch==arm64          
$ docker network inspect traefik-net
[
    {
        "Name": "traefik-net",
        "Id": "fvbi2v9rit8yfj1ij113ks9om",
        "Created": "2017-06-05T10:38:18.527963864+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "Containers": {
            "9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
                "Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
                "EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "bambuserver1-5a1d9d8fe251",
                "IP": "192.168.1.3"
            }
        ]
    }
]

image
image

@al-sabr
Copy link
Author

al-sabr commented Jun 5, 2017

yaml Portainer

version: "3"

networks:
  traefik-net:
    external: true 
    
services:  
  portainer:
    depends_on: [ traefik ]
    image: portainer/portainer:linux-arm-1.13.2
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - portainer-datas:/data
    networks:
      - traefik-net
    deploy:
      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure
      labels:
        traefik.docker.network: "traefik-net"
        traefik.port: "9000"
        traefik.frontend.rule: "Host:portainer.cluster.publicvm.com"      
        
volumes:
  portainer-datas:
    driver: local-persist
    driver_opts:
        type: volume 
        mountpoint: /mnt/virtual/docker/containers/portainer
$ docker network inspect traefik-net
[
    {
        "Name": "traefik-net",
        "Id": "fvbi2v9rit8yfj1ij113ks9om",
        "Created": "2017-06-05T10:38:18.527963864+02:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "Containers": {
            "0de2015679c7376c41a99cd57bbf380ec8b4be41593bc8642c39bc2ee9f86f3a": {
                "Name": "portainer_portainer.1.pwla0ki3quxaq3kz2guvjt2jq",
                "EndpointID": "9ad492667b6f5e92c8424bcd9ea5910e99b7d8b06e7b85226b2cdcdb0e66ea89",
                "MacAddress": "02:42:0a:00:00:0b",
                "IPv4Address": "10.0.0.11/24",
                "IPv6Address": ""
            },
            "9cac989d59cbcf32a641b8b08bda92599bb2c66a8f28fe07dbb22ca6f7debc0d": {
                "Name": "traefik_traefik.1.lot8k11oxno48nwjsh8rgnbr6",
                "EndpointID": "4aec29368b0c46b61fc01738d9c58fdb224b7de4e3e0b20f95febc5ed0924882",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "bambuserver1-5a1d9d8fe251",
                "IP": "192.168.1.3"
            }
        ]
    }
]

image

image

@al-sabr
Copy link
Author

al-sabr commented Jun 5, 2017

As you can see everything outside the manager node is not accessible by Traefik

time="2017-06-05T09:18:00Z" level=info msg="Skipping same configuration for provider docker" 
time="2017-06-05T09:18:02Z" level=warning msg="Error forwarding to http://10.0.0.9:80, err: dial tcp 10.0.0.9:80: getsockopt: no route to host" 
time="2017-06-05T09:18:02Z" level=debug msg="Round trip: http://10.0.0.11:9000, code: 200, duration: 821.781684ms"

@docbobo
Copy link
Collaborator

docbobo commented Jun 5, 2017

Weird. I'll have another look in the evening.

@al-sabr
Copy link
Author

al-sabr commented Jun 5, 2017

So in your opinion do you think that my analysis is correct with this case being a bug ?

@al-sabr
Copy link
Author

al-sabr commented Jun 6, 2017

I've filed a ticket in moby and someone asked me to do some basic tests without Traefik in the equation.

You can read my process here moby/moby#33531

@al-sabr
Copy link
Author

al-sabr commented Jun 6, 2017

I've just tested with only 2 Odroid C2 devices

  1. Manager Node with Traefik + Portainer
  2. Second node with drone 0.7.1

It works without any problem. It seems that the bug is in the Odroid C1 build with Docker.

@al-sabr
Copy link
Author

al-sabr commented Jul 1, 2017

Closing this issue reason to old kernel missing features.

@al-sabr al-sabr closed this as completed Jul 1, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants