Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Swarm Connectivity issue on Ubuntu 22.04 Cluster #45541

Open
jgeorg02 opened this issue May 16, 2023 · 12 comments
Open

Docker Swarm Connectivity issue on Ubuntu 22.04 Cluster #45541

jgeorg02 opened this issue May 16, 2023 · 12 comments
Labels
area/networking/d/overlay area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage

Comments

@jgeorg02
Copy link

jgeorg02 commented May 16, 2023

Description

Something seems to go wrong when using Docker Swarm on Ubuntu 22.04. I have some services where when I deploy them on a Docker Swarm locally everything works fine. If I deploy one in one node and another in another node, even though they can ping each other using hostnames they cannot connect/communicate to each other. There isn't something wrong with the services since they were working just fine when I used to have Ubuntu 20.04.

I tried to find solutions from issues that were relevant to mine that I found online but none of them seemed to fix my issue. I have already tried the following:

  • Disabled checksum offloading for both docker0 and docker_gwbridge using the command "ethtool -K tx off"
  • Reduced mtu to 1350 on all docker0, docker_gwbridge and the overlay network I am using to run my services.
  • I do not have any firewalls setup
  • I tried initiating my docker swarm using the option "--data-path-port"

Reproduce

An example docker-compose.yaml that I am using to deploy a service using the command: docker stack deploy --compose-file docker-compose.yaml storm-net:

version: '3.3'
services:

# NIMBUS
  engine-manager:
    image: storm:2.1.0
    hostname: engine-manager
    command: storm nimbus
    depends_on:
      - zookeeper
    networks:
      - storm-net
    restart: always
    volumes:
      - /storm.yaml:/conf/storm.yaml
    deploy:
      placement:
        constraints:
          - node.role==manager


# STORM-UI
  ui:
    image: storm:2.1.0
    hostname: ui
    command: storm ui
    depends_on:
      - engine-manager
      - zookeeper
    networks:
      - storm-net
    volumes:
      - /storm.yaml:/conf/storm.yaml
    restart: always
    ports:
      - 8080:8080
    deploy:
      placement:
        constraints:
          - node.role==manager
          
# SUPERVISOR
  supervisor:
    hostname: supervisor
    image: storm:2.1.0
    command: storm supervisor
    volumes:
      - /storm.yaml:/conf/storm.yaml
    depends_on:
      - engine-manager
    networks: 
      - storm-net
    deploy:
      placement:
        constraints:
          - node.hostname==node

# ZOOKEEPER
  zookeeper:
    image: wurstmeister/zookeeper:latest
    hostname: zookeeper
    restart: on-failure
    networks:
      - storm-net
    deploy:
      placement:
        constraints:
          - node.role==manager

networks:
  storm-net:
      external: true`

storm.yaml contains the following:

nimbus.seeds: ["engine-manager"]
ui.port: 8080
storm.zookeeper.servers:
  - "zookeeper"
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703

Then the output of ping:

root@supervisor:/apache-storm-2.4.0# ping zookeeper
PING zookeeper (10.0.2.47) 56(84) bytes of data.
64 bytes from 10.0.2.47 (10.0.2.47): icmp_seq=1 ttl=64 time=0.074 ms
64 bytes from 10.0.2.47 (10.0.2.47): icmp_seq=2 ttl=64 time=0.068 ms
64 bytes from 10.0.2.47 (10.0.2.47): icmp_seq=3 ttl=64 time=0.066 ms
64 bytes from 10.0.2.47 (10.0.2.47): icmp_seq=4 ttl=64 time=0.087 ms
^C
--- zookeeper ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.066/0.073/0.087/0.008 ms

The output of the supervisor:

2023-05-16 11:16:24.375 o.a.s.s.o.a.z.ClientCnxn main-SendThread(zookeeper:2181) [INFO] Opening socket connection to server zookeeper/10.0.2.47:2181. Will not attempt to authenticate using SASL (unknown error)
2023-05-16 11:16:43.394 o.a.s.s.o.a.z.ClientCnxn main-SendThread(zookeeper:2181) [WARN] Client session timed out, have not heard from server in 20019ms for sessionid 0x0
2023-05-16 11:16:43.394 o.a.s.s.o.a.z.ClientCnxn main-SendThread(zookeeper:2181) [INFO] Client session timed out, have not heard from server in 20019ms for sessionid 0x0, closing socket connection and attempting reconnect
2023-05-16 11:16:44.496 o.a.s.s.o.a.z.ClientCnxn main-SendThread(zookeeper:2181) [INFO] Opening socket connection to server zookeeper/10.0.2.48:2181. Will not attempt to authenticate using SASL (unknown error)
2023-05-16 11:17:03.513 o.a.s.s.o.a.z.ClientCnxn main-SendThread(zookeeper:2181) [WARN] Client session timed out, have not heard from server in 20019ms for sessionid 0x0
2023-05-16 11:17:03.514 o.a.s.s.o.a.z.ClientCnxn main-SendThread(zookeeper:2181) [INFO] Client session timed out, have not heard from server in 20019ms for sessionid 0x0, closing socket connection and attempting reconnect

Using docker inspect I can verify that the IP of the Zookeeper service is:

            "VirtualIPs": [
                {
                    "NetworkID": "6goxww3phfk63cwcujszqq0du",
                    "Addr": "10.0.2.47/24"
                }
            ]

For both nodes I have installed Docker version 20.10.21, build 20.10.21-0ubuntu1~22.04.3

Expected behavior

Docker Swarm Overlay networking should work as it used to work on Ubuntu 20.04.

docker version

Client:
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.1
 Git commit:        20.10.21-0ubuntu1~22.04.3
 Built:             Thu Apr 27 05:57:17 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.1
  Git commit:       20.10.21-0ubuntu1~22.04.3
  Built:            Thu Apr 27 05:37:25 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.12-0ubuntu1~22.04.1
  GitCommit:        
 runc:
  Version:          1.1.4-0ubuntu1~22.04.2
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:

docker info

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 5
 Server Version: 20.10.21
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: 71hw16vfr2y20jq192jznzfdy
  Is Manager: true
  ClusterID: 6ssvhejrou3vo518enztjso6f
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 7777
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: xx.xx.xx.xx ** I removed it **
  Manager Addresses:
   xx.xx.xx.xx:2377 ** I removed it **
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.15.0-71-generic
 Operating System: Ubuntu 22.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 11.68GiB
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

No response

@jgeorg02 jgeorg02 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels May 16, 2023
@corhere
Copy link
Contributor

corhere commented May 16, 2023

networks:
  storm-net:
      external: true

You left out the most important information. What's the configuration of storm-net?

@jgeorg02
Copy link
Author

networks:
  storm-net:
      external: true

You left out the most important information. What's the configuration of storm-net?

I apologize for the inconvenience.
I always create it using the command: docker network create -d overlay storm-net, but I also tried like this: docker network create -d overlay --attachable --opt com.docker.network.driver.mtu=1350 storm-net

@corhere
Copy link
Contributor

corhere commented May 17, 2023

I see that you set the data path port to 7777. I suggest you try changing it to the default of 4789. I have a hunch that the Ubuntu Docker package installs ufw configuration to allow incoming traffic on the default data path port only.

@jgeorg02
Copy link
Author

I see that you set the data path port to 7777. I suggest you try changing it to the default of 4789. I have a hunch that the Ubuntu Docker package installs ufw configuration to allow incoming traffic on the default data path port only.

Tbh the only reason why I changed the data path port was due to the fact that I found some posts online suggesting that that might be the problem. I am experiencing this problem even when using the default data path port.

@hlafaille

This comment was marked as off-topic.

@tusooa
Copy link

tusooa commented Jul 17, 2023

Same boat. I was using Ubuntu 20.04 and it worked, but as soon as I upgraded to 22.04 everything stopped working. Absolutely no connectivity between swarm nodes. Single-node swarm still works but as soon as the network goes beyond the one machine there is no connectivity. Pinging containers works, but curling does not. No firewall on it. Port numbers are the default.

@corhere
Copy link
Contributor

corhere commented Jul 17, 2023

Something seems to go wrong when using Docker Swarm on Ubuntu 22.04. [...] There isn't something wrong with the services since they were working just fine when I used to have Ubuntu 20.04.

Same boat. I was using Ubuntu 20.04 and it worked, but as soon as I upgraded to 22.04 everything stopped working.

@tusooa what's your docker version output? Are you also running 20.10.21-0ubuntu1~22.04.3?

@jgeorg02 @tusooa did you both upgrade Ubuntu 20.04 -> 22.04 in-place, i.e. without reinstalling the operating system from scratch? How did you upgrade?

@tusooa
Copy link

tusooa commented Jul 17, 2023

Something seems to go wrong when using Docker Swarm on Ubuntu 22.04. [...] There isn't something wrong with the services since they were working just fine when I used to have Ubuntu 20.04.

Same boat. I was using Ubuntu 20.04 and it worked, but as soon as I upgraded to 22.04 everything stopped working.

@tusooa what's your docker version output? Are you also running 20.10.21-0ubuntu1~22.04.3?

@jgeorg02 @tusooa did you both upgrade Ubuntu 20.04 -> 22.04 in-place, i.e. without reinstalling the operating system from scratch? How did you upgrade?

20.10.21-0ubuntu1~22.04.3

Yes it was an upgrade in place, but, I also tried to use a new 22.04 installation, same result.

@jgeorg02
Copy link
Author

jgeorg02 commented Jul 18, 2023

Something seems to go wrong when using Docker Swarm on Ubuntu 22.04. [...] There isn't something wrong with the services since they were working just fine when I used to have Ubuntu 20.04.

Same boat. I was using Ubuntu 20.04 and it worked, but as soon as I upgraded to 22.04 everything stopped working.

@tusooa what's your docker version output? Are you also running 20.10.21-0ubuntu1~22.04.3?

@jgeorg02 @tusooa did you both upgrade Ubuntu 20.04 -> 22.04 in-place, i.e. without reinstalling the operating system from scratch? How did you upgrade?

I also upgraded from Ubuntu 20.04 -> 22.04, I also tried to uninstall and reinstall docker but still I couldn't make it work. I also tried in a fresh new installment of Ubuntu 22.04, it still didn't work. My docker engine version is 23.0.5

@cinco
Copy link

cinco commented Sep 15, 2023

I have the same problem. In the same docker swarm I have nodes with Ubuntu 20.04 that work the overlay network between them. On nodes 22.04, the overlay network does not work both between nodes 22.04 and between nodes 22.04 and 20.04.

@jgeorg02
Copy link
Author

I still experience the same issue and I haven't found out any way to solve it.

@jgeorg02
Copy link
Author

jgeorg02 commented Nov 1, 2023

Hello everyone, I would like to confirm that with the updated version of docker this bug has been fixed :) Everything is working properly now (at least for me)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking/d/overlay area/networking kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage
Projects
None yet
Development

No branches or pull requests

5 participants