New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to create a swarm using RHEL on Azure #33345
Comments
Running the same test using the
cc @friism |
@briantd since this is Docker EE, this should be handled though support; can you open a ticket there? |
Centos checks out too (i.e.
As CentOS is CE, I used a different engine version:
|
For posterity, I just re-tested on Azure RHEL (fresh VMs) using the latest CE and got the same swarm error:
|
@friism ^^ |
Another data point: confirmed that latest CE (RC4) can start a swarm on UbuntuLTS Azure VMs. RHEL remains the outlier. |
@thaJeztah do you think this could be the networkmonitor thing? |
@briantd out of curiosity, do you have NetworkManager running? |
|
Results of explicitly specifying
|
@briantd Have you checked iptables rules on the nodes (especially the one running the manager)? The port might just be blocked unless explicitly allowed (via firewalld). |
I don't see anything obvious in the chain rules (see below). Also, in the ticket above I try an experiment whereby I expose an iptables output:
|
UPDATE: @cpuguy83 found the root cause and a workaround
|
ping @sanimej @fcrisciani @ddebroy something that we can change / fix / check for? |
@cpuguy83 How was the ICMP rule affecting the grpc connection ? Was it a PMTU issue ? |
It's not an icmp rule, it blocks everything from everywhere. |
@briantd Do we know:
In case we are trying to figure out who is adding the rule, one suspect may be the Azure linux agent script for RHEL. |
The rule exists on a fresh RHEL on Azure setup (no Docker).
…On Thu, Jul 20, 2017 at 11:51 AM, Deep Debroy ***@***.***> wrote:
@briantd <https://github.com/briantd> Do we know:
1.
If the above rule that blocked everything was present during your
nginx-on-port 2377 experiment? In other words is docker adding the rule on
rhel or someone else?
2.
Is the rule absent in iptables on the other distros you tested?
In case we are trying to figure out who is adding the rule, one suspect
may be the Azure linux agent script for RHEL.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33345 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAwxZiD_RiTERprhJSSgXr1OJvALy5HCks5sP3d_gaJpZM4NjA3l>
.
--
- Brian Goff
|
I started a thread with the azure Linux people last week about modifying
this. I'm somewhat skeptical that we can actually get them to change this
though.
|
Let me close this ticket for now, as it looks like it went stale, although we do have some Azure people on this thread now 😅😉 |
Description
I'm unable to join a node to a freshly initialized swarm. Attempting to use the worker join-token from the swarm leader on a prospective worker yields this:
I checked the logs in /var/log/messages:
I also confirmed the clocks are not suffering from skew: the time is identical on both servers.
To confirm that this is not a network issue involving network security groups or equivalent, I spun up an nginx container on port 2378 on the swarm leader node, and attempted to connect.
Output of
sudo netstat -plnt
:This shows dockerd listening on port 2377 (swarm), and docker-proxy listening on 2378 (nginx):
Sure enough, I can connect to nginx from the other node:
Trying to connect to the swarm port (2377) produces an odd error:
It seems that the "no route to host" is an artifact of running in Azure. Basically if Azure isn't aware of a service listening on that port, it reports that error message.
Thinking this might be a special issue with that port, I removed the swarm leader from the swarm (
docker swarm leave --force
), and spun up another nginx instance on the leader node, this time on port 2377.Note: this time it's docker-proxy listening...
Now telnet can connect to port 2377:
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Node successfully joins
Additional information you deem important (e.g. issue happens only occasionally):
I've tried using both 10.0.0.0/16 and 172.17.0.? ip schemes for the underlying VMs -- that didn't make a difference.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Azure
The text was updated successfully, but these errors were encountered: