Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Swarm on Windows 2019 ingress routing not working on some systems #39065

Closed
drnybble opened this issue Apr 12, 2019 · 25 comments
Closed

Comments

@drnybble
Copy link

Description

I create a simple stack to run IIS. It is not reachable through ingress routing on my VM, either via localhost or from a remote machine.

Steps to reproduce the issue:
Deploy the following stack:

version: '3.3'

networks:
  mynet:
    driver: overlay
    attachable: true

services:
  iis:
    image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
    networks:
    - mynet
    ports:
    - "8000:80"

Then:

docker stack deploy -c .\iis.yml iis

Describe the results you received:
From another machine try to access port 8000 -> fails with unable to connect

Describe the results you expected:
Able to connect to IIS on port 8000.

Additional information you deem important (e.g. issue happens only occasionally):
This same testcase works on two other environments I have tried:

  • Windows 2019 running under vmware
  • Windows 2019 running under VirtualBox

This VM runs under KVM. Whether that is the reason I am not sure.

Also, if I just run the IIS container not in the Swarm so it uses the NAT network it works:

docker run -p 8000:80 mcr.microsoft.com/windows/servercore/iis

Thus, it does not appear to be firewall related (and firewall is disabled on this box).

Looking for next steps or diagnostics to understand what is going wrong.

Also -- is it a documented limitation on Windows that ingress routing is not accessible via localhost on a Swarm node? Means I cannot run a Docker registry in the Swarm and access it via localhost on Swarm nodes -- it works on Linux.

Output of docker version:

Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        142dfcedca
 Built:             02/28/2019 06:33:17
 OS/Arch:           windows/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.24)
  Go version:       go1.10.8
  Git commit:       142dfcedca
  Built:            02/28/2019 06:31:15
  OS/Arch:          windows/amd64
  Experimental:     false

Output of docker info:

Containers: 3
 Running: 1
 Paused: 0
 Stopped: 2
Images: 4
Server Version: 18.09.3
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: active
 NodeID: 63hma7e6j05oufpra909ami07
 Is Manager: true
 ClusterID: ljqnmsarwwujbmtnevq2xbb6l
 Managers: 3
 Nodes: 3
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 9.28.238.149
 Manager Addresses:
  9.28.238.142:2377
  9.28.238.144:2377
  9.28.238.149:2377
Default Isolation: process
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows Server 2019 Datacenter Version 1809 (OS Build 17763.316)
OSType: windows
Architecture: x86_64
CPUs: 8
Total Memory: 12GiB
Name: octopus1
ID: 6MCE:SRL5:J2UB:3SD4:VFGD:SDHZ:QRCG:HZND:EDZQ:WBKT:ANEH:T3VO
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

@ghost
Copy link

ghost commented Apr 26, 2019

We're seeing this too. Have tried this with 2019 on vmware and on hyper-v, but no luck. Exact same scenario with standalone containers on nat being accessible from host, however, any service started in Swarm has published ports on ingress without any accessibility from host. Hopefully this will get some attention as we can't find any help anywhere on this topic.

@olljanat
Copy link
Contributor

@drnybble @cmahoski there have been long discussion about similar issues on docker/for-win#1476 but it should works fine on 2019 from both localhost and remote.

How ever there was some issues on Win Srv 2019 after it was released #38498 so please install latest Windows Updates and latest Docker version and try again then.

@drnybble
Copy link
Author

drnybble commented May 14, 2019

Just tried again with the 2019-05 cumulative update. Under a VMWare hypervisor:

The problem I am describing is that Windows 2019 running under the KVM hypervisor is not working at all (cannot access remotely either). Looking for next steps to debug/diagnose -- logging etc.

@jorioux
Copy link

jorioux commented May 21, 2019

Exactly same issue here, I'm on WS2019 build 17763, and running docker EE version 18.09.6

@troyha
Copy link

troyha commented Jun 13, 2019

Having the same issue on WS2019 build 17763.557 and running docker EE version 18.09.6.
ERR_CONNECTION_REFUSED when trying to connect to container using hostname on host, but able to connect from other hosts. Everywhere I read this is supposed to be working in 2019.
I also can;t seem to find an actual link to Windows Support for Docker EE which is also supposed to be a thing, just lots of links to other documents which say it should be working even though it isn't.

@thaJeztah
Copy link
Member

/cc @ddebroy ^^

@troyha
Copy link

troyha commented Jun 13, 2019

The above issue was occurring in an on-premise environment behind a corporate firewall where I had to install Docker manually using the instructions here. This essentially just has you unpack the zip file into program files, rather than installing by package.

I have since created a VM on Azure and tested both installing manually and installing via the package with the following commands:

  • Install-WindowsFeature -Name Containers
  • Install-Module -Name DockerMsftProvider -Repository PSGallery -Force
    -- Y to install nuget
  • Install-Package -Name docker -ProviderName DockerMsftProvider
  • Restart-Computer -Force
  • Start-Service docker

Installing manually had the same issue as above, however using the package commands I can connect to the container on the host using hostname:port, but not using localhost:port, which is fine using hostname for me anyway.

Can anyone tell me why/what the difference between the package installation and manual is? Is there a way to manually configure/install whatever is missing to bridge the gap? This would be a lifesaver!

@troyha
Copy link

troyha commented Jun 17, 2019

So I managed to configure PowerShell to use webproxy server to do a package install and Docker worked after doing this, however I am still having the following issues:

  • Can connect from host to container on host using IP Address only - not hostname or localhost. Not too worries about this as using IP Address is fine.
  • Cannot connect to services on host from within container using IP Adrress e.g. connect to DB, logging server, etc on the same host. This is very inconvenient as I now need to ensure that containers never have to reach a service on their own host e.g. Seq logging is a single-server logging instance so services that would have been running on that node cannot log to it. I will either have to put this on a completely separate server (resources are limited so this is an issue), or figure out a way to route communications through another node first. It also raises issues with HA for services only running on 2/3 nodes because if one of the 2 nodes that service is running on goes down, then the other is useless for the Docker services running on that node anyway.
    If the services had official Windows 2019 Docker images then I could run them in the Swarm and issue solved, but it should't have to be that way should it? I thought Windows 2019 was supposed to have solved these issues?
    Is there anything I am missing? From all the documentation it seems like this "should" be working. Not having any issues with Docker on Linux, only on Windows. Unfortunately this client won't use Linux servers so Windows is the only option.

@troyha
Copy link

troyha commented Jun 17, 2019

I am getting 3 Warnings in Application Events when running swarm init:

  • Failed to set datapath keys in driver overlay: not implemented
  • Failed the node discovery in driver: not implemented
  • Failed to set datapath keys in driver: not implemented

@drnybble
Copy link
Author

drnybble commented Jul 3, 2019

Tried again with Docker 18.09.7 and my original problem persists. This problem is that I cannot connect to my exposed port on the ingress network from localhost OR remotely.

I verified with WireShark that my system receives the incoming connection to port 8000 but a RST is immediately sent indicating that my system is not listening on 8000.

PS C:\Users\Administrator> docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                                                                 PORTS
wdrlcq94ehcp        iis_iis             replicated          1/1                 mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019   *:8000->80/

Any updates on this issue? Diagnostics to capture?

Also of note: my machine has two NICs. The external vswitch is created by Docker against the private NIC not the public one. Does that matter?

@drnybble
Copy link
Author

drnybble commented Jul 4, 2019

I got it to work! I disabled the other NICs on this system so that only the NIC with the public IP was enabled. Otherwise it seems to create the external vswitch against an arbitrary NIC, I even saw it create it against the Npcap loopback adapter used by WireShark.

So the next question: on a multi-NIC system how to control how the external vswitch is created; and is this documented somewhere?

Related: docker/for-win#1399

@drnybble
Copy link
Author

drnybble commented Jul 4, 2019

How to customize the ingress network: https://docs.docker.com/network/overlay/#customize-the-default-ingress-network

See also: https://docs.microsoft.com/en-us/virtualization/windowscontainers/container-networking/advanced#bind-a-network-to-a-specific-network-interface

First I did 'docker network inspect ingress' to see the subnet/gateway settings. Then remove it and recreate. Here is an example using the com.docker.network.windowsshim.interface option to specify the interface:

 docker network create --driver overlay --ingress --subnet=10.255.0.0/16 --gateway=10.255.0.1 --opt "com.docker.network.windowsshim.interface=Ethernet 2" ingress

Also the network configuration is used on all nodes so better hope you have the same named NIC everywhere

You have to restart the Docker service for this to take effect.

@drnybble drnybble closed this as completed Jul 4, 2019
@djarvis
Copy link

djarvis commented Feb 19, 2020

ERR_CONNECTION_REFUSED when trying to connect to container using hostname on host, but able to connect from other hosts.

Same here. I can connect from other hosts no problem, cannot connect from local machine with localhost, 127.0.0.1, host name, any of the network IPs given in docker network inspect, etc.

Docker version 19.03.5
Windows Server 2019 Standard 1809 17763.1039

docker network create --driver=overlay xxx

@liraelia
Copy link

I have similar problem my host machine is running:
Docker version 19.03.5
Windows Server 2019 Standard 1809 17763.1039
and I cannot connect from anywhere, not only my local machine.
My images are on version 17763.1039 as well.

I tested my compose with the same images( version 17763.1039) on older version of windows (17763.107) today and everything works the way I would expect. I can access my images going to http://hostname:port.
So the problem for me only happens when host machine is running the latest version of OS.

@djarvis
Copy link

djarvis commented May 18, 2020

Windows Server 2019 running Docker/Swarm, ingress network was working fine until this was installed:

2020-05 Cumulative Update for Windows Server 2019 (1809) for x64-based Systems (KB4551853)

This broke something with the ingress network such that no traffic could enter through any exposed/published ports.

Uninstalling this update made it all work again.

@timparkinson
Copy link

This broke something with the ingress network such that no traffic could enter through any exposed/published ports.

I believe I might have just come across the same thing on a new swarm setup - only a single port works when multiple are published. I'll try and get a simple repo together when I have time.

@masaeedu
Copy link
Contributor

@djarvis Would you happen to know if there's a canonical issue somewhere in the issue tracker about this problem?

@djarvis
Copy link

djarvis commented Jun 10, 2020

@djarvis Would you happen to know if there's a canonical issue somewhere in the issue tracker about this problem?

There is this: #40998

I have a ticket open with Microsoft as well.

@simmohall
Copy link

This broke something with the ingress network such that no traffic could enter through any exposed/published ports.

I believe I might have just come across the same thing on a new swarm setup - only a single port works when multiple are published. I'll try and get a simple repo together when I have time.

@timparkinson i might have similar issue with yours. #40606 though i think unrelated to the 2020-05 Cumulative Update as I've had the problem for a while :)

@KhimairaCrypto
Copy link

not sure if my issue is related or not #41094

@FrankAtHexagon
Copy link

Is this issue resolved? My team is seeing a similar issue where we're unable to access a service from the machine it's running on. This is a breaking issue in most cases for using Swarm.

@olljanat
Copy link
Contributor

olljanat commented Feb 4, 2021

@FrankAtHexagon unfortunately Microsoft looks to be constantly breaking swatm compatibility Last know good state is Win 2019 with https://support.microsoft.com/en-us/topic/october-20-2020-kb4580390-os-build-17763-1554-preview-ac4799c9-838f-8665-a968-0f19b6cb1049

Look #40998 and #41354

@psandeep09
Copy link

  • it works on Linu

is it resolved?

@bjork-dev
Copy link

The localhost issue is still present in Windows Server 2022.... Running a 3-node swarm cluster but the services cannot communicate with e.g RabbitMQ on their own host, basically making it a 2-node cluster...

There's next to no mentions of this issue anywhere except here, did anyone ever find a resolution to this!?

@djarvis
Copy link

djarvis commented Mar 29, 2023

@bjork-dev I don't know about Windows Server 2022. But Microsoft has really dropped the ball on anything Docker related. any sort of enterprise support now goes straight through Mirantis, so probably with some paid support one can get some better support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests