Unable to retrieve user's IP address in docker swarm mode #25526

Open
PanJ opened this Issue Aug 9, 2016 · 104 comments

Comments

Projects
None yet

PanJ commented Aug 9, 2016

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 22:00:36 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 155
 Running: 65
 Paused: 0
 Stopped: 90
Images: 57
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 868
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host overlay null bridge
Swarm: active
 NodeID: 0ddz27v59pwh2g5rr1k32d9bv
 Is Manager: true
 ClusterID: 32c5sn0lgxoq9gsl1er0aucsr
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot interval: 10000
  Heartbeat tick: 1
  Election tick: 3
 Dispatcher:
  Heartbeat period: 5 seconds
 CA configuration:
  Expiry duration: 3 months
 Node Address: 172.31.24.209
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-92-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.42 GiB
Name: ip-172-31-24-209
ID: 4LDN:RTAI:5KG5:KHR2:RD4D:MV5P:DEXQ:G5RE:AZBQ:OPQJ:N4DK:WCQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: panj
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

Steps to reproduce the issue:

  1. run following service which publishes port 80
docker service create \
--name debugging-simple-server \
--publish 80:3000 \
panj/debugging-simple-server
  1. Try connecting with http://<public-ip>/.

Describe the results you received:
Neither ip nor header.x-forwarded-for is the correct user's IP address.

Describe the results you expected:
ip or header.x-forwarded-for should be user's IP address. The expected result can be archieved using standalone docker container docker run -d -p 80:3000 panj/debugging-simple-server. You can see both of the results via following links,
http://swarm.issue-25526.docker.takemetour.com:81/
http://container.issue-25526.docker.takemetour.com:82/

Additional information you deem important (e.g. issue happens only occasionally):
This happens on both global mode and replicated mode.

I am not sure if I missed anything that should solve this issue easily.

In the meantime, I think I have to do a workaround which is running a proxy container outside of swarm mode and let it forward to published port in swarm mode (SSL termination should be done on this container too), which breaks the purpose of swarm mode for self-healing and orchestration.

Member

thaJeztah commented Aug 9, 2016

/cc @aluzzardi @mrjana ptal

Contributor

mavenugo commented Aug 9, 2016

@PanJ can you please share some details on how debugging-simple-server determines the ip ? Also what is the expectation if a service is scaled to more than 1 replica across multiple hosts (or global mode) ?

PanJ commented Aug 9, 2016

@mavenugo it's koa's request object which uses node's remoteAddress from net module. The result should be the same for any other libraries that can retrieve remote address.

The expectation is that ip field should always be remote address regardless of any configuration.

marech commented Sep 19, 2016

@PanJ you still use your workaround or found some better solution?

sanimej commented Sep 19, 2016

@PanJ When I run your app as a standalone container..

docker run -it --rm -p 80:3000 --name test panj/debugging-simple-server

and access the published port from another host I get this

vagrant@net-1:~$ curl 192.168.33.12
{"method":"GET","url":"/","header":{"user-agent":"curl/7.38.0","host":"192.168.33.12","accept":"*/*"},"ip":"::ffff:192.168.33.11","ips":[]}
vagrant@net-1:~$

192.168.33.11 is the IP of the host in which I am running curl. Is this the expected behavior ?

PanJ commented Sep 19, 2016

@sanimej Yes, it is the expected behavior that should be on swarm mode as well.

PanJ commented Sep 19, 2016

@marech I am still using the standalone container as a workaround, which works fine.

In my case, there are 2 nginx intances, standalone and swarm instances. SSL termination and reverse proxy is done on standalone nginx. Swarm instance is used to route to other services based on request host.

sanimej commented Sep 19, 2016

@PanJ The way the published port of a container is accessed is different in swarm mode. In the swarm mode a service can be reached from any node in the cluster. To facilitate this we route through an ingress network. 10.255.0.x is the address of the ingress network interface on the host in the cluster from which you try to reach the published port.

PanJ commented Sep 19, 2016

@sanimej I kinda saw how it works when I dug into the issue. But the use case (ability to retrieve user's IP) is quite common.

I have limited knowledge on how the fix should be implemented. Maybe a special type of network that does not alter source IP address?

Rancher is similar to Docker swarm mode and it seems to have expected behavior. Maybe it is a good place to start.

marech commented Sep 20, 2016

@sanimej good idea could be add all IPs to X-Forwarded-For header if its possible then we can see all chain.

@PanJ hmm, and how your nignx standalone container communicate to swarm instance, via service name or ip? Maybe can share nginx config part where you pass it to swarm instance.

PanJ commented Sep 20, 2016

@marech standalone container listens to port 80 and then proxies to localhost:8181

server {
  listen 80 default_server;
  location / {
    proxy_set_header        Host $host;
    proxy_set_header        X-Real-IP $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        X-Forwarded-Proto $scheme;
    proxy_pass          http://localhost:8181;
    proxy_read_timeout  90;
  }
}

If you have to do SSL termination, add another server block that listens to port 443, then do the SSL termination and proxies to localhost:8181 as well

Swarm mode's nginx publishes 8181:80 and routes to another service based on request host.

server {
  listen 80;
  server_name your.domain.com;
  location / {
    proxy_pass          http://your-service:80;
    proxy_set_header Host $host;
    proxy_read_timeout  90;
  }
}

server {
  listen 80;
  server_name another.domain.com;
  location / {
    proxy_pass          http://another-service:80;
    proxy_set_header Host $host;
    proxy_read_timeout  90;
  }
}

o3o3o commented Oct 24, 2016

In our case, our API RateLimit and other functions is depend on the user's ip address. Is there any way to skip the problem in swarm mode?

dack commented Nov 1, 2016

I've also run into the issue when trying to run logstash in swarm mode (for collecting syslog messages from various hosts). The logstash "host" field always appears as 10.255.0.x, instead of the actual IP of the connecting host. This makes it totally unusable, as you can't tell which host the log messages are coming from. Is there some way we can avoid translating the source IP?

vfarcic commented Nov 2, 2016

+1 for a solution for this issue.

Without the ability to retrieve user's IP prevents us from using monitoring solutions like Prometheus.

dack commented Nov 2, 2016

Perhaps the linux kernel IPVS capabilities would be of some use here. I'm guessing that the IP change is taking place because the connections are being proxied in user space. IPVS, on the other hand, can redirect and load balance requests in kernel space without changing the source IP address. IPVS could also be good down the road for building in more advanced functionality, such as different load balancing algorithms, floating IP addresses, and direct routing.

vfarcic commented Nov 2, 2016

For me, it would be enough if I could somehow find out the relation between the virtual IP and the IP of the server the endpoint belongs to. That way, when Prometheus send an alert related to some virtual IP, I could find out what is the affected server. It would not be a good solution but it would be better than nothing.

dack commented Nov 2, 2016

@vfarcic I don't think that's possible with the way it works now. All client connections come from the same IP, so you can't translate it back. The only way that would work is if whatever is doing the proxy/nat of the connections saved a connection log with timestamp, source ip, and source port. Even then, it wouldn't be much help in most use cases where the source IP is needed.

vfarcic commented Nov 2, 2016

I probably did not explain well the use case.

I use Prometheus that is configured to scrap exporters that are running as Swarm global services. It uses tasks.<SERVICE_NAME> to get the IPs of all replicas. So, it's not using the service but replica endpoints (no load balancing). What I'd need is to somehow figure out the IP of the node where each of those replica IPs come from.

vfarcic commented Nov 3, 2016

I just realized the "docker network inspect <NETWORK_NAME>" provides information about containers and IPv4 addresses of a single node. Can this be extended so that there is a cluster-wide information of a network together with nodes?

Something like:

       "Containers": {
            "57bc4f3d826d4955deb32c3b71550473e55139a86bef7d5e584786a3a5fa6f37": {
                "Name": "cadvisor.0.8d1s6qb63xdir22xyhrcjhgsa",
                "EndpointID": "084a032fcd404ae1b51f33f07ffb2df9c1f9ec18276d2f414c2b453fc8e85576",
                "MacAddress": "02:42:0a:00:00:1e",
                "IPv4Address": "10.0.0.30/24",
                "IPv6Address": "",
                "Node": "swarm-4"
            },
...

Note the addition of the "Node".

If such information would be available for the whole cluster, not only a single node with the addition of a --filter argument, I'd have everything I'd need to figure out the relation between a container IPv4 address and the node. It would not be a great solution but still better than nothing. Right now, when Prometheus detects a problem, I need to execute "docker network inspect" on each node until I find out the location of the address.

tlvenn commented Nov 3, 2016

I agree with @dack , given the ingress network is using IPVS, we should solve this issue using IPVS so that the source IP is preserved and presented to the service correctly and transparently.

The solution need to work at the IP level so that any service that are not based on HTTP can still work properly as well (Can't rely on http headers...).

And I cant stress out how important this is, without it, there are many services that simply cant operate at all in swarm mode.

tlvenn commented Nov 3, 2016

@kobolog might be able to shed some light on this matter given his talk on IPVS at DockerCon.

ljb2of3 commented Nov 4, 2016

Just adding myself to the list. I'm using logstash to accept syslog messages, and they're all getting pushed into elasticsearch with the host IP set to 10.255.0.4, which makes it unuseable, and I'm going to have to revert to my non-containerized logstash deployment if there's no fix for this.

Contributor

mavenugo commented Nov 4, 2016

@mrjana can u pls add the suggestion you had to workaround this problem ?

Contributor

mrjana commented Nov 4, 2016

IPVS is not a userspace reverse proxy which can fix up things in HTTP layer. That is the difference between a userspace proxy like HAProxy and this. If you want to use HAProxy you could do that by putting a HAProxy in the cluster and have all your service instances and HAProxy to participate in the same network. That way HAProxy can fix up HTTP header.x-forwarded-for. Or if the L7 load balancer is external to the cluster you can use the upcoming (in 1.13) feature for a new PublishMode called Host PublishMode which will expose each one of the individual instances of the service in it's own individual port and you can point your external load balancer to that.

dack commented Nov 5, 2016

@mrjana The whole idea of using IPVS (instead of whatever docker currently does in swarm mode) would be to avoid translating the source IP to begin with. Adding an X-Forwarded-For might help for some HTTP applications, but it's of no use whatsoever for all the other applications that are broken by the current behaviour.

tlvenn commented Nov 5, 2016

@dack my understanding is the Docker ingress network already use IPVS.

tlvenn commented Nov 5, 2016

If you want to use HAProxy you could do that by putting a HAProxy in the cluster and have all your service instances and HAProxy to participate in the same network. That way HAProxy can fix up HTTP header.x-forwarded-for

That would not work either @mrjana , the only way for HAProxy to get the client ip is to run outside the ingress network using docker run or directly on the host but then you cant use any of your services since they are on a different network and you cant access them.

Simply put, there is absolutely no way as far as I know to deal with this as soon as you use docker services and swarm mode.

It would be interesting if the author(s) of the docker ingress network could join the discussion as they would probably have some insight as to how IPVS is configured / operated under the hood ( there are many modes for IPVS) and how we can fix the issue.

dack commented Nov 5, 2016

@tlvenn Do you know where this is in the source code? I could be wrong, but I don't believe it is using IPVS based on some things I've observed:

  • The source port is translated (the whole reason for this issue). IPVS doesn't do this. Even in NAT mode, it only translates the destination address. You need to use the default route or policy routing to send return packets back to the IPVS host.
  • When a port is published in swarm mode, all the dockerd instances in the swarm listen on the published port. If IPVS was used, then it would happen in kernel space and dockerd would not be listening on the port.

tlvenn commented Nov 6, 2016

Hi @dack,

From their blog:

Internally, we make this work using Linux IPVS, an in-kernel Layer 4 multi-protocol load balancer that’s been in the Linux kernel for more than 15 years. With IPVS routing packets inside the kernel, swarm’s routing mesh delivers high performance container-aware load-balancing.

The code source should live in swarmkit project if I am not wrong.

I wonder if @stevvooe can help us out understand what is the underlying issue here.

dack commented Nov 6, 2016

OK, I've had a brief look through the code and I think I have a slightly better understanding of it now. It does indeed appear to be using IPVS as stated in the blog. SNAT is done via an iptables rule which set up in service_linux.go. If I understand correctly, the logic behind it would be something like this (assuming node A receives a client packet for the service running on node B):

  • Swarm node A receives the client packet. IPVS/iptables translates (src ip)->(node a ip) and (dst ip)->(node B ip)
  • The packet is forwarded to node B
  • Node B sends it's reply to node A (as that's what it sees as the src ip)
  • Node A translates the src and dst back to the original values and forwards the reply to the client

I think the reasoning behind the SNAT is that the reply must go through the same node that the original request came through (as that's where the NAT/IPVS state is stored). As requests may come through any node, the SNAT is used so that the service node knows which node to route the request back through. In an IPVS setup with a single load balancing node, that wouldn't be an issue.

So, the question is then how to avoid the SNAT while still allowing all nodes handle incoming client requests. I'm not totally sure what the best approach is. Maybe there's a way to have a state table on the service node so that it can use policy routing to direct replies instead of relying on SNAT. Or maybe some kind of encapsulation could help (VXLAN?). Or, the direct routing method of IPVS could be used. This would allow the service node to reply directly to the client (rather than via the node that received the original request) and would allow adding new floating IPs for services. However, it would also mean that the service can only be contacted via the floating IP and not the individual node IPs (not sure if that's a problem for any use cases).

tlvenn commented Nov 6, 2016

Pretty interesting discovery @dack !

Hopefully a solution will be found to skip that SNAT all together.

In the meantime, there is maybe a workaround that has been committed not long ago which introduce a host-level port publishing with PublishMode, effectively bypassing the ingress network.

docker/swarmkit#1645

Contributor

aluzzardi commented Nov 6, 2016

Hey, thanks for the large amount of feedback - we'll take a deep look at this issue after the weekend.

Some info in the meantime:

@tlvenn: @mrjana is the main author behind the ingress network feature. Source lives in docker/libnetwork mostly, some in SwarmKit

@dack: it is indeed backed by IPVS

kobolog commented Nov 6, 2016

@tlvenn as far as I know, Docker Swarm uses masquerading, since it's the most straightforward way and guaranteed to work in most configurations. Plus this is the only mode that actually allows to masquerade ports too [re: @dack], which is handy. In theory, this issue could be solved by using IPIP encapsulation mode – the packet flow will be like this then:

  • A packet arrives at the gateway server – in our case any node of the swarm – and IPVS on that node determines that it is in fact a packet for a virtual service, based on its destination IP address and port.
  • Packet is encapsulated into another IP packet and sent over to the real server which was choosen based on the load balancing algorithm.
  • The real server receives the enclosing packet, decapsulates it and sees real client IP as source and virtual service IP as destination. All real servers are supposed to have a non-ARPable interface alias with the virtual service IP so that they would assume that this packet is actually destined for them.
  • The real server processes the packet and sends the response back to the client directly. The source IP in this case will be the virtual service IP, so no martian replies involved, which is good.

There're, of course, many caveats and things-which-can-go-wrong, but generally this is possible and IPIP mode is widely used in production.

Hoping a solution can be found soon for this, as IP-fixation and other security checks need to be able to receive the correct external IP.

se7enack commented Nov 8, 2016

Watching. Our product leverages source IP information for security and analytics.

tlvenn commented Nov 15, 2016

@aluzzardi any update for us ?

bump, we need this to be working for a very large project we are starting early next year.

dack commented Nov 16, 2016

Examing the flow, it seems to currently work like this (in this example, node A receives the incoming traffic and node B is running the service container):

  • node A performs DNAT to direct the packet into the ingress_sbox network namespace (/var/run/docker/netns/ingress_sbox)
  • ingress_sbox on node A runs IPVS in NAT mode, which performs DNAT to direct the packet to the container on node B (via the ingress overlay network) and also SNAT to change the source IP to the node A ingress overlay network IP
  • the packet is routed through the overlay to the real server
  • the return packets follow the same path in reverse, rewriting the source/dest addresses back to the original values

I think the SNAT could be avoided with something like this:

  • node A passes the packet into ingress_sbox without any NAT (iptables/policy routing ?)
  • node A ingress_sbox runs IPVS in direct routing mode, which sends packet to node B via ingress overlay network
  • container on node B receives the unaltered packet (container must accept packets for all public IPs, but not send ARP for them. there are several ways to do this, see IPVS docs).
  • the return packets send directly from node B to the client (does not need to go back through the overlay network or node A)

As an added bonus, no NAT state needs to be stored and overlay network traffic is reduced.

tlvenn commented Nov 25, 2016

@aluzzardi @mrjana Any update on this please ? A little bit of feedback from Docker would be very much appreciated.

Watching. without source IP information, most of our services can't work as expected

tlvenn commented Nov 26, 2016

How did that happen ?
unassign_bug

Contributor

mavenugo commented Nov 26, 2016

@tlvenn seems like a bug in Github ?

@PanJ @tlvenn @vfarcic @dack and others, PTAL #27917. We introduced the ability to enable service publish mode = host which will provide a way for the service to bypass IPVS & bring back docker run -p like behavior and will that will retain the source-ip for cases that need it.

Pls try 1.13.0-rc2 and provide feedback.

tlvenn commented Nov 26, 2016

ya pretty weird @mavenugo ..

Regarding the publish mode, I had already linked this from swarm kit above, this could be a workaround but I truly hope a proper solution comes with Docker 1.13 to address this issue for good.

This issue could very much be categorized as a bug because preserving the source ip is the behaviour we as users expect and it's a very serious limitation of the docker services right now.

I believe both @kobolog and @dack have come up with some potential leads on how to solve this and it's been almost 2 weeks with no follow up on those from Docker side.

Could we please have some visibility on who is looking into this issue at Docker and a status update ? Thanks in advance.

Contributor

mavenugo commented Nov 26, 2016

Other than #27917, there is no other solution for 1.13. The Direct-return functionality needs to be analyzed for various use-cases and should not be taken lightly to be considered as a bug-fix. We can look into this for 1.14. But, this also falls under the category of configurable LB behavior, that includes the algorithm (rr vs 10 other methods), Data-path (LVS-DR, LVS-NAT & LVS-TUN). If someone is willing to contribute to this, pls push a PR and we can get that moving.

tlvenn commented Nov 27, 2016

Fair enough I guess @mavenugo given we have an alternative now.

At the very least, can we amend the doc for 1.13 so it clearly state that when using docker services with the default ingress publishing mode, the source ip is not preserved and hint at using the host mode if this is a requirement for running the service ?

I think it will help people who are migrating to services to not being burnt by this unexpected behaviour.

Contributor

mavenugo commented Nov 27, 2016

Sure and yes a doc update to indicate this behavior and the workaround of using the publish mode=host will be useful for such use-cases that fails in LVS-NAT mode.

virtuman commented Jan 6, 2017

Just checking back in to see if there was no new developments in getting this real up thing figured out? It certainly is a huge limitation for us as well

Is a solution on the roadmap for docker 1.14? We are delayed deployed our solutions using docker due in part to this issue.

Would love to see a custom header added to the http/https request which preserves the client-ip. This should be possible, shouldn't it? I don't mind when X_Forwarded_for is overwritten, I just want to have a custom field which is only set the very first time the request enters the swarm.

sanimej commented Feb 17, 2017

@dack @kobolog In typical deployments of LVS-Tunnel and LVS-DR mode the destination IP in the incoming packet will be the service VIP which is also programmed as a non ARP IP in the real servers. Routing mesh works in a fundamentally different way, the incoming request could be to any of the hosts. For the real server to accept the packet (in any LVS mode) the destination IP has to be changed to a local IP. There is no way for the reply packet from the backend container to go back with the right source address. Instead of direct return, we can try to get the reply packet back to the ingress host. But there is no clean way to do it except by changing the source IP which brings us back to square one.

@thaJeztah I think we should clarify this in the documentation, suggest using the host mod if client IP has to be preserved and close this issue.

dack commented Feb 18, 2017

@sanimej I still don't see why it's impossible to do this without NAT. Couldn't we just have the option to use, for example, the regular LVS-DR flow? Docker adds the non-arp vip to the appropriate nodes, LVS directs the incoming packets to the nodes, and outgoing packets return directly. Why does it matter that the incoming packet could hit any host? That's no different than standard LVS with multiple frontend and multiple backend servers.

pi0 commented Feb 19, 2017

@thaJeztah thanks for workaround :)
If you are deploying your proxy with compose version 3 new publish syntax is not supported so we can patch deployed service using this command (replace nginx_proxy with service name)

docker service update nginx_proxy \
	--publish-rm 80 \
	--publish-add "mode=host,published=80,target=80" \
	--publish-rm 443 \
	--publish-add "mode=host,published=443,target=443"

sanimej commented Feb 21, 2017

@dack In the regular LVS-DR flow the destination IP will be the service VIP. So the LB can send the packet to the backend without any dest IP change. This is not the case with routing mesh because the incoming packet's dest IP will be one of the host's IP.

tlvenn commented Feb 21, 2017

@sanimej any feedback on the proposal above to use IPIP encapsulation mode to solve this issue ?

sanimej commented Feb 21, 2017

@tlvenn LVS-IP tunnel works very similar to LVS-DR, except that the backend gets the packet through an IP in IP tunnel rather than a mac-rewrite. So it has the same problem for the routing mesh use case.

From the proposal you referred to..
The real server receives the enclosing packet, decapsulates it and sees real client IP as source and virtual service IP as destination.

Destination IP of the packet would be the IP of the host to which the client sent the packet and not the VIP. If its not rewritten, the real server would drop it after removing the outer IP header. If the destination IP is rewritten, the real server's reply to the client will have an incorrect Source IP resulting in connection failure.

tlvenn commented Feb 21, 2017

Thanks for the clarification @sanimej. Could you perhaps implement the PROXY protocol ? It would not provide a seamless solution but at least it would offer the service a solution to resolve the user IP.

sanimej commented Feb 21, 2017

There is a kludgy way to achieve the source IP preservation by splitting the source port range into blocks and assign a block for each host in the cluster. Then its possible to do a hybrid NAT+DR approach, where the ingress host does the usual SNAT and sends the packet to a real server. On the host where the real server is running, based on the source IP do a SNAT to change the source port to a port in the range assigned for the ingress host. Then on the return packet from the container match against the source port range (and the target port) and change the source IP to that of the ingress Host.

Technically this would work But impractical and fragile in real deployments where the cluster members are being added and removed quickly. This also reduces the port space significantly.

sanimej commented Feb 21, 2017

The NAT+DR approach I mentioned wouldn't work because the source IP can't be changed in the ingress host. By changing only the source port to one in the range for that particular host and using the routing policy from the backend host to get the packet back to the ingress host might be an option. This still has other issues I mentioned earlier.

lpakula commented Mar 17, 2017

@thaJeztah
Is any workaround at the moment to forward real IP address from Nginx container to web container?
I have Nginx container running in global mode and published to host, so Nginx container gets a correct IP address. Both containers see each other fine, however, web container gets Nginx container IP address, not a client one.
Nginx is a reverse proxy for the web, and web runs uwsgi on port 8000:

server {
    resolver 127.0.0.11;
    set $web_upstream http://web:8000;

    listen 80;
    server_name domain.com;
    location / {
        proxy_pass $web_upstream;
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
        proxy_redirect off;
        proxy_buffering off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

pi0 commented Mar 17, 2017

@lpakula Please check my answer above + this working nginx configuration

lpakula commented Mar 17, 2017

@pi0 Thanks for reply

I'm using nginx configuration from the link, but IP address is still wrong, i must have something missing in my configuration

I have a docker (17.03.0-ce) swarm cluster with overlay network and two services

    docker service create --name nginx --network overlay_network --mode=global \
        --publish mode=host,published=80,target=80 \
        --publish mode=host,published=443,target=443 \
        nginx:1.11.10

    docker service create --name web --network overlay_network \
        --replicas 1 \
        web:newest

Nginx container uses the latest official container https://hub.docker.com/_/nginx/
Web container runs uwsgi server on port 8000

I'm using global nginx.conf from the link and conf.d/default.conf looks as follow:

   server {
       resolver 127.0.0.11;
       set $web_upstream http://web:8000;

       listen 80;
       server_name domain.com;
       location / {
        proxy_pass $web_upstream;
      }
  }

And then nginx container logs:

  194.168.X.X - - [17/Mar/2017:12:25:08 +0000] "GET / HTTP/1.1" 200

Web container logs:

  10.0.0.47 - - [17/Mar/2017 12:25:08] "GET / HTTP/1.1" 200 -

What is missing there?

PanJ commented Mar 17, 2017

pi0 commented Mar 17, 2017

@lpakula Ah there is another thing your web:newest image should honor X-Real-IP header too. nginx won't automatically change senders IP it just sends a hint header.

lpakula commented Mar 17, 2017

@pi0 @PanJ
It does make sense, thanks guys!

sirlp2 commented Mar 17, 2017

@dsbudiac dsbudiac referenced this issue in docker/dockercloud-haproxy Mar 22, 2017

Closed

real ip, forwarded for etc? #134

teohhanhui commented Apr 21, 2017

nginx supports IP Transparency using the TPROXY kernel module.

@stevvooe Can Docker do something like that too?

Contributor

stevvooe commented May 1, 2017

nginx supports IP Transparency using the TPROXY kernel module.
@stevvooe Can Docker do something like that too?

Unlikely, as the entry needs to be tracked across nodes. I'll let @sanimej or @mavenugo.

Can swarm provide the REST API to get the client IP address?

Member

thaJeztah commented Jul 25, 2017

@tonysongtl that's not related to this issue

@vonloh vonloh referenced this issue in containous/traefik Jul 28, 2017

Closed

expecting X-Forwarded-For: users-real-public-ip #1880

kmbulebu commented Aug 8, 2017

Something else to consider is how your traffic is delivered to your nodes in a highly available setup. A node should be able to go down without creating errors for clients. The current recommendation is to use an external load balancer (ELB, F5, etc) and load balance at Layer 4 to each Swarm node, with a simple Layer 4 health check. I believe F5 uses SNAT, so the best case in this configuration is to capture the single IP of your F5, and not the real client IP.

References:
https://docs.docker.com/engine/swarm/ingress/#configure-an-external-load-balancer
https://success.docker.com/Architecture/Docker_Reference_Architecture%3A_Docker_EE_Best_Practices_and_Design_Considerations
https://success.docker.com/Architecture/Docker_Reference_Architecture%3A_Universal_Control_Plane_2.0_Service_Discovery_and_Load_Balancing

sandys commented Aug 17, 2017

mirroring the comment above - can proxy protocol not be used ? All cloud load balancers and haproxy use this for source ip preservation.

Calico also has ipip mode - https://docs.projectcalico.org/v2.2/usage/configuration/ip-in-ip - which is one of the reasons why github uses it. https://githubengineering.com/kubernetes-at-github/

@yangm97 yangm97 referenced this issue in GlowstoneMC/Glowstone Aug 19, 2017

Open

Issue with AuthMe and proxy-support #547

mostolog commented Aug 24, 2017

Hi.

For the sake of understanding and completeness, let me summarize and please correct me if I'm wrong:

The main issue is that containers aren't receiving original src-IP but swarm VIP. I have replicated this issue with the following scenario:

create docker swarm
docker service create --name web --publish 80:80 nginx
access.log source IP is 10.255.0.7 instead of client's browser IP

It seems:

When services within swarm are using (default) mesh, swarm does NAT to ensure traffic from same origin is always sent to same host-running-service?
Hence, it's loosing the original src-IP and replacing it by swarm's service VIP.

Seems @kobolog #25526 (comment) and @dack #25526 (comment) proposals were refuted by @sanimej #25526 (comment) #25526 (comment) but, TBH, his arguments aren't fully clear to me yet, neither I understand why thread hasn't been closed if this is definitively impossible. @stevvooe ?

@sanimej wouldn't this work?:

  1. Swarm receives message with src-IP=A and destination="my-service-virtual-address"
  2. Package is sent to a swarm node running that service, encapsulating the original msg.
  3. Node forwards to task changing destination to container-running-that-service-IP
    Swarm and nodes could maintain tables to ensure traffic from same origin is forwarded to same node whenever possible.

Wouldn't an option to enable "reverse proxy instead of NAT" for specific services solve all this issues satisfying everybody?

On the other hand, IIUC, the only option left is to use https://docs.docker.com/engine/swarm/services/#publish-a-services-ports-directly-on-the-swarm-node, which -again IIUC- seems to be like not using mesh at all, hence I don't see the benefits of using swarm mode (vs compose). In fact, it looks like pre-1.12 swarm, needing Consul and so.

Thanks for your help and patience.
Regards

mostolog commented Aug 25, 2017

@sanimej
Even more...why Docker is not just doing a port forwarding NAT (changing only dest IP/port) ?

  1. Swarm receive message "from A to myservice"
  2. Swarm forwards message to host running that service, setting dest=node1
  3. Node1 receive message "from A to node1", and forwards setting dest=container1
  4. Container1 receives message "from A to container1"
  5. To reply, container use default gateway route

I'd just like to chime in; while I do understand that there is no easy way to do this, not having the originating IP address preserved in some manner severely hampers a number of application use cases. Here's a few I can think of off the top of my head:

  • Being able to have metrics detailing where your users originate from is vital for network/service engineering.

  • In many security applications you need to have access to the originating IP address in order to allow for dynamic blacklisting based upon service abuse.

  • Location awareness services often need to be able to access the IP address in order to locate the user's general location when other methods fail.

From my reading of this issue thread, it does not seem that the given work-around(s) work very well when you want to have scalable services within a Docker Swarm. Limiting yourself to one instance per worker node greatly reduces the flexibility of the offering. Also, maintaining a hybrid approach of having an LB/Proxy on the edge running as a non-Swarm orchestrated container before feeding into Swarm orchestrated containers seems like going back in time. Why should the user need to maintain 2 different paradigms for service orchestration? What about being able to dynamically scale the LB/Proxy at the edge? That would have to be done manually, right?

Could the Docker team perhaps consider these comments and see if there is some way to introduce this functionality, while still maintaining the quality and flexibility present in the Docker ecosystem?

As a further aside, I'm currently getting hit by this now. I have a web application which forwards authorized/authenticated requests to a downstream web server. Our service technicians need to be able to verify whether people have reached the downstream server, which they like to use web access logs for. In the current scenario, there is no way for me to provide that functionality as my proxy server never sees the originating IP address. I want my application to be easily scalable, and it doesn't seem like I can do this with the work-arounds presented, at least not without throwing new VMs around for each scaled instance.

trajano commented Sep 6, 2017

@Jitsusama could Kubernetes solve your issue?

trajano commented Sep 6, 2017

@thaJeztah is there a way of doing the work around using docker-compose?

I tried


`services:
  math:
    build: ./math
    restart: always
    ports:
    - target: 12555
      published: 12555
      mode: host

But it seems to take 172.x.x.1 as the source IP

@trajano, I have no clue. Does Kubernetes somehow manage to get around this issue?

monotykamary commented Sep 8, 2017

@Jitsusama
Yes, they have documentation referring to how they preserve the source IP. It is functional, but not so pretty if you don't use a Load Balancer since the packet gets dropped on nodes without those endpoints. If you plan on using Rancher as your self-hosted Load Balancer, unfortunately it currently does not yet support it.

@trajano

But it seems to take 172.x.x.1 as the source IP

If you're accessing your application locally, that IP should be correct (if you use swarm) since the docker_gwbridge is the interface that interacts with your proxy container. You can try accessing the app from another machine within your IP network to see if it catches the right address.

As for the compose workaround, it is possible. Here, I use the image jwilder/nginx-proxy as my frontend reverse-proxy (to simplify concepts) along with an official build image of nginx as the backend service. I deploy the stack in Docker Swarm Mode:

version: '3.3'

services:

  nginx-proxy:
    image: 'jwilder/nginx-proxy:alpine'
    deploy:
      mode: global
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
    volumes:
      - /var/run/docker.sock:/tmp/docker.sock:ro

  nginx:
    image: 'nginx:1.13.5-alpine'
    deploy:
      replicas: 3
    ports:
      - 80
      - 443
    environment:
      - VIRTUAL_HOST=website.local
$ echo '127.0.0.1 website.local' | sudo tee -a /etc/hosts
$ docker stack deploy --compose-file docker-compose.yml website

This will create a website_default network for the stack. My endpoint is defined in the environment variable VIRTUAL_HOST and accessing http://website.local gives me:

website_nginx-proxy.0.ny152x5l9sh7@Sherry    | nginx.1    | website.local 172.18.0.1 - - [08/Sep/2017:21:33:36 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36"
website_nginx.1.vskh5941kgkb@Sherry    | 10.0.1.3 - - [08/Sep/2017:21:33:36 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36" "172.18.0.1"

Note that the end of the header for website_nginx.1.vskh5941kgkb has hint of the original IP (172.18.0.1). X-Forwarded-For & X-Real-IP are set in the nginx.tmpl of jwilder/nginx-proxy by default.

For port 443, I was not able to add both ports in the docker-compose file, so I just use:

docker service update website_nginx-proxy \
	--publish-rm 80 \
	--publish-add "mode=host,published=80,target=80" \
	--publish-rm 443 \
	--publish-add "mode=host,published=443,target=443" \
	--network-add "<network>"

while also adding networks that I want to reverse-proxy with apps containing the environment variable VIRTUAL_HOST. More granular options are possible in the documentation for jwilder/nginx-proxy, or you can create your own setup.

Ingress controllers on Kubernetes essentially do the same thing, as ingress charts (usually) have support X-Forwarded-For and X-Real-IP with a bit more flexibility with choice and type of ingresses and also their deployment replicas.

sandys commented Sep 9, 2017

Traefik did add proxy_protocol support a few weeks ago and is available from v1.4.0-rc1 onwards.

sandys commented Sep 10, 2017

sandys commented Sep 11, 2017

im also confused on the relationship of this bug to infrakit . e.g. docker/infrakit#601 can someone comment on the direction that docker swarm is going to take ?

Will swarm rollup into infrakit ? I'm especially keen on the ingress side of it.

blazedd commented Oct 10, 2017

We are running into this issue as well. We want to know the client ip and requested IP for inbound connections. For example if the user performs a raw TCP connection to our server, we want to know what their IP is and which ip on our machine they connected to.

@blazedd As commented previously and in other threads this is actually possible using publishMode. ie: services are not handled by mesh network.

IIUC, there are some undergoing progress towards improving how ingress handles this, but that's actually the only solution.

We anded deploying our nginx service using publishmode and mode:global, to avoid external LB configuration

blazedd commented Oct 12, 2017

@mostolog Thanks for your reply. Just a few notes:

  1. publishMode does not resolve the issue whatsoever. The inbound socket connection still resolves to the local network that swarm sets up. At least when you use the ports list mode: host
  2. nginx isn't really a good solution. Our application is TCP based, but isn't a web server. There aren't any headers we'll be able to use without coding it manually.
  3. If I use docker run --net=host ... it everything works fine.
  4. The only solution I've seen that works so far is to use: #25873 (comment)

@blazedd In our stack we have:

    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host

and so, I would bet we get real IP's on our logs.

trajano commented Oct 13, 2017

@mostolog It does not work on Windows at least. I am still getting the 172.0.0.x address as the source.

blazedd commented Oct 13, 2017

@mostolog mode: host doesn't expose your container to the host network. It removes the container from the ingress network, which is how Docker normally operates when running a container. Its would replicate the --publish 8080:8080 used in a docker run command. If nginx is getting real ips, it's not a result of the socket being connected to those ips directly. To test this you should seriously consider using a raw TCP implementation or HTTP server, without a framework, and check the reported address.

caoli5288 commented Oct 15, 2017

Why not use IPVS route network to container directly? bind all swarm node's overlay interface's ips as virtual ips, use ip rule from xxx table xxx to make multi-gateway, then swarm nodes can route client to container directly(DNAT), without any userspace network proxy daemon(dockerd)

0xcaff commented Nov 30, 2017

@blazedd Have you tried it? I'm getting external ip addresses when following @mostolog's example.

dack commented Dec 1, 2017

I'm running up against this issue again.

My setup is as follows:

  • ipvs load balancer in DR mode (external to the docker swarm)
  • 3 docker nodes, with destination IP added to all nodes and arp configured appropriately for IPVS DR routing

I would like to deploy a stack to the swarm and have it listen on port 80 on the virtual IP without mangling the addresses.

I can almost get there by doing this:
ports:
- target: 80
published: 80
protocol: tcp
mode: host

The problem here is that it doesn't allow you to specify which IP address to bind to - it just binds to all. This creates problems if you want to run more than a single service using that port. It needs to to bind only to the one IP. Using different ports isn't an option with DR load balancing. It seems that the devs made the assumption that the same IP will never exist on multiple nodes, which is not the case when using a DR load balancer.

In addition, if you use the short syntax, it will ignore the bind IP and still bind to all addresses. The only way I've found to bind to a single IP is to run a non-clustered container (not a service or stack).

So now I'm back to having to use standalone containers and having to manage them myself instead of relying on service/stack features to do that.

@mattronix mattronix referenced this issue in ONLYOFFICE/DocumentServer Dec 7, 2017

Open

Issue on first load of new document from nextcloud #220

wenzowski added a commit to wenzowski-docker/traefik that referenced this issue Dec 9, 2017

malor added a commit to xsnippet/xsnippet-infra that referenced this issue Jan 10, 2018

xsnippet-web: "publish" the exposed port to the host
"Publish" the exposed port to the host effectively configuring a DNAT
iptables rule. This is useful for us, becase we get real remote IPs
in nginx logs instead of Docker Swarm VIP one.

moby/moby#25526
https://docs.docker.com/engine/swarm/services/#publish-a-services-ports-directly-on-the-swarm-node

Closes #9

blop commented Jan 10, 2018

We have the same issue.
I'd vote for a transparent solution within docker ingress that'd allow all applications (some using raw UDP/TCP, not especially HTTP) to work as expected.

I could use the "mode=host port publishing" workaround as my service is deployed globally.
However, it seems that this is incompatible with the use of the macvlan network driver, which i need for some other reasons.
We get logs like "macvlan driver does not support port mappings".
I tried using multiple networks, but it does not help.

I created a specific ticket here : docker/libnetwork#2050
That leaves me no solution for now :'(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment