New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subnets with conflicting ranges in Docker #26912

Closed
stepanv opened this Issue Sep 26, 2016 · 10 comments

Comments

Projects
None yet
6 participants
@stepanv

stepanv commented Sep 26, 2016

Description

Docker allows a creation of subnets with conflicting ranges. Additionally it assigns the same IP multiple times (both to a same docker container or a multiple ones).

Steps to reproduce the issue:
When using docker overlay network (with zookeeper), docker allows to create subnets with conflicting ranges (possibly not necessarily related to an overlay feature). Additionally, docker assigns the same IP to

  • multiple containers
  • to a single container within 2 subnets

For instance, Docker with the implicit overlay bridge docker_gwbridge with subnet 172.19.0.0/16, allows a creation of a conflicting overlay network:

docker network create --driver overlay --subnet=172.19.0.0/16 conflicting_subnet

and then running a docker container with --net conflicting_subnet

docker run -ti --net conflicting_subnet alpine /bin/sh

which results with a container connected to both subnets (Additionally, it assigns the same IP 172.19.0.2 multiple times as listed bellow. Here, it assigns the same IP to the same container.)


/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
4909: eth0@if4910: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
    link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.2/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe13:2/64 scope link
       valid_lft forever preferred_lft forever
4911: eth1@if4912: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.2/16 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe13:2/64 scope link
       valid_lft forever preferred_lft forever

Describe the results you expected:
I would expect docker to protect the subnets it manages either by

  1. disallowing a creation of subnets with conflicting ranges
  2. disallowing connecting a container to subnets with conflicting ranges
  3. not assigning a single IP address to multiple containers or multiple times to a single container

Output of docker version:

# docker version
Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

# docker info
Containers: 3
 Running: 1
 Paused: 0
 Stopped: 2
Images: 3
Server Version: 1.12.1
Storage Driver: devicemapper
 Pool Name: docker-251:0-6815748-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 21.47 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 809.9 MB
 Data Space Total: 107.4 GB
 Data Space Available: 106.6 GB
 Metadata Space Used: 1.581 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /scratch/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /scratch/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.1.12-37.6.2.el7uek.x86_64
Operating System: Oracle Linux Server 7.1
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 29.47 GiB
Name: foobar.my.com
ID: 2GL5:LWVG:MB6F:OBS7:KDVF:ZWUG:6KED:U5SJ:SPBW:66RI:RZGS:AZX6
Docker Root Dir: /scratch/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://proxy.my.com:80
Https Proxy: http://proxy.my.com:80
No Proxy: localhost,127.0.0.1,.my.com,/var/run/docker.sock
Registry: https://index.docker.io/v1/
Cluster Store: zk://zoo.my.com:2181/docker
Cluster Advertise: 10.170.105.79:2376
Insecure Registries:
 mic2.my.com
 127.0.0.0/8
@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Sep 26, 2016

I think 1. disallowing a creation of subnets with conflicting ranges depends a bit on the kernel that you're running on. On older kernel versions (below 3.16 IIRC), creating an overlay network that overlaps with another network (or a network on the host), may result in issues. On newer kernels, overlay networks that overlap are possible. Some people use this as a feature, so I'm not sure we should deny this option by default (we do recommend to provide a subnet when creating those networks;

Note: It is highly recommended to use the --subnet option when creating a network. If the --subnet is not specified, the docker daemon automatically chooses and assigns a subnet for the network and it could overlap with another subnet in your infrastructure that is not managed by docker. Such overlaps can cause connectivity issues or failures when containers are connected to that network.

2. disallowing connecting a container to subnets with conflicting ranges is indeed more problematic, as that would always conflict.

ping @mrjana has this been discussed?

@aboch

This comment has been minimized.

Contributor

aboch commented Sep 27, 2016

Besides the techincal difficulties in enforcing this (at overlay network creation we may not know what is the subnet used by the default gateway network), I do not believe it is sustainable or recommended for a system to make sure that no invalid configuration is entered.

IIRC, Linux and other OSes as well allow you to create multiple interfaces with addresses in the same subnet. Trying to say I do not see a strong reason for disallowing it in the container.

In other words, for this specific misconfiguration scenario, I'd expect the docker user to have enough knowledge to spot the issue and correct it.

@mrjana

This comment has been minimized.

Contributor

mrjana commented Sep 27, 2016

IIRC, I think this was erroring out some time back i.e if you attach a container to multiple networks which belong to the same subnet then the attachment will fail. This may be a regression. In any case I think we should error out and disallow such an attachment.

@aboch

This comment has been minimized.

Contributor

aboch commented Sep 28, 2016

@mrjana

I am not aware of a any regression in regarding to the specific test the submitter is performing. He is intentionally creating an overlay network with subnet overlapping with the existing default gateway bridge network's subnet.

Given the two pools belong to two distinct address spaces and respective bridges are located on two distinct network namespaces, I am not sure how IPAM driver or libnetwork would/should detect the overlap.

@mrjana

This comment has been minimized.

Contributor

mrjana commented Sep 28, 2016

@aboch IPAM cannot detect it and it never did. But the code which adds the gateway IP in osl will fail if it already found another gateway IP in the subnet with EEXIST error. If it is not happening now it is a regression (it does not matter whether these two attachments were driven by different drivers).

@mrjana

This comment has been minimized.

Contributor

mrjana commented Sep 28, 2016

and respective bridges are located on two distinct network namespaces

It is not about doing conflict resolution at the bridge level but at the container namespace level which is the same for that container. This is the primary reason we broker network resource addition into the container namespace so that we can resolve the conflicts.

@aboch

This comment has been minimized.

Contributor

aboch commented Sep 28, 2016

@mrjana

[...] code which adds the gateway IP in osl will fail if it already found another gateway IP

True, but sandbox.UpdateGateway() has always made sure to first unset the current gateway, then install the new one: https://github.com/docker/docker/blob/v1.9.0/vendor/src/github.com/docker/libnetwork/sandbox.go#L338

@mrjana

This comment has been minimized.

Contributor

mrjana commented Sep 28, 2016

That is probably the regression

@aboch

This comment has been minimized.

Contributor

aboch commented Sep 28, 2016

@mrjana

This comment has been minimized.

Contributor

mrjana commented Sep 28, 2016

May be it was in experimental code when overlay provided it's own default gateway but I do remember that behavior. Anyhow, the point is not about whether this is a regression or not. But we should detect these conflicts during container attachment and fail the attachment and it is possible to do it without too much difficulty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment