New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17.04 / 17.05: The disappearance of the link-local & loopback ipv6 addresses #33099

Open
euank opened this Issue May 9, 2017 · 28 comments

Comments

@euank
Contributor

euank commented May 9, 2017

Description

After upgrading to 17.04-ce or 17.05-ce, I no longer have a link-local ipv6 address by default (with ipv6 not enabled).

I think this might be an intentional change related to #20569, but I also think it's not entirely well thought out and is dangerously backwards incompatible.

Here's a simple example of something that no longer works:

func main() {
	conn, _ := net.Listen("tcp", ":8080")
	_, err := net.Dial("tcp", conn.Addr().String())
	if err != nil {
		fmt.Printf("did not expect error, but got: %v", err)
	}
}

The above program is available as euank/ipv6-repro:latest

The problem is that a program thinks ipv6 is available because it's enabled in my kernel, but docker has broken ipv6; it tries to listen on both (woo!), but unknowingly can't listen on ipv6 since docker tore it down when creating the netns. The program doesn't know that and will still happily think it's listening on both and .Addr() gives the ipv6 formatted address as a result.

Furthermore, if I just want a link local address (and don't want to allocate routable addresses), there's no way to configure the default bridge network to do so. Enabling ipv6 requires that I have a range of addresses to assign, but all I want is link-local addresses as we got by default before.

The workaround I've found is the following: docker run --sysctl net.ipv6.conf.all.disable_ipv6=0 euank/ipv6-repro:latest

Steps to reproduce the issue:

  1. docker run euank/ipv6-repro:latest

  2. docker run busybox ip addr

Describe the results you received:

  1. did not expect error, but got: dial tcp [::]:8080: connect: cannot assign requested address

  2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    128: eth0@if129: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue 
        link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.2/16 scope global eth0
           valid_lft forever preferred_lft forever
    

Describe the results you expected:

  1. No error, as was true in previous versions of docker (1.12.x, 1.13, 17.03)

  2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    128: eth0@if129: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue 
        link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
        inet 172.17.0.2/16 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:acff:fe11:2/64 scope link tentative 
           valid_lft forever preferred_lft forever
    

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:14:18 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:14:18 2017
 OS/Arch:      linux/amd64
 Experimental: false

Default fedora installation with ipv6 enabled on the host, config_ipv6 in the kernel, all that jazz.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented May 9, 2017

ping @aboch
Should we make an exception for the --ipv6 setting when the user sets the sysctl?

@euank

This comment has been minimized.

Contributor

euank commented May 9, 2017

@cpuguy83 The real solution IMO is to have 3 levels of granularity for ipv6:

  1. default - link local ipv6 addresses are provided, the user doesn't have to carve out routable ipv6 addresses for containers. This matches the behavior in 1.12.6
  2. off - the user explicitly said ipv6=off, there are neither link local nor routable ipv6 addresses. This is the current default
  3. on + routable - the user specifies ipv6=on and specifies addresses to assign to containers, link local + routable ipv6 addresses are assigned.

Right now we're being forced to pick between 2 and 3 because ipv6=on completely arbitrarily requires specifying a range of addresses. Restoring 1 while allowing 2 fixes what the original PR intended to do IMO.

@tianon

This comment has been minimized.

Member

tianon commented May 9, 2017

+1 -- losing even ::1 from the lo interface was really surprising behavior, and not at all easy to get back the old link-local-only behavior

(and I could've sworn it should've caused a resurgence of docker-library/php#194, but I tried and tried and couldn't get it to do so 😕)

@euank

This comment has been minimized.

Contributor

euank commented May 9, 2017

@tianon That doesn't resurge because apparently disabling ipv6 in a specifc netns doesn't prevent listening on ipv6; the go repro above succeeds to listen on ipv6 happily, but then is unable to connect on that address it was able to listen on. The php process happily listens on both when ipv6 is enabled on the host, but not in that netns.

If poorly written (php) code decided to do something like netstat -l to parse out what php was listening on and connect to that, then it would fail.

@SpComb

This comment has been minimized.

SpComb commented May 10, 2017

To clarify re the title and description of this issue: I don't think losing the fe80::/64 link-local on the container eth0 interface is necessarily an issue. It's losing the ::1/128 loopback address on the lo interface that causes problems?

@euank euank changed the title from 17.04 / 17.05: The disappearance of the link-local ipv6 address to 17.04 / 17.05: The disappearance of the link-local & loopback ipv6 addresses May 10, 2017

@euank

This comment has been minimized.

Contributor

euank commented May 10, 2017

Good call @SpComb, I clarified the issue title.

My example that broke was based on a real piece of code that broke. Indeed, that bit of code only needed lo's loopback address.

I agree that losing the loopback address is the bigger problem, but I still think that losing eth0's link-local address is a bug. I think the option to retain it, even without assigning routable addresses, is a feature worth having (and that used to work fine too.

@aboch

This comment has been minimized.

Contributor

aboch commented May 10, 2017

Thanks @SpComb and @euank for the extra clarification.

A premise: If we do not find a better solution, we can revert the change in time for 17.06.

Now, I would really like to preserve the no ipv6 addresses if -ipv6==false. I feel that how it was working in the past where --ipv6=false was not in fact respected must be fixed.

The ipv6 link-local address on container's lo iface can be fixed. ATM, I am thinking to preserve it (even when --ipv6=false) if ipv6 is enabled on the host.

For the linux created link-local address on container's interfaces, ATM I am thinking that can be preserved when user creates the network with --ipv6 but does not specify any ipv6 --subnet. As of now, this failed by libnetwork core because the default IPAM driver is not able to autonomously provide a IPv6 subnet for the network.
So this requries changes in libnetwork core to be more lenient if the IPAM driver does not give us an IPv6 subnet.

To recap:

  • docker network create abc; docker run --network abc ; => lo has ll v6 addr if V6 is enabled on host
  • docker network create --ipv6 abc; docker run --network abc ... => lo and ethX have ll v6 addr
  • docker network create --ipv6 --subnet 2001:db8::/64 abc; docker run --network abc ... => lo and ethX have ll v6 addr, ethX will have addr from v6 pool.

WDYT ?

@yosifkit

This comment has been minimized.

Contributor

yosifkit commented May 10, 2017

So for the default docker network, we'd be able to do --ipv6 on the daemon without giving it routable addresses? (just ensuring that this code path is also fixed and not just docker network create)

@euank

This comment has been minimized.

Contributor

euank commented May 10, 2017

@aboch I think that maps to my preferred solution per #33099 (comment)

The additional detail is that I think by default the default bridge network should have --ipv6=true (but no subnet) for sanity and backwards compatibility.

With the inclusion of that point, SGTM

@euank euank closed this May 12, 2017

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented May 12, 2017

Given that the issue is not resolved yet, I'll reopen

@euank

This comment has been minimized.

Contributor

euank commented May 12, 2017

Thanks @thaJeztah, I definitely didn't mean to close it.. guess I hit the wrong vimperator shortcut at some point 😕. Sorry!

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented May 15, 2017

Haha, no worries, just making sure we don't overlook the issue 👍

@jchv

This comment has been minimized.

jchv commented Jun 9, 2017

Just to be sure, is it a decent workaround to simply enable IPv6? Like:

/etc/docker/daemon.json:

{"ipv6": true, "fixed-cidr-v6": "2001:db8:1::/64"}

...and systemctl restart docker?

This appears to be working, but I'm wondering if maybe there's any other thought I need to put into this, like consequences of fully enabling IPv6 for example. I jumped on the 17.05 train when I noticed it had multi-stage builds and installed it on a CI server, and had to revert because of this issue. Hopefully this post will show up in Google for Go users getting the dreaded connect: cannot assign requested address message.

@SomberNight

This comment has been minimized.

SomberNight commented Jul 4, 2017

I encountered this on debian old-stable with my /etc/apt/sources.list pointing to a supposedly stable docker repository. Going from 17.03 to 17.06 broke a production container.

deb [arch=amd64] https://download.docker.com/linux/debian jessie stable
$ sudo docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:17:22 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:16:12 2017
 OS/Arch:      linux/amd64
 Experimental: false

Any thoughts regarding @johnwchadwick 's fix? Or another fix?

@bemeyert

This comment has been minimized.

bemeyert commented Aug 28, 2017

Any news on this one? It breaks our builds. I can confirm that it still happens with 17.06.0. Haven't tested 17.06.1 but from looking int he release notes there is no work done on this issue.

@jchv

This comment has been minimized.

jchv commented Aug 28, 2017

If you need a workaround, you can disable IPv6 all the way and it should work, I believe.

If you're dealing with tests written in Go, one somewhat poor but certainly workable solution is to explicitly use IPv4 addresses in your test, i.e. 127.0.0.1:0 instead of :0.

@bemeyert

This comment has been minimized.

bemeyert commented Aug 29, 2017

@jchv Thanks, for the tip. But we must have IPv6 since we're testing a products IPv6 capabilities. So both workarounds wouldn't work for us.

@jchv

This comment has been minimized.

jchv commented Aug 29, 2017

@bemeyert That's odd, I thought this bug only occurred when running Docker without --ipv6. Have you tried simply enabling IPv6?

@bemeyert

This comment has been minimized.

bemeyert commented Aug 29, 2017

@jchv No, that doesn't work at all:
Docker is started with ExecStart=/usr/bin/dockerd --bip=172.16.1.1/24 --disable-legacy-registry=false --ipv6 and it fails with:

Aug 29 16:38:06 my.host dockerd[17912]: Error starting daemon: Error initializing network controller: Error creating default "bridge" network: could not find an available, non-overlapping IPv6 address pool among the defaults to assign to the network

[root@my] docker.service.d # docker --version
Docker version 17.06.0-ce, build 02c1d87
@jchv

This comment has been minimized.

jchv commented Aug 29, 2017

@bemeyert You may need to specify the CIDR for allocating container addresses using --fixed-cidr-v6; see the documentation for IPv6.

Of course this doesn't make the bug less valid, but it will probably fix your issues in the meantime.

@bemeyert

This comment has been minimized.

bemeyert commented Aug 29, 2017

@jchv We did that too. Docker was started with --ipv6 --fixed-cidr-v6=fe80::/64 and the effect such that we were able to start Docker and one, but only one, container. With the next container came the error Address already in use.

@euank

This comment has been minimized.

Contributor

euank commented Aug 29, 2017

@bemeyert the workaround of docker run --sysctl net.ipv6.conf.all.disable_ipv6=0 ... I suggest in my original comment should match the pre-17.04 behaviour.

@bemeyert

This comment has been minimized.

bemeyert commented Aug 30, 2017

@euank Sry for not RTFMing... An initial run worked:

[root@host] ~ # docker run --sysctl net.ipv6.conf.all.disable_ipv6=0 -it --rm centos bash                                           
[root@70114e8c8f57 /]# yum -y -q install iproute
[...]
[root@70114e8c8f57 /]# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
15260: eth0@if15261: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 02:42:ac:10:01:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.16.1.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe10:102/64 scope link 
       valid_lft forever preferred_lft forever
[root@70114e8c8f57 /]# exit

I'll be running our tests next.

Is there any chance that this is "fixed" somehow so that we don't have to use this workaround? This is a bit clumsy and my customers aren't to happy about it ;)

@excieve

This comment has been minimized.

excieve commented Sep 6, 2017

This issue is affecting some CI services out there, which rely on Docker to run builds. We've spent a few days trying to understand why our tests in a Go application suddely fail in such an odd way and then discovered this. Had to force listening on IPv4 to reliably work it around but this is ugly.

@grigorig

This comment has been minimized.

grigorig commented Oct 30, 2017

I just hit this problem with the NSD nameserver. The remote control service by default listens on ::1 and without this address being available, the daemon fails to start. Please change this behavior. Many applications depend on the availability of loopback and link-local IPv6 addresses.

@SpComb

This comment has been minimized.

SpComb commented Oct 30, 2017

I just hit this problem with the NSD nameserver. The remote control service by default listens on ::1 and without this address being available, the daemon fails to start.

NSD is what I also originally hit this on, workaround is IIRC to configure nsd with ip4-only: yes to bind to 127.0.0.1 by default instead.

Please change this behavior. Many applications depend on the availability of loopback and link-local IPv6 addresses.

Careful again: I doubt nsd relies on the availability of the IPv6 link-local fe80::/64 address on the container's eth0 interface. It does rely on the availability of the IPv6 loopback ::1/128 address on the container's lo interface. Those do not necessarily go together as one.

@squeed

This comment has been minimized.

squeed commented Feb 12, 2018

This is still a problem. Doing docker run --net=none still explicitly disables IPv6 in the container.

This was fixed for --net=host in #32447.

@pmichali

This comment has been minimized.

pmichali commented Apr 17, 2018

I see that, for kubernetes, running a cluster in IPv6 mode, docker versions beyond 17.03 have IPv6 disabled in pods created (e.g. kubeadm starts up kube-dns and tries to assign an IPv6 address to eth0, and gets a permission denied error, the same with any user created containers).

Is there a workaround for newer docker versions? Is there a fix available?

Seen with 18.03, 18.04.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment