Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libnetwork] Calico does not work properly on systems with kernel version 4.x+ unless ipv6 network is disabled #192

Open
ansiz opened this issue Sep 17, 2018 · 14 comments

Comments

@ansiz
Copy link

ansiz commented Sep 17, 2018

When I run:

docker run --privileged -tid --rm --network net2 --name k530-net2 harbor.hpc.com/images/busybox

docker reported a problem:

15ba23b49172c9dc4f0643f3f11984ce02c878a60bafccb268becec600330a8f
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: 
starting container process caused "process_linux.go:402: container init caused 
\"process_linux.go:385: running prestart hook 0 caused \\\"error running hook: exit status 1,
 stdout: , stderr: time=\\\\\\\"2018-09-16T22:25:13-04:00\\\\\\\" level=fatal msg=
\\\\\\\"failed to add interface temp31556e7d316 to sandbox: error setting interface 
\\\\\\\\\\\\\\\"temp31556e7d316\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" 
\\\\\\\\\\\\\\\"fe80::b448:31ff:fee4:de7d/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.

I can run this command on standard CentOS 7.x with kernel 3.x and it also not work on ubuntu 18.04 which has kernel 4.x, I found some log in dmesg:

[ 2111.674564] IPv6: ADDRCONF(NETDEV_UP): temp66aa9bddf71: link is not ready
[ 2111.674700] IPv6: ADDRCONF(NETDEV_UP): cali66aa9bddf71: link is not ready
[ 2111.674710] IPv6: ADDRCONF(NETDEV_CHANGE): cali66aa9bddf71: link becomes ready
[ 2111.674760] IPv6: ADDRCONF(NETDEV_CHANGE): temp66aa9bddf71: link becomes ready
[ 2111.926941] cali0: renamed from temp66aa9bddf71
[ 2113.110629] IPv6: ADDRCONF(NETDEV_UP): tempf1169b462ad: link is not ready
[ 2113.111066] IPv6: ADDRCONF(NETDEV_CHANGE): tempf1169b462ad: link becomes ready
[ 2113.325654] cali0: renamed from tempf1169b462ad
[ 2114.395699] IPv6: ADDRCONF(NETDEV_UP): tempc99fe2a39dc: link is not ready
[ 2114.400374] IPv6: ADDRCONF(NETDEV_CHANGE): tempc99fe2a39dc: link becomes ready
[ 2114.571455] cali0: renamed from tempc99fe2a39dc
[ 2115.557923] IPv6: ADDRCONF(NETDEV_UP): tempa2528b66f07: link is not ready
[ 2115.563399] IPv6: ADDRCONF(NETDEV_CHANGE): tempa2528b66f07: link becomes ready
[ 2115.744184] cali0: renamed from tempa2528b66f07

So I try to disable ipv6 with command:

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6

Then it works fine

Expected Behavior

I hope Calico 2.6 can work properly on systems with kernel version 4.x without ipv6 disabled.

Possible Solution

Disable ipv6

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6

Steps to Reproduce (for bugs)

  1. Install Calico 2.6 on the systems with kernel 4.x+
  2. Try to create a container with calico network

Context

Your Environment

  • Calicoctl version v1.6.4, build ae98f46f
  • Docker without orchestration
  • Operating System and version: CentOS Linux release 7.5.1804 (Core) Kernel: Linux 4.18.7
@ti-mo
Copy link
Contributor

ti-mo commented Sep 28, 2018

Hi, thanks for the report! We're experiencing the exact same issue, but the behaviour seems very flakey. Given many retries/re-schedules, chances are most containers will be successfully started eventually. This started appearing for us when we went from 4.15.15 to 4.16.x (and now 4.18.10).

We are using libnetwork, which is likely the case for OP as well. Calico tries to set an IPv6 address on a container interface that should not be v6-enabled. Logging into the container does not show a (stateless) link-local fe80 or anything auto-assigned by the kernel. The Docker network's EnableIPv6 is set to False, and none of the containers we run have anything set in their IPv6Address fields.

# docker network inspect <net>
...
                "IPv4Address": "10.123.121.83/32",
                "IPv6Address": ""

Could this be a fallback mechanism in case IPv6Address is empty? Newer kernel versions likely reject unwanted addresses instead of silently dropping the Netlink messages, or are rejected using a different errno.

@caseydavenport @fasaxc Any ideas?

@ansiz
Copy link
Author

ansiz commented Sep 29, 2018

@ti-mo

We're experiencing the exact same issue, but the behaviour seems very flakey. Given many retries/re-schedules, chances are most containers will be successfully started eventually.

Yes, container will be successfully started after many retries, but the network cannot communicate even if the container is already started.

The same behavior with the command: docker network connect, the network cannot communicate even if the IP has allocated to container

@caseydavenport
Copy link
Member

This sounds to me like the libnetwork-plugin is trying to assign an IPv6 address when it shouldn't.

It seems to decide how to do that here:

linkLocalAddr := netns.GetLinkLocalAddr(hostInterfaceName)
if linkLocalAddr == nil {
log.Warnf("No IPv6 link local address for %s", hostInterfaceName)
} else {
resp.GatewayIPv6 = fmt.Sprintf("%s", linkLocalAddr)
nextHopIPv6 := fmt.Sprintf("%s/128", linkLocalAddr)
resp.StaticRoutes = append(resp.StaticRoutes, &network.StaticRoute{
Destination: nextHopIPv6,
RouteType: 1, // 1 = CONNECTED
NextHop: "",
})
}

Based off of whether or not an IPv6 LL address is available on the host. Maybe we want to make that configurable, or smarter in some way?

@caseydavenport caseydavenport changed the title Calico does not work properly on systems with kernel version 4.x+ unless ipv6 network is disabled [libnetwork] Calico does not work properly on systems with kernel version 4.x+ unless ipv6 network is disabled Oct 18, 2018
@ti-mo
Copy link
Contributor

ti-mo commented Oct 22, 2018

@caseydavenport That's indeed what I initially thought. This can only really work properly when libnetwork-plugin can query whether or not IPv6 is enabled on the target network. The Docker network in question has "EnableIPv6": false,, set when running inspect on it, because we don't explicitly enable this when creating our networks (as intended).

There's also the case of IPv6 being enabled on the Docker network, but sysctl disabled on the system, though this shouldn't cause problems because it will still cause linkLocalAddr to be nil.

Any ideas how we can query EnableIPv6 in the target network?

@caseydavenport
Copy link
Member

Any ideas how we can query EnableIPv6 in the target network?

Looks like we have some logic already to inspect the network, might be as simple as using something like this?

networkData, err := dockerCli.NetworkInspect(ctx, networkID, dockertypes.NetworkInspectOptions{})
if err != nil {
err = errors.Wrapf(err, "Error inspecting network %s - retrying (T=%s)", networkID, time.Since(start))
log.Warningln(err)
// was unable to inspect network, let's retry
time.Sleep(retrySleep)
goto RETRY_NETWORK_INSPECT
}

@merickso
Copy link

merickso commented May 8, 2019

I believe I am having the exact same problem on centos 7 with Kernel 3.10.0-957.12.1.el7.x86_64. I upgraded from 3.10.0-862.14.4.el7.x86_64 and immediately started to get the same problems. Running the following (as described above) fixed it immediately
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6

I didn't think this bug applied to me based on the title since I was still using kernel 3.x and my docker network has "EnableIPv6": false.

@tmjd tmjd removed their assignment Sep 23, 2019
@jasonjoo2010
Copy link

So is this solved?

We met this issue recently on some nodes after rebooting and it cost us a whole day to locate the issue. These issued nodes return normal after setting the kernel attributes disable_ipv6.
Most nodes doesn't need it.

@rico-qian
Copy link

I got same problem.But I didn't fix it after disable IPv6.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2020-03-26T14:30:51+08:00\\\\\\\" level=fatal msg=\\\\\\\"failed to add interface temp1181c31de18 to sandbox: error setting interface \\\\\\\\\\\\\\\"temp1181c31de18\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" \\\\\\\\\\\\\\\"fe80::b4fc:d8ff:fe11:f2bd/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.

@jasonjoo2010
Copy link

I got same problem.But I didn't fix it after disable IPv6.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2020-03-26T14:30:51+08:00\\\\\\\" level=fatal msg=\\\\\\\"failed to add interface temp1181c31de18 to sandbox: error setting interface \\\\\\\\\\\\\\\"temp1181c31de18\\\\\\\\\\\\\\\" routes to [\\\\\\\\\\\\\\\"169.254.1.1/32\\\\\\\\\\\\\\\" \\\\\\\\\\\\\\\"fe80::b4fc:d8ff:fe11:f2bd/128\\\\\\\\\\\\\\\"]: permission denied\\\\\\\"\\\\n\\\"\"": unknown.

How do you disable it? Maybe you need disable and restart docker daemon.

@rico-qian
Copy link

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
I disabled IPv6 as above.Then I reboot my server.

@jasonjoo2010
Copy link

echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6 echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
I disabled IPv6 as above.Then I reboot my server.

Oh did you reboot your server?
So did you also check the configuration status after rebooting using sysctl net.ipv6.conf.all.disable_ipv6 ?

In my thoughts settings will rollback if you just run echo approach.
If you want them persistent you can edit server's /etc/rc.local or /etc/sysctl.conf. Take sysctl.conf for example:

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

And use sysctl -p to make configuration take effect at once and they will automatically update in next rebooting.

@darrena092
Copy link

It may be worthwhile mentioning this in the getting started docs (I don't think I saw it there) - this was a difficult one to track down.

@oshoval
Copy link

oshoval commented Apr 23, 2020

Hi, any workaround for this ?
some calico version that work, or maybe using centos 8 ?
thanks

@caseydavenport caseydavenport transferred this issue from projectcalico/calico Apr 20, 2021
@cucker0
Copy link

cucker0 commented Oct 25, 2021

sysctl config disable ipv6

Step 1: add this rule in /etc/sysctl.conf :
net.ipv6.conf.all.disable_ipv6=1

Step 2: add this rule in /etc/sysconfig/network :
NETWORKING_IPV6=no

Step 4: disable the ip6tables service :
systemctl disable ip6tables
// or
chkconfig ip6tables off

Step 5: Reload the sysctl configuration:
sysctl -p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants