Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error initializing network controller: list bridge addresses failed: no available network #2638

Closed
jhaprins opened this issue May 7, 2021 · 1 comment

Comments

@jhaprins
Copy link

jhaprins commented May 7, 2021

When one of my colleagues asked me if the network had changed because his Docker configuration was suddenly giving a lot of problems, at first I did not know what he was talking about but after some questions, it slowly became clear to me that he had problems starting his docker environment when his VPN connection to the office was online. I looked at the error message that he received and I saw the following: "Error initializing network controller: list bridge addresses failed: no available network". This was very strange because the network he had configured in his daemon.yaml looked like this:
{
"default-address-pools":
[
{"base":"10.10.0.0/16","size":24}
]
}

In our corporate network we have a lot of RFC1918 networks, a few in the 10.x.x.x/8 range, a lot in the 172.16.0.0/12 and 192.168.0.0/16 ranges. But nothing that collides with above ranges, and even if something would collide, it was all local on his workstation where he was developing and testing some monitoring systems, and he is completely free to use whatever network he wants to use locally, as long as he doesn't interfere with the corporate network. On the VPN router I have a default set of routes set for RFC1918 networks pointing towards the corporate routers, so everyone can reach the internal corporate networks without having to worry about anything. The firewalls will take care of the rest.

I started debugging the error message and did some Google searches and I found a lot of people complaining about exactly this same problem. Some example tickets:
docker/for-linux#123
moby/moby#35121
moby/moby#33925
Most of these tickets are against other projects, and none give a solution.

At first the error didn't make any sense to me because:

  • a network is available
  • the configured network is not directly connected so docker is not able to say that it should not use it.
  • even if an overlapping network is used somewhere else, a more specific route would be configured locally and this should prevent any routing issues.

But then I thought about something. What if the docker code, searching for free networks, takes the local routing table and checks the configured network against EVERY route in the routing table. If something matches or overlaps the route in the routing table it gives this error. At first I thought that this couldn't be true because this would always fail because a default route of 0.0.0.0/0 would always match. But what if this default route is filtered out in the code for this specific reason. Then this hypothesis could be the truth.

I started testing locally on my own system, first I reproduced the error:

Setup my docker daemon with the same configuration
Had my normal local routing table without VPN.
Started docker and this worked fine.
The resulting routing table:
default via 192.168.178.1 dev enp62s0u1u1 proto static metric 1024
10.10.0.0/24 dev docker0 proto kernel scope link src 10.10.0.1 linkdown
192.168.178.0/24 dev enp62s0u1u1 proto kernel scope link src 192.168.178.74 metric 100

Then I started my VPN. The result was 3 extra routes:
10.0.0.0/8 via 192.168.2.1 dev tap0 proto static metric 50
172.16.0.0/12 via 192.168.2.1 dev tap0 proto static metric 50
192.168.0.0/16 via 192.168.2.1 dev tap0 proto static metric 50

I then stopped my docker daemon and tried to start it again, and indeed I received the same error. So I could reproduce the problem, now for my hypothesis: "Does the code check EVERY route in the routing table, filtering out the default route."

To test this I did the following:
I removed the default route and replaced it by 2 more specific routes that are together the whole internet:
0.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1
128.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1

My routing table then looks like this:
0.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1
128.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1
192.168.178.0/24 dev enp62s0u1u1 proto kernel scope link src 192.168.178.74 metric 100

The only difference between this state and a clean state of my system, is not having a default route, but having two routes that are together the default route of my system. Now I tried to start the docker daemon again. If the daemon starts fine my hypothesis is wrong and I have to continue my search. If the daemon fails then my hypothesis must me correct because the default route is the only difference in my local configuration.

And indeed, I received the same error again. Now I'm sure there is absolutely no reason to give this error because:

  • I don't have the 10.10.0.0/16 network anywhere in my home network
  • I have a routing table that only routes for 192.168.178.0/24 and the internet

This also proves my hypothesis that every route in the routing table is being checked against the configured network, filtering out the default route. If any route matches the configured network, the configuration is rejected.

This is a bug in the docker code. The code should be changed to only match routes with "scope link" because these routes are directly connected and would be a problem when you start a docker daemon with an overlapping network configuration. Any route that is not "scope link" should be ignored because those routes could be:

  • Injected by DHCP
  • Injected by a routing protocol
  • Injected by a VPN config.
  • Less specific behind a router somewhere remote

There is one corner case where you could give a warning or maybe an error. This is when there is an equal or more specific route that is not "scope link". Because this could result in routing issues to other systems. But even then, I would make it configurable because it could very well be that this is intentional and the user should be qualified to evaluate if this route overlap is a problem for him.

I'm not a developer but a network and systems engineer, so I am not able at the moment to provide a patch for this problem, but one of my colleagues thought that he had already found the problematic code in https://github.com/moby/libnetwork/blob/master/netutils/utils_linux.go in the CheckRouteOverlaps function and he might have a fix for this issue in the near future.

The problem might very well be the same in FreeBSD and / or Windows, because I also saws tickets where people had the same problem on at least Apple notebooks.

The version I have tested this with is: Docker version 19.03.13, build 4484c46d9d

Cheers,
Jan Hugo Prins

@akerouanton
Copy link
Member

This route overlap check was changed by moby/moby#42598 (released in v23.0) to only consider on-link routes. There's some agreement amongst maintainers that this heuristic isn't perfect and we might revisit it, or the ability for users to influence it at a later time.

I'm going to close this ticket as 'fixed'. Thanks for reporting here and in moby/moby.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants