New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon ungraceful shutdown during starting make daemon failed to start next time with Error initializing network controller #22834
Labels
area/networking
kind/bug
Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Comments
coolljt0725
added
the
kind/bug
Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
label
May 19, 2016
@coolljt0725 could you open an issue in the libnetwork repository as well? Looks like this is a libnetwork issue |
lingmann
pushed a commit
to lingmann/dcos
that referenced
this issue
Aug 29, 2016
On a percentage of DC/OS agents (~5%) with DOcker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
lingmann
pushed a commit
to lingmann/dcos
that referenced
this issue
Aug 29, 2016
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
lingmann
pushed a commit
to lingmann/dcos
that referenced
this issue
Aug 29, 2016
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
lingmann
pushed a commit
to lingmann/dcos
that referenced
this issue
Aug 30, 2016
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
lingmann
pushed a commit
to lingmann/dcos
that referenced
this issue
Aug 31, 2016
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
lingmann
pushed a commit
to lingmann/dcos
that referenced
this issue
Sep 1, 2016
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
lingmann
pushed a commit
to lingmann/dcos
that referenced
this issue
Sep 1, 2016
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
mellenburg
pushed a commit
to mellenburg/dcos
that referenced
this issue
Sep 2, 2016
On a percentage of DC/OS agents (~5%) with Docker 1.11.2, the Daemon will fail to start up with the following error: > Error starting daemon: Error initializing network controller: Error > creating default "bridge" network: failed to allocate gateway > (172.17.0.1): Address already in use This seems to be related to a Docker bug around the network controller initialization, where the controller has allocated an ip pool and persisted some state but not all of it. See: * moby/moby#22834 * moby/moby#23078 This fix simply removes the docker0 interface if it exists before starting the Docker daemon. This fix will need to be re-evaluated if we want to enable the 1.12+ containerd live-restore like Docker options as discussed in: * https://docs.docker.com/engine/admin/live-restore/ * moby/moby#2658
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/networking
kind/bug
Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
During daemon starting, if the daemon was shutdown ungracefully, the daemon may will failed to start next time with error
FATA[0001] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.17.0.1): Address already in use
we hit this problem on one of our server, on this server, the daemon failed to start with timeout(because there is a process take out all of the cup resource, 400% on a 4-core server)
and after several times of timeout failure, the failure became
Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.17.0.1): Address already in use
. I investigated and found that this may due to the the ungraceful shutdown happened between https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/libnetwork/controller.go#L507 to https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/libnetwork/controller.go#L535, the controller has allocated ip pool and stored the bitmap to store, but the network failed to finish the initialization and didn't store the network to store. so this ip pool doesn't belong to any network, so it can't be cleaned up.It's hard to reproduce, to produce you can add
after https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/libnetwork/controller.go#L506 and the kill the daemon when we get message
after ipamAllocate, please kill the daemon
and then start again, the daemon will failed with
FATA[0001] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: failed to allocate gateway (172.17.0.1): Address already in use
To fix this, I think we should have a way to clean up unused allocated ip pool or synchronized the store of bit map and network.
The text was updated successfully, but these errors were encountered: