GCI: Race condition when deleting docker0 #29756

Open
maisem opened this Issue Jul 28, 2016 · 11 comments

Projects

None yet

3 participants

@maisem
Contributor
maisem commented Jul 28, 2016

During GCI bootup config, the docker0 bridge is deleted before kubelet starts which works fine when not using a NETWORK_PROVIDER.

If a NETWORK_PROVIDER is used (e.g. kubenet) kubelet won't restart docker, this introduces a race condition between when the config restarts docker and the docker0 bridge is deleted.

cc @bprashanth @thockin @Amey-D @fabioy @roberthbailey

@bprashanth
Contributor
bprashanth commented Jul 29, 2016 edited

we don't need to restart docker with kubenet. The new containers are created on cbr0 by the network plugin, and if someone ssh's into the node and runs "docker run -it busybox /bin/sh" it still gets an ip from docker0. Now docker0 and cbr0 should not have overlapping cidrs, because docker0 will be created from the default 172.17.0.1, and cbr0 will get created from the podcidr range.

We should probably make sure the nodecidr doesn't overlap the 172 range or bad things can happen.

@bprashanth
Contributor

Also isn't the only thing running GCI the master, on which we don't use kubenet?

@fabioy
Member
fabioy commented Jul 29, 2016

GCI is an option for GKE customers for the node as well. It'd be bad if it was broken for them.

@bprashanth
Contributor

@maisem can you clarify the race? we delete docker0 on the master to avoid overlapping cidrs. There might actually be better ways to solve the race (like actually applying the master kubelet's --pod-cidr arg to cbr0 , and making sure it doesn't overlap with docker0), but i don't think we need to target everything for 1.3.4. The docker0 deletion shouldn't be an issue on nodes, in fact we shouldn't even need to delete docker0 because we start docker without --bridge option, so it's going to create docker0 anyway.

@maisem
Contributor
maisem commented Jul 29, 2016

We need to restart docker at least once for it to pick up the new command line arguments.
We didn't need to explicitly restart it pre-kubenet as kubelet would do that.
Pre-kubenet these are the following happens.

  1. Create docker flags
  2. Delete docker0 bridge
  3. Start kubelet
  4. kubelet restarts docker

When we start using kubenet the following happens.

  1. Create docker flags
  2. Restart docker
  3. Delete docker0 bridge
  4. Start kubelet

There is a race between 2 and 3. If docker starts before the bridge is deleted with kubenet enabled it crashes because it is unable to use 172.17.0.1. Which causes the startup scripts to fail.

Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (172.17.0.1): Address already in use

#29757 changes the order of the steps to

  1. Create docker flags
  2. Delete docker0 bridge
  3. Restart docker
  4. Start kubelet

I hope that clarifies things.

@bprashanth
Contributor

Thanks, will check tomorrow

@bprashanth
Contributor

Reading your previous comment, we don't need to delete docker0 with kubenet. In fact, with the "correct" ordering (2 before 3 in your list) doesn't docker just recreate docker0 for itself if we simply service restart docker? how does deleting the bridge even help?

There are a few situations to handle though:

  • On the master:
    • either don't use kubenet, continue what we're doing today. It doesn't really matter because all but fluentd are running with host networking on master
    • dont pass a cidr that overlaps with default docker0 (--pod-cidr=172.17.42.1/16 on GKE vs --pod-cidr=10.123.45.0/30 on GCE. On GCE the docker0 deletion is not required)
    • make the docker0 deletion logic smarter. Only delete docker0 if docker is started with --bridge, because then a restart of docker will not recreate docker0.
  • On the node: no docker0 deletion required, just like we do currently

The easiest option is to get rid of docker0 deletion hack by just changing the --pod-cidr given to the master on GKE. Is ther a reason it is what it is today? Then everywhere, we use kubenet, never delete docker0 and respect --pod-cidr if specified, otherwise use the podCIDR in the node object.

@maisem
Contributor
maisem commented Jul 29, 2016

This isn't an issue with the master.
This is an issue with the node.

docker0 deletion for GCI was introduced in #27016 to fix #26379. #29757 merely reorders it.

@bprashanth
Contributor

docker0 deletion isn't required for node.

@bprashanth
Contributor

#26379 is only an issue on the GKE master because we pass --pod-cidr=172.17.42.1/16 on GKE, which overlaps with the default range of docker0.

@bprashanth
Contributor

I take that back, #26379 was an issue because we were not using kubenet. Once we started using kubenet, GCE masters should work with or without docker0 deletion. GKE masters won't work because of the --pod-cidr argument overlapping with docker0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment