New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCI: Race condition when deleting docker0 #29756

Closed
maisem opened this Issue Jul 28, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@maisem
Contributor

maisem commented Jul 28, 2016

During GCI bootup config, the docker0 bridge is deleted before kubelet starts which works fine when not using a NETWORK_PROVIDER.

If a NETWORK_PROVIDER is used (e.g. kubenet) kubelet won't restart docker, this introduces a race condition between when the config restarts docker and the docker0 bridge is deleted.

cc @bprashanth @thockin @Amey-D @fabioy @roberthbailey

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

we don't need to restart docker with kubenet. The new containers are created on cbr0 by the network plugin, and if someone ssh's into the node and runs "docker run -it busybox /bin/sh" it still gets an ip from docker0. Now docker0 and cbr0 should not have overlapping cidrs, because docker0 will be created from the default 172.17.0.1, and cbr0 will get created from the podcidr range.

We should probably make sure the nodecidr doesn't overlap the 172 range or bad things can happen.

Member

bprashanth commented Jul 29, 2016

we don't need to restart docker with kubenet. The new containers are created on cbr0 by the network plugin, and if someone ssh's into the node and runs "docker run -it busybox /bin/sh" it still gets an ip from docker0. Now docker0 and cbr0 should not have overlapping cidrs, because docker0 will be created from the default 172.17.0.1, and cbr0 will get created from the podcidr range.

We should probably make sure the nodecidr doesn't overlap the 172 range or bad things can happen.

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

Also isn't the only thing running GCI the master, on which we don't use kubenet?

Member

bprashanth commented Jul 29, 2016

Also isn't the only thing running GCI the master, on which we don't use kubenet?

@fabioy

This comment has been minimized.

Show comment
Hide comment
@fabioy

fabioy Jul 29, 2016

Member

GCI is an option for GKE customers for the node as well. It'd be bad if it was broken for them.

Member

fabioy commented Jul 29, 2016

GCI is an option for GKE customers for the node as well. It'd be bad if it was broken for them.

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

@maisem can you clarify the race? we delete docker0 on the master to avoid overlapping cidrs. There might actually be better ways to solve the race (like actually applying the master kubelet's --pod-cidr arg to cbr0 , and making sure it doesn't overlap with docker0), but i don't think we need to target everything for 1.3.4. The docker0 deletion shouldn't be an issue on nodes, in fact we shouldn't even need to delete docker0 because we start docker without --bridge option, so it's going to create docker0 anyway.

Member

bprashanth commented Jul 29, 2016

@maisem can you clarify the race? we delete docker0 on the master to avoid overlapping cidrs. There might actually be better ways to solve the race (like actually applying the master kubelet's --pod-cidr arg to cbr0 , and making sure it doesn't overlap with docker0), but i don't think we need to target everything for 1.3.4. The docker0 deletion shouldn't be an issue on nodes, in fact we shouldn't even need to delete docker0 because we start docker without --bridge option, so it's going to create docker0 anyway.

@maisem

This comment has been minimized.

Show comment
Hide comment
@maisem

maisem Jul 29, 2016

Contributor

We need to restart docker at least once for it to pick up the new command line arguments.
We didn't need to explicitly restart it pre-kubenet as kubelet would do that.
Pre-kubenet these are the following happens.

  1. Create docker flags
  2. Delete docker0 bridge
  3. Start kubelet
  4. kubelet restarts docker

When we start using kubenet the following happens.

  1. Create docker flags
  2. Restart docker
  3. Delete docker0 bridge
  4. Start kubelet

There is a race between 2 and 3. If docker starts before the bridge is deleted with kubenet enabled it crashes because it is unable to use 172.17.0.1. Which causes the startup scripts to fail.

Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (172.17.0.1): Address already in use

#29757 changes the order of the steps to

  1. Create docker flags
  2. Delete docker0 bridge
  3. Restart docker
  4. Start kubelet

I hope that clarifies things.

Contributor

maisem commented Jul 29, 2016

We need to restart docker at least once for it to pick up the new command line arguments.
We didn't need to explicitly restart it pre-kubenet as kubelet would do that.
Pre-kubenet these are the following happens.

  1. Create docker flags
  2. Delete docker0 bridge
  3. Start kubelet
  4. kubelet restarts docker

When we start using kubenet the following happens.

  1. Create docker flags
  2. Restart docker
  3. Delete docker0 bridge
  4. Start kubelet

There is a race between 2 and 3. If docker starts before the bridge is deleted with kubenet enabled it crashes because it is unable to use 172.17.0.1. Which causes the startup scripts to fail.

Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (172.17.0.1): Address already in use

#29757 changes the order of the steps to

  1. Create docker flags
  2. Delete docker0 bridge
  3. Restart docker
  4. Start kubelet

I hope that clarifies things.

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

Thanks, will check tomorrow

Member

bprashanth commented Jul 29, 2016

Thanks, will check tomorrow

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

Reading your previous comment, we don't need to delete docker0 with kubenet. In fact, with the "correct" ordering (2 before 3 in your list) doesn't docker just recreate docker0 for itself if we simply service restart docker? how does deleting the bridge even help?

There are a few situations to handle though:

  • On the master:
    • either don't use kubenet, continue what we're doing today. It doesn't really matter because all but fluentd are running with host networking on master
    • dont pass a cidr that overlaps with default docker0 (--pod-cidr=172.17.42.1/16 on GKE vs --pod-cidr=10.123.45.0/30 on GCE. On GCE the docker0 deletion is not required)
    • make the docker0 deletion logic smarter. Only delete docker0 if docker is started with --bridge, because then a restart of docker will not recreate docker0.
  • On the node: no docker0 deletion required, just like we do currently

The easiest option is to get rid of docker0 deletion hack by just changing the --pod-cidr given to the master on GKE. Is ther a reason it is what it is today? Then everywhere, we use kubenet, never delete docker0 and respect --pod-cidr if specified, otherwise use the podCIDR in the node object.

Member

bprashanth commented Jul 29, 2016

Reading your previous comment, we don't need to delete docker0 with kubenet. In fact, with the "correct" ordering (2 before 3 in your list) doesn't docker just recreate docker0 for itself if we simply service restart docker? how does deleting the bridge even help?

There are a few situations to handle though:

  • On the master:
    • either don't use kubenet, continue what we're doing today. It doesn't really matter because all but fluentd are running with host networking on master
    • dont pass a cidr that overlaps with default docker0 (--pod-cidr=172.17.42.1/16 on GKE vs --pod-cidr=10.123.45.0/30 on GCE. On GCE the docker0 deletion is not required)
    • make the docker0 deletion logic smarter. Only delete docker0 if docker is started with --bridge, because then a restart of docker will not recreate docker0.
  • On the node: no docker0 deletion required, just like we do currently

The easiest option is to get rid of docker0 deletion hack by just changing the --pod-cidr given to the master on GKE. Is ther a reason it is what it is today? Then everywhere, we use kubenet, never delete docker0 and respect --pod-cidr if specified, otherwise use the podCIDR in the node object.

@maisem

This comment has been minimized.

Show comment
Hide comment
@maisem

maisem Jul 29, 2016

Contributor

This isn't an issue with the master.
This is an issue with the node.

docker0 deletion for GCI was introduced in #27016 to fix #26379. #29757 merely reorders it.

Contributor

maisem commented Jul 29, 2016

This isn't an issue with the master.
This is an issue with the node.

docker0 deletion for GCI was introduced in #27016 to fix #26379. #29757 merely reorders it.

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

docker0 deletion isn't required for node.

Member

bprashanth commented Jul 29, 2016

docker0 deletion isn't required for node.

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

#26379 is only an issue on the GKE master because we pass --pod-cidr=172.17.42.1/16 on GKE, which overlaps with the default range of docker0.

Member

bprashanth commented Jul 29, 2016

#26379 is only an issue on the GKE master because we pass --pod-cidr=172.17.42.1/16 on GKE, which overlaps with the default range of docker0.

@bprashanth

This comment has been minimized.

Show comment
Hide comment
@bprashanth

bprashanth Jul 29, 2016

Member

I take that back, #26379 was an issue because we were not using kubenet. Once we started using kubenet, GCE masters should work with or without docker0 deletion. GKE masters won't work because of the --pod-cidr argument overlapping with docker0.

Member

bprashanth commented Jul 29, 2016

I take that back, #26379 was an issue because we were not using kubenet. Once we started using kubenet, GCE masters should work with or without docker0 deletion. GKE masters won't work because of the --pod-cidr argument overlapping with docker0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment