Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: don't create sandbox when bridge is none to avoid docker restart hang. #42454

Conversation

payall4u
Copy link
Contributor

@payall4u payall4u commented Jun 2, 2021

From #42461

- What I did
If we create container with docker daemon disable bridge, we'll create needless sandbox.
And when the container be removed, the sandbox will be stayed here. Only when dockerd be restarted, those sandbox would be deleted by sandboxCleanup(). 4000 such sandbox will cost nearly 10 minutes. It means restarting docker daemon will hang more then 10 minutes.

func (daemon *Daemon) allocateNetwork(container *container.Container) (retErr error) {
	var (
		start          = time.Now()
		controller     = daemon.netController
	)

	// container.Config.NetworkDisabled never be assigned.
	if container.Config.NetworkDisabled && container.HostConfig.NetworkMode.IsContainer() {
		return nil
	}
	// ...
        // container.Config.NetworkDisabled will be true here (in connectToNetwork).
	// when we create set NetworkDisabled, we won't delete sandbox after deleting container.
	if nConf, ok := container.NetworkSettings.Networks[defaultNetName]; ok {
		cleanOperationalData(nConf)
		if err := daemon.connectToNetwork(container, defaultNetName, nConf.EndpointSettings, updateSettings); err != nil {
			return err
		}

	}

	networks := make(map[string]*network.EndpointSettings)
	for n, epConf := range container.NetworkSettings.Networks {
		if n == defaultNetName {
			continue
		}

		networks[n] = epConf
	}
	// len of networks is 0, we'll create sandbox here
	if len(networks) == 0 {
	}
}

So I think we should assign the NetworkDisabled earlier.
- How I did it
I fix the bug by setting NetworkDisabled correctly.

- How to verify it

  1. Update the docker config set "bridge": "none"
  2. Use for i in $(seq 1 4000); do docker run --rm ubuntu; done to create many sandbox.
  3. Restart docker and wait.(We can verify it by read network local-kv.db directly.)

- Description for the changelog

Setting NetworkDisabled correctly to avoid restart docker hang.

- A picture of a cute animal (not mandatory but encouraged)
馃惂馃惂馃惂

@payall4u payall4u force-pushed the payall4u/fix-creating-sandbox-when-disable-bridge branch from af3d378 to a790247 Compare June 2, 2021 09:17
@payall4u payall4u changed the title fix: check whether disable bridge to avoid create sandbox. Fix: don't create sandbox when bridge is none to avoid docker restart hang. Jun 3, 2021
@payall4u
Copy link
Contributor Author

ping @AkihiroSuda @fuweid

@thaJeztah
Copy link
Member

I'm a bit confused; wouldn't the container still need a sandbox?

And when the container be removed, the sandbox will be stayed here

Do you know why it's not removed? Feels like that may be the actual bug?

@payall4u
Copy link
Contributor Author

payall4u commented Jun 15, 2021

I'm a bit confused; wouldn't the container still need a sandbox?

And when the container be removed, the sandbox will be stayed here

Do you know why it's not removed? Feels like that may be the actual bug?

Summary

When we set bridge to "none" and create a container, we'll get:

  1. create container
  2. allocate network <=== we'll set Container.Config.NetworkDisabled to true here
  3. create sandbox however

When we delete the container, it will:

  1. release network <=== if Container.Config.NetworkDisabled is true jump to step 3
  2. delete sandbox
  3. delete container

Detail

Create container

func (daemon *Daemon) allocateNetwork(container *container.Container) (retErr error) {

	if nConf, ok := container.NetworkSettings.Networks[defaultNetName]; ok {
		// set container.Config.NetworkDisabled here
		if err := daemon.connectToNetwork(container, defaultNetName, nConf.EndpointSettings, updateSettings); err != nil {
			return err
		}
	}

	// Create its network sandbox now if not present
	if len(networks) == 0 {
		if nil == daemon.getNetworkSandbox(container) {
			sb, err := daemon.netController.NewSandbox(container.ID, sbOptions...)
			if err != nil {
				return err
			}
			updateSandboxNetworkSettings(container, sb)
			defer func() {
				if retErr != nil {
					sb.Delete()
				}
			}()
		}
	}
}

func (daemon *Daemon) connectToNetwork(container *container.Container, idOrName string, ...) (err error) {
	// ...
	if containertypes.NetworkMode(idOrName).IsBridge() &&
		daemon.configStore.DisableBridge {
		container.Config.NetworkDisabled = true
		return nil
	}
	// ...
}

Delete container

When deleting the container, we'll check container.Config.NetworkDisabled. If the value is true, we won't delete the sandbox.

func (daemon *Daemon) releaseNetwork(container *container.Container) {
	if container.HostConfig.NetworkMode.IsContainer() || container.Config.NetworkDisabled {
		return
	}
	// ...
	sb, err := daemon.netController.SandboxByID(sid)
	// ...
	if err := sb.Delete(); err != nil {
		logrus.Errorf("Error deleting sandbox id %s for container %s: %v", sid, container.ID, err)
	}
	// ...
}

@thaJeztah Sorry for my late response.

@thaJeztah
Copy link
Member

release network <=== if Container.Config.NetworkDisabled is true jump to step 3

Is the sandbox ID set in that case? Because if that's the case, removing the || container.Config.NetworkDisabled would fix that; looking at the code;

if container.HostConfig.NetworkMode.IsContainer() || container.Config.NetworkDisabled {
return
}
sid := container.NetworkSettings.SandboxID
settings := container.NetworkSettings.Networks
container.NetworkSettings.Ports = nil
if sid == "" {
return
}

If the sandbox-ID is not set (empty), it would already return early. and if it's set, it would clean it up. If the sandbox ID is set in this case, I think that's something that should be done (even with this fix applied)

Looking further; I wonder if there's other situations that are overlooked for which things should be skipped; thinking of host mode networking; I see there's some handling in daemon.initializeNetworking(), but it does not return before daemon.allocateNetwork() is called;

if container.HostConfig.NetworkMode.IsHost() {
if container.Config.Hostname == "" {
container.Config.Hostname, err = os.Hostname()
if err != nil {
return err
}
}
}
if err := daemon.allocateNetwork(container); err != nil {
return err
}

@payall4u payall4u force-pushed the payall4u/fix-creating-sandbox-when-disable-bridge branch from a790247 to fcde313 Compare June 15, 2021 10:47
@@ -1014,7 +1014,7 @@ func (daemon *Daemon) releaseNetwork(container *container.Container) {
if daemon.netController == nil {
return
}
if container.HostConfig.NetworkMode.IsContainer() || container.Config.NetworkDisabled {
if container.HostConfig.NetworkMode.IsContainer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that container.NetworkSettings is a pointer, so could (potentially) be nil; perhaps we need a check for that below, to prevent a panic if that ever happens (not sure if possible)

Copy link
Contributor Author

@payall4u payall4u Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that container.NetworkSettings is a pointer, so could (potentially) be nil; perhaps we need a check for that below, to prevent a panic if that ever happens (not sure if possible)

I think that container.NetworkSettings is never nil, unless someone update the container config on disk and restart docker daemon. It's always good to check.

Signed-off-by: payall4u <payall4u@qq.com>
@payall4u payall4u force-pushed the payall4u/fix-creating-sandbox-when-disable-bridge branch from fcde313 to 33bfb32 Compare June 15, 2021 12:29
@payall4u payall4u requested a review from thaJeztah June 17, 2021 01:19
@payall4u
Copy link
Contributor Author

Ping @thaJeztah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants