Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting Windows Docker containers with a too low memory limit causes the Docker service to hang #37429

Open
claudiubelu opened this issue Jul 10, 2018 · 5 comments

Comments

@claudiubelu
Copy link

Description

Creating and starting Windows docker containers with a low memory limit (e.g.: 30 MB) can cause the docker service to hang and become unresponsive until the service is forcefully restarted.

This issue has been observed while using Kubernetes 1.11 with Windows Server 1803 nodes. The Kubernetes pods fail to start (they will remain in ContainerCreating state [1]) and the nodes will end up in NotReady state because the Docker service is hanging on the Windows nodes. To properly restore the Windows nodes, the pods will have to be deleted before restarting the Docker service.

Creating containers with a higher memory limit (e.g.: 50 MB) works fine and does not cause any issues.

[1] https://paste.ubuntu.com/p/Px25F9Tnfs/

Steps to reproduce the issue:

Via Docker:

  1. Create container: docker create --name test-container --memory 30m microsoft/windowsservercore:1803 ping -t localhost
  2. Check container: docker ps -a
  3. Start container: docker start test-container
  4. Notice that the CLI hangs. Docker is not responding.

Via Kubernetes:

  1. Create pod.yaml file: https://paste.ubuntu.com/p/kDnYP3zBYS/
  2. Create pod (kubectl create -f pod.yaml)
  3. Wait for The container to be created (kubectl describe po/pod-limit)
  4. Check that the Windows node is in NotReady state (kubectl get nodes)
  5. Connect to the Windows node and use the docker CLI (docker ps -a)
  6. Notice that there is no output. Docker is not responding.

Describe the results you received:

Docker becomes unresponsive. The CLI doesn't receive any response and will also hang until interrupted.

Describe the results you expected:

The Docker service should still be working.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:       17.06.2-ee-13
 API version:   1.30
 Go version:    go1.8.7
 Git commit:    ac44d73
 Built: Mon Jun  4 16:46:59 2018
 OS/Arch:       windows/amd64

Server:
 Engine:
  Version:      17.06.2-ee-13
  API version:  1.30 (minimum version 1.24)
  Go version:   go1.8.7
  Git commit:   ac44d73
  Built:        Mon Jun  4 16:58:47 2018
  OS/Arch:      windows/amd64
  Experimental: false

Output of docker info:

Containers: 3
 Running: 0
 Paused: 0
 Stopped: 3
Images: 21
Server Version: 17.06.2-ee-13
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17134 (17134.1.amd64fre.rs4_release.180410-1804)
Operating System: Windows Server Datacenter
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 8GiB
Name: 38923k8s9001
ID: XQOJ:4F3E:ERXL:FLLP:4WX6:7Y7E:FCOC:EXXN:PL4N:ERNU:GIQY:SJ6A
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Virtual Machine with Windows Server 1803
  • Used docker image: microsoft/windowsservercore:1803
@thaJeztah
Copy link
Member

ping @johnstep @jhowardmsft

@johnstep
Copy link
Member

I can reproduce the problem and will investigate today. Thanks.

@johnstep
Copy link
Member

johnstep commented Jul 10, 2018

I believe HCS relies on the CExecSvc process in the container, for container communication. At 30 MB, the container doesn't have enough memory to start CExecSvc sometimes, and when it does, something must be failing somewhere, or perhaps in another process.

I think there are a couple things that can help here:

  1. Harden the container components to be more resilient to memory failure issues. At a minimum, considering somehow monitoring CExecSvc from the host, to terminate the container on failure. Some of this will be challenging because the component binaries are the same as the host, and are likely not robust to extremely low memory conditions.

  2. Establish a minimum recommended memory value for each of the base Windows images. However, this will just be a rough guideline, and assumes doing no work within the container. It's also going to vary by image and by version. Based on my tests, reliable ping requires about 35 MB (for continuous pinging, and clean, fast exit) with microsoft/nanoserver:1803 and about 58 MB with microsoft/windowsservercore:1803.

@jhowardmsft Can you take a look at this from a Microsoft Windows team perspective?

@cpuguy83
Copy link
Member

Note that there is a hard minimum limit for Linux containers as well which we enforce (4MB) in the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants