Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux containers take too long to stop when executed with LCOW switch on #37919

Open
jtsalten opened this issue Sep 26, 2018 · 14 comments

Comments

Projects
None yet
9 participants
@jtsalten
Copy link

commented Sep 26, 2018

No description provided.

@jtsalten

This comment has been minimized.

Copy link
Author

commented Sep 26, 2018

I've just installed the Docker last version available for Windows 10 Pro (18.09.0-ce-beta1
) . When Running on mode Windows Containers to be able to run Linux and Windows containers at same time I realize that I can't stop the Linux ones.
If I run, for example, docker stop ubuntu it keeps waiting more than 5 minutes till the container is stopped (it should take seconds).
Also docker kill ubuntu doesn't work.
To stop it quickly I have to run hcsdiag kill , but I think is not the right way.

On the other side I can stop/kill perfectly a windows container with: docker stop nano or docker kill nano (being nano the name of a windows container)

Docker Version:
Server Version: 18.09.0-ce-beta1
Storage Driver: windowsfilter (windows) lcow (linux)

Docker Info:

Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 3
Server Version: 18.09.0-ce-beta1
Storage Driver: windowsfilter (windows) lcow (linux)
Windows:
LCOW:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: hyperv
Kernel Version: 10.0 17134 (17134.1.amd64fre.rs4_release.180410-1804)
Operating System: Windows 10 Pro Version 1803 (OS Build 17134.286)
OSType: windows
Architecture: x86_64
CPUs: 4
Total Memory: 15.85GiB
Name: ESXPSSALVA
ID: TSLU:U7XJ:OZ3M:KGIT:TZ5Q:4L3G:VZ5Q:P7OG:QMXN:M5PK:WZOW:TWVQ
Docker Root Dir: C:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: -1
Goroutines: 26
System Time: 2018-09-26T20:37:34.4583496+02:00
EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
docker.iscinternal.com
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

@jtsalten

This comment has been minimized.

Copy link
Author

commented Sep 26, 2018

A basic ubuntu container takes 30 seconds to stop. An IRIS container (also based on Ubuntu) takes up to 5 min... it shouldn't take more than 1 min.

Console output:
PS C:> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
88fd8bf51b6b docker.iscinternal.com/intersystems/iris:2018.2.0.491.0qdtestsys "/iris-main" 25 hours ago Up 40 seconds (health: starting) 0.0.0.0:5173->51773/tcp, 0.0.0.0:5273->52773/tcp iris
PS C:> Get-Date; docker stop iris; Get-Date

miércoles, 26 de septiembre de 2018 20:21:13
iris
miércoles, 26 de septiembre de 2018 20:26:21

PS C:> docker start ubuntu
ubuntu
PS C:> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
37d12e01f2e9 ubuntu:latest "/bin/sh" 9 minutes ago Up 5 seconds ubuntu
PS C:> Get-Date; docker stop ubuntu; Get-Date

miércoles, 26 de septiembre de 2018 20:26:57
ubuntu
miércoles, 26 de septiembre de 2018 20:27:29

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Sep 27, 2018

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2018

@jtsalten Can you confirm what kernel you are running for lcow, and where you got it from? I'm guessing/assuming it's from a Docker-For-Windows installation? If so, what version of D4W?

docker run --rm alpine uname -r will get you the kernel version.

And also in the second case, how -exactly- did you create that ubuntu container so I can attempt to repro like-for-like.

@jtsalten

This comment has been minimized.

Copy link
Author

commented Oct 2, 2018

@jhowardmsft sorry for answering late … i didn't see the notification.

running alpine: 4.14.35-linuxkit

I tested with 2 images, one the basic ubuntu, pulled from docker store. The other was an InterSystems IRIS image based on Ubuntu (published by InterSystems just for customers). Both took more time than expected to stop.

The ubuntu image is public so I guess is easier to reproduce the issue.
I include below output of:

docker image inspect ubuntu
docker images
get-date; docker stop ubuntu; get-date

docker image inspect ubuntu

PS C:\WINDOWS\system32> docker image inspect ubuntu
[
    {
        "Id": "sha256:cd6d8154f1e16e38493c3c2798977c5e142be5e5d41403ca89883840c6d51762",
        "RepoTags": [
            "ubuntu:latest"
        ],
        "RepoDigests": [
            "ubuntu@sha256:de774a3145f7ca4f0bd144c7d4ffb2931e06634f11529653b23eba85aef8e378"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2018-09-05T22:20:12.306501594Z",
        "Container": "9ec7e3534677f7131ca24d817f350f61a5f55b90213e72639a5d4544bc4ca661",
        "ContainerConfig": {
            "Hostname": "9ec7e3534677",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "#(nop) ",
                "CMD [\"/bin/bash\"]"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:7305906e8de52626527daa8a6d557980a47021e3ad0fbc59259c63f9d09abd78",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {}
        },
        "DockerVersion": "17.06.2-ce",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/bash"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:7305906e8de52626527daa8a6d557980a47021e3ad0fbc59259c63f9d09abd78",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 121113088,
        "VirtualSize": 121113088,
        "GraphDriver": {
            "Data": {
                "dir": "C:\\ProgramData\\Docker\\lcow\\55ff159f03bbb53445c9b458a96fe1c8ea6975999c6468f0977eecb9de90b589"
            },
            "Name": "lcow"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:a30b835850bfd4c7e9495edf7085cedfad918219227c7157ff71e8afe2661f63",
                "sha256:6267b420796f78004358a36a2dd7ea24640e0d2cd9bbfdba43bb0c140ce73567",
                "sha256:f73b2816c52ac5f8c1f64a1b309b70ff4318d11adff253da4320eee4b3236373",
                "sha256:6a061ee02432e1472146296de3f6dab653f57c109316fa178b40a5052e695e41",
                "sha256:8d7ea83e3c626d5ef1e6a05de454c3fe8b7a567db96293cb094e71930dba387d"
            ]
        },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]

docker images

REPOSITORY                                 TAG                       IMAGE ID            CREATED             SIZE
jtsalvador/iris                            jtsc                      074807d28fa4        26 minutes ago      1.64GB
alpine                                     latest                    196d12cf6ab1        2 weeks ago         6.82MB
ubuntu                                     latest                    cd6d8154f1e1        3 weeks ago         121MB

docker stop

_(get-date; docker stop ubuntu; get-date)_

PS C:\WINDOWS\system32> docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                     PORTS                                              NAMES
b99ff108f394        074807d28fa4        "/iris-main"        9 minutes ago       Up 8 minutes    0.0.0.0:5175->51773/tcp, 0.0.0.0:5275->52773/tcp   jts2018c
5d25f8cbabf0        ubuntu              "bash"              13 minutes ago      Up 6 minutes                                                                  ubuntu
PS C:\WINDOWS\system32> get-date; docker stop ubuntu; get-date

martes, 2 de octubre de 2018 12:54:06
ubuntu
martes, 2 de octubre de 2018 12:59:09

@thaJeztah thaJeztah added the area/lcow label Oct 30, 2018

@mrfrankbell

This comment has been minimized.

Copy link

commented Nov 20, 2018

Seeing the same on fresh install of Docker Desktop v2.0.0.0-win78 (28905) on Windows 10 1809.

PS C:\WINDOWS\system32> docker pull nginx:alpine
alpine: Pulling from library/nginx
4fe2ade4980c: Pull complete
e11b023b07a0: Pull complete
aaf7d5908de5: Pull complete
8ae3a5ca453f: Pull complete
Digest: sha256:41dbd69d560c20712f90011689596f20a5db6d7f85436edda7f177a2e53d4da2
Status: Downloaded newer image for nginx:alpine
PS C:\WINDOWS\system32> docker run -d nginx:alpine
f4a23f43c9a589f41b3a908aabe85eefda8a307b4ef813fc2c9b20e19afb6766
PS C:\WINDOWS\system32> docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
f4a23f43c9a5        nginx:alpine        "nginx -g 'daemon of…"   10 seconds ago      Up 6 seconds        80/tcp              amazing_knuth
PS C:\WINDOWS\system32> get-date; docker stop f4a23f43c9a5; get-date

20 November 2018 12:49:44
f4a23f43c9a5
20 November 2018 12:54:49

Related error event in Application log at 12:54:49:

failed to terminate hccshim container [signal=9 error=container f4a23f43c9a589f41b3a908aabe85eefda8a307b4ef813fc2c9b20e19afb6766 encountered an error during Terminate: hcsshim: the handle has already been closed module=libcontainerd namespace=moby container=f4a23f43c9a589f41b3a908aabe85eefda8a307b4ef813fc2c9b20e19afb6766 process=init pid=356]

Results of docker info:

PS C:\WINDOWS\system32> docker info
Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 1
Server Version: 18.09.0
Storage Driver: windowsfilter (windows) lcow (linux)
 Windows:
 LCOW:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: hyperv
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows 10 Pro Version 1809 (OS Build 17763.134)
OSType: windows
Architecture: x86_64
CPUs: 16
Total Memory: 31.92GiB
Name: *******
ID: 5YGG:JBRH:KC5D:MXVS:XN3B:AMSI:VDZL:YICW:OPZZ:VUP3:D7L2:MWYW
Docker Root Dir: C:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 28
 System Time: 2018-11-20T12:55:11.7222861Z
 EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
@joostmeijles

This comment has been minimized.

Copy link

commented Dec 3, 2018

Same issue here, it's quite annoying. Only workaround seems restarting the Docker daemon.

Might this be related to this change: #37583 ?

Any ETA on a fix for this?

@Barsonax

This comment has been minimized.

Copy link

commented Dec 17, 2018

Also waiting for a fix for this. Any ETA?

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

commented Jan 2, 2019

Absolutely not related to #37583. I suspect more likely it's the linuxkit kernel rather than docker itself. I'll try to investigate over the next few days. If someone can attach the relevant linuxkit kernel to a comment here (zipped), I can compare and contrast the performance of an internally built kernel to the linuxkit one.

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

commented Jan 2, 2019

Actually - don't need the above. It's available at https://github.com/linuxkit/lcow/releases/tag/v4.14.35-v0.3.9

@jhowardmsft

This comment has been minimized.

Copy link
Contributor

commented Jan 2, 2019

OK, this is somewhat interesting. It's looking more like an libcontainerd, opengcs or HCS issue. Exiting an interactive LCOW container works fine and is (relatively) immediate - ie the exit is initiated from inside the container, not from the host. It's only the case of docker stop. I can see from the daemon logs that host-initiated shutdown call appears to succeed, but some notification isn't being caught, so the daemon waits 30 seconds before giving up and trying to force terminate the container instead which succeeds immediately. This is where the delay is. Will investigate more.

@hulucc

This comment has been minimized.

Copy link

commented Feb 15, 2019

docker kill -s 9

@jterry75

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2019

@jhowardmsft - Do you have the logs timestamps between the signal notify and the hcsshim Wait return? We can eliminate the GCS between those two traces.

@Logiqx

This comment has been minimized.

Copy link

commented Apr 13, 2019

I'm seeing exactly the same behaviour; attempting to stop LCOW containers takes 300 seconds.

In an attempt to find a workaround, I've tried "docker attach" to send a ctrl-c but that hasn't worked either.

My Docker Compose file is available on GitHub. It's a simple two container NGINX and PHP-FPM setup which runs fine using LCOW but it can't be stopped cleanly.

Using docker kill isn't super-critical for containers that I regard as both immutable and ephemeral but it's not ideal for containers which write persistent data to a volume / bind mount, etc.

Docker version:

Client: Docker Engine - Community
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        6247962
 Built:             Sun Feb 10 04:11:47 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 04:13:06 2019
  OS/Arch:          linux/amd64
  Experimental:     true

Docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 15
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: fkpezvdswhcj82aggd46l0b18
 Is Manager: true
 ClusterID: zswgp20g71sn240wvbptjv7hm
 Managers: 1
 Nodes: 1
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.65.3
 Manager Addresses:
  192.168.65.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.125-linuxkit
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 5.8GiB
Name: linuxkit-00155db9b108
ID: AZKJ:OHY4:HRZQ:XN26:Z4U2:2PZ6:B7JX:LWDN:6HKX:EPSL:MJJE:5SL2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 38
 Goroutines: 167
 System Time: 2019-04-13T18:57:36.7022241Z
 EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.