Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport: No container locks on docker ps #375

Merged
merged 28 commits into from
Jun 30, 2020

Conversation

kolyshkin
Copy link

@kolyshkin kolyshkin commented May 1, 2020

Problem

Our openshift-ci cluster was experiencing both long delays and lockups on docker ps. Investigation shown that, for the lockup case, all the instances of daemon.Container() (the API call producing the list of containers) are waiting for a lock for a particular container which was taken 2+ hours ago and never released. A long delay is also caused by waiting for the same lock, when some other container is performing a long operation (e.g. container creation, which includes selinux relabeling).

A typical waiting goroutine backtrace looks like this:

goroutine 145541 [semacquire, 153 minutes]:
sync.runtime_SemacquireMutex(0xc42031f7a4, 0x0)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc42031f7a0)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/sync/mutex.go:134 +0x108
github.com/docker/docker/daemon.(*Daemon).reducePsContainer(0xc420472200, 0xc42289d400, 0xc4242b6120, 0xc420b3d680, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/daemon/list.go:202 +0x46
github.com/docker/docker/daemon.(*Daemon).reduceContainers(0xc420472200, 0xc4247c60c0, 0xc420b3d680, 0xc420b3d690, 0x1, 0x424858, 0x40, 0xc4214ce598)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/daemon/list.go:184 +0x12e
github.com/docker/docker/daemon.(*Daemon).Containers(0xc420472200, 0xc4247c60c0, 0x0, 0x0, 0x0, 0xed4d7ae6870956f6, 0x424858)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/daemon/list.go:107 +0x54
......

Root cause

Alas, I was unable to find a place in the code which acquired but not released the lock, since there are way too many places like that in the code (and I'm still looking). Still, even if the bug will be found, docker ps can easily be stuck for a few minutes, since many container operations take this lock for this time.

A container being created got stuck on getdents(2) in the middle of selinux relabeling. This is most probably a volume on a networked block device (provided by GCE).

daemon.(*Daemon).containerStart() holds a per-container lock for the duration of the container creation.

Here's an except from dockerd goroutine dump:

goroutine 145539 [syscall, 138 minutes]:
syscall.Syscall(0xd9, 0x8a, 0xc423880000, 0x1000, 0xc4204f1200, 0xc4224c2c80, 0x640)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.Getdents(0x8a, 0xc423880000, 0x1000, 0x1000, 0xc42576bc38, 0x18, 0xc42576bc30)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/syscall/zsyscall_linux_amd64.go:482 +0x5f
syscall.ReadDirent(0x8a, 0xc423880000, 0x1000, 0x1000, 0x15ccce0, 0x100000000000001, 0xc422a2dc00)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/syscall/syscall_linux.go:772 +0x49
internal/poll.(*FD).ReadDirent(0xc4219865a0, 0xc423880000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/internal/poll/fd_unix.go:404 +0xad
os.(*File).readdirnames(0xc42619ab40, 0xffffffffffffffff, 0x0, 0x0, 0xc42619ab40, 0x0, 0x0)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/os/dir_unix.go:67 +0x1bf
<...skipped....>
path/filepath.Walk(0xc4203fd400, 0x97, 0xc420ff4940, 0x0, 0xc42338c6d8)
        /opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/path/filepath/path.go:403 +0x106
github.com/docker/docker/vendor/github.com/opencontainers/runc/libcontainer/selinux.Chcon(0xc4203fd400, 0x97, 0xc421d6c510, 0x2f, 0x25aa601, 0x6e, 0x28)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/vendor/github.com/opencontainers/runc/libcontainer/selinux/selinux.go:482 +0xc3
github.com/docker/docker/vendor/github.com/opencontainers/runc/libcontainer/label.Relabel(0xc4203fd400, 0x97, 0xc421d6c510, 0x2f, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/vendor/github.com/opencontainers/runc/libcontainer/label/label_selinux.go:146 +0x1fb
github.com/docker/docker/volume.(*MountPoint).Setup.func1(0xc42338c908, 0xc4227a3680, 0xc42576c8f8, 0xc421d6c510, 0x2f)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/volume/volume.go:139 +0x1e0
github.com/docker/docker/volume.(*MountPoint).Setup(0xc4227a3680, 0xc421d6c510, 0x2f, 0x0, 0x0, 0xc4270424c0, 0xc4247ca790, 0x97, 0x0, 0x0)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/volume/volume.go:187 +0x447
github.com/docker/docker/daemon.(*Daemon).setupMounts(0xc420472200, 0xc42289d400, 0x0, 0x0, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/daemon/volumes_unix.go:58 +0x3be
github.com/docker/docker/daemon.(*Daemon).createSpec(0xc420472200, 0xc42289d400, 0x0, 0x0, 0xc420472200)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/daemon/oci_linux.go:703 +0xab2
github.com/docker/docker/daemon.(*Daemon).containerStart(0xc420472200, 0xc42289d400, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/daemon/start.go:149 +0x25d
github.com/docker/docker/daemon.(*Daemon).ContainerStart(0xc420472200, 0xc42565eb07, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x424858, 0x10)
        /builddir/build/BUILD/docker-b2f74b220126bddbae585cfe2406498ced3ae065/_build/src/github.com/docker/docker/daemon/start.go:87 +0x195

Solution

It was not a great idea in the first place to take per-container locks for such frequently used and critical operation like docker ps. Thus here is a backport of upstream PR moby#31273 which removed the lock. It does not resolves the root cause (lock being held forever in case of I/O operation being stuck) but fixes the bad consequences (docker ps being stuck).

Backport details

This was not an easy backport. Most of the merge conflicts were caused by old codebase here, but there were some caused by commits in this repo missed from upstream. Every cherry-pick that resulted in conflicts has them documented in the commit message.

The main strategy was to skip porting most of the preceding PRs and resolve the merge conflicts manually.

Skipped PRs

These PRs were looked at carefully but not backported. The list is not complete, the order is random.

Ported PRs

These were analysed and deemed to be a necessary prerequisites for the PR being backported.

Finally:

LK4D4 and others added 8 commits April 30, 2020 15:33
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
(cherry picked from commit b0f9958)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This drops support for migrations from pre-1.10 Docker versions, which
should be done via an external tool or an intermediate upgrade.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
(cherry picked from commit f8119bb)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This removes sqlite dependencies which were not removed by the previous
(backported) commit.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since Register() will never return err,remove the return value

Signed-off-by: NickrenREN <yuquan.ren@easystack.cn>
(cherry picked from commit 78b50c1)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Container state counts are used for reporting in the `/info` endpoint.
Currently when `/info` is called, each container is iterated over and
the containers 'StateString()' is called. This is not very efficient
with lots of containers, and is also racey since `StateString()` is not
using a mutex and the mutex is not otherwise locked.

We could just lock the container mutex, but this is proven to be
problematic since there are frequent deadlock scenarios and we should
always have the `/info` endpoint available since this endpoint is used
to get general information about the docker host.

Really, these metrics on `/info` should be deprecated. But until then,
we can just keep a running tally in memory for each of the reported
states.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e4c0362)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/delete.go: different imports
 - daemon/info.go: commit 62563e1 missing from upstream
 - vendor/github.com/docker/go-metrics/README.md: newer vndr
After running the test suite with the race detector enabled I found
these gems that need to be fixed.
This is just round one, sadly lost my test results after I built the
binary to test this... (whoops)

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 7917a36)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - api/types/network/network.go: typo in a comment
 - daemon/container.go: missing github.com/moby/moby/29683
 - daemon/monitor.go: multiple commits missing from upstream

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
(cherry picked from commit bd33a99)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/prune.go: missing moby#29226,
   moby#29963
`ConnectToNetwork` is modfying the container but is not locking the
object.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 4d0888e)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@rh-atomic-bot
Copy link

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

@kolyshkin
Copy link
Author

That'd be nice to run CI overnight (at least to see the beast compiles)

@kolyshkin kolyshkin changed the title WIP backport: No container locks on docker ps backport: No container locks on docker ps May 1, 2020
@kolyshkin kolyshkin marked this pull request as ready for review May 1, 2020 18:37
@kolyshkin
Copy link
Author

@rhatdan PTAL

@mrunalp
Copy link

mrunalp commented May 4, 2020

@kolyshkin Were you able to run and smoke test this locally? We can ask for a auto-test run for a scratch build of this run.

xulike666 and others added 14 commits May 4, 2020 10:17
Signed-off-by: Aaron.L.Xu <likexu@harmonycloud.cn>
(cherry picked from commit e0577d5)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - man/src/container/create-example.md, man/src/container/create.md,
   man/src/plugin/ls.md: files were removed
In some cases a mount spec would not be properly backported which could
lead to accidental removal of the underlying volume on container remove
(which should never happen with named volumes).

Adds unit tests for this as well. Unfortunately I had to add a daemon
depdency for the backport function due to looking up `VolumesFrom`
specs.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 3cf1859)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit d6e1cb7)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 481a92c)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The Solaris version (previously daemon/inspect_solaris.go) was
apparently missing some fields that should be available on that
platform.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit cfc404a)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
- container/container_unix.go (minor, due to commit b7732b0 missed
  from upstream)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This can be used by readers/queries so they don't need locks.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 054728b)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit eed4c7b)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/daemon.go: missing moby#33241
 - container/container_unix.go: missing moby#32336
 - container/container_windows.go, daemon/container_operations.go: minor conflicts

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Container queries are now served from the consistent in-memory db, and
don't need to grab a lock on every container being listed.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 8e425eb)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/list.go: missing github.com/moby/pull/33241
Also hide ViewDB behind an inteface.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit aacddda)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/container_operations.go, daemon/daemon.go: missing
   https://github.com/moby/moby/33241
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 2ed6f92)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
State will be saved on the following operation once the container is
properly registered on the daemon.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 03aa247)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It operates on containers that have already been registered on the
daemon, and are visible to other goroutines.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 201a37f)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
it is already being saved (with a lock held) on the subsequent
operations.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit f668af4)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reuse existing structures and rely on json serialization to deep copy
Container objects.

Also consolidate all "save" operations on container.CheckpointTo, which
now both saves a serialized json to disk, and replicates state to the
ACID in-memory store.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit edad527)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/container_operations.go: missing
   moby#30117
 - daemon/daemon.go: missing moby#32821
 - daemon/list.go: missing moby#27557,
   moby#33241 etc.
@rhatdan
Copy link
Member

rhatdan commented May 4, 2020

LGTM

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 9134e87)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - container/container.go: missing moby#33241,
   moby#32687

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit a43be34)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/container_operations.go: missing
   moby#31996
initHealthMonitor and updateHealthMonitor can cause container state to
be changed (State.Health).

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 04bd768)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 66b231d)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Migrate legacy volumes (Daemon.verifyVolumesInfo) before containers are
registered on the Daemon, so state on disk is not overwritten and legacy
fields lost during registration.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 76d9641)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - integration-cli/docker_cli_daemon_test.go: missing
   moby#29289

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Fix a deadlock caused by re-entrant locks on container objects.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
(cherry picked from commit 37addf0)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Conflicts:
 - daemon/container_operations.go: missing
   moby#30117
@kolyshkin
Copy link
Author

Update: found a root cause, updated the description. This PR is still a valid workaround.

A solution would be to cancel container creation upon a timeout

@kolyshkin
Copy link
Author

A solution would be to cancel container creation upon a timeout

Which, of course, is also a workaround, just more targeted.

@kolyshkin
Copy link
Author

Minor updates to PR description.

A docker build with this PR was tested by QE team (they have some smoke tests under their belt), and also live-tested in a production environment (OC CI cluster doing search). The issue appears to be resolved, and no regressions were observed.

@mrunalp PTAL

@mrunalp
Copy link

mrunalp commented Jun 29, 2020

LGTM

@rhatdan rhatdan merged commit f0068f2 into projectatomic:docker-1.13.1-rhel Jun 30, 2020
@alibo
Copy link

alibo commented Jan 22, 2021

is it also backported to docker-1.13.1 (centos 7)?
I've installed v1.13.1-203 and I'm still facing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.