-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport: No container locks on docker ps
#375
backport: No container locks on docker ps
#375
Conversation
Signed-off-by: Alexander Morozov <lk4d4@docker.com> (cherry picked from commit b0f9958) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This drops support for migrations from pre-1.10 Docker versions, which should be done via an external tool or an intermediate upgrade. Signed-off-by: Justin Cormack <justin.cormack@docker.com> (cherry picked from commit f8119bb) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This removes sqlite dependencies which were not removed by the previous (backported) commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since Register() will never return err,remove the return value Signed-off-by: NickrenREN <yuquan.ren@easystack.cn> (cherry picked from commit 78b50c1) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Container state counts are used for reporting in the `/info` endpoint. Currently when `/info` is called, each container is iterated over and the containers 'StateString()' is called. This is not very efficient with lots of containers, and is also racey since `StateString()` is not using a mutex and the mutex is not otherwise locked. We could just lock the container mutex, but this is proven to be problematic since there are frequent deadlock scenarios and we should always have the `/info` endpoint available since this endpoint is used to get general information about the docker host. Really, these metrics on `/info` should be deprecated. But until then, we can just keep a running tally in memory for each of the reported states. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit e4c0362) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/delete.go: different imports - daemon/info.go: commit 62563e1 missing from upstream - vendor/github.com/docker/go-metrics/README.md: newer vndr
After running the test suite with the race detector enabled I found these gems that need to be fixed. This is just round one, sadly lost my test results after I built the binary to test this... (whoops) Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 7917a36) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - api/types/network/network.go: typo in a comment - daemon/container.go: missing github.com/moby/moby/29683 - daemon/monitor.go: multiple commits missing from upstream Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> (cherry picked from commit bd33a99) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/prune.go: missing moby#29226, moby#29963
`ConnectToNetwork` is modfying the container but is not locking the object. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 4d0888e) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Can one of the admins verify this patch?
|
That'd be nice to run CI overnight (at least to see the beast compiles) |
c8b2cab
to
3bfb2f7
Compare
docker ps
docker ps
@rhatdan PTAL |
@kolyshkin Were you able to run and smoke test this locally? We can ask for a auto-test run for a scratch build of this run. |
Signed-off-by: Aaron.L.Xu <likexu@harmonycloud.cn> (cherry picked from commit e0577d5) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - man/src/container/create-example.md, man/src/container/create.md, man/src/plugin/ls.md: files were removed
In some cases a mount spec would not be properly backported which could lead to accidental removal of the underlying volume on container remove (which should never happen with named volumes). Adds unit tests for this as well. Unfortunately I had to add a daemon depdency for the backport function due to looking up `VolumesFrom` specs. Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit 3cf1859) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Brian Goff <cpuguy83@gmail.com> (cherry picked from commit d6e1cb7) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 481a92c) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The Solaris version (previously daemon/inspect_solaris.go) was apparently missing some fields that should be available on that platform. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit cfc404a) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - container/container_unix.go (minor, due to commit b7732b0 missed from upstream) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This can be used by readers/queries so they don't need locks. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 054728b) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Replicate relevant mutations to the in-memory ACID store. Readers will then be able to query container state without locking. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit eed4c7b) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/daemon.go: missing moby#33241 - container/container_unix.go: missing moby#32336 - container/container_windows.go, daemon/container_operations.go: minor conflicts Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Container queries are now served from the consistent in-memory db, and don't need to grab a lock on every container being listed. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 8e425eb) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/list.go: missing github.com/moby/pull/33241
Also hide ViewDB behind an inteface. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit aacddda) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/container_operations.go, daemon/daemon.go: missing https://github.com/moby/moby/33241
Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 2ed6f92) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
State will be saved on the following operation once the container is properly registered on the daemon. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 03aa247) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It operates on containers that have already been registered on the daemon, and are visible to other goroutines. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 201a37f) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
it is already being saved (with a lock held) on the subsequent operations. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit f668af4) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reuse existing structures and rely on json serialization to deep copy Container objects. Also consolidate all "save" operations on container.CheckpointTo, which now both saves a serialized json to disk, and replicates state to the ACID in-memory store. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit edad527) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/container_operations.go: missing moby#30117 - daemon/daemon.go: missing moby#32821 - daemon/list.go: missing moby#27557, moby#33241 etc.
3bfb2f7
to
4050df2
Compare
LGTM |
Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 9134e87) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - container/container.go: missing moby#33241, moby#32687 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit a43be34) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/container_operations.go: missing moby#31996
initHealthMonitor and updateHealthMonitor can cause container state to be changed (State.Health). Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 04bd768) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 66b231d) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Migrate legacy volumes (Daemon.verifyVolumesInfo) before containers are registered on the Daemon, so state on disk is not overwritten and legacy fields lost during registration. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 76d9641) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - integration-cli/docker_cli_daemon_test.go: missing moby#29289 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Fix a deadlock caused by re-entrant locks on container objects. Signed-off-by: Fabio Kung <fabio.kung@gmail.com> (cherry picked from commit 37addf0) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Conflicts: - daemon/container_operations.go: missing moby#30117
4050df2
to
6e5a2b9
Compare
Update: found a root cause, updated the description. This PR is still a valid workaround. A solution would be to cancel container creation upon a timeout |
Which, of course, is also a workaround, just more targeted. |
Minor updates to PR description. A docker build with this PR was tested by QE team (they have some smoke tests under their belt), and also live-tested in a production environment (OC CI cluster doing search). The issue appears to be resolved, and no regressions were observed. @mrunalp PTAL |
LGTM |
is it also backported to docker-1.13.1 (centos 7)? |
Problem
Our openshift-ci cluster was experiencing both long delays and lockups on
docker ps
. Investigation shown that, for the lockup case, all the instances ofdaemon.Container()
(the API call producing the list of containers) are waiting for a lock for a particular container which was taken 2+ hours ago and never released. A long delay is also caused by waiting for the same lock, when some other container is performing a long operation (e.g. container creation, which includes selinux relabeling).A typical waiting goroutine backtrace looks like this:
Root cause
Alas, I was unable to find a place in the code which acquired but not released the lock, since there are way too many places like that in the code (and I'm still looking). Still, even if the bug will be found,docker ps
can easily be stuck for a few minutes, since many container operations take this lock for this time.A container being created got stuck on
getdents(2)
in the middle of selinux relabeling. This is most probably a volume on a networked block device (provided by GCE).daemon.(*Daemon).containerStart()
holds a per-container lock for the duration of the container creation.Here's an except from dockerd goroutine dump:
Solution
It was not a great idea in the first place to take per-container locks for such frequently used and critical operation like
docker ps
. Thus here is a backport of upstream PR moby#31273 which removed the lock. It does not resolves the root cause (lock being held forever in case of I/O operation being stuck) but fixes the bad consequences (docker ps being stuck).Backport details
This was not an easy backport. Most of the merge conflicts were caused by old codebase here, but there were some caused by commits in this repo missed from upstream. Every cherry-pick that resulted in conflicts has them documented in the commit message.
The main strategy was to skip porting most of the preceding PRs and resolve the merge conflicts manually.
Skipped PRs
These PRs were looked at carefully but not backported. The list is not complete, the order is random.
publish
andexpose
filter fordocker ps --filter
moby/moby#27557--filter until=<timestamp>
fordocker container/image prune
moby/moby#29226Ported PRs
These were analysed and deemed to be a necessary prerequisites for the PR being backported.
Finally:
docker ps
moby/moby#31273 No container locks ondocker ps