New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No container locks on docker ps
#31273
Conversation
23a8be6
to
6da800d
Compare
Ping @cpuguy83 @tonistiigi 🎉 |
also related to #28754 |
Thanks for working on this @fabiokung ! |
container/snapshot.go
Outdated
StartedAt: container.StartedAt, | ||
Running: container.Running, | ||
Paused: container.Paused, | ||
Restarted: container.Paused, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restarted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, good catch
6da800d
to
5d5ff2f
Compare
I like the idea. This looks way simpler than the original iteration that involved deep copies. It looks like |
Good point on moving the replication to |
I'm bit torn about this. I stand by my earlier comment that we shouldn't be afraid of locks, just the wrong implementation of them. This problem doesn't only show in performance, most places we call |
container/snapshot.go
Outdated
} | ||
|
||
// Snapshot provides a read only view of a Container. Callers must hold a Lock on the container object. | ||
func (container *Container) Snapshot() *Snapshot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively the same thing as the deep copy, and has to be maintained in the same way.
It ends up being pretty fragile as any change even to any of the underlying types could wind up messing things up... e.g. adding a new reference type to a struct which is not properly copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is similar, but not necessarily the same. It's more of a mapping to what queries need to see, rather than only a (deep) copy. The rest of the code already has precedence for similar mapping to different views/strucs being done at different layers.
It is fragile in the same sense that public API handlers can break when new fields are added and not properly mapped to strucs being serialized for responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of defining a new type do you think we could use the API container type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with that, but it had a pointer to *NetworkSettings
and was missing a few fields that daemon/list.go
requires. Since types.Container
is a public API, I didn't want break API compatibility or mess with its JSON serialization.
It's probably possible, but a bit riskier I think.
@tonistiigi fair point. This doesn't block going back to locking for reads when/if you are comfortable enough that all concurrency and locking is being properly handled across the codebase. It would be a bit of throwaway work, but at least it fixes the problem we are having today in the short term, until more long term solutions (what you propose with proper synchronization between lifecycle states, or "lock-free" transactional mutations everywhere, ...). I'm not proposing we lock down on a decision on how concurrency will be handled long term, but this is a big pain right now and I don't feel we can wait much for a big re-architecture to happen. |
@aaronlehmann I moved the checkpointing/replication operation to I liked it more, since now it is clearer that checkpointing should probably be done when saving state to disk. It also allowed |
Thanks, I think that's an improvement. |
af35a93
to
8bf082e
Compare
This needs a rebase, but it looks like some of the requested changes were made, so could use another review as well |
8bf082e
to
882fbe3
Compare
Rebased. |
container/snapshot.go
Outdated
"github.com/docker/go-connections/nat" | ||
) | ||
|
||
// Snapshot is a read only view for Containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comment about how to maintain this structure?
e.g. Why HostConfig.{NetworkMode, Isolation}
is needed but others not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder we can embed api/types.Container
for deduplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment about the structure.
Re: using types.Container
, here's a brief discussion about it we had on a commit that was previously on this PR: #31273 (comment)
I wonder if instead of having this intermediate object, we can store a duplicate of the container object like so: (warning, pseudo-code follows) func(c *Container) ToDisk() {
// normal stuff
snapshot := &Container{ID: c.ID}
snapshot.FromDisk()
c.snapshotsStore.Add(snapshot)
} |
@cpuguy83 isn't it what we discussed here: #31273 (comment) ? Or is it something else I'm missing? |
@fabiokung Not the same, this would be the actual |
The dump shows in the daemon logs, not on the command-iine. I'm not sure docker 1.6.2 had this already; possibly it needs |
That debugging functionality was added in docker 1.7 #10786 Anyway, this is not really the best location for this discussion - feel free to message me on Slack if you need assistance |
I googled docker slack channel before, but no help info, what is the link of docker slack channel? @thaJeztah |
Here's more info about the slack channel; https://blog.docker.com/2016/11/introducing-docker-community-directory-docker-community-slack/, register here for the docker community, and to get an invitation for the Slack channel: http://dockr.ly/community |
Thanks @thaJeztah , as I cannot be approved to the community, I want to ask you the last question here, hope it is OK. ;-) Regarding the |
Got the answer myself, we can turn on the |
One question for the issue, before your fix, why does Another issue I want to mention is that when such issue happens, not only |
@gyliu513 it uses locks to prevent partial reads and data corruption, since container data is being concurrently modified. Even in read-only operations, locking is how is ensures the state being read will be consistent. This fix only applies to |
@fabiokung this fix is available in what docker versions ? |
@krnayankk this will be included in docker 17.07 and up |
@thaJeztah How about docker-ee? Will we have it in 17.06-ee3 or ee4? Or should we wait until 17.09? |
@thiagoalves It will not be in EE until the next major release. I do not see this one getting backported. |
@thiagoalves btw, if you have a specific case where you are seeing a deadlock, please report so it can be fixed. This patch is just making the deadlock less apparent, not actually fixing deadlocks. |
17.07 sounds great, when can we expect to see it? I'm only seeing 17.05 with the last commit from May of this year. and oldschool docker 1.13.1 from Feb of this year. How long do critical fixes affecting an entire ecosystem of projects typically bake before a release is cut? |
@robertglen 17.07 was released in July.... |
@robertglen You probably need to update your apt/yum repositories from dockerproject.org to download.docker.com. https://docs.docker.com/engine/installation/ |
heh, OK I just went to the code portion of this repo and looked at branches and tags and it stopped at 17.05 and had completely dismissed visiting docker's website :\ my bad. |
The Moby project doesn't make releases of this repo, rather Docker Inc makes releases of the Docker Community Edition software (which includes code from this repo). Those releases are tagged in the docker-ce repository. |
- What I did
Fixes #28183
We are seeing a fair amount of lockups on
docker ps
. More details here.While things seem to get better with every new version of Docker, there may always be a legitimate reason to hold container locks during some not-very-quick operations. Container queries (
docker ps
) currently try to grab a lock on every single container being inspected. The risk for lockups is big.I started by trying to pick up @cpuguy83's work on #30225 (the memdb branch). Completely moving the container in-memory store to a consistent ACID DB is a noble cause, but I quickly realized it would be a huge undertaking (references that would need to be deep copied, structs that would need to be broken apart, etc.).
This is a more incremental step towards that goal. Containers keep being stored in the existing in-memory store, and mutations still grab a
Lock()
to avoid races. We then keep a consistent view of all containers rendered during all of these mutations, so readers (queries) do not need toLock()
anything.Queries and
docker ps
are now very cheap. There are now virtually no chances of them getting stuck in a lockup, at the expense of some more lock contention during some mutation operations, because the currentMemDB
implementation (using hashicorp/go-memdb) does a table-levelLock()
during write transactions.These write locks are very short and hopefully won't be a problem (
memdb.Save()
). If they are, in the future another in-memory ACID implementation that supports row level locking could be investigated. Or replications could be done asynchronously (and optimistically), reducing lock contentions but causing the read-only view to be eventually consistent.There is also admittedly some risk of missing parts of the code that are mutating containers and not replicating state when they should. However, we already replicate container state to disk (
container.ToDisk()
), then any of these cases should be treated as bugs and covered by tests.- How I did it
All data that is necessary to serve queries (
docker ps
) is snapshotted during operations that mutate that data. Typically these mutations already hold a lock on the container object they are mutating. All places in the code callingcontainer.ToDisk()
andcontainer.ToDiskLocking()
are good candidates to also replicate state to the in-memory rendered consistent view.Queries use a read-only transaction on the replicated in-memory DB, and don't need to grab locks on each individual container being inspected.
In the future, more read operations can be served from the in-memory ACID store, e.g.:
docker inspect
(shouldn't be too hard, but will require deep copying some pointers currently being held by the container duringSnapshot()
)container.RWLayer
ExecConfigs
This way we can incrementally move things as needed/wanted and even potentially one day completely eliminating container locks (#30225). One day
container.ToDisk()
could also be replaced by checkpointing the in-memory DB to disk.I explicitly left out some code paths that mutate parts of the container that queries don't currently care about:
container.RemovalInProgress
container.BaseFS
(all calls tomount.Mount()
may mutate the container:daemon.ContainerArchivePath
,daemon.ContainerCopy
,daemon.ContainerExtractToDir
, daemon.ContainerStatPath
, ... )container.HasBeenManuallyStopped
container.HostsPath
,container.ResolvConfPath
: being mutated by some networking code paths- How to verify it
All related
docker ps
andGetContainerApi
cli-integration tests are passing. This should be treated as an internal refactoring and no functionality is being added or changed.- Description for the changelog
lock-free docker ps, reducing the chances of daemon lockups during queries
- A picture of a cute animal (not mandatory but encouraged)
My dog, Jilly: