-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store container names in memdb #33886
Conversation
Edit: Actually, there's no point in doing this, because we want to pre-reserve the name before we create the container. Otherwise, we might go through all the work of creating the container, and then find out the name is taken when we try to insert it into the memdb. So I think the PR makes sense as-is, and we just have to accept that some names in the DB might not have corresponding containers yet (see #33883). |
c76f9e6
to
864b6d2
Compare
Rebased and added test coverage. PTAL |
864b6d2
to
b5b7d7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this.
Other than inline comments, the only remark I have is that this adds more lock contention on all container update operations.
Hopefully all name reservation operations should be "quick" and not hold the lock for too long. All I could spot seemed to be O(1)
, but it may not be a bad idea to put this through some concurrency tests and try to measure lock contention with lots of names reserved, and containers being modified in parallel.
container/view.go
Outdated
return nil | ||
} | ||
|
||
txn.Insert(memdbNamesTable, nameAssociation{name: name, containerID: containerID}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why error
is being swallowed here?
container/view.go
Outdated
// A name reservation is globally unique | ||
func (db *memDB) ReserveName(name, containerID string) error { | ||
txn := db.store.Txn(true) | ||
defer txn.Commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to check the returned error and txn.Abort()
if necessary on errors?
container/view.go
Outdated
@@ -106,25 +144,75 @@ func (db *memDB) Snapshot(index *registrar.Registrar) View { | |||
func (db *memDB) Save(c *Container) error { | |||
txn := db.store.Txn(true) | |||
defer txn.Commit() | |||
return txn.Insert(memdbTable, c) | |||
return txn.Insert(memdbContainersTable, c) | |||
} | |||
|
|||
// Delete removes an item by ID | |||
func (db *memDB) Delete(c *Container) error { | |||
txn := db.store.Txn(true) | |||
defer txn.Commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should abort on errors here too.
container/view.go
Outdated
func (db *memDB) ReleaseName(name string) { | ||
txn := db.store.Txn(true) | ||
txn.Delete(memdbNamesTable, nameAssociation{name: name}) | ||
txn.Commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check all errors and txn.Abort()
as necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need to change the interface here to return an error
too
container/view.go
Outdated
} | ||
|
||
// Delete removes an item by ID | ||
func (db *memDB) Delete(c *Container) error { | ||
txn := db.store.Txn(true) | ||
defer txn.Commit() | ||
return txn.Delete(memdbTable, NewBaseContainer(c.ID, c.Root)) | ||
|
||
// Delete any names referencing this container's ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can lines below (156-168) reuse func getNames
?
I've added a new commit with the requested changes. I have mixed feelings about these changes. I think you are correct that adding these error checks and aborts is technically the right thing to do, but I'm concerned that it makes the code more complex and fragile. Since we can't use I'm not concerned about the lock contention. I think that's the price that has to be paid for keeping a consistent view of naming. All of these reservation and release operations should be trivial and only hold the lock very briefly. |
This may be a bit more robust way to abort on errors, which will avoid leaking the transaction lock: https://play.golang.org/p/KGszH6RfPP The error handling could be extracted into a |
Alternatively, you can also go the functional route with:
|
I've taken the functional approach. |
container/view_test.go
Outdated
assert.EqualError(t, db.ReserveName("name2", "containerid3"), ErrNameReserved.Error()) | ||
|
||
// Releasing a name allows the name to point to something else later. | ||
db.ReleaseName("name2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap in assert.NoError()
container/view_test.go
Outdated
assert.EqualError(t, err, ErrNameNotReserved.Error()) | ||
|
||
// Releasing and re-reserving a name doesn't affect the snapshot. | ||
db.ReleaseName("name2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.NoError
@@ -77,7 +76,7 @@ func (daemon *Daemon) reserveName(id, name string) (string, error) { | |||
} | |||
|
|||
func (daemon *Daemon) releaseName(name string) { | |||
daemon.nameIndex.Release(name) | |||
daemon.containersReplica.ReleaseName(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth at least logging the error
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is used in cases where the name may not be reserved. I didn't want to change the behavior of this part of the code, only the implementation.
id, err := daemon.nameIndex.Get(name) | ||
if err := daemon.containersReplica.ReserveName(name, id); err != nil { | ||
if err == container.ErrNameReserved { | ||
id, err := daemon.containersReplica.Snapshot().GetID(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit here: there's still a small chance for a race. If the name gets released after the attempt failed, id
will be empty and the output will look weird. Not sure it matters much since it is a corner case and only informational, but it may cause confusion if there's enough concurrency on the daemon to trigger this.
The proper fix for this would be to return the conflicting id on Reserve
call that failed (from the same memdb transaction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this really should be transactional. But I think that since the impact of the race is only missing information in the log message, it's not very important to fix right now.
@@ -55,7 +55,7 @@ func (daemon *Daemon) ContainerRename(oldName, newName string) error { | |||
} | |||
|
|||
for k, v := range links { | |||
daemon.nameIndex.Reserve(newName+k, v.ID) | |||
daemon.containersReplica.ReserveName(newName+k, v.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily related to the changes being made, all these errors being swallowed (here and lines 69, 61 and 74) make me a bit nervous 😬
@@ -128,7 +128,6 @@ func (daemon *Daemon) cleanupContainer(container *container.Container, forceRemo | |||
return errors.Wrapf(err, "unable to remove filesystem for %s", container.ID) | |||
} | |||
|
|||
daemon.nameIndex.Delete(container.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed it, but I didn't see this being replaced by something else.
Missing a if err := daemon.containerReplica.ReleaseName(...); err != nil {...}
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is released by daemon.containersReplica.Delete(container)
below.
30a2f21
to
4ec8864
Compare
LGTM |
SGTM Needs a rebase. |
What do you think about also doing the same treatment for ID's, currently stored in a separate patricia-trie for prefix matching (as a separate PR). |
Currently, names are maintained by a separate system called "registrar". This means there is no way to atomically snapshot the state of containers and the names associated with them. We can add this atomicity and simplify the code by storing name associations in the memdb. This removes the need for pkg/registrar, and makes snapshots a lot less expensive because they no longer need to copy all the names. This change also avoids some problematic behavior from pkg/registrar where it returns slices which may be modified later on. Note that while this change makes the *snapshotting* atomic, it doesn't yet do anything to make sure containers are named at the same time that they are added to the database. We can do that by adding a transactional interface, either as a followup, or as part of this PR. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Rebased.
This would be an excellent way to support ID lookups. memdb uses a prefix tree internally and supports prefix matching for lookups. The IDs are already stored in the container objects, so there would be no ned to store anything else. We could just remove other redundant code. |
4ec8864
to
0e57eb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.
We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.
Note that while this change makes the snapshotting atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.
See #33863 and #33885
I still need to validate these changes and add test coverage.
cc @fabiokung @cpuguy83