Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store container names in memdb #33886

Merged
merged 3 commits into from
Jul 17, 2017
Merged

Conversation

aaronlehmann
Copy link
Contributor

Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.

We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.

Note that while this change makes the snapshotting atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

See #33863 and #33885

I still need to validate these changes and add test coverage.

cc @fabiokung @cpuguy83

@aaronlehmann
Copy link
Contributor Author

aaronlehmann commented Jun 30, 2017

Note that while this change makes the snapshotting atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

Thinking about this a little bit, usually the only name we care about is container.Name. The only case when we have other names are when links are in use, and I don't think there's any point in adding these transactionally. So maybe all we need to do here is automatically reserve container.Name when a container is added to the database. We're already automatically unregistering the name when the container is deleted.

Edit: Actually, there's no point in doing this, because we want to pre-reserve the name before we create the container. Otherwise, we might go through all the work of creating the container, and then find out the name is taken when we try to insert it into the memdb. So I think the PR makes sense as-is, and we just have to accept that some names in the DB might not have corresponding containers yet (see #33883).

@LK4D4 LK4D4 requested a review from cpuguy83 July 6, 2017 16:36
@aaronlehmann aaronlehmann changed the title [WIP] Store container names in memdb Store container names in memdb Jul 6, 2017
@aaronlehmann
Copy link
Contributor Author

Rebased and added test coverage. PTAL

Copy link
Contributor

@fabiokung fabiokung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

Other than inline comments, the only remark I have is that this adds more lock contention on all container update operations.

Hopefully all name reservation operations should be "quick" and not hold the lock for too long. All I could spot seemed to be O(1), but it may not be a bad idea to put this through some concurrency tests and try to measure lock contention with lots of names reserved, and containers being modified in parallel.

return nil
}

txn.Insert(memdbNamesTable, nameAssociation{name: name, containerID: containerID})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why error is being swallowed here?

// A name reservation is globally unique
func (db *memDB) ReserveName(name, containerID string) error {
txn := db.store.Txn(true)
defer txn.Commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to check the returned error and txn.Abort() if necessary on errors?

@@ -106,25 +144,75 @@ func (db *memDB) Snapshot(index *registrar.Registrar) View {
func (db *memDB) Save(c *Container) error {
txn := db.store.Txn(true)
defer txn.Commit()
return txn.Insert(memdbTable, c)
return txn.Insert(memdbContainersTable, c)
}

// Delete removes an item by ID
func (db *memDB) Delete(c *Container) error {
txn := db.store.Txn(true)
defer txn.Commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should abort on errors here too.

func (db *memDB) ReleaseName(name string) {
txn := db.store.Txn(true)
txn.Delete(memdbNamesTable, nameAssociation{name: name})
txn.Commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check all errors and txn.Abort() as necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need to change the interface here to return an error too

}

// Delete removes an item by ID
func (db *memDB) Delete(c *Container) error {
txn := db.store.Txn(true)
defer txn.Commit()
return txn.Delete(memdbTable, NewBaseContainer(c.ID, c.Root))

// Delete any names referencing this container's ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can lines below (156-168) reuse func getNames?

@aaronlehmann
Copy link
Contributor Author

I've added a new commit with the requested changes. I have mixed feelings about these changes. I think you are correct that adding these error checks and aborts is technically the right thing to do, but I'm concerned that it makes the code more complex and fragile. Since we can't use defer txn.Commit(), it will be much easier to leak the transaction lock in the future. I'm not sure there are any failure modes in practice that would cause Abort vs Commit to make a difference - the only paths that would allow us to commit a change partially seem to be internal memdb errors that shouldn't happen.

I'm not concerned about the lock contention. I think that's the price that has to be paid for keeping a consistent view of naming. All of these reservation and release operations should be trivial and only hold the lock very briefly.

@fabiokung
Copy link
Contributor

fabiokung commented Jul 10, 2017

(snip), but I'm concerned that it makes the code more complex and fragile.

This may be a bit more robust way to abort on errors, which will avoid leaking the transaction lock: https://play.golang.org/p/KGszH6RfPP

The error handling could be extracted into a func commitOrAbort(err error), or something similar, and be used as defer one-liners (defer commitOrAbort(retErr)) as well.

@fabiokung
Copy link
Contributor

Alternatively, you can also go the functional route with:

func ...() {
	withTx(func(tx boltdb.Tx) error {
		// ...
	})
}

@aaronlehmann
Copy link
Contributor Author

I've taken the functional approach.

assert.EqualError(t, db.ReserveName("name2", "containerid3"), ErrNameReserved.Error())

// Releasing a name allows the name to point to something else later.
db.ReleaseName("name2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap in assert.NoError()

assert.EqualError(t, err, ErrNameNotReserved.Error())

// Releasing and re-reserving a name doesn't affect the snapshot.
db.ReleaseName("name2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.NoError

@@ -77,7 +76,7 @@ func (daemon *Daemon) reserveName(id, name string) (string, error) {
}

func (daemon *Daemon) releaseName(name string) {
daemon.nameIndex.Release(name)
daemon.containersReplica.ReleaseName(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth at least logging the error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is used in cases where the name may not be reserved. I didn't want to change the behavior of this part of the code, only the implementation.

id, err := daemon.nameIndex.Get(name)
if err := daemon.containersReplica.ReserveName(name, id); err != nil {
if err == container.ErrNameReserved {
id, err := daemon.containersReplica.Snapshot().GetID(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit here: there's still a small chance for a race. If the name gets released after the attempt failed, id will be empty and the output will look weird. Not sure it matters much since it is a corner case and only informational, but it may cause confusion if there's enough concurrency on the daemon to trigger this.

The proper fix for this would be to return the conflicting id on Reserve call that failed (from the same memdb transaction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this really should be transactional. But I think that since the impact of the race is only missing information in the log message, it's not very important to fix right now.

@@ -55,7 +55,7 @@ func (daemon *Daemon) ContainerRename(oldName, newName string) error {
}

for k, v := range links {
daemon.nameIndex.Reserve(newName+k, v.ID)
daemon.containersReplica.ReserveName(newName+k, v.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily related to the changes being made, all these errors being swallowed (here and lines 69, 61 and 74) make me a bit nervous 😬

@@ -128,7 +128,6 @@ func (daemon *Daemon) cleanupContainer(container *container.Container, forceRemo
return errors.Wrapf(err, "unable to remove filesystem for %s", container.ID)
}

daemon.nameIndex.Delete(container.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it, but I didn't see this being replaced by something else.

Missing a if err := daemon.containerReplica.ReleaseName(...); err != nil {...} here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is released by daemon.containersReplica.Delete(container) below.

@fabiokung
Copy link
Contributor

LGTM

@cpuguy83
Copy link
Member

SGTM

Needs a rebase.

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 13, 2017

What do you think about also doing the same treatment for ID's, currently stored in a separate patricia-trie for prefix matching (as a separate PR).

Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.

We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.

Note that while this change makes the *snapshotting* atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann
Copy link
Contributor Author

Rebased.

What do you think about also doing the same treatment for ID's, currently stored in a separate patricia-trie for prefix matching (as a separate PR).

This would be an excellent way to support ID lookups. memdb uses a prefix tree internally and supports prefix matching for lookups. The IDs are already stored in the container objects, so there would be no ned to store anything else. We could just remove other redundant code.

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ehazlett
Copy link
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants