New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store container names in memdb #33886

Merged
merged 3 commits into from Jul 17, 2017

Conversation

Projects
None yet
7 participants
@aaronlehmann
Contributor

aaronlehmann commented Jun 30, 2017

Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.

We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.

Note that while this change makes the snapshotting atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

See #33863 and #33885

I still need to validate these changes and add test coverage.

cc @fabiokung @cpuguy83

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jun 30, 2017

Contributor

Note that while this change makes the snapshotting atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

Thinking about this a little bit, usually the only name we care about is container.Name. The only case when we have other names are when links are in use, and I don't think there's any point in adding these transactionally. So maybe all we need to do here is automatically reserve container.Name when a container is added to the database. We're already automatically unregistering the name when the container is deleted.

Edit: Actually, there's no point in doing this, because we want to pre-reserve the name before we create the container. Otherwise, we might go through all the work of creating the container, and then find out the name is taken when we try to insert it into the memdb. So I think the PR makes sense as-is, and we just have to accept that some names in the DB might not have corresponding containers yet (see #33883).

Contributor

aaronlehmann commented Jun 30, 2017

Note that while this change makes the snapshotting atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

Thinking about this a little bit, usually the only name we care about is container.Name. The only case when we have other names are when links are in use, and I don't think there's any point in adding these transactionally. So maybe all we need to do here is automatically reserve container.Name when a container is added to the database. We're already automatically unregistering the name when the container is deleted.

Edit: Actually, there's no point in doing this, because we want to pre-reserve the name before we create the container. Otherwise, we might go through all the work of creating the container, and then find out the name is taken when we try to insert it into the memdb. So I think the PR makes sense as-is, and we just have to accept that some names in the DB might not have corresponding containers yet (see #33883).

@LK4D4 LK4D4 requested a review from cpuguy83 Jul 6, 2017

@aaronlehmann aaronlehmann changed the title from [WIP] Store container names in memdb to Store container names in memdb Jul 6, 2017

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jul 6, 2017

Contributor

Rebased and added test coverage. PTAL

Contributor

aaronlehmann commented Jul 6, 2017

Rebased and added test coverage. PTAL

@fabiokung

I like this.

Other than inline comments, the only remark I have is that this adds more lock contention on all container update operations.

Hopefully all name reservation operations should be "quick" and not hold the lock for too long. All I could spot seemed to be O(1), but it may not be a bad idea to put this through some concurrency tests and try to measure lock contention with lots of names reserved, and containers being modified in parallel.

Show outdated Hide outdated container/view.go Outdated
Show outdated Hide outdated container/view.go Outdated
Show outdated Hide outdated container/view.go Outdated
Show outdated Hide outdated container/view.go Outdated
Show outdated Hide outdated container/view.go Outdated
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

I've added a new commit with the requested changes. I have mixed feelings about these changes. I think you are correct that adding these error checks and aborts is technically the right thing to do, but I'm concerned that it makes the code more complex and fragile. Since we can't use defer txn.Commit(), it will be much easier to leak the transaction lock in the future. I'm not sure there are any failure modes in practice that would cause Abort vs Commit to make a difference - the only paths that would allow us to commit a change partially seem to be internal memdb errors that shouldn't happen.

I'm not concerned about the lock contention. I think that's the price that has to be paid for keeping a consistent view of naming. All of these reservation and release operations should be trivial and only hold the lock very briefly.

Contributor

aaronlehmann commented Jul 10, 2017

I've added a new commit with the requested changes. I have mixed feelings about these changes. I think you are correct that adding these error checks and aborts is technically the right thing to do, but I'm concerned that it makes the code more complex and fragile. Since we can't use defer txn.Commit(), it will be much easier to leak the transaction lock in the future. I'm not sure there are any failure modes in practice that would cause Abort vs Commit to make a difference - the only paths that would allow us to commit a change partially seem to be internal memdb errors that shouldn't happen.

I'm not concerned about the lock contention. I think that's the price that has to be paid for keeping a consistent view of naming. All of these reservation and release operations should be trivial and only hold the lock very briefly.

@fabiokung

This comment has been minimized.

Show comment
Hide comment
@fabiokung

fabiokung Jul 10, 2017

Contributor

(snip), but I'm concerned that it makes the code more complex and fragile.

This may be a bit more robust way to abort on errors, which will avoid leaking the transaction lock: https://play.golang.org/p/KGszH6RfPP

The error handling could be extracted into a func commitOrAbort(err error), or something similar, and be used as defer one-liners (defer commitOrAbort(retErr)) as well.

Contributor

fabiokung commented Jul 10, 2017

(snip), but I'm concerned that it makes the code more complex and fragile.

This may be a bit more robust way to abort on errors, which will avoid leaking the transaction lock: https://play.golang.org/p/KGszH6RfPP

The error handling could be extracted into a func commitOrAbort(err error), or something similar, and be used as defer one-liners (defer commitOrAbort(retErr)) as well.

@fabiokung

This comment has been minimized.

Show comment
Hide comment
@fabiokung

fabiokung Jul 10, 2017

Contributor

Alternatively, you can also go the functional route with:

func ...() {
	withTx(func(tx boltdb.Tx) error {
		// ...
	})
}
Contributor

fabiokung commented Jul 10, 2017

Alternatively, you can also go the functional route with:

func ...() {
	withTx(func(tx boltdb.Tx) error {
		// ...
	})
}
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

I've taken the functional approach.

Contributor

aaronlehmann commented Jul 10, 2017

I've taken the functional approach.

Show outdated Hide outdated container/view_test.go Outdated
Show outdated Hide outdated container/view_test.go Outdated
@@ -77,7 +76,7 @@ func (daemon *Daemon) reserveName(id, name string) (string, error) {
}
func (daemon *Daemon) releaseName(name string) {
daemon.nameIndex.Release(name)
daemon.containersReplica.ReleaseName(name)

This comment has been minimized.

@fabiokung

fabiokung Jul 10, 2017

Contributor

worth at least logging the error here?

@fabiokung

fabiokung Jul 10, 2017

Contributor

worth at least logging the error here?

This comment has been minimized.

@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

Not sure if this is used in cases where the name may not be reserved. I didn't want to change the behavior of this part of the code, only the implementation.

@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

Not sure if this is used in cases where the name may not be reserved. I didn't want to change the behavior of this part of the code, only the implementation.

id, err := daemon.nameIndex.Get(name)
if err := daemon.containersReplica.ReserveName(name, id); err != nil {
if err == container.ErrNameReserved {
id, err := daemon.containersReplica.Snapshot().GetID(name)

This comment has been minimized.

@fabiokung

fabiokung Jul 10, 2017

Contributor

minor nit here: there's still a small chance for a race. If the name gets released after the attempt failed, id will be empty and the output will look weird. Not sure it matters much since it is a corner case and only informational, but it may cause confusion if there's enough concurrency on the daemon to trigger this.

The proper fix for this would be to return the conflicting id on Reserve call that failed (from the same memdb transaction).

@fabiokung

fabiokung Jul 10, 2017

Contributor

minor nit here: there's still a small chance for a race. If the name gets released after the attempt failed, id will be empty and the output will look weird. Not sure it matters much since it is a corner case and only informational, but it may cause confusion if there's enough concurrency on the daemon to trigger this.

The proper fix for this would be to return the conflicting id on Reserve call that failed (from the same memdb transaction).

This comment has been minimized.

@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

I agree, this really should be transactional. But I think that since the impact of the race is only missing information in the log message, it's not very important to fix right now.

@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

I agree, this really should be transactional. But I think that since the impact of the race is only missing information in the log message, it's not very important to fix right now.

@@ -55,7 +55,7 @@ func (daemon *Daemon) ContainerRename(oldName, newName string) error {
}
for k, v := range links {
daemon.nameIndex.Reserve(newName+k, v.ID)
daemon.containersReplica.ReserveName(newName+k, v.ID)

This comment has been minimized.

@fabiokung

fabiokung Jul 10, 2017

Contributor

not necessarily related to the changes being made, all these errors being swallowed (here and lines 69, 61 and 74) make me a bit nervous 😬

@fabiokung

fabiokung Jul 10, 2017

Contributor

not necessarily related to the changes being made, all these errors being swallowed (here and lines 69, 61 and 74) make me a bit nervous 😬

@@ -128,7 +128,6 @@ func (daemon *Daemon) cleanupContainer(container *container.Container, forceRemo
return errors.Wrapf(err, "unable to remove filesystem for %s", container.ID)
}
daemon.nameIndex.Delete(container.ID)

This comment has been minimized.

@fabiokung

fabiokung Jul 10, 2017

Contributor

Maybe I missed it, but I didn't see this being replaced by something else.

Missing a if err := daemon.containerReplica.ReleaseName(...); err != nil {...} here?

@fabiokung

fabiokung Jul 10, 2017

Contributor

Maybe I missed it, but I didn't see this being replaced by something else.

Missing a if err := daemon.containerReplica.ReleaseName(...); err != nil {...} here?

This comment has been minimized.

@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

The name is released by daemon.containersReplica.Delete(container) below.

@aaronlehmann

aaronlehmann Jul 10, 2017

Contributor

The name is released by daemon.containersReplica.Delete(container) below.

@fabiokung

This comment has been minimized.

Show comment
Hide comment
@fabiokung

fabiokung Jul 11, 2017

Contributor

LGTM

Contributor

fabiokung commented Jul 11, 2017

LGTM

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 13, 2017

Contributor

SGTM

Needs a rebase.

Contributor

cpuguy83 commented Jul 13, 2017

SGTM

Needs a rebase.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 13, 2017

Contributor

What do you think about also doing the same treatment for ID's, currently stored in a separate patricia-trie for prefix matching (as a separate PR).

Contributor

cpuguy83 commented Jul 13, 2017

What do you think about also doing the same treatment for ID's, currently stored in a separate patricia-trie for prefix matching (as a separate PR).

aaronlehmann added some commits Jun 30, 2017

Store container names in memdb
Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.

We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.

Note that while this change makes the *snapshotting* atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
container: Abort transactions when memdb calls fail
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
container: Use wrapper to ensure commit/abort happens
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jul 13, 2017

Contributor

Rebased.

What do you think about also doing the same treatment for ID's, currently stored in a separate patricia-trie for prefix matching (as a separate PR).

This would be an excellent way to support ID lookups. memdb uses a prefix tree internally and supports prefix matching for lookups. The IDs are already stored in the container objects, so there would be no ned to store anything else. We could just remove other redundant code.

Contributor

aaronlehmann commented Jul 13, 2017

Rebased.

What do you think about also doing the same treatment for ID's, currently stored in a separate patricia-trie for prefix matching (as a separate PR).

This would be an excellent way to support ID lookups. memdb uses a prefix tree internally and supports prefix matching for lookups. The IDs are already stored in the container objects, so there would be no ned to store anything else. We could just remove other redundant code.

@cpuguy83

LGTM

@ehazlett

This comment has been minimized.

Show comment
Hide comment
@ehazlett

ehazlett Jul 17, 2017

Contributor

LGTM

Contributor

ehazlett commented Jul 17, 2017

LGTM

@ehazlett ehazlett merged commit 458f671 into moby:master Jul 17, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 35662 has succeeded
Details
janky Jenkins build Docker-PRs 44278 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 4655 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 15656 has succeeded
Details
z Jenkins build Docker-PRs-s390x 4354 has succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment