Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple CreateContainer operations at the same time. #1355

Merged
merged 7 commits into from
Apr 22, 2022

Conversation

anmaxvl
Copy link
Contributor

@anmaxvl anmaxvl commented Apr 15, 2022

Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:

  • Add a new "Status" field to Container object, which can be either
    "Creating" or "Created".
  • Remove locking from CreateContainer function
  • Rework "GetContainer" to "GetCreatedContainer", which returns
    the container object only when it's in "Created" state, otherwise
    either gcserr.HrVmcomputeSystemNotFound or gcserr.HrVmcomputeInvalidState
    error returned.
  • Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  • Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Created" only when the container
    was successfully created.

Reworking "GetContainer" to "GetCreatedContainer" seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If "GetContainer" is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which hasn't been created yet.

Signed-off-by: Maksim An maksiman@microsoft.com

@anmaxvl anmaxvl force-pushed the gcs-rework-locks branch 3 times, most recently from 9bff60f to 2fba6ce Compare April 15, 2022 09:47
@anmaxvl anmaxvl marked this pull request as ready for review April 15, 2022 17:15
@anmaxvl anmaxvl requested a review from a team as a code owner April 15, 2022 17:15
@anmaxvl
Copy link
Contributor Author

anmaxvl commented Apr 18, 2022

@svolos @SeanTAllen @KenGordon FYI

@SeanTAllen
Copy link
Contributor

Maksim and I discussed, I pointed out a race condition in this version. Maksim will be changing slightly to address.

@anmaxvl anmaxvl marked this pull request as draft April 18, 2022 19:34
@anmaxvl anmaxvl marked this pull request as ready for review April 18, 2022 20:10
internal/guest/runtime/hcsv2/container.go Outdated Show resolved Hide resolved
internal/guest/bridge/bridge_v2.go Outdated Show resolved Hide resolved
@SeanTAllen
Copy link
Contributor

This addresses the issue that @svolos discovered, LGTM.

@@ -318,7 +331,7 @@ func (h *Host) CreateContainer(ctx context.Context, id string, settings *prot.VM
}
}

h.containers[id] = c
c.status = containerCreated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to do this under lock as well, otherwise we could be changing c.status just as another thread is reading it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container level lock I assume?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be the map lock, as long as we ensure only one accessor. You could also look at using an atomic here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently that isn't possible with the existing implementation but it was a concern that i raised with @anmaxvl as something to look at for the future if not wasn't addressed now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to atomic update. had to switch to uint32 without custom type definition.

@@ -318,7 +332,7 @@ func (h *Host) CreateContainer(ctx context.Context, id string, settings *prot.VM
}
}

h.containers[id] = c
atomic.StoreUint32(&c.status, containerCreated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this is a good approach. You need an atomic for any reads as well.

Given the lack of usage of atomics in the codebase, I can see this being incorrectly done (as it is currently in this PR). I think coming up with a more robust approach would be better. I'm not a Go expert so I'm not sure of the best pattern, but making all access atomic (store and load) without relying on authors of later changes to get it correct would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go-winio has a custom atomic bool that basically wraps these operations and I have been looking for an excuse to export it for a while.
since status is currently creating/created, a bool would suffice for now, i think...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid go-winio route will take too long. I'll just add getter/setter.

Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Reworking "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If "GetContainer" is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
Signed-off-by: Maksim An <maksiman@microsoft.com>
Signed-off-by: Maksim An <maksiman@microsoft.com>
Additionally update synchronization CRI tests to use go routines
to properly reproduce the scenario.

Signed-off-by: Maksim An <maksiman@microsoft.com>
Signed-off-by: Maksim An <maksiman@microsoft.com>
Signed-off-by: Maksim An <maksiman@microsoft.com>
Copy link
Contributor

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Agree on Kevin's comment on removing the switch though. There's some safety on what we pass due to the type

Signed-off-by: Maksim An <maksiman@microsoft.com>
Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anmaxvl anmaxvl merged commit 4a33ed5 into microsoft:master Apr 22, 2022
@anmaxvl anmaxvl deleted the gcs-rework-locks branch April 22, 2022 22:04
anmaxvl added a commit that referenced this pull request Feb 7, 2023
Sync ADO with upstream to enable including test GCS binaries as
part of dev-pipeline

Related work items: #1311, #1322, #1341, #1343, #1345, #1347, #1348, #1350, #1353, #1354, #1355, #1358, #1361, #1365, #1368, #1369, #1370
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
…t#1355)

Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a `CreateRuntime` OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because `CreateContainer` request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "status" field to Container object and atomic setter/getter,
    which can be either "Created" or "Creating". New `uint32` type alias
    and constants were added to represent the values (`containerCreated`
    and `containerCreating`)
  - Remove locking from `CreateContainer` function
  - Rework `GetContainer` to `GetCreatedContainer`, which returns
    the container object only when it's in `containerCreated` state,
    otherwise either `gcserr.HrVmcomputeSystemNotFound` or
    `gcserr.HrVmcomputeInvalidState` error returned.
  - Add new `AddContainer(id, container)` function, which updates the
    containers map with new container instances.
  - Rework `CreateContainer` to initially add new container objects into
    the containers map and set the "status" to `containerCreating` at the
    start of the function and set it to `containerCreated` only when the
    container is successfully created in runtime.

Reworking `GetContainer` to `GetCreatedContainer` seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If `GetContainer` is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Additionally update synchronization CRI tests to use go routines
to properly reproduce the scenario.

Signed-off-by: Maksim An <maksiman@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants