Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CPUManager stored state semantics #84462

Conversation

@klueska
Copy link
Contributor

klueska commented Oct 28, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
This PR updates the information that is stored as part of the CPUManager state. Instead of storing the CPUManager state keyed off of the ContainerID, we now store it keyed off of the combination of PodUIDand ContainerName. This change is necessary moving forward as we move to a model where we assign CPUs to containers at pod admit time, rather than container creation time.

Does this PR introduce a user-facing change?:

The underlying format of the `CPUManager` state file has changed. Upgrades should be seamless, but any third-party tools that rely on reading the previous format need to be updated.
@klueska

This comment has been minimized.

Copy link
Contributor Author

klueska commented Oct 28, 2019

/assign @ConnorDoyle

@klueska klueska changed the title Update CPUmanager state information Update CPUManager state information Oct 28, 2019
@k8s-ci-robot k8s-ci-robot requested review from resouer and sjenning Oct 28, 2019
@klueska klueska force-pushed the klueska:upstream-cpu-manager-update-state-semantics branch 3 times, most recently from 17094b7 to ac82cb0 Oct 28, 2019
@klueska

This comment has been minimized.

Copy link
Contributor Author

klueska commented Oct 28, 2019

/test pull-kubernetes-integration

@klueska

This comment has been minimized.

Copy link
Contributor Author

klueska commented Oct 28, 2019

/test pull-kubernetes-e2e-gce

Copy link
Contributor

mattjmcnaughton left a comment

I've been a little detached from the reviews on the CPUManager internals, so I'll leave technical specifics around this change to someone else :)

But, I do have one question. From reading the PR description, it looked like we plan to merge this diff to master before the subsequent necessary diff you mentioned is approved and ready to merge? If yes, that makes me a little nervous. If for some reason, the subsequent diff takes longer, it'd be unfortunate (I think? perhaps you can share more about the ramifications) to have the first diff in 1.17 but not the second.

@klueska

This comment has been minimized.

Copy link
Contributor Author

klueska commented Oct 29, 2019

@mattjmcnaughton This patch should only be merged together with the patch to support upgrades. I'm working on this right now and will add it as a follow-on commit in this same PR once it's done, and update the description accordingly.

@mattjmcnaughton

This comment has been minimized.

Copy link
Contributor

mattjmcnaughton commented Oct 30, 2019

@mattjmcnaughton This patch should only be merged together with the patch to support upgrades. I'm working on this right now and will add it as a follow-on commit in this same PR once it's done, and update the description accordingly.

Gotcha, thanks @klueska ! Mind adding a [WIP] prefix to your PR title, so its clear to reviewers and doesn't get accidentally merged? Thanks!

@klueska klueska changed the title Update CPUManager state information Update CPUManager stored state semantics Oct 30, 2019
@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Nov 4, 2019

@klueska this appears to take commits from the #84196 that is also in progress. updating the title to WIP pending the core PRs merging first.

@derekwaynecarr derekwaynecarr changed the title Update CPUManager stored state semantics [WIP] Update CPUManager stored state semantics Nov 4, 2019
@klueska klueska force-pushed the klueska:upstream-cpu-manager-update-state-semantics branch from ac82cb0 to 8fabb2a Nov 5, 2019
@klueska

This comment has been minimized.

Copy link
Contributor Author

klueska commented Nov 5, 2019

@ConnorDoyle @derekwaynecarr

This PR has now been rebased on top of master (well technically on top of #84525 which I'm hoping will merge soon).

However, we can't call this PR complete until we have a migration path of the old state format (keyed off of containerID) to the new state format (keyed off of podUID, containerName).

I'm curious what your thoughts are for how to ensure state stored by 1.16 kubelets will be readable by 1.17 kubelets, transformed to the new semantics, and then written back in the new format.

I imagine there must be some a sort of precedent on best practices for this, but my (admittedly limited) search for it hasn't turned up much.

klueska added 12 commits Dec 10, 2019
Previously it keyed off of a pointer to the actual pod / container,
which was unnecessary, and hard to work with (especially on the
retrieval side).
Previously, the state was keyed off of containerID intead of podUID and
containerName. Unfortunately, this is no longer possible as we move to a
to model where we we allocate CPUs to containers at pod adit time rather
than container start time.

This patch is the first step towards full migration to the new
semantics. Only the unit tests in cpumanager/state are passing. In
subsequent commits we will update the CPUManager itself to use these new
semantics.

This patch also includes code to do migration from the old checkpoint format
to the new one, assuming the existence of a ContainerMap with the proper
mapping of (containerID)->(podUID, containerName). A subsequent commit
will update code in higher layers to make sure that this ContainerMap is
made available to this state logic.
For now, we just pass 'nil' as the set of 'initialContainers' for
migrating from old state semantics to new ones. In a subsequent commit
will we pull this information from higher layers so that we can pass it
down at this stage properly.
These information associatedd with these containers is used to migrate
the CPUManager state from it's old format to its new (i.e. keyed off of
podUID and containerName instead of containerID).
@klueska klueska force-pushed the klueska:upstream-cpu-manager-update-state-semantics branch from 35c2299 to f553286 Dec 11, 2019
@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Dec 17, 2019

/assign

i will review this today.

@@ -96,6 +97,10 @@ type manager struct {
// and the containerID of their containers
podStatusProvider status.PodStatusProvider

// containerMap provides a mapping from (pod, container) -> containerID

This comment has been minimized.

Copy link
@derekwaynecarr

derekwaynecarr Dec 17, 2019

Member

nit: its containerID -> (pod, container) per earlier commits

type CPUManagerCheckpointV2 struct {
PolicyName string `json:"policyName"`
DefaultCPUSet string `json:"defaultCpuSet"`
Entries map[string]map[string]string `json:"entries,omitempty"`

This comment has been minimized.

Copy link
@derekwaynecarr

derekwaynecarr Dec 17, 2019

Member

nit: a map to a map can be cumbersome, we can come back to this later with an actual type.

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Dec 17, 2019

the breakdown per commit was really helpful, and the mechanical changes all made sense.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Dec 17, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 17, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a1fc96f into kubernetes:master Dec 17, 2019
16 checks passed
16 checks passed
cla/linuxfoundation klueska authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-kind-ipv6 Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 17, 2019
@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Dec 27, 2019

@klueska i have a suspicion that this PR is making the 1.18 kubelet unstable, resulting in panics.

see this issue for more details:
#86676

klueska added a commit to klueska/kubernetes that referenced this pull request Dec 28, 2019
The updated CPUManager from PR kubernetes#84462 implements logic to migrate the
CPUManager checkpoint file from an old format to a new one. To do so, it
defines the following types:

```
type CPUManagerCheckpoint = CPUManagerCheckpointV2
type CPUManagerCheckpointV1 struct {  ...  }
type CPUManagerCheckpointV2 struct {  ...  }
```

This replaces the old definition of just:

```
type CPUManagerCheckpoint struct {  ...  }
```

Code was put in place to ensure proper migration from checkpoints in V1
format to checkpoints in V2 format. However (and this is a big however),
all of the unit tests were performed on V1 checkpoints that were
generated using the type name `CPUManagerCheckpointV1` and not the
original type name of `CPUManagerCheckpoint`. As such, the checksum in
the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate
its checksum and not the original type name of `CPUManagerCheckpoint`.

This causes problems in the real world since all pre-1.18 checkpoint
files will have been generated with the original type name of
`CPUManagerCheckpoint`. When verifying the checksum of the checkpoint
file across an upgrade to 1.18, the checksum is calculated assuming
a type name of `CPUManagerCheckpointV1` (which is incorrect) and the
file is seen to be corrupt.

This patch ensures that all V1 checksums are verified against a type
name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`.
It also locks the algorithm used to calculate the checksum in place,
since it wil never change in the future (for pre-1.18 checkpoint
files at least).
klueska added a commit to klueska/kubernetes that referenced this pull request Dec 28, 2019
The updated CPUManager from PR kubernetes#84462 implements logic to migrate the
CPUManager checkpoint file from an old format to a new one. To do so, it
defines the following types:

```
type CPUManagerCheckpoint = CPUManagerCheckpointV2
type CPUManagerCheckpointV1 struct {  ...  }
type CPUManagerCheckpointV2 struct {  ...  }
```

This replaces the old definition of just:

```
type CPUManagerCheckpoint struct {  ...  }
```

Code was put in place to ensure proper migration from checkpoints in V1
format to checkpoints in V2 format. However (and this is a big however),
all of the unit tests were performed on V1 checkpoints that were
generated using the type name `CPUManagerCheckpointV1` and not the
original type name of `CPUManagerCheckpoint`. As such, the checksum in
the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate
its checksum and not the original type name of `CPUManagerCheckpoint`.

This causes problems in the real world since all pre-1.18 checkpoint
files will have been generated with the original type name of
`CPUManagerCheckpoint`. When verifying the checksum of the checkpoint
file across an upgrade to 1.18, the checksum is calculated assuming
a type name of `CPUManagerCheckpointV1` (which is incorrect) and the
file is seen to be corrupt.

This patch ensures that all V1 checksums are verified against a type
name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`.
It also locks the algorithm used to calculate the checksum in place,
since it wil never change in the future (for pre-1.18 checkpoint
files at least).
klueska added a commit to klueska/kubernetes that referenced this pull request Dec 28, 2019
The updated CPUManager from PR kubernetes#84462 implements logic to migrate the
CPUManager checkpoint file from an old format to a new one. To do so, it
defines the following types:

```
type CPUManagerCheckpoint = CPUManagerCheckpointV2
type CPUManagerCheckpointV1 struct {  ...  }
type CPUManagerCheckpointV2 struct {  ...  }
```

This replaces the old definition of just:

```
type CPUManagerCheckpoint struct {  ...  }
```

Code was put in place to ensure proper migration from checkpoints in V1
format to checkpoints in V2 format. However (and this is a big however),
all of the unit tests were performed on V1 checkpoints that were
generated using the type name `CPUManagerCheckpointV1` and not the
original type name of `CPUManagerCheckpoint`. As such, the checksum in
the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate
its checksum and not the original type name of `CPUManagerCheckpoint`.

This causes problems in the real world since all pre-1.18 checkpoint
files will have been generated with the original type name of
`CPUManagerCheckpoint`. When verifying the checksum of the checkpoint
file across an upgrade to 1.18, the checksum is calculated assuming
a type name of `CPUManagerCheckpointV1` (which is incorrect) and the
file is seen to be corrupt.

This patch ensures that all V1 checksums are verified against a type
name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`.
It also locks the algorithm used to calculate the checksum in place,
since it wil never change in the future (for pre-1.18 checkpoint
files at least).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.