New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update logic in CPUManager
reconcileState()
#84300
Update logic in CPUManager
reconcileState()
#84300
Conversation
CPUManager
reconcileState()
89cc0ec
to
158ba82
Compare
/assign @ConnorDoyle |
CPUManager
reconcileState()
CPUManager
reconcileState()
/hold Will revisit once #84462 is merged. |
158ba82
to
94a0cb6
Compare
94a0cb6
to
0c78497
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
0c78497
to
7be9b0f
Compare
CPUManager
reconcileState()
CPUManager
reconcileState()
klog.V(4).Infof("[cpumanager] reconcileState: container is not present in state - trying to add (pod: %s, container: %s, container id: %s)", pod.Name, container.Name, containerID) | ||
err := m.AddContainer(pod, &container, containerID) | ||
if cstatus.State.Waiting != nil || | ||
(cstatus.State.Waiting == nil && cstatus.State.Running == nil && cstatus.State.Terminated == nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious here and making sure it's not an oversight.. If all three states (waiting, running and terminated) are all nil, this also means that the correct state is in fact 'waiting'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. From the comment underneath https://godoc.org/k8s.io/api/core/v1#ContainerState:
ContainerState holds a possible state of container. Only one of its members may be specified. If none of them is specified, the default one is ContainerStateWaiting.
/lgtm |
/unhold |
/test pull-kubernetes-node-kubelet-serial-cpu-manager |
/retest |
/test pull-kubernetes-node-kubelet-serial-cpu-manager |
@klueska after this pr, if the container has been assigned a dedicated cpuset, and it restart inside the pod. It would become default cpuset after restart instead of the dedicated cpuset set before. |
Yes. That was an unfortunate oversight. Please see the following for the (long) discussion and PR to fix it: https://kubernetes.slack.com/archives/C0BP8PW9G/p1587155932390500 |
thx @klueska , let me the pr you mentioned. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR cleans up the logic for
reconcileState()
in theCPUManager
.As background.....
The
CPUManager
maintains state about theCPUSet
that should be associated with any given container on a node. For containers that require a dedicated set of CPUs, theCPUManager
tracks a mapping of(containerID)->(CPUSet)
, where thisCPUSet
contains the list of dedicated CPUs that theCPUManager
has granted to the container. For containers that do not require a dedicated set of CPUs, no explicit state is maintained by theCPUManager
about these containers. Instead, aDefaultCPUSet
is maintained, which contains the set of all CPUs that are not part of any(containerID)->(CPUSet)
mapping.Once this state has been established, the actual
CPUSet
of a container can be updated via anUpdate(CPUSet)
call on the specificContainerRuntime
in use. As each container is added to the system, a singleContainerRuntime.Udate(CPUSet)
call is made with the appropriateCPUSet
for the container.As you can imagine, however, as containers come and go in the system the
DefaultCPUSet
can change quite frequently. Whenever it does, the containers at the mercy of theDefaultCPUSet
need to have theirCPUSet
's updated via a new call toContainerRuntime.Udate(DefaultCPUSet)
. However, at present, there is no easy path to synchronously update these containers whenever theDefaultCPUSet
changes.Instead, a function called
reconcileState()
is run in an asynchronous loop everyreconcilePeriod
seconds in order to accomplish this task. It looks at the set of all active containers on the node, and callsContainerRuntime.Update(CPUSet)
with the appropriateCPUSet
for it. While this doesn't guarantee that all containers have the correctCPUSet
associated with them at all times, it does guarantee that all containers converge to the correct value everyreconcilePeriod
seconds.Unfortunately, the logic inside
reconcileState()
has become convoluted over time. While the basic idea behindreconcileState()
is fairly straightforward, edge cases were found that caused the basic flow to diverge from its original intended purpose.For example, there is currently a path inside
reconcileState()
that makes a call out toAddContainer()
if an active container is found that has noCPUSet
associated with it. Presumably, this was added to cover the case wherereconcileState()
began to execute asynchronous to the container in question actually being created. SinceAddContainer()
was designed to be idempotent, whoever got to the call first (eitherreconcileState()
or the container creation path itself) would do theAddContainer()
and everything could continue forward as expectedAs we know, however, any containers that don't have dedicated CPUs, also don't have a
CPUSet
associated with them. This means that thisAddContainer()
call is erroneously being called on all containers that don't have any dedicated CPUs associated with them. This is OK because of the idempotency of theAddContainer()
call, but it convolutes the logic significantly.One of the reasons that this
AddContainer()
call is necessary insidereconcileState()
is because there is currently no way "skip" a container that is not ready for further processing yet. All logic to decide if a container should be known by theCPUManager
is gleaned from looking at the existence of the container in thePodStatus
(regardless of that container's specific state).This PR attempts to clean this up and make the logic inside
reconcileState()
a bit more sane. It does this through a combination of thecontainerMap
introduced in #84196 and moving to a model that looks at the specific state of a given container inside thePodStatus
rather than just looking for the existence of the container in thePodStatus
.The
containerMap
makes it so we know for sure whether the container has completed anAddContainer()
call and should have its state reconciled. We should simply skip it if it has not.Using the state of the container let's us decide what to do, depending on whether it is currently
waiting
,running
, orterminated
. Whenwaiting
we skip with a warning. Whenterminated
we skip without warning and remove it so that it is never attempted again in the future. Only whenrunning
do we continue on to attempt a reconciliation.Does this PR introduce a user-facing change?: