-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no processing for successfully exited pods #765
Conversation
169a3c5
to
a4df231
Compare
Codecov ReportBase: 68.40% // Head: 68.45% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #765 +/- ##
==========================================
+ Coverage 68.40% 68.45% +0.05%
==========================================
Files 208 208
Lines 23956 24008 +52
==========================================
+ Hits 16387 16435 +48
- Misses 6426 6428 +2
- Partials 1143 1145 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
pkg/koordlet/resmanager/cpu_burst.go
Outdated
@@ -198,6 +198,12 @@ func (b *CPUBurst) start() { | |||
// ignore non-burstable pod, e.g. LSR, BE pods | |||
continue | |||
} | |||
if podMeta.Pod.Status.Phase == corev1.PodSucceeded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about other phases like corev1.PodFailed
? We may check if the phase is not Running or Pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's restart policy is set to never?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PodFailed
indicates all containers have terminated, so the koordlet would fail to update any of the container-level cgroups without correct cgroup paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the container cgroup file is only available as long as the container exists.
so we should only deal with the running pods,is right?
// These are the valid statuses of pods.
const (
// PodPending means the pod has been accepted by the system, but one or more of the containers
// has not been started. This includes time before being bound to a node, as well as time spent
// pulling images onto the host.
PodPending PodPhase = "Pending"
// PodRunning means the pod has been bound to a node and all of the containers have been started.
// At least one container is still running or is in the process of being restarted.
PodRunning PodPhase = "Running"
// PodSucceeded means that all containers in the pod have voluntarily terminated
// with a container exit code of 0, and the system is not going to restart any of these containers.
PodSucceeded PodPhase = "Succeeded"
// PodFailed means that all containers in the pod have terminated, and at least one container has
// terminated in a failure (exited with a non-zero exit code or was stopped by the system).
PodFailed PodPhase = "Failed"
// PodUnknown means that for some reason the state of the pod could not be obtained, typically due
// to an error in communicating with the host of the pod.
// Deprecated in v1.21: It isn't being set since 2015 (74da3b14b0c0f658b3bb8d2def5094686d0e9095)
PodUnknown PodPhase = "Unknown"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments indicate that Pending
pods may have some containers running. So I still recommend using both PodPending
and PodRunning
.
e.g. a pod with part of the init containers started, a pod with part of regular containers terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll change the code then.
08de4dd
to
ec0e0fb
Compare
Signed-off-by: lucming <2876757716@qq.com>
ec0e0fb
to
87d918f
Compare
New changes are detected. LGTM label has been removed. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hormes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ⅰ. Describe what this PR does
The pod has successfully exited, and koordlet still keeps trying to modify the pod and container cgroups, which is not necessary
koordlet will keep writing the cgroup file even though the pod has exited successfully.However, as the pod exits successfully, the associated cgroup file is deleted, so the following error log is always reported.
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test