-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add not-ready reap logic and unit test #58
Conversation
4c6aa3b
to
b93cb14
Compare
pkg/reaper/pdbreaper/pdbreaper.go
Outdated
if pod.Status.Phase == corev1.PodPending && isPodReadinessThresholdPast(pod.Status.StartTime, thresholdSeconds) { | ||
notReadyCount++ | ||
} | ||
|
||
if pod.Status.Phase == corev1.PodRunning && isPodReadinessThresholdPast(pod.Status.StartTime, thresholdSeconds) { | ||
for _, condition := range pod.Status.Conditions { | ||
if condition.Type == "ContainersReady" && condition.Status == "False" { | ||
notReadyCount++ | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of pod.Status.StartTime
, (which is the start of the Pod) would it be possible to check if the threshold is past when the container last entered the NotReady state? For example, the lastTransitionTime
of the Ready
or ContainersReady
condition?
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-09-15T22:27:29Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-09-15T22:27:15Z"
message: 'containers with unready status: [app]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-09-15T22:27:15Z"
message: 'containers with unready status: [app]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-09-15T22:27:15Z"
status: "True"
type: PodScheduled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
cmd/governor/app/pdbreaper.go
Outdated
@@ -49,4 +49,7 @@ func init() { | |||
pdbReaperCmd.Flags().BoolVar(&pdbReaperArgs.AllCrashLoop, "all-crashloop", true, "Only deletes PDBs for crashlooping pods when all pods are in crashloop") | |||
pdbReaperCmd.Flags().IntVar(&pdbReaperArgs.CrashLoopRestartCount, "crashloop-restart-count", 5, "Minimum restart count to when considering pods in crashloop") | |||
pdbReaperCmd.Flags().StringSliceVar(&pdbReaperArgs.ExcludedNamespaces, "excluded-namespaces", []string{}, "Namespaces excluded from scanning") | |||
pdbReaperCmd.Flags().BoolVar(&pdbReaperArgs.ReapNotReady, "reap-not-ready", true, "Deletes PDBs which have pods in not-ready state") | |||
pdbReaperCmd.Flags().IntVar(&pdbReaperArgs.ReapNotReadyThreshold, "not-ready-threshold-seconds", 400, "Minimum seconds to wait when considering pods in not-ready state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default threshold for deleting a NotReady PDB should probably match the upgrade-manager drainTimeout
default, which is 1800 seconds (30 minutes). If a Pod is still not ready after 30 minutes then probably something is wrong. After 400 seconds (6.6 minutes) maybe something is wrong, but there may be cases where containers take longer than 6.6 minutes to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: sbadla1 <sahil_badla@intuit.com>
Signed-off-by: sbadla1 <sahil_badla@intuit.com>
Signed-off-by: sbadla1 <sahil_badla@intuit.com>
e8b39d1
to
added1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report
@@ Coverage Diff @@
## master #58 +/- ##
==========================================
+ Coverage 59.71% 59.85% +0.14%
==========================================
Files 9 9
Lines 1390 1420 +30
==========================================
+ Hits 830 850 +20
- Misses 467 476 +9
- Partials 93 94 +1
Continue to review full report at Codecov.
|
* add not-ready reap logic and unit test Signed-off-by: sbadla1 <sahil_badla@intuit.com> Signed-off-by: shaoxt <shaoxt@gmail.com>
pdb-reaper not deleting the pdb's where pods are not fully ready(pod(s) in pending state or pod containers in not ready state)
Changes: