-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support to track pod crash/restart during sleep #52
Added support to track pod crash/restart during sleep #52
Conversation
Please have a look @chaitanyaenr @mffiedler |
Can one of the admins verify this patch? |
start_cerberus.py
Outdated
@@ -165,6 +167,10 @@ def main(cfg): | |||
watch_namespaces_status, failed_nodes, | |||
failed_pods_components) | |||
|
|||
if iteration != 1 and crashed_restarted_pods: | |||
logging.info("Pods that were crashed/restarted during the sleep: %s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "Pods that crashed/restarted during iteration %s : %s" % (iteration, crashed_restarted_pods)
cerberus/kubernetes/client.py
Outdated
pods_tracker[pod]["creation_timestamp"] = pod_creation_timestamp | ||
pods_tracker[pod]["restart_count"] = pod_restart_count | ||
else: | ||
crashed_restarted_pods.append(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of crashed/restarted pods is currently the pod name. A possible improvement is to append <pod_namspace>:<pod_name> to the list to make it clear where the pod is. This is minor though, I am fine if we move forward like it is now.
start_cerberus.py
Outdated
@@ -165,6 +167,10 @@ def main(cfg): | |||
watch_namespaces_status, failed_nodes, | |||
failed_pods_components) | |||
|
|||
if iteration != 1 and crashed_restarted_pods: | |||
logging.info("Pods that were crashed/restarted during the sleep: %s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "Pods that were crashed/restarted during iteration %s : %s" % (iteration, crashed_restarted_pods)"
Not sure why the duplicate comment. In any case my comments are nitpicks. I tested this and it looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits.
cerberus/kubernetes/client.py
Outdated
|
||
# Load kubeconfig and initialize kubernetes python client | ||
def initialize_clients(kubeconfig_path): | ||
global cli | ||
global cli, pods_tracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe initialize pods_tracker outside this function as it's meant for initializing client?
cerberus/kubernetes/client.py
Outdated
crashed_restarted_pods = [] | ||
for pod in pods: | ||
try: | ||
pod_info = cli.read_namespaced_pod_status(pod, namespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a separate function and call it to provide the given pod status? This way we can reuse it in other places as well. Thoughts?
cerberus/kubernetes/client.py
Outdated
@@ -80,6 +82,41 @@ def check_sdn_namespace(): | |||
please specify the correct networking namespace in config file") | |||
|
|||
|
|||
def namespace_sleep_track(namespace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment to describe the functionality of this function.
e664d2b
to
f22f53f
Compare
I have made all the above changes. PTAL @mffiedler @chaitanyaenr |
This commit enables the pod crash/restart to be tracked during the wait time between each iteration. This prevents the Cerberus from missing pod crash/restarts when the pod enters the running phase before the next iteration.
f22f53f
to
456aff2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice job @yashashreesuresh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
This commit enables the pod crash/restart to be tracked during the wait time between each iteration. This prevents the Cerberus from missing pod crash/restarts when the pod enters the running phase before the next iteration.
Fixes: #51