Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support to track pod crash/restart during sleep #52

Merged
merged 1 commit into from
May 15, 2020

Conversation

yashashreesuresh
Copy link
Contributor

@yashashreesuresh yashashreesuresh commented May 13, 2020

This commit enables the pod crash/restart to be tracked during the wait time between each iteration. This prevents the Cerberus from missing pod crash/restarts when the pod enters the running phase before the next iteration.

sleep_tracker1

Fixes: #51

@yashashreesuresh
Copy link
Contributor Author

Please have a look @chaitanyaenr @mffiedler

@rht-perf-ci
Copy link

Can one of the admins verify this patch?

@@ -165,6 +167,10 @@ def main(cfg):
watch_namespaces_status, failed_nodes,
failed_pods_components)

if iteration != 1 and crashed_restarted_pods:
logging.info("Pods that were crashed/restarted during the sleep: %s"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "Pods that crashed/restarted during iteration %s : %s" % (iteration, crashed_restarted_pods)

pods_tracker[pod]["creation_timestamp"] = pod_creation_timestamp
pods_tracker[pod]["restart_count"] = pod_restart_count
else:
crashed_restarted_pods.append(pod)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of crashed/restarted pods is currently the pod name. A possible improvement is to append <pod_namspace>:<pod_name> to the list to make it clear where the pod is. This is minor though, I am fine if we move forward like it is now.

@@ -165,6 +167,10 @@ def main(cfg):
watch_namespaces_status, failed_nodes,
failed_pods_components)

if iteration != 1 and crashed_restarted_pods:
logging.info("Pods that were crashed/restarted during the sleep: %s"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "Pods that were crashed/restarted during iteration %s : %s" % (iteration, crashed_restarted_pods)"

@mffiedler
Copy link
Collaborator

Not sure why the duplicate comment. In any case my comments are nitpicks. I tested this and it looks good to me.

Copy link
Collaborator

@chaitanyaenr chaitanyaenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits.


# Load kubeconfig and initialize kubernetes python client
def initialize_clients(kubeconfig_path):
global cli
global cli, pods_tracker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe initialize pods_tracker outside this function as it's meant for initializing client?

crashed_restarted_pods = []
for pod in pods:
try:
pod_info = cli.read_namespaced_pod_status(pod, namespace,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a separate function and call it to provide the given pod status? This way we can reuse it in other places as well. Thoughts?

@@ -80,6 +82,41 @@ def check_sdn_namespace():
please specify the correct networking namespace in config file")


def namespace_sleep_track(namespace):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment to describe the functionality of this function.

@yashashreesuresh
Copy link
Contributor Author

yashashreesuresh commented May 15, 2020

I have made all the above changes. PTAL @mffiedler @chaitanyaenr
sleep_tracker1

This commit enables the pod crash/restart to be tracked during the
wait time between each iteration. This prevents the Cerberus from
missing pod crash/restarts when the pod enters the running phase
before the next iteration.
Copy link
Collaborator

@chaitanyaenr chaitanyaenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaitanyaenr
Copy link
Collaborator

Nice job @yashashreesuresh.

Copy link
Collaborator

@mffiedler mffiedler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@chaitanyaenr chaitanyaenr merged commit e749679 into krkn-chaos:master May 15, 2020
@yashashreesuresh yashashreesuresh deleted the sleep_tracker branch May 17, 2020 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track Kube/OpenShift object restarts
4 participants