Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add check for scheduled pods #4728

Conversation

juanvallejo
Copy link
Contributor

Adds a check that ensures pods have been scheduled (and that containerStatuses exist) before attempting to retrieve pod container statuses.

Addresses BZ: 1468760

cc @sosiouxme @rhcarvalho

@juanvallejo juanvallejo force-pushed the jvallejo/handle-unscheduled-logging-stack-pods branch 2 times, most recently from d5cba42 to b42d7d1 Compare July 10, 2017 20:25
@juanvallejo
Copy link
Contributor Author

aos-ci-test

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.5_containerized for b42d7d1 (logs)

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_containerized for b42d7d1 (logs)

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.5_NOT_containerized for b42d7d1 (logs)

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_NOT_containerized for b42d7d1 (logs)

def not_running_pods(self, pods):
"""Returns: list of pods not in a ready and running state
or an OpenShiftCheckException if no pods have been scheduled."""
self.ensure_scheduled_pods(pods)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after thinking about this for a bit, while I like the idea of informing users about specific problems with their pods, this just does not feel like the right place to do it. And I think there could be legitimate reasons why a pod might not be working but not actually impacting anything because there's already a replacement, so I wouldn't want the check to fail on any particular pod having problems, just if there are the wrong number of pods actually in service.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sosiouxme

And I think there could be legitimate reasons why a pod might not be working but not actually impacting anything because there's already a replacement, so I wouldn't want the check to fail on any particular pod having problems

I think that is a valid point. Will update this change to only fail if no pods or > 1 pods are found to be Ready

def not_running_pods(self, pods):
"""Returns: list of pods not in a ready and running state
or an OpenShiftCheckException if no pods have been scheduled."""
self.ensure_scheduled_pods(pods)
return [
pod for pod in pods
if any(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just stick to the goal of filtering out pods that aren't actually contributing, and fix the check breakage like so:

if not pod.get('status', {}).get('containerStatuses') or any(
... and later ...
    for condition in pod['status'].get('conditions', [])

@sosiouxme
Copy link
Member

sosiouxme commented Jul 11, 2017 via email

@juanvallejo
Copy link
Contributor Author

@sosiouxme thanks, updated check based on #4728 (comment)

@@ -51,15 +51,16 @@ def get_pods_for_component(self, execute_module, namespace, logging_component, t

@staticmethod
def not_running_pods(pods):
"""Returns: list of pods not in a ready and running state"""
"""Returns: list of pods not in a ready and running state
or an OpenShiftCheckException if no pods have been scheduled."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't throw an exception now :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM... can you squash commits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't throw an exception now :)

:/ saw this as I started to make changes and knew I would somehow forget to edit it out before pushing

@juanvallejo juanvallejo force-pushed the jvallejo/handle-unscheduled-logging-stack-pods branch from 1c99a90 to 0192385 Compare July 11, 2017 19:21
@juanvallejo
Copy link
Contributor Author

@sosiouxme thanks for the feedback. commits squashed

@juanvallejo
Copy link
Contributor Author

aos-ci-test

@sosiouxme sosiouxme mentioned this pull request Jul 11, 2017
@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 0192385 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 0192385 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for 0192385 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for 0192385 (logs)

@sosiouxme
Copy link
Member

merged in #4737

@sosiouxme sosiouxme closed this Jul 12, 2017
@juanvallejo juanvallejo deleted the jvallejo/handle-unscheduled-logging-stack-pods branch July 12, 2017 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants