add check for scheduled pods #4728

juanvallejo · 2017-07-10T19:40:06Z

Adds a check that ensures pods have been scheduled (and that containerStatuses exist) before attempting to retrieve pod container statuses.

Addresses BZ: 1468760

cc @sosiouxme @rhcarvalho

juanvallejo · 2017-07-10T20:25:58Z

aos-ci-test

openshift-bot · 2017-07-10T23:27:16Z

error: aos-ci-jenkins/OS_3.5_containerized for b42d7d1 (logs)

openshift-bot · 2017-07-10T23:32:06Z

error: aos-ci-jenkins/OS_3.6_containerized for b42d7d1 (logs)

openshift-bot · 2017-07-10T23:37:42Z

error: aos-ci-jenkins/OS_3.5_NOT_containerized for b42d7d1 (logs)

openshift-bot · 2017-07-10T23:37:52Z

error: aos-ci-jenkins/OS_3.6_NOT_containerized for b42d7d1 (logs)

sosiouxme · 2017-07-11T15:07:10Z

roles/openshift_health_checker/openshift_checks/logging/logging.py

+    def not_running_pods(self, pods):
+        """Returns: list of pods not in a ready and running state
+        or an OpenShiftCheckException if no pods have been scheduled."""
+        self.ensure_scheduled_pods(pods)


So after thinking about this for a bit, while I like the idea of informing users about specific problems with their pods, this just does not feel like the right place to do it. And I think there could be legitimate reasons why a pod might not be working but not actually impacting anything because there's already a replacement, so I wouldn't want the check to fail on any particular pod having problems, just if there are the wrong number of pods actually in service.

WDYT?

@sosiouxme

And I think there could be legitimate reasons why a pod might not be working but not actually impacting anything because there's already a replacement, so I wouldn't want the check to fail on any particular pod having problems

I think that is a valid point. Will update this change to only fail if no pods or > 1 pods are found to be Ready

sosiouxme · 2017-07-11T15:15:20Z

roles/openshift_health_checker/openshift_checks/logging/logging.py

+    def not_running_pods(self, pods):
+        """Returns: list of pods not in a ready and running state
+        or an OpenShiftCheckException if no pods have been scheduled."""
+        self.ensure_scheduled_pods(pods)
        return [
            pod for pod in pods
            if any(


We could just stick to the goal of filtering out pods that aren't actually contributing, and fix the check breakage like so:

if not pod.get('status', {}).get('containerStatuses') or any( ... and later ... for condition in pod['status'].get('conditions', [])

sosiouxme · 2017-07-11T17:39:05Z

More than 1 pods can be valid too... I vote for just the minimal change here.

…

On Tue, Jul 11, 2017 at 1:36 PM, Juan Vallejo ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In roles/openshift_health_checker/openshift_checks/logging/logging.py <#4728 (comment)> : > @@ -49,9 +49,10 @@ def get_pods_for_component(self, execute_module, namespace, logging_component, t return pods['items'], None - @staticmethod - def not_running_pods(pods): - """Returns: list of pods not in a ready and running state""" + def not_running_pods(self, pods): + """Returns: list of pods not in a ready and running state + or an OpenShiftCheckException if no pods have been scheduled.""" + self.ensure_scheduled_pods(pods) @sosiouxme <https://github.com/sosiouxme> And I think there could be legitimate reasons why a pod might not be working but not actually impacting anything because there's already a replacement, so I wouldn't want the check to fail on any particular pod having problems I think that is a valid point. Will update this change to only fail if no pods or > 1 pods are found to be Ready — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4728 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABz-rNDYT2xq804lQm4PEQqbe7AxX05ks5sM7K4gaJpZM4OTUIM> .

juanvallejo · 2017-07-11T18:03:17Z

@sosiouxme thanks, updated check based on #4728 (comment)

sosiouxme · 2017-07-11T18:37:48Z

roles/openshift_health_checker/openshift_checks/logging/logging.py

@@ -51,15 +51,16 @@ def get_pods_for_component(self, execute_module, namespace, logging_component, t

    @staticmethod
    def not_running_pods(pods):
-        """Returns: list of pods not in a ready and running state"""
+        """Returns: list of pods not in a ready and running state
+        or an OpenShiftCheckException if no pods have been scheduled."""


doesn't throw an exception now :)

otherwise LGTM... can you squash commits?

doesn't throw an exception now :)

:/ saw this as I started to make changes and knew I would somehow forget to edit it out before pushing

juanvallejo · 2017-07-11T19:23:04Z

@sosiouxme thanks for the feedback. commits squashed

juanvallejo · 2017-07-11T19:23:18Z

aos-ci-test

openshift-bot · 2017-07-11T22:04:32Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 0192385 (logs)

openshift-bot · 2017-07-11T22:07:24Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 0192385 (logs)

openshift-bot · 2017-07-11T22:08:23Z

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for 0192385 (logs)

openshift-bot · 2017-07-11T22:17:36Z

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for 0192385 (logs)

sosiouxme · 2017-07-12T11:17:39Z

merged in #4737

juanvallejo force-pushed the jvallejo/handle-unscheduled-logging-stack-pods branch 2 times, most recently from d5cba42 to b42d7d1 Compare July 10, 2017 20:25

juanvallejo requested a review from sosiouxme July 10, 2017 20:29

sosiouxme reviewed Jul 11, 2017

View reviewed changes

add scheduled pods check

0192385

juanvallejo force-pushed the jvallejo/handle-unscheduled-logging-stack-pods branch from 1c99a90 to 0192385 Compare July 11, 2017 19:21

sosiouxme mentioned this pull request Jul 11, 2017

consolidate small prs #4737

Merged

sosiouxme approved these changes Jul 12, 2017

View reviewed changes

sosiouxme closed this Jul 12, 2017

juanvallejo deleted the jvallejo/handle-unscheduled-logging-stack-pods branch July 12, 2017 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add check for scheduled pods #4728

add check for scheduled pods #4728

juanvallejo commented Jul 10, 2017

juanvallejo commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

sosiouxme Jul 11, 2017

juanvallejo Jul 11, 2017

sosiouxme Jul 11, 2017

sosiouxme commented Jul 11, 2017 via email

juanvallejo commented Jul 11, 2017

sosiouxme Jul 11, 2017

sosiouxme Jul 11, 2017

juanvallejo Jul 11, 2017

juanvallejo commented Jul 11, 2017

juanvallejo commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

sosiouxme commented Jul 12, 2017

add check for scheduled pods #4728

add check for scheduled pods #4728

Conversation

juanvallejo commented Jul 10, 2017

juanvallejo commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

openshift-bot commented Jul 10, 2017

sosiouxme Jul 11, 2017

Choose a reason for hiding this comment

juanvallejo Jul 11, 2017

Choose a reason for hiding this comment

sosiouxme Jul 11, 2017

Choose a reason for hiding this comment

sosiouxme commented Jul 11, 2017 via email

juanvallejo commented Jul 11, 2017

sosiouxme Jul 11, 2017

Choose a reason for hiding this comment

sosiouxme Jul 11, 2017

Choose a reason for hiding this comment

juanvallejo Jul 11, 2017

Choose a reason for hiding this comment

juanvallejo commented Jul 11, 2017

juanvallejo commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

openshift-bot commented Jul 11, 2017

sosiouxme commented Jul 12, 2017