USHIFT-1389: Enhance greenboot check script to print cluster debugging information on failures #2150

ggiguash · 2023-07-31T15:59:52Z

openshift-ci-robot · 2023-07-31T15:59:55Z

@ggiguash: This pull request references USHIFT-1389 which is a valid jira issue.

In response to this:

Closes USHIFT-1389

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-07-31T15:59:59Z

@ggiguash: This pull request references USHIFT-1389 which is a valid jira issue.

In response to this:

Closes USHIFT-1389

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ggiguash · 2023-07-31T17:32:21Z

/assign @pmtk @dhellmann

packaging/greenboot/microshift-running-check.sh

dhellmann · 2023-07-31T19:28:27Z

packaging/greenboot/microshift-running-check.sh

+        for srv in microshift.service microshift-etcd.service ; do
+            log_failure_cmd "${srv}" "journalctl -xu ${srv} -n 1000 --no-pager"
+        done
+        # Always log the list of pods


Could we log this sort of information closer to the place where it is being checked? For example, "Expected N pods in $namespace, found M: $(oc get pods -n $namespace)"

The problem is that it's difficult because the checks are running in loops/background and there's a wait for status reports.
I think, however, that producing pod list (and events?) before exit helps to identify the problem on the spot.

So, there will be a message that pod restart check failed and pod-list-snapshot taken 1s after that error.

How is a user supposed to know what pods should be present so they can compare that to the list of pods that are present?

I guess, that requires some internal knowledge of MicroShift. However, it's "easy" to see if something not started by the pod count.
I'm not saying it's ideal, so I'm open to suggestions on how to improve this.

In the loops where the script looks for pods in specific namespaces, it knows when it does not find them. It should report whatever details it can at that point. Messages like "Expected 5 ready pods in openshift-foo but found 4" point directly to issues within the namespace. "Deployment embedded-component has no ready pods" is even better because it points directly to the thing that is broken.

I added error messages in all the loops. Is this any better?

packaging/greenboot/microshift-running-check.sh

dhellmann

/lgtm

openshift-ci-robot · 2023-08-02T21:29:13Z

/retest-required

Remaining retests: 0 against base HEAD d0a6112 and 2 for PR HEAD a61c625 in total

openshift-ci-robot · 2023-08-02T23:29:10Z

/retest-required

Remaining retests: 0 against base HEAD 54c213a and 1 for PR HEAD a61c625 in total

openshift-ci-robot · 2023-08-03T07:29:18Z

/retest-required

Remaining retests: 0 against base HEAD c96eb62 and 0 for PR HEAD a61c625 in total

ggiguash · 2023-08-03T07:38:05Z

I reverted the "optimization" of service status check here.
Looks like it was introducing a race condition.

… on failures

pmtk · 2023-08-03T12:41:48Z

/lgtm

openshift-ci · 2023-08-03T12:44:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhellmann, ggiguash, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dhellmann,ggiguash,pmtk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-08-03T13:47:31Z

/retest-required

Remaining retests: 0 against base HEAD c96eb62 and 2 for PR HEAD a1e7213 in total

openshift-ci-robot · 2023-08-03T14:49:57Z

/retest-required

Remaining retests: 0 against base HEAD f7cb029 and 1 for PR HEAD a1e7213 in total

ggiguash · 2023-08-03T15:52:43Z

The e2e-openshift-conformance-reduced test is not related to this change.
/override ci/prow/e2e-openshift-conformance-reduced

openshift-ci · 2023-08-03T15:57:13Z

@ggiguash: Overrode contexts on behalf of ggiguash: ci/prow/e2e-openshift-conformance-reduced

In response to this:

The e2e-openshift-conformance-reduced test is not related to this change.
/override ci/prow/e2e-openshift-conformance-reduced

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-08-03T18:49:45Z

/retest-required

Remaining retests: 0 against base HEAD 32f9818 and 0 for PR HEAD a1e7213 in total

ggiguash · 2023-08-03T19:28:44Z

/override ci/prow/e2e-openshift-conformance-reduced

openshift-ci · 2023-08-03T19:32:31Z

@ggiguash: Overrode contexts on behalf of ggiguash: ci/prow/e2e-openshift-conformance-reduced

In response to this:

/override ci/prow/e2e-openshift-conformance-reduced

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-08-03T19:36:26Z

@ggiguash: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/microshift-e2e-arm	`a1e7213`	link	false	`/test microshift-e2e-arm`
ci/prow/e2e-openshift-conformance-reduced-arm	`a1e7213`	link	false	`/test e2e-openshift-conformance-reduced-arm`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 31, 2023

openshift-ci bot requested review from jogeo and pacevedom July 31, 2023 16:05

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2023

openshift-ci bot assigned dhellmann and pmtk Jul 31, 2023

dhellmann reviewed Jul 31, 2023

View reviewed changes

pmtk reviewed Aug 1, 2023

View reviewed changes

packaging/greenboot/microshift-running-check.sh Outdated Show resolved Hide resolved

packaging/greenboot/microshift-running-check.sh Outdated Show resolved Hide resolved

ggiguash requested review from dhellmann and pmtk August 1, 2023 12:59

ggiguash force-pushed the greenboot_error_logging branch from df7d3ef to a61c625 Compare August 2, 2023 15:28

dhellmann reviewed Aug 2, 2023

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 2, 2023

ggiguash force-pushed the greenboot_error_logging branch from a61c625 to f6fe532 Compare August 3, 2023 07:32

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 3, 2023

Enhance greenboot check script to print cluster debugging information…

a1e7213

… on failures

ggiguash force-pushed the greenboot_error_logging branch from f6fe532 to a1e7213 Compare August 3, 2023 12:37

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 3, 2023

openshift-merge-robot merged commit d0c1a88 into openshift:main Aug 3, 2023
8 of 10 checks passed

ggiguash deleted the greenboot_error_logging branch August 4, 2023 08:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USHIFT-1389: Enhance greenboot check script to print cluster debugging information on failures #2150

USHIFT-1389: Enhance greenboot check script to print cluster debugging information on failures #2150

ggiguash commented Jul 31, 2023 •

edited by openshift-ci bot

openshift-ci-robot commented Jul 31, 2023 •

edited by openshift-ci bot

openshift-ci-robot commented Jul 31, 2023 •

edited by openshift-ci bot

ggiguash commented Jul 31, 2023

dhellmann Jul 31, 2023

ggiguash Aug 1, 2023 •

edited

dhellmann Aug 1, 2023

ggiguash Aug 1, 2023

dhellmann Aug 1, 2023

ggiguash Aug 2, 2023

dhellmann left a comment

openshift-ci-robot commented Aug 2, 2023

openshift-ci-robot commented Aug 2, 2023

openshift-ci-robot commented Aug 3, 2023

ggiguash commented Aug 3, 2023 •

edited

pmtk commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

openshift-ci-robot commented Aug 3, 2023

openshift-ci-robot commented Aug 3, 2023

ggiguash commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

openshift-ci-robot commented Aug 3, 2023

ggiguash commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

USHIFT-1389: Enhance greenboot check script to print cluster debugging information on failures #2150

USHIFT-1389: Enhance greenboot check script to print cluster debugging information on failures #2150

Conversation

ggiguash commented Jul 31, 2023 • edited by openshift-ci bot

openshift-ci-robot commented Jul 31, 2023 • edited by openshift-ci bot

openshift-ci-robot commented Jul 31, 2023 • edited by openshift-ci bot

ggiguash commented Jul 31, 2023

dhellmann Jul 31, 2023

Choose a reason for hiding this comment

ggiguash Aug 1, 2023 • edited

Choose a reason for hiding this comment

dhellmann Aug 1, 2023

Choose a reason for hiding this comment

ggiguash Aug 1, 2023

Choose a reason for hiding this comment

dhellmann Aug 1, 2023

Choose a reason for hiding this comment

ggiguash Aug 2, 2023

Choose a reason for hiding this comment

dhellmann left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Aug 2, 2023

openshift-ci-robot commented Aug 2, 2023

openshift-ci-robot commented Aug 3, 2023

ggiguash commented Aug 3, 2023 • edited

pmtk commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

openshift-ci-robot commented Aug 3, 2023

openshift-ci-robot commented Aug 3, 2023

ggiguash commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

openshift-ci-robot commented Aug 3, 2023

ggiguash commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

openshift-ci bot commented Aug 3, 2023

ggiguash commented Jul 31, 2023 •

edited by openshift-ci bot

openshift-ci-robot commented Jul 31, 2023 •

edited by openshift-ci bot

openshift-ci-robot commented Jul 31, 2023 •

edited by openshift-ci bot

ggiguash Aug 1, 2023 •

edited

ggiguash commented Aug 3, 2023 •

edited