STOR-1892: Add option to disable all monitors #28815

jsafrane · 2024-05-20T12:28:14Z

openshift-tests run --disable-monitor all disables all monitors, so the output and artifacts are not polluted by monitors, and contain the ginkgo test results only.

This is useful when running the tests by a human and not in CI - the monitors are very talkative on stdout. We use openshift-tests as a certification suite for CSI drivers and 3rd party CSI driver vendors are not interested in megabytes of logs of OCP health, they're interested in the CSI driver test results.

In addition, when all monitors are disabled, do not collect AdditionalEvents__in_cluster_disruption.json from nodes - no monitoring means no monitoring DaemonSets running and thus there is nothing to collect from them.

openshift-ci-robot · 2024-05-20T12:28:34Z

@jsafrane: This pull request references STOR-1892 which is a valid jira issue.

In response to this:

openshift-tests run --disable-monitor all disables all monitors, so the output and artifacts are not polluted by monitors, and contain the ginkgo test results only.

This is useful when running the tests by a human and not in CI - the monitors are very talkative on stdout. We use openshift-tests as a certification suite for CSI drivers and 3rd party CSI driver vendors are not interested in the OCP health, they're interested in the CSI driver test results.

In addition, when all monitors are disabled, do not collect AdditionalEvents__in_cluster_disruption.json from nodes - there are no monitoring DaemonSets running.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-05-20T12:32:36Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jsafrane
Once this PR has been reviewed and has the lgtm label, please assign soltysh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

`openshift-tests run --disable-monitor all` disables all monitors, so the output and artifacts are not polluted by monitors, but contain the ginkgo test results only. This is useful when running the tests by a human and not in CI.

When no in-cluster monitoring daemon sets run, StartInClusterMonitors() is not called and `namespace` is empty. Nothing produces /tmp/artifacts/junit/AdditionalEvents__in_cluster_disruption.json on the nodes and thus do not collect it. This removes following error: found errors fetching in-cluster data: [failed to list files in disruption event folder on node ip-10-0-19-0.ec2.internal: the server could not find the requested resource ]

openshift-ci · 2024-05-20T17:48:04Z

@jsafrane: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-single-node	`d9a2c99`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/e2e-aws-ovn-single-node-upgrade	`d9a2c99`	link	false	`/test e2e-aws-ovn-single-node-upgrade`
ci/prow/e2e-aws-ovn-single-node-serial	`d9a2c99`	link	false	`/test e2e-aws-ovn-single-node-serial`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-trt-bot · 2024-05-20T18:07:17Z

Job Failure Risk Analysis for sha: d9a2c99

Job Name	Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade	IncompleteTests Tests for this run (99) are below the historical average (2301): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial	IncompleteTests Tests for this run (95) are below the historical average (820): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node	IncompleteTests Tests for this run (98) are below the historical average (1745): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-ci-robot · 2024-06-03T10:34:05Z

@jsafrane: This pull request references STOR-1892 which is a valid jira issue.

In response to this:

openshift-tests run --disable-monitor all disables all monitors, so the output and artifacts are not polluted by monitors, and contain the ginkgo test results only.

This is useful when running the tests by a human and not in CI - the monitors are very talkative on stdout. We use openshift-tests as a certification suite for CSI drivers and 3rd party CSI driver vendors are not interested in the OCP health, they're interested in the CSI driver test results.

In addition, when all monitors are disabled, do not collect AdditionalEvents__in_cluster_disruption.json from nodes - no monitoring means no monitoring DaemonSets running and thus there is nothing to collect from them.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-06-03T10:36:16Z

@jsafrane: This pull request references STOR-1892 which is a valid jira issue.

In response to this:

openshift-tests run --disable-monitor all disables all monitors, so the output and artifacts are not polluted by monitors, and contain the ginkgo test results only.

This is useful when running the tests by a human and not in CI - the monitors are very talkative on stdout. We use openshift-tests as a certification suite for CSI drivers and 3rd party CSI driver vendors are not interested in megabytes of logs of OCP health, they're interested in the CSI driver test results.

In addition, when all monitors are disabled, do not collect AdditionalEvents__in_cluster_disruption.json from nodes - no monitoring means no monitoring DaemonSets running and thus there is nothing to collect from them.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

dgoodwin · 2024-06-03T11:16:40Z

This is a reasonable line of thought but I fear it's too big of a hammer. The monitor tests contain an immense amount of testing, without which the CSI tests could be giving false positives where the tests pass but the cluster was outlandishly unwell. They check the state of cluster operators, network connectivity, watch for suspicious kube events, watch for suspicious etcd and kubelet logging, etc.

Particularly given the issue appears to be with how much is logged, it seems like a better first step would be to analyze what's logging so much and see if we could adjust there.

openshift-bot · 2024-09-02T01:00:38Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-merge-robot · 2024-09-02T01:00:48Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jsafrane changed the title ~~Add option to disable all monitors~~ STOR-1892: Add option to disable all monitors May 20, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 20, 2024

openshift-ci bot requested review from deads2k and sjenning May 20, 2024 12:32

jsafrane added 2 commits May 20, 2024 16:47

Add option to disable all monitors

4683436

`openshift-tests run --disable-monitor all` disables all monitors, so the output and artifacts are not polluted by monitors, but contain the ginkgo test results only. This is useful when running the tests by a human and not in CI.

jsafrane force-pushed the monitor-regexp branch from 01fb682 to d9a2c99 Compare May 20, 2024 14:48

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2024

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STOR-1892: Add option to disable all monitors #28815

STOR-1892: Add option to disable all monitors #28815

jsafrane commented May 20, 2024 •

edited

Loading

openshift-ci-robot commented May 20, 2024 •

edited by openshift-ci bot

Loading

openshift-ci bot commented May 20, 2024

openshift-ci bot commented May 20, 2024

openshift-trt-bot commented May 20, 2024

openshift-ci-robot commented Jun 3, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jun 3, 2024 •

edited by openshift-ci bot

Loading

dgoodwin commented Jun 3, 2024

openshift-bot commented Sep 2, 2024

openshift-merge-robot commented Sep 2, 2024

STOR-1892: Add option to disable all monitors #28815

Are you sure you want to change the base?

STOR-1892: Add option to disable all monitors #28815

Conversation

jsafrane commented May 20, 2024 • edited Loading

openshift-ci-robot commented May 20, 2024 • edited by openshift-ci bot Loading

openshift-ci bot commented May 20, 2024

openshift-ci bot commented May 20, 2024

openshift-trt-bot commented May 20, 2024

openshift-ci-robot commented Jun 3, 2024 • edited by openshift-ci bot Loading

openshift-ci-robot commented Jun 3, 2024 • edited by openshift-ci bot Loading

dgoodwin commented Jun 3, 2024

openshift-bot commented Sep 2, 2024

openshift-merge-robot commented Sep 2, 2024

jsafrane commented May 20, 2024 •

edited

Loading

openshift-ci-robot commented May 20, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jun 3, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jun 3, 2024 •

edited by openshift-ci bot

Loading