-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STOR-1892: Add option to disable all monitors #28815
base: master
Are you sure you want to change the base?
Conversation
@jsafrane: This pull request references STOR-1892 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jsafrane The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
`openshift-tests run --disable-monitor all` disables all monitors, so the output and artifacts are not polluted by monitors, but contain the ginkgo test results only. This is useful when running the tests by a human and not in CI.
When no in-cluster monitoring daemon sets run, StartInClusterMonitors() is not called and `namespace` is empty. Nothing produces /tmp/artifacts/junit/AdditionalEvents__in_cluster_disruption.json on the nodes and thus do not collect it. This removes following error: found errors fetching in-cluster data: [failed to list files in disruption event folder on node ip-10-0-19-0.ec2.internal: the server could not find the requested resource ]
@jsafrane: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: d9a2c99
|
@jsafrane: This pull request references STOR-1892 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@jsafrane: This pull request references STOR-1892 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is a reasonable line of thought but I fear it's too big of a hammer. The monitor tests contain an immense amount of testing, without which the CSI tests could be giving false positives where the tests pass but the cluster was outlandishly unwell. They check the state of cluster operators, network connectivity, watch for suspicious kube events, watch for suspicious etcd and kubelet logging, etc. Particularly given the issue appears to be with how much is logged, it seems like a better first step would be to analyze what's logging so much and see if we could adjust there. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
openshift-tests run --disable-monitor all
disables all monitors, so the output and artifacts are not polluted by monitors, and contain the ginkgo test results only.This is useful when running the tests by a human and not in CI - the monitors are very talkative on stdout. We use openshift-tests as a certification suite for CSI drivers and 3rd party CSI driver vendors are not interested in megabytes of logs of OCP health, they're interested in the CSI driver test results.
In addition, when all monitors are disabled, do not collect
AdditionalEvents__in_cluster_disruption.json
from nodes - no monitoring means no monitoring DaemonSets running and thus there is nothing to collect from them.