Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move remaining disruption tests to invariants #28144

Merged

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Aug 7, 2023

dropped new apiserver testing that Vadim has a separate PR working on.

@openshift-ci openshift-ci bot requested review from bparees and csrwng August 7, 2023 15:46
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2023
@deads2k
Copy link
Contributor Author

deads2k commented Aug 7, 2023

looks to be working. ci-cluster-network-liveness still exists in backends and would have failed intervals, but it doesn't have associated test because it never fails.

@deads2k
Copy link
Contributor Author

deads2k commented Aug 7, 2023

/retest

@@ -153,6 +152,9 @@ a running cluster.
`),

RunE: func(cmd *cobra.Command, args []string) error {
if true {
return fmt.Errorf("this command got nerfed")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to at least ask, you really want this in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this command trying to do and does it make sense this way with invariant tests? Would a rewrite when we need it again make more sense?

I'm up for removing the command of you're down with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as removing it goes I would be interested on checking with @dgoodwin to see what his original intent was and if it is still of use. If leaving it disabled then a comment regarding why it was disabled would suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command lets developers work on disruption test code and run it against a given intervals file containing disruption we're interested in testing against, getting feedback in seconds rather than hours waiting for CI to run. I would prefer to keep it operational.

With your new testing, would it be possible to run through the junit generation parts of your interface, but skip the setup/generate intervals portions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That capability was already lost at a preview point int eh refactor. Is it frequently enough used to pre-emptively create it or should it just be made the next time its needed?

I'm really not clear on which part it's trying to test. possibilities

  1. the code doing sample collection
  2. the code recording the sample failures
  3. the code summarizing the disruption summary json file
  4. the code rendering a timeline
  5. the code looking up historical values
  6. the code creating junit reports.

Which part is this command trying to do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supposed to run the "should remain available" disruption tests against the historical data on disk. It assumes it's already given an intervals file with the observed disruption. In your list, #5, plus running the tests and viewing the output to see what would fail and with what values.

Deep is going to need similar very soon for alerts.

if backendName == externalservice.LivenessProbeBackend {
aed := allowedExternalDisruption
return &aed, "forgiving limit for disruption to an external service", nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC This was in place so we didn't fail due to our check to see if the cluster running the tests was having connection issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was always ok. I've left the backend check and it reports intervals and overall disruption. Do we also need the "never fail" test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I see you are dropping the tests out when I saw the change I just wanted to be sure weren't risking failures on these checks again.

@neisw
Copy link
Contributor

neisw commented Aug 8, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 8, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2023

@deads2k: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node-serial 3aeb2d7 link false /test e2e-aws-ovn-single-node-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 3aeb2d7

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-serial High
[sig-apps] Daemon set [Serial] should surge pods onto nodes when spec was updated and update strategy is RollingUpdate [Suite:openshift/conformance/serial] [Suite:k8s]
This test has passed 100.00% of 53 runs on jobs ['periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-serial'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial Low
[sig-storage] PersistentVolumes-local Stress with local volumes [Serial] should be able to process many pods and reuse local volumes [Suite:openshift/conformance/serial] [Suite:k8s]
This test has passed 70.59% of 34 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-single-node-serial'] in the last 14 days.

@neisw
Copy link
Contributor

neisw commented Aug 8, 2023

/override ci/prow/e2e-aws-ovn-serial

Failure for [sig-apps] Daemon set [Serial] should surge pods onto nodes when spec was updated and update strategy is RollingUpdate is not related

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2023

@neisw: Overrode contexts on behalf of neisw: ci/prow/e2e-aws-ovn-serial

In response to this:

/override ci/prow/e2e-aws-ovn-serial

Failure for [sig-apps] Daemon set [Serial] should surge pods onto nodes when spec was updated and update strategy is RollingUpdate is not related

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit e973bdc into openshift:master Aug 8, 2023
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants