New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move remaining disruption tests to invariants #28144
move remaining disruption tests to invariants #28144
Conversation
looks to be working. |
/retest |
@@ -153,6 +152,9 @@ a running cluster. | |||
`), | |||
|
|||
RunE: func(cmd *cobra.Command, args []string) error { | |||
if true { | |||
return fmt.Errorf("this command got nerfed") | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to at least ask, you really want this in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this command trying to do and does it make sense this way with invariant tests? Would a rewrite when we need it again make more sense?
I'm up for removing the command of you're down with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as removing it goes I would be interested on checking with @dgoodwin to see what his original intent was and if it is still of use. If leaving it disabled then a comment regarding why it was disabled would suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command lets developers work on disruption test code and run it against a given intervals file containing disruption we're interested in testing against, getting feedback in seconds rather than hours waiting for CI to run. I would prefer to keep it operational.
With your new testing, would it be possible to run through the junit generation parts of your interface, but skip the setup/generate intervals portions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That capability was already lost at a preview point int eh refactor. Is it frequently enough used to pre-emptively create it or should it just be made the next time its needed?
I'm really not clear on which part it's trying to test. possibilities
- the code doing sample collection
- the code recording the sample failures
- the code summarizing the disruption summary json file
- the code rendering a timeline
- the code looking up historical values
- the code creating junit reports.
Which part is this command trying to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is supposed to run the "should remain available" disruption tests against the historical data on disk. It assumes it's already given an intervals file with the observed disruption. In your list, #5, plus running the tests and viewing the output to see what would fail and with what values.
Deep is going to need similar very soon for alerts.
if backendName == externalservice.LivenessProbeBackend { | ||
aed := allowedExternalDisruption | ||
return &aed, "forgiving limit for disruption to an external service", nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC This was in place so we didn't fail due to our check to see if the cluster running the tests was having connection issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test was always ok. I've left the backend check and it reports intervals and overall disruption. Do we also need the "never fail" test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I see you are dropping the tests out when I saw the change I just wanted to be sure weren't risking failures on these checks again.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, neisw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@deads2k: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: 3aeb2d7
|
/override ci/prow/e2e-aws-ovn-serial Failure for |
@neisw: Overrode contexts on behalf of neisw: ci/prow/e2e-aws-ovn-serial In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
e973bdc
into
openshift:master
dropped new apiserver testing that Vadim has a separate PR working on.