Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install: increase timeout to 40 minutes #4181

Merged
merged 1 commit into from Sep 16, 2020

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Sep 16, 2020

We're seeing a significant uptick in install failures. Many appear to actually finish successfully before
artifact collection. We have good signal in sippy about successful install ratios. We can increase
this timeout by 10 minutes to see if we get an improved install percentage and then continue from there.

We're seeing a significant uptick in install failures.  Many appear to actually finish successfully before
artifact collection.  We have good signal in sippy about successful install ratios.  We can increase
this timeout by 10 minutes to see if we get an improved install percentage and then continue from there.
@sdodson
Copy link
Member

sdodson commented Sep 16, 2020

/approve
/lgtm
/hold for feedback from @abhinavdahiya though

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 16, 2020
@sdodson
Copy link
Member

sdodson commented Sep 16, 2020

/hold cancel
I guess since there's already agreement to revert if this doesn't yield positive results there's no harm in getting this in now so we can begin measuring the impact.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2020
@sdodson sdodson added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 16, 2020
@abhinavdahiya
Copy link
Contributor

/hold

I don't think it is appropriate to increase the timeout from 30 to 40 mins. Moving past the 30 min deadline is a regression in my opinion and moving the needle doesn't help figuring out which operator is regressing.
We should instead update our CI to

catch create cluster timeour
capture all the operators that were not done
run wait-for install-complete to allow success
report all the operator from the list above to e2e as failure so that we can open bugs.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2020
@sdodson
Copy link
Member

sdodson commented Sep 16, 2020

Do we have that timing data from historic releases to compare per operator timings in a meaningful manner? To me this seems like a pragmatic approach to debug but agree this should only be done with the goal of reverting this after we've gathered data.

@deads2k
Copy link
Contributor Author

deads2k commented Sep 16, 2020

/retest

@abhinavdahiya
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2020
@openshift-ci-robot
Copy link
Contributor

@deads2k: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-fips 9348f3a link /test e2e-aws-fips
ci/prow/e2e-crc 9348f3a link /test e2e-crc
ci/prow/e2e-openstack 9348f3a link /test e2e-openstack
ci/prow/e2e-aws 9348f3a link /test e2e-aws
ci/prow/e2e-aws-workers-rhel7 9348f3a link /test e2e-aws-workers-rhel7
ci/prow/e2e-ovirt 9348f3a link /test e2e-ovirt

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@deads2k
Copy link
Contributor Author

deads2k commented Sep 16, 2020

CI is having general difficulties on gcp quota, golang 1.15 changes, ART shas missing, distgit down. Force merging so we'll get it in the next good payload. Some installs succeeded. https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4181/pull-ci-openshift-installer-master-e2e-aws/1306247987332124672 for instance

@deads2k deads2k merged commit b894e4c into openshift:master Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants