Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout config for node_e2e tests #83268

Merged
merged 1 commit into from Jan 29, 2021

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented Sep 28, 2019

What type of PR is this?
/kind feature
/sig testing

What this PR does / why we need it:

Some test suits use more than the default 45m, resulting in the test to
crash. Without invoking the go runner manually, it is currently impossible to set the timeout when running the tests.

Here is an example of a test suite taking ~2 hours.
Screenshot from 2019-09-25 19-30-25

The current 45m limit comes from here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/remote/remote.go#L34

Which issue(s) this PR fixes:

Fixes # N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 28, 2019
@k8s-ci-robot
Copy link
Contributor

@odinuge: The label(s) area/testing cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

What type of PR is this?
/kind feature
/area testing

What this PR does / why we need it:

Some test suits use more than the default 45m, resulting in the test to
crash.

The current 45m limit comes from here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/remote/remote.go#L34

Which issue(s) this PR fixes:

Fixes # N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 28, 2019
@@ -95,6 +95,7 @@ if [ "${remote}" = true ] ; then
cleanup=${CLEANUP:-"true"}
delete_instances=${DELETE_INSTANCES:-"false"}
preemptible_instances=${PREEMPTIBLE_INSTANCES:-"false"}
test_timeout=${TIMEOUT:-"45m"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we avoid hardcoding "45m" here (ref. https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/remote/remote.go#L34), and only use the test-timeout flag when the timeout is given? And if so, should we then still write Defaults to 45m in the Makefile?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm plus one on avoiding hardcoding 45m - lets keep remote.go as the source of truth for the default. I think the Makefile should also indicate that remote.go contains the source of truth.

@odinuge
Copy link
Member Author

odinuge commented Sep 28, 2019

/cc BenTheElder
/priority backlog

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 28, 2019
@fejta
Copy link
Contributor

fejta commented Sep 29, 2019

@kubernetes/sig-node-pr-reviews
/assign @dchen1107
/uncc @fejta @BenTheElder

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Sep 29, 2019
@odinuge
Copy link
Member Author

odinuge commented Oct 22, 2019

/cc @mattjmcnaughton @derekwaynecarr

Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only see benefits from adding the ability to configure the timeout :) None of the default behaviors are changed, nor is the behavior when running via CI. We are just giving the individual dev more flexibility.

LGTM modulo I agree with your comment around not hard coding "45m" in two places. Please ping me when you've updated the diff and I'll mark as "lgtm".

@@ -95,6 +95,7 @@ if [ "${remote}" = true ] ; then
cleanup=${CLEANUP:-"true"}
delete_instances=${DELETE_INSTANCES:-"false"}
preemptible_instances=${PREEMPTIBLE_INSTANCES:-"false"}
test_timeout=${TIMEOUT:-"45m"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm plus one on avoiding hardcoding 45m - lets keep remote.go as the source of truth for the default. I think the Makefile should also indicate that remote.go contains the source of truth.

@odinuge odinuge force-pushed the e2e_node_timeout branch 2 times, most recently from e5fcce1 to 14d386a Compare October 23, 2019 17:17
@odinuge
Copy link
Member Author

odinuge commented Oct 23, 2019

/test pull-kubernetes-node-e2e-containerd

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2020
@@ -151,6 +155,7 @@ if [ "${remote}" = true ] ; then
--delete-instances="${delete_instances}" --test_args="${test_args}" --instance-metadata="${metadata}" \
--image-config-file="${image_config_file}" --system-spec-name="${system_spec_name}" \
--preemptible-instances="${preemptible_instances}" --extra-envs="${extra_envs}" --test-suite="${test_suite}" \
"${timeout_arg}" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do if empty? is an empty arg tolerated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yeah, that works fine! Running with TIMEOUT=20s, TIMEOUT="" and without TIMEOUT works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @liggitt

@alejandrox1
Copy link
Contributor

@odinuge would you be able to continue working on this?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2020
@alejandrox1 alejandrox1 added this to PRs that need attention from the author in SIG Node CI/Test Board Aug 4, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 9, 2020
Some test suits use more than the default 45m, resulting in the test to
crash.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2020
@odinuge
Copy link
Member Author

odinuge commented Aug 9, 2020

/retest

1 similar comment
@SergeyKanzhelev
Copy link
Member

/retest

@alejandrox1
Copy link
Contributor

Looking through owners of build and hack...
/assign @spiffxp
ptal 🙏

@alejandrox1 alejandrox1 moved this from PRs that need attention from the author to PRs - Reviewer approved in SIG Node CI/Test Board Oct 20, 2020
@SergeyKanzhelev SergeyKanzhelev moved this from PRs - Reviewer lgtm'd to PRs - Review in progress in SIG Node CI/Test Board Jan 4, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2021
SIG Node CI/Test Board automation moved this from PRs - Review in progress to PRs - Reviewer lgtm'd Jan 28, 2021
Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/remove-lifecycle stale
/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 28, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, odinuge, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2021
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit d7cb340 into kubernetes:master Jan 29, 2021
SIG Node CI/Test Board automation moved this from PRs - Reviewer lgtm'd to Done Jan 29, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet