Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new neverTerminate job behavior just for upgrade #122643

Merged
merged 2 commits into from
Jan 8, 2024

Conversation

soltysh
Copy link
Contributor

@soltysh soltysh commented Jan 8, 2024

What type of PR is this?

/kind failing-test
/kind regression

What this PR does / why we need it:

In #121538 the command for notTerminate job behavior was modified, from using a sleep 1000000 to a pause image. The problem is that this behavior is also used in the upgrade job, to ensure the job continues to run during an upgrade, especially when nodes are being rolled over, ie. terminated and the job controller has to keep the job running for the entire time.

Since the goal of that other PR was to optimize executions, I'm adding a new neverTerminate behavior which will be used only during upgrade (with a comment explaining why we need it that way).

Special notes for your reviewer:

/assign @mimowo @sairameshv

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Jan 8, 2024
@k8s-ci-robot k8s-ci-robot added kind/regression Categorizes issue or PR as related to a regression from a prior release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 8, 2024
@soltysh
Copy link
Contributor Author

soltysh commented Jan 8, 2024

/sig apps
/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 8, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 8, 2024
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

// this job is being used in an upgrade job see test/e2e/upgrades/apps/job.go
// it should never be optimized, as it always has to restart during an upgrade
// and continue running
job.Spec.Template.Spec.Containers[0].Command = []string{"sleep", "1000000"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking nit: we could have TerminationGracePeriodSeconds: 1 to make it faster :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need = ptr.To[int64](1)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 79190a25a9c2cc7d4dd3c3c30baf96ff244c595a

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 8, 2024
@@ -70,7 +73,7 @@ func (t *JobUpgradeTest) Teardown(ctx context.Context, f *framework.Framework) {
// rely on the namespace deletion to clean up everything
}

// ensureAllJobPodsRunning uses c to check in the Job named jobName in ns
// ensureAllJobPodsRunning uses c to check if the Job named jobName in ns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an old function, but I'm wondering how do we know the pods are recreated already (and running)? I'm wondering if we are missing a wait here, but even if so this is a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being already handled at the upgrade framework level (see here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/upgrades/upgrade_suite.go). These jobs here are implementing upgrades.Test interface, so you only focus on the Setup (create the job and make sure it's running) and Test (check the job after the upgrade) steps.

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 8c04d42974a6f18fd47dbef03a5ce31f4cfad7bf

@k8s-ci-robot k8s-ci-robot merged commit 4142dda into kubernetes:master Jan 8, 2024
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Jan 8, 2024
@soltysh soltysh deleted the never_terminate branch January 8, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants