Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Reproduce missing the nodeSelector issue on the PyTorchJob #1422

Closed

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Dec 7, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR can show that #1407 is reproducable only on the E2E Test.

Here is a reproduced Job: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_kueue/1422/pull-kueue-test-e2e-main-1-27/1733124744468762624

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 7, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

netlify bot commented Dec 7, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit b664426
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/657323926eb95f0008623b74

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 7, 2023
@tenzen-y
Copy link
Member Author

tenzen-y commented Dec 7, 2023

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2023
@tenzen-y tenzen-y force-pushed the missing-nodeselector-pytorchjob branch from d9cb9a7 to 54c45f1 Compare December 7, 2023 15:32
@tenzen-y
Copy link
Member Author

tenzen-y commented Dec 7, 2023

This is still under implementing.

@alculquicondor
Copy link
Contributor

Have you figured why the node selectors are getting dropped?

@tenzen-y
Copy link
Member Author

tenzen-y commented Dec 8, 2023

Have you figured why the node selectors are getting dropped?

I'm organizing the cause now. So I will create another PR.

@tenzen-y tenzen-y force-pushed the missing-nodeselector-pytorchjob branch 6 times, most recently from 2b1ba5d to 0e62f1a Compare December 8, 2023 13:31
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
@tenzen-y tenzen-y force-pushed the missing-nodeselector-pytorchjob branch from 0373371 to 30fcf25 Compare December 8, 2023 14:01
Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
@k8s-ci-robot
Copy link
Contributor

@tenzen-y: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-test-e2e-main-1-28 b664426 link true /test pull-kueue-test-e2e-main-1-28
pull-kueue-test-e2e-main-1-27 b664426 link true /test pull-kueue-test-e2e-main-1-27
pull-kueue-test-e2e-main-1-25 b664426 link true /test pull-kueue-test-e2e-main-1-25
pull-kueue-test-e2e-main-1-26 b664426 link true /test pull-kueue-test-e2e-main-1-26
pull-kueue-verify-main b664426 link true /test pull-kueue-verify-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tenzen-y
Copy link
Member Author

tenzen-y commented Dec 8, 2023

I found that my reproducing steps were invalid.
/close

@k8s-ci-robot
Copy link
Contributor

@tenzen-y: Closed this PR.

In response to this:

I found that my reproducing steps were invalid.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tenzen-y tenzen-y deleted the missing-nodeselector-pytorchjob branch December 8, 2023 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants