Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP DNM testing CI #119590

Closed

Conversation

ffromani
Copy link
Contributor

DNM Test CI

Signed-off-by: Francesco Romani <fromani@redhat.com>
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Jul 26, 2023
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 26, 2023
@k8s-ci-robot
Copy link
Contributor

This cherry pick PR is for a release branch and has not yet been approved by Release Managers.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick, it must first be approved (/lgtm + /approve) by the relevant OWNERS.

If you didn't cherry-pick this change to all supported release branches, please leave a comment describing why other cherry-picks are not needed to speed up the review process.

If you're not sure is it required to cherry-pick this change to all supported release branches, please consult the cherry-pick guidelines document.

AFTER it has been approved by code owners, please leave the following comment on a line by itself, with no leading whitespace: /cc kubernetes/release-managers

(This command will request a cherry pick review from Release Managers and should work for all GitHub users, whether they are members of the Kubernetes GitHub organization or not.)

For details on the patch release process and schedule, see the Patch Releases page.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 26, 2023
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 26, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 26, 2023
@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@bart0sh bart0sh added this to WIP in SIG Node PR Triage Jul 26, 2023
ffromani added a commit to ffromani/test-infra that referenced this pull request Jul 26, 2023
The gpu jobs on 1.27 branch (and likely earlier) are failing at startup
stage (xref: kubernetes/kubernetes#119590
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/119590/pull-kubernetes-e2e-gce-device-plugin-gpu/1684154811563380736/
)

A clear stand out is the error message
```
I0726 11:21:54.238]       Begin Captured GinkgoWriter Output >>
I0726 11:21:54.238]         ...
I0726 11:21:54.238]         Jul 26 11:21:52.897: INFO: Nvidia GPUs not available on Node: "e2e-4a2d554360-8baae-minion-group-dhf5"
I0726 11:21:54.238]         Jul 26 11:21:52.925: INFO: Get container nvidia-driver-installer-28m4c/nvidia-driver-installer usage on node e2e-4a2d554360-8baae-minion-group-trvc. CPUUsageInCores: 1.146806814, MemoryUsageInBytes: 4332822528, MemoryWorkingSetInBytes: 621940736
I0726 11:21:54.238]         Jul 26 11:21:52.925: INFO: Get container nvidia-gpu-device-plugin-tdtrj/nvidia-gpu-device-plugin usage on node e2e-4a2d554360-8baae-minion-group-trvc. CPUUsageInCores: 3.468e-05, MemoryUsageInBytes: 1650688, MemoryWorkingSetInBytes: 1650688
I0726 11:21:54.238]         Jul 26 11:21:53.342: INFO: Get container nvidia-gpu-device-plugin-2zrxr/nvidia-gpu-device-plugin usage on node e2e-4a2d554360-8baae-minion-group-dhf5. CPUUsageInCores: 4.1366e-05, MemoryUsageInBytes: 1503232, MemoryWorkingSetInBytes: 1503232
I0726 11:21:54.238]         Jul 26 11:21:53.342: INFO: Get container nvidia-driver-installer-fvzlz/nvidia-driver-installer usage on node e2e-4a2d554360-8baae-minion-group-dhf5. CPUUsageInCores: 0.996069722, MemoryUsageInBytes: 4413071360, MemoryWorkingSetInBytes: 271339520
I0726 11:21:54.238]         Jul 26 11:21:53.898: INFO: Getting list of Nodes from API server
I0726 11:21:54.238]         Jul 26 11:21:53.941: INFO: gpuResourceName nvidia.com/gpu
I0726 11:21:54.238]         Jul 26 11:21:53.941: INFO: Nvidia GPUs not available on Node: "e2e-4a2d554360-8baae-minion-group-dhf5"
I0726 11:21:54.238]         Jul 26 11:21:54.170: INFO: Get container nvidia-driver-installer-5clvp/nvidia-driver-installer usage on node e2e-4a2d554360-8baae-minion-group-nbjs. CPUUsageInCores: 0.977369235, MemoryUsageInBytes: 4303605760, MemoryWorkingSetInBytes: 269389824
I0726 11:21:54.238]         Jul 26 11:21:54.170: INFO: Get container nvidia-gpu-device-plugin-8nbnk/nvidia-gpu-device-plugin usage on node e2e-4a2d554360-8baae-minion-group-nbjs. CPUUsageInCores: 3.7186e-05, MemoryUsageInBytes: 1507328, MemoryWorkingSetInBytes: 1507328
I0726 11:21:54.238]       << End Captured GinkgoWriter Output
```

Crosschecking the diff with the working pre-submit job, the lack of
preset stands out.  From prow docs it seems presets aren't inherited
across files, so let's clone them from the working job.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 26, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ffromani
Once this PR has been reviewed and has the lgtm label, please assign ahg-g, klueska for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 26, 2023
@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 27, 2023
@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

1 similar comment
@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@ffromani
Copy link
Contributor Author

/test pull-kubernetes-e2e-capz-windows-1-27

1 similar comment
@ffromani
Copy link
Contributor Author

ffromani commented Aug 2, 2023

/test pull-kubernetes-e2e-capz-windows-1-27

@ffromani
Copy link
Contributor Author

ffromani commented Aug 2, 2023

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@ffromani
Copy link
Contributor Author

ffromani commented Aug 3, 2023

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@ffromani
Copy link
Contributor Author

ffromani commented Aug 8, 2023

/test pull-kubernetes-e2e-capz-windows-1-27

@k8s-ci-robot
Copy link
Contributor

@ffromani: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-device-plugin-gpu 55fe670 link false /test pull-kubernetes-e2e-gce-device-plugin-gpu
pull-kubernetes-e2e-capz-windows-1-27 55fe670 link false /test pull-kubernetes-e2e-capz-windows-1-27

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ffromani
Copy link
Contributor Author

there were real issues, fixed by kubernetes/test-infra#30450 and kubernetes/test-infra#30352

@ffromani ffromani closed this Aug 22, 2023
SIG Node CI/Test Board automation moved this from PRs Waiting on Author to Done Aug 22, 2023
SIG Node PR Triage automation moved this from WIP to Done Aug 22, 2023
@ffromani ffromani deleted the test-ci-20230726-1.27 branch August 22, 2023 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants