Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

testing: prow/gpu-operator.sh: use the GPU Operator must-gather image #294

Merged
merged 2 commits into from
Dec 8, 2021

Conversation

kpouget
Copy link
Collaborator

@kpouget kpouget commented Dec 1, 2021

  • build/Dockerfile: unprepare the image for GPU Operator must-gather support

The GPU Operator image now provides its own must-gather script.

https://gitlab.com/nvidia/kubernetes/gpu-operator/-/blob/master/hack/must-gather.sh


  • testing: prow/gpu-operator.sh: use the GPU Operator must-gather image

... or our script toolbox/gpu-operator/must-gather.sh if it doesn't
exist. Support integrated in v1.9.0.


@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2021
@kpouget
Copy link
Collaborator Author

kpouget commented Dec 1, 2021

/test gpu-operator-e2e

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 1, 2021 via email

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 1, 2021

/test gpu-operator-e2e

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 1, 2021

/test gpu-operator-e2e

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 2, 2021

/test gpu-operator-e2e

1 similar comment
@kpouget
Copy link
Collaborator Author

kpouget commented Dec 2, 2021

/test gpu-operator-e2e

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 2, 2021

/test gpu-operator-e2e
/test test-commit

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 2, 2021

/test gpu-operator-e2e
/test test-commit

@kpouget kpouget changed the title WIP: testing: prow/gpu-operator.sh: use the GPU Operator must-gather image testing: prow/gpu-operator.sh: use the GPU Operator must-gather image Dec 2, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 2, 2021
@kpouget
Copy link
Collaborator Author

kpouget commented Dec 2, 2021

new behavior:

  • calls gpu-operator capture_deployment_state is must-gather not available in the image (eg, 1.8.2)

  • calls must-gather in the image is available

  • calls gpu-operator capture_deployment_state after, until we're sure we're happy with the must-gather image.

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-psap_ci-artifacts/294/pull-ci-openshift-psap-ci-artifacts-master-gpu-operator-e2e/1466473963252092928/artifacts/gpu-operator-e2e/presubmit-operatorhub/artifacts/
--> see must-gather script being broken in v1.9.0-beta

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-psap_ci-artifacts/294/pull-ci-openshift-psap-ci-artifacts-master-gpu-operator-e2e/1466473963252092928/artifacts/gpu-operator-e2e/presubmit-master/artifacts/

--> see must-gather script being fixed in master

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 2, 2021

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2021
@kpouget
Copy link
Collaborator Author

kpouget commented Dec 7, 2021

@omertuc PTAL

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 8, 2021

thanks for the feedback @omertuc , I followed the suggestions

/test gpu-operator-e2e
/test test-commit

@omertuc
Copy link
Contributor

omertuc commented Dec 8, 2021

/lgtm
/hold wait for job

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 8, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2021
@kpouget
Copy link
Collaborator Author

kpouget commented Dec 8, 2021

image
did you see the force push when you LGTM? 🙃 seems to be that both happened at the same time :)

anyway:

/test gpu-operator-e2e
/test test-commit

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 8, 2021

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-commit d40638d link true /test test-commit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 8, 2021

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
Test name Commit Details Required Rerun command
ci/prow/test-commit d40638d link true /test test-commit

Full PR test history. Your PR dashboard.

our test passed:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-psap_ci-artifacts/294/pull-ci-openshift-psap-ci-artifacts-master-test-commit/1468561748058443776/artifacts/test-commit/test-commit/artifacts/prow_gpu-operator_test_operatorhub_1.8.2_v1.8/

the failure occured in the must-gather step:

   * could not run steps: step test-commit failed: "test-commit" post steps failed: "test-commit" pod "test-commit-gather-extra" failed: the pod ci-op-m549vkn3/test-commit-gather-extra failed after 59s (failed containers: test): ContainerFailed one or more containers exited 

... or our script `toolbox/gpu-operator/must-gather.sh` if it doesn't
exist. Support integrated in v1.9.0.

test-path: prow gpu-operator test_operatorhub 1.8.2 v1.8
@kpouget
Copy link
Collaborator Author

kpouget commented Dec 8, 2021

/test gpu-operator-e2e

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2021
@omertuc
Copy link
Contributor

omertuc commented Dec 8, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2021
@kpouget
Copy link
Collaborator Author

kpouget commented Dec 8, 2021 via email

@kpouget
Copy link
Collaborator Author

kpouget commented Dec 8, 2021

must-gather looks good!
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-psap_ci-artifacts/294/pull-ci-openshift-psap-ci-artifacts-master-gpu-operator-e2e/1468588232127025152/artifacts/gpu-operator-e2e/presubmit-master/artifacts/011__gpu-operator__must-gather/
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-psap_ci-artifacts/294/pull-ci-openshift-psap-ci-artifacts-master-gpu-operator-e2e/1468588232127025152/artifacts/gpu-operator-e2e/presubmit-operatorhub/artifacts/012__gpu-operator__must-gather/

I'll remote the rmdir in another PR

Running exit finalizers ...
Running finalizer 'collect_must_gather' ...
Running the GPU Operator must-gather image ...
Operator image: nvcr.io/nvidia/gpu-operator@sha256:57e7be259af342ccdf281d23d79e15d474e2e6f506f84c0cdae3db2a199b3395
Copying must-gather results to /logs/artifacts/012__gpu-operator__must-gather ...
rmdir: failed to remove '/tmp/gpu-operator_Rlqi': Directory not empty
Running gpu_operator capture_deployment_state ...
Running gpu_operator capture_deployment_state ... done.

/hold cancel
/approve

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 8, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 8, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kpouget

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit b88b1e3 into openshift-psap:master Dec 8, 2021
@kpouget kpouget deleted the must-gather branch December 8, 2021 17:16
@kpouget kpouget mentioned this pull request Dec 14, 2021
10 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants