Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[node-e2e] add test cases for serialize and parallel image pulling #121604

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

pacoxu
Copy link
Member

@pacoxu pacoxu commented Oct 30, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR fixes:

xref kubernetes/enhancements#3673

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3673-kubelet-parallel-image-pull-limit

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 30, 2023
@pacoxu pacoxu changed the title Add Image pull e2e [node-e2e] add test cases for serialize and parallel image pulling Oct 30, 2023
@pacoxu pacoxu force-pushed the image-pull-e2e branch 10 times, most recently from 724eb51 to 390ac56 Compare October 31, 2023 10:15
@k8s-ci-robot k8s-ci-robot added the area/dependency Issues or PRs related to dependency changes label Oct 31, 2023
@pacoxu
Copy link
Member Author

pacoxu commented Oct 31, 2023

/remove-area dependency

@k8s-ci-robot k8s-ci-robot removed the area/dependency Issues or PRs related to dependency changes label Oct 31, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 29, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Feb 29, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

Add an logic to check if the pull happened together. If not, skip it.

@pacoxu
Copy link
Member Author

pacoxu commented Mar 1, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

@pacoxu
Copy link
Member Author

pacoxu commented Mar 1, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

@pacoxu
Copy link
Member Author

pacoxu commented Mar 1, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

@pacoxu
Copy link
Member Author

pacoxu commented Mar 1, 2024

/hold
still a problem

Image pulling sometimes is so quick and it used only ~100ms. Changing to use a large image may help.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Mar 5, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

Both passed.

@pacoxu
Copy link
Member Author

pacoxu commented Mar 5, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

@pacoxu
Copy link
Member Author

pacoxu commented Mar 6, 2024

/hold still a problem

Image pulling sometimes is so quick and it used only ~100ms. Changing to use a large image may help.

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

Last two runs of both are passed. It would be more stable to use a larger image instead.

@pacoxu
Copy link
Member Author

pacoxu commented Mar 7, 2024

flake for unrelated test: E2eNode Suite: [It] [sig-node] Device Manager [Serial] [Feature:DeviceManager] [NodeFeature:DeviceManager] With sample device plugin [Serial] [Disruptive] should deploy pod consuming devices first but fail with admission error after kubelet restart in case device plugin hasn't re-registered

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

Recent 6+ run are not flake.
/unhold

@ruiwen-zhao @SergeyKanzhelev would you take a look again?

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 7, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pacoxu, ruiwen-zhao
Once this PR has been reviewed and has the lgtm label, please ask for approval from dims. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pacoxu
Copy link
Member Author

pacoxu commented Mar 8, 2024

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2
I changed to use cuda image as it will takes more time to pulling image. This will make the test more stable.

@pacoxu
Copy link
Member Author

pacoxu commented Mar 11, 2024

after using a big image, the time wait time should be larger.

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-node-e2e-containerd-serial-ec2

@k8s-ci-robot
Copy link
Contributor

@pacoxu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-e2e-containerd-serial-ec2 8741280 link false /test pull-kubernetes-node-e2e-containerd-serial-ec2
pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial 8741280 link false /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pacoxu
Copy link
Member Author

pacoxu commented Apr 1, 2024

The changed Image is so big and the test timeout after 10 minutes. I need to find another image for the test 😓.

@bart0sh bart0sh moved this from WIP to Waiting on Author in SIG Node PR Triage Apr 11, 2024
@SergeyKanzhelev
Copy link
Member

The changed Image is so big and the test timeout after 10 minutes. I need to find another image for the test 😓.

So the flake before was that the image was too fast to download? Since the test is mostly timestamps comparison, why was it flaking? Was the download spees faster than a second?

If this is the case, the next step may be to install the ImageService proxy that will introduce the delay. It is not such a bad idea long term as it will allow to test other failure modes and edge cases in future. But it will require maintaining this proxy. @ruiwen-zhao @pacoxu wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG Node CI/Test Board
PRs Waiting on Author
SIG Node PR Triage
Waiting on Author
Development

Successfully merging this pull request may close these issues.

None yet

9 participants