Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix requeue logic in the common controller #317

Merged
merged 1 commit into from Jul 8, 2020

Conversation

xing-yang
Copy link
Collaborator

What type of PR is this?
/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fixes the requeue logic in the snapshot controller.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2020
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 22, 2020
@xing-yang xing-yang force-pushed the requeue_common branch 2 times, most recently from cdd0b80 to 815ab01 Compare May 22, 2020 14:59
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 22, 2020
@xing-yang xing-yang changed the title WIP: Fix requeue logic in the common controller Fix requeue logic in the common controller May 27, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 27, 2020
@xing-yang
Copy link
Collaborator Author

/assign @yuxiangqian

Copy link
Contributor

@yuxiangqian yuxiangqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP

pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller.go Show resolved Hide resolved
pkg/common-controller/snapshot_controller_base.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller_base.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller_base.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller_base.go Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@xing-yang xing-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuxiangqian Addressed your comments. PTAL. Thanks.

pkg/common-controller/snapshot_controller.go Show resolved Hide resolved
pkg/common-controller/snapshot_controller.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller_base.go Outdated Show resolved Hide resolved
pkg/common-controller/snapshot_controller_base.go Outdated Show resolved Hide resolved
@xing-yang
Copy link
Collaborator Author

/assign @bswartz

@xing-yang
Copy link
Collaborator Author

@yuxiangqian Addressed your comments. PTAL.

@bswartz
Copy link
Contributor

bswartz commented Jun 11, 2020

Tested this PR, and it works as well or better than master in my torture tests.

@xing-yang
Copy link
Collaborator Author

Thanks @bswartz !

@msau42
Copy link
Collaborator

msau42 commented Jun 11, 2020

@bswartz side topic but is it possible to generalize your stress test and add it to the kubernetes storage e2e test suite? I think we don't do enough stress testing in k8s so this would be a great addition.

@bswartz
Copy link
Contributor

bswartz commented Jun 11, 2020

@bswartz side topic but is it possible to generalize your stress test and add it to the kubernetes storage e2e test suite? I think we don't do enough stress testing in k8s so this would be a great addition.

My stress test is just an open-ended loop that creates volume from snapshots and snapshots from volumes, reading and writing data each time. I kick it off and let it run for an arbitrary period of time, sometime 5 minutes, sometimes 5 hours.

The pseudocode looks like this:

create_empty_volume(0) // creates empty volume 0
for i = 0; true; i++ {
create_pod_and_read_write_data(i) // creates a pod to read volume i, validate the data, and write some new data, and flush
create_snapshot(i) // create snapshot i from volume i
delete_pod(i)
delete_volume(i)
create_volume_from_snap(i+1) // creates volume i+1 from snapshot i
delete_snapshot(i)
}

After a few hours this loop can run hundreds of times. I was able to shake out some bugs in my CSI drivers using this approach because certain failures only manifested after a dozen or so cycles of create/delete.

I'm not sure how this can become an e2e test because of its open-ended nature. There's no obvious place to stop.

@msau42
Copy link
Collaborator

msau42 commented Jun 11, 2020

I've added a similar stress test (without snapshots) and made the parallelization and # iterations configurable:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/testsuites/stress.go

@yuxiangqian
Copy link
Contributor

/cc @AndiLi99

@k8s-ci-robot
Copy link
Contributor

@yuxiangqian: GitHub didn't allow me to request PR reviews from the following users: andili99.

Note that only kubernetes-csi members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @AndiLi99

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yuxiangqian
Copy link
Contributor

@xing-yang lgtm, please squash

@xing-yang
Copy link
Collaborator Author

@yuxiangqian Squashed commits. PTAL.

@yuxiangqian
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 7, 2020
@yuxiangqian
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xing-yang, yuxiangqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [xing-yang,yuxiangqian]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 097b1fc into kubernetes-csi:master Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants