Add E2E stress test suite for creation / deletion of VolumeSnapshot resources #95971

chrishenzie · 2020-10-29T00:53:26Z

/sig storage

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR introduces an E2E test suite for stress testing the creation / deletion of VolumeSnapshot objects.

It works by spinning up a set of pods, and launching len(pods) goroutines which repeatedly create and delete VolumeSnapshots. Users can configure the number of pods and number of snapshots by setting NumPods and NumSnapshots in their testdriver.yaml.

Which issue(s) this PR fixes:
Fixes #95969

Special notes for your reviewer:
I split this into three distinct changes to make it easier for reviewing. I can squash everything if reviewers prefer.

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Additional details:
The testdriver.yaml described above can be supplied to the e2e.test binary via -storage.testdriver. It serializes to this struct:

kubernetes/test/e2e/storage/testsuites/testdriver.go

Lines 159 to 193 in da941d8

    
           // DriverInfo represents static information about a TestDriver. 
        
           type DriverInfo struct { 
        
           	// Internal name of the driver, this is used as a display name in the test 
        
           	// case and test objects 
        
           	Name string 
        
           	// Fully qualified plugin name as registered in Kubernetes of the in-tree 
        
           	// plugin if it exists and is empty if this DriverInfo represents a CSI 
        
           	// Driver 
        
           	InTreePluginName string 
        
           	FeatureTag       string // FeatureTag for the driver 
        
           	// Maximum single file size supported by this driver 
        
           	MaxFileSize int64 
        
           	// The range of disk size supported by this driver 
        
           	SupportedSizeRange e2evolume.SizeRange 
        
           	// Map of string for supported fs type 
        
           	SupportedFsType sets.String 
        
           	// Map of string for supported mount option 
        
           	SupportedMountOption sets.String 
        
           	// [Optional] Map of string for required mount option 
        
           	RequiredMountOption sets.String 
        
           	// Map that represents plugin capabilities 
        
           	Capabilities map[Capability]bool 
        
           	// [Optional] List of access modes required for provisioning, defaults to 
        
           	// RWO if unset 
        
           	RequiredAccessModes []v1.PersistentVolumeAccessMode 
        
           	// [Optional] List of topology keys driver supports 
        
           	TopologyKeys []string 
        
           	// [Optional] Number of allowed topologies the driver requires. 
        
           	// Only relevant if TopologyKeys is set. Defaults to 1. 
        
           	// Example: multi-zonal disk requires at least 2 allowed topologies. 
        
           	NumAllowedTopologies int 
        
           	// [Optional] Scale parameters for stress tests. 
        
           	StressTestOptions *StressTestOptions 
        
           }

@msau42

k8s-ci-robot · 2020-10-29T00:53:33Z

@chrishenzie: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-10-29T00:53:35Z

Hi @chrishenzie. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

msau42 · 2020-10-29T01:01:12Z

/ok-to-test

msau42 · 2020-10-29T01:01:38Z

/assign @xing-yang @msau42

chrishenzie · 2020-10-30T17:22:58Z

Updated last commit to incorporate similar changes to here, setup / teardown of resources:
#96023

chrishenzie · 2020-10-30T17:47:21Z

/test pull-kubernetes-conformance-kind-ipv6-parallel

xing-yang · 2020-10-31T01:28:03Z

/retest

xing-yang · 2020-10-31T20:53:43Z

test/e2e/storage/testsuites/snapshottable_stress.go

+
+	createPodsAndVolumes := func() {
+		for i := 0; i < stressTest.testOptions.NumPods; i++ {
+			framework.Logf("Creating resources for pod %v/%v", i, stressTest.testOptions.NumPods-1)


Use %d for i and stressTest.testOptions.NumPods-1

xing-yang · 2020-10-31T21:07:40Z

test/e2e/storage/testsuites/snapshottable_stress.go

+
+			if _, err := cs.CoreV1().Pods(pod.Namespace).Create(context.TODO(), pod, metav1.CreateOptions{}); err != nil {
+				stressTest.cancel()
+				framework.Failf("Failed to create pod-%v [%+v]. Error: %v", i, pod, err)


Use %d for i

xing-yang · 2020-10-31T21:07:54Z

test/e2e/storage/testsuites/snapshottable_stress.go

+
+			if err := e2epod.WaitForPodRunningInNamespace(cs, pod); err != nil {
+				stressTest.cancel()
+				framework.Failf("Failed to wait for pod-%v [%+v] turn into running status. Error: %v", i, pod, err)


Use %d for i

xing-yang · 2020-10-31T21:08:42Z

test/e2e/storage/testsuites/snapshottable_stress.go

+
+		var errs []error
+		for _, snapshot := range stressTest.snapshots {
+			framework.Logf("Deleting snapshot %v", snapshot.Vs.GetName())


Can you print out namespace as well? namespace/name

Use %s

xing-yang · 2020-10-31T21:08:58Z

test/e2e/storage/testsuites/snapshottable_stress.go

+			errs = append(errs, snapshot.CleanupResource())
+		}
+		for _, pod := range stressTest.pods {
+			framework.Logf("Deleting pod %v", pod.Name)


xing-yang · 2020-10-31T21:09:12Z

test/e2e/storage/testsuites/snapshottable_stress.go

+			errs = append(errs, e2epod.DeletePodWithWait(cs, pod))
+		}
+		for _, volume := range stressTest.volumes {
+			framework.Logf("Deleting volume %v", volume.Pvc.GetName())


xing-yang · 2020-10-31T21:09:36Z

test/e2e/storage/testsuites/snapshottable_stress.go

+			errs = append(errs, volume.CleanupResource())
+		}
+		errs = append(errs, tryFunc(stressTest.driverCleanup))
+		framework.ExpectNoError(errors.NewAggregate(errs), "While cleaning up resources")


While -> while

xing-yang · 2020-10-31T21:10:54Z

test/e2e/storage/testsuites/snapshottable_stress.go

+					case <-stressTest.ctx.Done():
+						return
+					default:
+						framework.Logf("Pod-%v [%v], Iteration %v/%v", podIndex, pod.Name, j, stressTest.testOptions.NumSnapshots-1)


Use %d for podIndex, use %s for name, and %d for j and NumSnapshots

xing-yang · 2020-10-31T21:11:22Z

test/e2e/storage/testsuites/snapshottable_stress.go

+
+						if err := snapshot.CleanupResource(); err != nil {
+							stressTest.cancel()
+							framework.Failf("Failed to delete snapshot for pod-%v [%+v]. Error: %v", podIndex, pod, err)


use %d for podIndex

xing-yang · 2020-11-01T00:29:57Z

test/e2e/storage/testsuites/snapshottable_stress.go

+						stressTest.snapshots = append(stressTest.snapshots, snapshot)
+						stressTest.snapshotsMutex.Unlock()
+
+						if err := snapshot.CleanupResource(); err != nil {


For testpatterns.DynamicSnapshotRetain, can you change the policy to Delete before deleting the VolumeSnapshot? This is to make sure we don't leave physical snapshot resources behind after cleaning up.

I think the CleanupResource method should be responsible for it instead of each test case

It looks like this is already handled inside of CleanupResource():

kubernetes/test/e2e/storage/testsuites/snapshottable.go

Lines 498 to 505 in c82d5ee

if boundVsContent.Object["spec"].(map[string]interface{})["deletionPolicy"] != "Delete" {

// The purpose of this block is to prevent physical snapshotContent leaks.

// We must update the SnapshotContent to have Delete Deletion policy,

// or else the physical snapshot content will be leaked.

boundVsContent.Object["spec"].(map[string]interface{})["deletionPolicy"] = "Delete"

boundVsContent, err = dc.Resource(SnapshotContentGVR).Update(context.TODO(), boundVsContent, metav1.UpdateOptions{})

framework.ExpectNoError(err)

}

msau42 · 2020-11-04T18:59:25Z

test/e2e/storage/drivers/in_tree.go

@@ -788,6 +788,11 @@ func InitHostPathDriver() testsuites.TestDriver {
 				testsuites.CapSingleNodeVolume: true,
 				testsuites.CapTopology:         true,
 			},
+			StressTestOptions: &testsuites.StressTestOptions{


This should be added for csi hostpath, not intree hostpath

msau42 · 2020-11-04T19:01:01Z

test/e2e/storage/testsuites/driveroperations.go

-				// Name must be unique, so let's base it on namespace name
-				"name": ns + "-" + suffix,
+				// Name must be unique, so let's base it on namespace name and use GenerateName
+				"name": names.SimpleNameGenerator.GenerateName(ns + "-" + suffix),


can you create a followup issue to clean this up later?

msau42 · 2020-11-04T19:01:40Z

test/e2e/storage/testsuites/snapshottable_stress.go

+		tsInfo: TestSuiteInfo{
+			Name: "snapshottable-stress",
+			TestPatterns: []testpatterns.TestPattern{
+				testpatterns.DynamicSnapshotDelete,


We should also add raw block support once the pattern is added in the other pr. cc @Jiawei0227

msau42 · 2020-11-04T19:03:46Z

test/e2e/storage/testsuites/snapshottable_stress.go

+	// Check preconditions before setting up namespace via framework below.
+	ginkgo.BeforeEach(func() {
+		driverInfo = driver.GetDriverInfo()
+		if driverInfo.StressTestOptions == nil {


Hmm maybe fail? Right now it's really easy to miss that a test case got skipped.

Also can you add similar validation to the volume_stress suite?

msau42 · 2020-11-04T19:56:28Z

test/e2e/storage/testsuites/snapshottable_stress.go

+		createPodsAndVolumes()
+	})
+
+	f.AddAfterEach("cleanup", func(f *framework.Framework, failed bool) {


Thanks! Can you reference the bug in the comment so we can followup on it later?

msau42 · 2020-11-04T19:58:54Z

test/e2e/testing-manifests/storage-csi/gce-pd/controller_ss.yaml

@@ -21,7 +21,8 @@ spec:
      serviceAccountName: csi-gce-pd-controller-sa
      containers:
        - name: csi-snapshotter
-          image: gcr.io/gke-release/csi-snapshotter:v2.1.1-gke.0
+          # TODO: Replace this with the gke image once available.


I think it's fine to have the OSS tests use the OSS images.

msau42 · 2020-11-04T20:04:31Z

test/e2e/testing-manifests/storage-csi/gce-pd/controller_ss.yaml

@@ -21,7 +21,8 @@ spec:
      serviceAccountName: csi-gce-pd-controller-sa
      containers:
        - name: csi-snapshotter
-          image: gcr.io/gke-release/csi-snapshotter:v2.1.1-gke.0
+          # TODO: Replace this with the gke image once available.
+          image: k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.2


Can you also update the pdcsi image to v1.0.1-gke.0? There were some fixes related to snapshots I think.

chrishenzie · 2020-11-04T21:05:01Z

Is it possible we need to increase the snapshot timeout or make it configurable? The last failed test logs indicated some snapshots were ready but not all of them.

kubernetes/test/e2e/storage/testsuites/snapshottable.go

Line 464 in 396b90f

    
           err = WaitForSnapshotReady(dc, r.Vs.GetNamespace(), r.Vs.GetName(), framework.Poll, framework.SnapshotCreateTimeout)

kubernetes/test/e2e/framework/util.go

Lines 134 to 135 in 396b90f

    
           // SnapshotCreateTimeout is how long for snapshot to create snapshotContent. 
        
           SnapshotCreateTimeout = 5 * time.Minute

chrishenzie · 2020-11-04T21:20:27Z

Synced offline with @xing-yang, who found this in the logs:

E1104 19:12:18.485645    1 snapshot_controller.go:106] createSnapshot for content 
[snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181]: error occurred in createSnapshotWrapper: 
failed to take snapshot of the volume,
projects/k8s-boskos-gce-project-10/zones/us-west1-b/disks/pvc-1d6c49a5-5c72-4fef-b6f0-5c03a8335de5: 
"rpc error: code = Aborted desc = An operation with the given Volume ID 
projects/k8s-boskos-gce-project-10/zones/us-west1-b/disks/pvc-1d6c49a5-5c72-4fef-b6f0-5c03a8335de5 
already exists"

it seems that GCE only allows for one snapshot to be created for a volume at a time, which is a cause of slowness in this test. I am going to test with fewer snapshots and see if it succeeds in time.

xing-yang · 2020-11-04T21:25:27Z

Yes, I think we need to either make the timeout longer or reduce the number of snapshots being created from the same volume at the same time.

For example, it took more than 7 minutes for readyToUse field of the VolumeSnapshotContent snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181 to become true.

I1104 19:12:17.488887       1 snapshot_controller.go:583] setAnnVolumeSnapshotBeingCreated: volume snapshot content &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181 GenerateName: Namespace: SelfLink: UID:05ce86f7-b244-4ccc-a768-5000fe4bb520 ResourceVersion:1902 Generation:1 CreationTimestamp:2020-11-04 19:12:13 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[snapshot.storage.kubernetes.io/volumesnapshot-being-created:yes] OwnerReferences:[] Finalizers:[snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection] ClusterName: ManagedFields:[{Manager:snapshot-controller Operation:Update APIVersion:snapshot.storage.k8s.io/v1beta1 Time:2020-11-04 19:12:15 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:finalizers":{".":{},"v:\"snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection\"":{}}},"f:spec":{".":{},"f:deletionPolicy":{},"f:driver":{},"f:source":{".":{},"f:volumeHandle":{}},"f:volumeSnapshotClassName":{},"f:volumeSnapshotRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}}} {Manager:csi-snapshotter Operation:Update APIVersion:snapshot.storage.k8s.io/v1beta1 Time:2020-11-04 19:12:17 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:annotations":{".":{},"f:snapshot.storage.kubernetes.io/volumesnapshot-being-created":{}}}}}]} Spec:{VolumeSnapshotRef:{Kind:VolumeSnapshot Namespace:snapshottable-stress-4994 Name:snapshot-p2h8p UID:34d9d19d-036e-4c97-b9c1-8eb9d0011181 APIVersion:snapshot.storage.k8s.io/v1beta1 ResourceVersion:1543 FieldPath:} DeletionPolicy:Delete Driver:pd.csi.storage.gke.io VolumeSnapshotClassName:0xc000642ea0 Source:{VolumeHandle:0xc000642e90 SnapshotHandle:<nil>}} Status:<nil>}
I1104 19:12:17.489006       1 snapshotter.go:56] CSI CreateSnapshot: snapshot-34d9d19d-036e-4c97-b9c1-8eb9d0011181
……
I1104 19:12:18.485620       1 snapshot_controller.go:166] updating VolumeSnapshotContent[snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181] error status failed Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181": the object has been modified; please apply your changes to the latest version and try again
E1104 19:12:18.485645       1 snapshot_controller.go:106] createSnapshot for content [snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181]: error occurred in createSnapshotWrapper: failed to take snapshot of the volume, projects/k8s-boskos-gce-project-10/zones/us-west1-b/disks/pvc-1d6c49a5-5c72-4fef-b6f0-5c03a8335de5: "rpc error: code = Aborted desc = An operation with the given Volume ID projects/k8s-boskos-gce-project-10/zones/us-west1-b/disks/pvc-1d6c49a5-5c72-4fef-b6f0-5c03a8335de5 already exists"
E1104 19:12:18.485691       1 snapshot_controller_base.go:261] could not sync content "snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181": failed to take snapshot of the volume, projects/k8s-boskos-gce-project-10/zones/us-west1-b/disks/pvc-1d6c49a5-5c72-4fef-b6f0-5c03a8335de5: "rpc error: code = Aborted desc = An operation with the given Volume ID projects/k8s-boskos-gce-project-10/zones/us-west1-b/disks/pvc-1d6c49a5-5c72-4fef-b6f0-5c03a8335de5 already exists"
……
I1104 19:12:45.371504       1 snapshot_controller.go:384] updateSnapshotContentStatus: updating VolumeSnapshotContent [snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181], snapshotHandle projects/k8s-boskos-gce-project-10/global/snapshots/snapshot-34d9d19d-036e-4c97-b9c1-8eb9d0011181, readyToUse false, createdAt 1604517156388000000, size 5368709120
……
I1104 19:19:50.185945       1 snapshot_controller.go:384] updateSnapshotContentStatus: updating VolumeSnapshotContent [snapcontent-34d9d19d-036e-4c97-b9c1-8eb9d0011181], snapshotHandle projects/k8s-boskos-gce-project-10/global/snapshots/snapshot-34d9d19d-036e-4c97-b9c1-8eb9d0011181, readyToUse true, createdAt 1604517156388000000, size 5368709120

msau42 · 2020-11-04T23:30:54Z

test/e2e/storage/drivers/csi.go

@@ -490,6 +495,11 @@ func InitGcePDCSIDriver() testsuites.TestDriver {
 			StressTestOptions: &testsuites.StressTestOptions{
 				NumPods:     10,


Should we increase the number of pods here?

Maybe we need the snapshot test to use its own set of options, so it won't impact the volume stress test settings

I wanted to rename this to VolumeStressTestOptions but it seems like we'd need to mark this as deprecated if users depend on this in their testdriver.yaml files. Maybe we can add some custom serialization logic and log a deprecation warning? What is the timeline for fully deprecating something like this?

Maybe just mark this with a TODO and a tracking issue for the time being, adding a custom decoder seems more involved.

How about we create a new VolumeSnapshotStressTestOptions?

Filed #96241 for this work.

Agreed, added. I can rename it to that to be even more specific.

chrishenzie · 2020-11-05T01:30:48Z

/test pull-kubernetes-conformance-kind-ga-only-parallel

xing-yang · 2020-11-05T02:29:35Z

Can you squash your commits?

Introduces a new test suite that creates and deletes many VolumeSnapshots simultaneously to test snapshottable storage plugins under load.

xing-yang · 2020-11-05T19:50:49Z

/lgtm

xing-yang · 2020-11-05T19:55:05Z

@msau42 do you have more comments?

msau42 · 2020-11-05T20:11:51Z

/approve
Thanks Chris, this is really great! I think as a followup, it would be nice to also get stress testing on the restore flow.

@xing-yang I noticed that hostpath stress tests are running longer than the gce tests. That's a bit surprising, as hostpath driver doesn't do anything and should be much faster. Can you take a look?

k8s-ci-robot · 2020-11-05T20:14:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrishenzie, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster/addons/volumesnapshots/OWNERS~~ [msau42]
~~test/e2e/storage/OWNERS~~ [msau42]
~~test/e2e/testing-manifests/storage-csi/OWNERS~~ [msau42]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

xing-yang · 2020-11-05T21:03:32Z

@xing-yang I noticed that hostpath stress tests are running longer than the gce tests. That's a bit surprising, as hostpath driver doesn't do anything and should be much faster. Can you take a look?

Maybe because hostpath driver creates 10 snapshots while gce creates 2 per volume? I'll take a look.

msau42 · 2020-11-05T21:09:15Z

Xing Yang I noticed that hostpath stress tests are running longer than the gce tests. That's a bit surprising, as hostpath driver doesn't do anything and should be much faster. Can you take a look?

Maybe because hostpath driver creates 10 snapshots while gce creates 2 per volume? I'll take a look.

Ah I did my math wrong. Yes, I think the hostpath test creates 100 snapshots whereas the gce test creates 40. That makes sense, thanks!

chrishenzie · 2020-11-05T21:34:46Z

If that is an issue let me know and I can reduce it to maybe 4 pods and 10 snapshots to be consistent.

xing-yang · 2020-11-06T01:43:01Z

If that is an issue let me know and I can reduce it to maybe 4 pods and 10 snapshots to be consistent.

@chrishenzie I don't think you need to change anything. That's fine. Thanks for the great work!

…-#95971-upstream-release-1.19 Automated cherry pick of #95971: E2E stress test suite for VolumeSnapshots

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 29, 2020

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 29, 2020

chrishenzie changed the title ~~E2e stress snapshots~~ Add E2E stress test suite for creation / deletion of VolumeSnapshot resources Oct 29, 2020

k8s-ci-robot requested review from verult and xing-yang October 29, 2020 00:54

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 29, 2020

k8s-ci-robot assigned msau42 and xing-yang Oct 29, 2020

chrishenzie mentioned this pull request Oct 29, 2020

Add e2e stress test kubernetes-csi/external-snapshotter#361

Closed

chrishenzie force-pushed the e2e-stress-snapshots branch from 1ca3a6e to 8429d12 Compare October 30, 2020 17:21

chrishenzie force-pushed the e2e-stress-snapshots branch from 8429d12 to 442d0dd Compare October 30, 2020 17:24

chrishenzie force-pushed the e2e-stress-snapshots branch from 442d0dd to 4ee69f6 Compare October 30, 2020 18:25

xing-yang reviewed Oct 31, 2020

View reviewed changes

xing-yang reviewed Nov 1, 2020

View reviewed changes

chrishenzie force-pushed the e2e-stress-snapshots branch from 67c938d to 61ef469 Compare November 4, 2020 18:44

msau42 reviewed Nov 4, 2020

View reviewed changes

chrishenzie force-pushed the e2e-stress-snapshots branch from 61ef469 to 4064950 Compare November 4, 2020 20:58

chrishenzie force-pushed the e2e-stress-snapshots branch from 4064950 to bc50ef4 Compare November 4, 2020 21:18

chrishenzie force-pushed the e2e-stress-snapshots branch from bc50ef4 to d3120a4 Compare November 4, 2020 22:35

msau42 reviewed Nov 4, 2020

View reviewed changes

chrishenzie force-pushed the e2e-stress-snapshots branch 2 times, most recently from 7272818 to bfc2e29 Compare November 5, 2020 00:52

E2E stress test suite for VolumeSnapshots

fb6bc4f

Introduces a new test suite that creates and deletes many VolumeSnapshots simultaneously to test snapshottable storage plugins under load.

chrishenzie force-pushed the e2e-stress-snapshots branch from bfc2e29 to fb6bc4f Compare November 5, 2020 16:59

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2020

k8s-ci-robot merged commit 0bb7328 into kubernetes:master Nov 5, 2020

k8s-ci-robot added this to the v1.20 milestone Nov 5, 2020

chrishenzie deleted the e2e-stress-snapshots branch November 5, 2020 22:36

annapendleton mentioned this pull request Dec 14, 2020

Automated cherry pick of #95971: E2E stress test suite for VolumeSnapshots #97299

Merged

k8s-ci-robot added a commit that referenced this pull request Dec 17, 2020

Merge pull request #97299 from annapendleton/automated-cherry-pick-of…

cf7777a

…-#95971-upstream-release-1.19 Automated cherry pick of #95971: E2E stress test suite for VolumeSnapshots

	// DriverInfo represents static information about a TestDriver.
	type DriverInfo struct {
	// Internal name of the driver, this is used as a display name in the test
	// case and test objects
	Name string
	// Fully qualified plugin name as registered in Kubernetes of the in-tree
	// plugin if it exists and is empty if this DriverInfo represents a CSI
	// Driver
	InTreePluginName string
	FeatureTag string // FeatureTag for the driver

	// Maximum single file size supported by this driver
	MaxFileSize int64
	// The range of disk size supported by this driver
	SupportedSizeRange e2evolume.SizeRange
	// Map of string for supported fs type
	SupportedFsType sets.String
	// Map of string for supported mount option
	SupportedMountOption sets.String
	// [Optional] Map of string for required mount option
	RequiredMountOption sets.String
	// Map that represents plugin capabilities
	Capabilities map[Capability]bool
	// [Optional] List of access modes required for provisioning, defaults to
	// RWO if unset
	RequiredAccessModes []v1.PersistentVolumeAccessMode
	// [Optional] List of topology keys driver supports
	TopologyKeys []string
	// [Optional] Number of allowed topologies the driver requires.
	// Only relevant if TopologyKeys is set. Defaults to 1.
	// Example: multi-zonal disk requires at least 2 allowed topologies.
	NumAllowedTopologies int
	// [Optional] Scale parameters for stress tests.
	StressTestOptions *StressTestOptions
	}

	if boundVsContent.Object["spec"].(map[string]interface{})["deletionPolicy"] != "Delete" {
	// The purpose of this block is to prevent physical snapshotContent leaks.
	// We must update the SnapshotContent to have Delete Deletion policy,
	// or else the physical snapshot content will be leaked.
	boundVsContent.Object["spec"].(map[string]interface{})["deletionPolicy"] = "Delete"
	boundVsContent, err = dc.Resource(SnapshotContentGVR).Update(context.TODO(), boundVsContent, metav1.UpdateOptions{})
	framework.ExpectNoError(err)
	}

		@@ -490,6 +495,11 @@ func InitGcePDCSIDriver() testsuites.TestDriver {
		StressTestOptions: &testsuites.StressTestOptions{
		NumPods: 10,

Add E2E stress test suite for creation / deletion of VolumeSnapshot resources #95971

Add E2E stress test suite for creation / deletion of VolumeSnapshot resources #95971

Conversation

chrishenzie commented Oct 29, 2020

k8s-ci-robot commented Oct 29, 2020

k8s-ci-robot commented Oct 29, 2020

msau42 commented Oct 29, 2020

msau42 commented Oct 29, 2020

chrishenzie commented Oct 30, 2020

chrishenzie commented Oct 30, 2020

xing-yang commented Oct 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrishenzie commented Nov 4, 2020 • edited

chrishenzie commented Nov 4, 2020 • edited

xing-yang commented Nov 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrishenzie commented Nov 5, 2020

xing-yang commented Nov 5, 2020

xing-yang commented Nov 5, 2020

xing-yang commented Nov 5, 2020

msau42 commented Nov 5, 2020

k8s-ci-robot commented Nov 5, 2020

xing-yang commented Nov 5, 2020

msau42 commented Nov 5, 2020

chrishenzie commented Nov 5, 2020

xing-yang commented Nov 6, 2020

chrishenzie commented Nov 4, 2020 •

edited

chrishenzie commented Nov 4, 2020 •

edited