New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STOR-1838: add test for vsphere driver snapshot configuration #28717
STOR-1838: add test for vsphere driver snapshot configuration #28717
Conversation
/test ? |
@RomanBednar: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
a66f1f6
to
2327793
Compare
/test e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
2327793
to
6b248e2
Compare
@RomanBednar: This pull request references STOR-1838 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Job Failure Risk Analysis for sha: 6b248e2
|
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c632ad20-fcc9-11ee-81ee-fbc318190246-0 |
}) | ||
|
||
func setClusterCSIDriverSnapshotOptions(oc *exutil.CLI, snapshotOptions string, value int) { | ||
patch := []byte(fmt.Sprintf("{\"spec\":{\"driverConfig\":{\"vSphere\":{\"%s\": %d}}}}", snapshotOptions, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you want to use patch as a string, then at least use backticks to avoid escaping \"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, using Update() instead.
clusterCSIDriverOptions: map[string]int{ | ||
snapshotOptions["vsan"]["clusterCSIDriver"]: 4, | ||
}, | ||
cloudConfigOptions: map[string]int{ | ||
snapshotOptions["vsan"]["cloudConf"]: 4, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so complicated?
clusterCSIDriverOptions: opv1.VSphereCSIDriverConfigSpec {
granularMaxSnapshotsPerBlockVolumeInVSAN: ptr.To(uint32(4)),
},
cloudConfigOptions: map[string]string{
"granular-max-snapshots-per-block-volume-vsan": "4".
}
(note change of the field types)
It will be slightly harder to compute the ClusterCSIDriver patch, but you can use Update()
instead or encode VSphereCSIDriverConfigSpec to json.
On the positive side, it's crystal clear what field is set and what ini file is expected without reading a separate map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed and bumping openshift/api to pull in the new clusterCSIDriver fields.
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
_, err = createTestPod(oc, pvc.Name, oc.Namespace()) | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
// Wait for pvc to be bound. | ||
o.Eventually(func() error { | ||
pvc, err := oc.AdminKubeClient().CoreV1().PersistentVolumeClaims(oc.Namespace()).Get(context.Background(), "test-pvc", metav1.GetOptions{}) | ||
if err != nil { | ||
return err | ||
} | ||
if pvc.Status.Phase != v1.ClaimBound { | ||
return fmt.Errorf("PVC not bound") | ||
} | ||
return nil | ||
}) | ||
|
||
for i := 0; i < t.successfulSnapshotsCreated; i++ { | ||
err := createSnapshot(oc, oc.Namespace(), fmt.Sprintf("test-snapshot-%d", i), "test-pvc") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
} | ||
|
||
// Next snapshot creation should be over the set limit and fail. | ||
err = createSnapshot(oc, oc.Namespace(), "test-snapshot-failed", "test-pvc") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
readyToUse, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.readyToUse}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
errMsg, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.error.message}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
e2e.Logf("VolumeSnapshot error message: %s readyToUse %s", errMsg, readyToUse) | ||
if !strings.Contains(errMsg, "failed to take snapshot of the volume") && readyToUse != "false" { | ||
e2e.Failf("VolumeSnapshot \"test-snapshot-failed\" should have failed and should not be ready to use") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go to its own function, just like loadAndCheckCloudConf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to separate function.
for option, value := range t.cloudConfigOptions { | ||
o.Eventually(func() error { | ||
return loadAndCheckCloudConf(oc, "Snapshot", option, value) | ||
}, time.Minute, time.Second).Should(o.Succeed()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check all three ini file keys you Get
the same object from the API server three times? loadAndCheckCloudConf
can get a map[string]string
and check all of them with a single Get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I missed that, changed.
section, err := cfg.GetSection(sectionName) | ||
if err != nil { | ||
return fmt.Errorf("section %s not found in cloud.conf: %v", sectionName, err) | ||
} | ||
|
||
key, err := section.GetKey(keyName) | ||
if err != nil { | ||
return fmt.Errorf("key %s not found in section %s: %v", keyName, sectionName, err) | ||
} | ||
|
||
o.Expect(key.String()).To(o.Equal(fmt.Sprintf("%d", expectedValue))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check that the other keys are not set.
I wonder what section.KeysHash()
returns. Is it the same map as t.cloudConfigOptions
, if you used map[string]string
instead of int
as I suggest? Will DeepEqual
work then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeepEqual() seems to work well in this case, thanks for the suggestion.
} | ||
|
||
if t.successfulSnapshotsCreated > 0 { | ||
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the PVC deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added removal. The cleanup slows down the test a bit, how bad is it to rely on the project teardown (in oc from util.CLI)?
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
_, err = createTestPod(oc, pvc.Name, oc.Namespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the Pod deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as PVC.
}) | ||
|
||
for i := 0; i < t.successfulSnapshotsCreated; i++ { | ||
err := createSnapshot(oc, oc.Namespace(), fmt.Sprintf("test-snapshot-%d", i), "test-pvc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the snapshots deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as PVC - snapshot removal is taking the longest.
} | ||
|
||
// Next snapshot creation should be over the set limit and fail. | ||
err = createSnapshot(oc, oc.Namespace(), "test-snapshot-failed", "test-pvc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not guaranteed that it's the the last snapshot that is going to fail. The test creates several VolumeSnapshots in a quick succession, the controller + CSI driver may process them in any order. You should wait until the "good snapshots" are ready to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, adding a simple check with o.Eventually() counting ready snapshots.
readyToUse, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.readyToUse}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
errMsg, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.error.message}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
e2e.Logf("VolumeSnapshot error message: %s readyToUse %s", errMsg, readyToUse) | ||
if !strings.Contains(errMsg, "failed to take snapshot of the volume") && readyToUse != "false" { | ||
e2e.Failf("VolumeSnapshot \"test-snapshot-failed\" should have failed and should not be ready to use") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss some loop waiting for the snapshots to get failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding.
6b248e2
to
1dfd605
Compare
/unhold |
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3f876ab0-ff06-11ee-9af2-ff5ed051b6b0-0 |
Job Failure Risk Analysis for sha: 1dfd605
|
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d157a940-ff25-11ee-8c2f-58a38c674904-0 |
Job Failure Risk Analysis for sha: 172d0ba
|
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cdad4690-0c65-11ef-8463-344af23f030f-0 |
Job Failure Risk Analysis for sha: 15db814
|
openshift/vmware-vsphere-csi-driver-operator#230 has merged /payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/32e1d770-0ed1-11ef-97f2-5abf97a9ff70-0 |
registry.AddPathologicalEventMatcherOrDie(&SimplePathologicalEventMatcher{ | ||
name: "StorageOperatorsFlipsProgressingTooOften", | ||
locatorKeyRegexes: map[monitorapi.LocatorKey]*regexp.Regexp{ | ||
monitorapi.LocatorNamespaceKey: regexp.MustCompile(`^openshift-cluster-storage-operator$`), | ||
}, | ||
messageReasonRegex: regexp.MustCompile(`^OperatorStatusChanged$`), | ||
messageHumanRegex: regexp.MustCompile(`Progressing changed.*`), | ||
jira: "https://issues.redhat.com/browse/OCPBUGS-24061", | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you drop this commit and run the payload job again now that we have a fix merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dobsonj Just did that and the tests passed, thanks!
15db814
to
41dd374
Compare
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/467feb70-128a-11ef-8406-c69368090d20-0 |
/unhold |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, RomanBednar, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label acknowledge-critical-fixes-only |
/retest-required |
/test e2e-gcp-ovn-builds |
@RomanBednar: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/test e2e-gcp-ovn-builds |
3aac758
into
openshift:master
[ART PR BUILD NOTIFIER] This PR has been included in build openshift-enterprise-tests-container-v4.17.0-202405171013.p0.g3aac758.assembly.stream.el9 for distgit openshift-enterprise-tests. |
No description provided.