STOR-2793:Add testcases for snapshot of images#30891
STOR-2793:Add testcases for snapshot of images#30891chao007 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@chao007: This pull request references STOR-2793 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds a new GCP-specific e2e test file that validates GCE PD CSI "images" VolumeSnapshot behavior and restore flows for filesystem and block volumes, plus VolumeSnapshotClass checks, multi-snapshot scenarios, and helper functions for PVC/pod lifecycle and cleanup. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.OpenGrep is compatible with Semgrep configurations. Add an |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chao007 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/extended/storage/gcp_snapshot_images.go (1)
558-579: Reuse the shared VolumeSnapshot helper instead of copying it.
createGCPSnapshotduplicates the existingcreateSnapshot(...)apply-manifest flow intest/extended/storage/driver_configuration.go. Extending that shared helper to accept a class name keeps snapshot creation logic in one place and avoids drift.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/storage/gcp_snapshot_images.go` around lines 558 - 579, createGCPSnapshot duplicates the apply-manifest flow implemented by the shared helper createSnapshot in test/extended/storage/driver_configuration.go; refactor to reuse that helper by extending createSnapshot to accept an optional volumeSnapshotClassName (or additional parameter for the snapshot class) and update createGCPSnapshot to call createSnapshot with the namespace, snapshotName, pvcName and the class name, then remove the duplicated manifest construction and apply logic from createGCPSnapshot so all VolumeSnapshot creation is centralized in createSnapshot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/storage/gcp_snapshot_images.go`:
- Around line 335-340: The test currently only compares
boundVolumeSnapshotContentName (handle1/handle2) which doesn't prove
point-in-time restores; update the test to perform actual restores from
snapshot1 and snapshot2 using the oc.Run restore/create-PVC logic (same helper
used elsewhere in this file) and mount or read the restored PVC contents, then
assert that the restore from snapshot1 yields testData1 and the restore from
snapshot2 yields testData2; reference snapshot1, snapshot2, testData1, testData2
and the existing oc.Run("get")/restore helpers to locate where to add the
restore-and-assert steps and clean up restored PVCs afterwards.
- Around line 115-118: The poll loop currently swallows errors from
isSnapshotReady; update each Eventually call that calls isSnapshotReady (e.g.,
the block using isSnapshotReady(oc, snapshotName)) to capture the returned error
and, when ready==false and err!=nil, call getSnapshotErrorMessage(...) (and
include the get error) to surface the real failure instead of timing out — i.e.,
change the lambda to return a boolean only when ready is true, but if err != nil
invoke the test failure with the getSnapshotErrorMessage output (and the
underlying error) so API/controller errors fail immediately; apply the same
change to the other Eventually sites mentioned (lines around 212-215, 304-307,
330-333) referencing the same isSnapshotReady/getSnapshotErrorMessage pattern.
---
Nitpick comments:
In `@test/extended/storage/gcp_snapshot_images.go`:
- Around line 558-579: createGCPSnapshot duplicates the apply-manifest flow
implemented by the shared helper createSnapshot in
test/extended/storage/driver_configuration.go; refactor to reuse that helper by
extending createSnapshot to accept an optional volumeSnapshotClassName (or
additional parameter for the snapshot class) and update createGCPSnapshot to
call createSnapshot with the namespace, snapshotName, pvcName and the class
name, then remove the duplicated manifest construction and apply logic from
createGCPSnapshot so all VolumeSnapshot creation is centralized in
createSnapshot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5e101960-889d-42ff-a8ea-266489063225
📒 Files selected for processing (1)
test/extended/storage/gcp_snapshot_images.go
| o.Eventually(func() bool { | ||
| ready, _ := isSnapshotReady(oc, snapshotName) | ||
| return ready | ||
| }, gcpSnapshotPollTimeout, gcpSnapshotPollInterval).Should(o.BeTrue(), "snapshot should be ready") |
There was a problem hiding this comment.
Don't suppress snapshot readiness errors in the poll loops.
These Eventually blocks discard isSnapshotReady errors, so API failures or controller-reported snapshot errors only surface as a generic 10-minute timeout. test/extended/storage/driver_configuration.go already has getSnapshotErrorMessage(...); use that and the get error to fail with the real reason.
Also applies to: 212-215, 304-307, 330-333
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/storage/gcp_snapshot_images.go` around lines 115 - 118, The
poll loop currently swallows errors from isSnapshotReady; update each Eventually
call that calls isSnapshotReady (e.g., the block using isSnapshotReady(oc,
snapshotName)) to capture the returned error and, when ready==false and
err!=nil, call getSnapshotErrorMessage(...) (and include the get error) to
surface the real failure instead of timing out — i.e., change the lambda to
return a boolean only when ready is true, but if err != nil invoke the test
failure with the getSnapshotErrorMessage output (and the underlying error) so
API/controller errors fail immediately; apply the same change to the other
Eventually sites mentioned (lines around 212-215, 304-307, 330-333) referencing
the same isSnapshotReady/getSnapshotErrorMessage pattern.
|
Scheduling required tests: |
|
@chao007: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 96c84a6
New tests seen in this PR at sha: 96c84a6
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/extended/storage/gcp_snapshot_images.go (1)
113-128:⚠️ Potential issue | 🟠 MajorFail snapshot polling on the first API/controller error.
These
Eventuallyblocks still turnisSnapshotReady(...)failures intofalseand also discard thegetSnapshotErrorMessage(...)lookup error, so a broken snapshot just burns the full 10-minute timeout and loses the real cause. Fail immediately when the readiness check errors, and include any snapshot error/get failure in that message.Also applies to: 220-235, 324-337, 360-373
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/storage/gcp_snapshot_images.go` around lines 113 - 128, Do not swallow snapshot API/controller errors inside the Eventually block: when isSnapshotReady(oc, snapshotName) returns an error, immediately fail the test with e2e.Failf and include that error; likewise, when !ready call getSnapshotErrorMessage(oc, snapshotName) and if it returns a non-empty message or an error, immediately fail with e2e.Failf including both the snapshot error string and any lookup error. Update the anonymous func used by Eventually (the block around isSnapshotReady/getSnapshotErrorMessage) to call e2e.Failf on those error cases instead of returning false so failures surface immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/storage/gcp_snapshot_images.go`:
- Around line 97-103: The current polling loop that waits for pods like
writerPod to reach v1.PodSucceeded never exits early if a pod reaches
v1.PodFailed, causing long timeouts; add a shared helper (e.g.,
waitForPodSucceeded) that polls the pod phase via
oc.AdminKubeClient().CoreV1().Pods(...).Get(...), and if the phase is
v1.PodFailed immediately stop polling, fetch and emit the pod.Status.Reason and
recent logs, and return a failing assertion so tests surface failure
immediately; replace the repeated blocks that call
o.Eventually(...).Should(o.Equal(v1.PodSucceeded)) (locations referencing
writerPod and similar reader pods) with calls to this helper. Ensure the helper
accepts ctx, oc, pod name/object and the polling timeout/interval so it can be
reused in all the listed locations.
---
Duplicate comments:
In `@test/extended/storage/gcp_snapshot_images.go`:
- Around line 113-128: Do not swallow snapshot API/controller errors inside the
Eventually block: when isSnapshotReady(oc, snapshotName) returns an error,
immediately fail the test with e2e.Failf and include that error; likewise, when
!ready call getSnapshotErrorMessage(oc, snapshotName) and if it returns a
non-empty message or an error, immediately fail with e2e.Failf including both
the snapshot error string and any lookup error. Update the anonymous func used
by Eventually (the block around isSnapshotReady/getSnapshotErrorMessage) to call
e2e.Failf on those error cases instead of returning false so failures surface
immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0e1eabf-476e-43ab-ae9c-52de98ed66fc
📒 Files selected for processing (1)
test/extended/storage/gcp_snapshot_images.go
| // Wait for Pod to complete | ||
| e2e.Logf("Waiting for writer Pod to complete") | ||
| o.Eventually(func() v1.PodPhase { | ||
| pod, err := oc.AdminKubeClient().CoreV1().Pods(oc.Namespace()).Get(ctx, writerPod.Name, metav1.GetOptions{}) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| return pod.Status.Phase | ||
| }, gcpSnapshotPollTimeout, gcpSnapshotPollInterval).Should(o.Equal(v1.PodSucceeded)) |
There was a problem hiding this comment.
Stop polling once a pod reaches Failed.
All of these waits only succeed on v1.PodSucceeded, so any writer/reader pod that exits unsuccessfully will sit in Failed until gcpSnapshotPollTimeout expires. That turns real data-mismatch or attach/mount failures into an extra 10 minutes of noise instead of surfacing the pod reason/logs immediately. A shared waitForPodSucceeded helper would fix this in one place.
Also applies to: 159-165, 204-210, 266-272, 311-315, 347-351, 403-407, 430-434
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/storage/gcp_snapshot_images.go` around lines 97 - 103, The
current polling loop that waits for pods like writerPod to reach v1.PodSucceeded
never exits early if a pod reaches v1.PodFailed, causing long timeouts; add a
shared helper (e.g., waitForPodSucceeded) that polls the pod phase via
oc.AdminKubeClient().CoreV1().Pods(...).Get(...), and if the phase is
v1.PodFailed immediately stop polling, fetch and emit the pod.Status.Reason and
recent logs, and return a failing assertion so tests surface failure
immediately; replace the repeated blocks that call
o.Eventually(...).Should(o.Equal(v1.PodSucceeded)) (locations referencing
writerPod and similar reader pods) with calls to this helper. Ensure the helper
accepts ctx, oc, pod name/object and the polling timeout/interval so it can be
reused in all the listed locations.
|
Scheduling required tests: |
Add 5 test cases for STOP-2793.