Skip to content

Merge https://github.com/vmware-tanzu/velero:main (66ac235) into oadp-dev#483

Open
oadp-rebasebot-app[bot] wants to merge 56 commits intoopenshift:oadp-devfrom
oadp-rebasebot:rebase-bot-oadp-dev
Open

Merge https://github.com/vmware-tanzu/velero:main (66ac235) into oadp-dev#483
oadp-rebasebot-app[bot] wants to merge 56 commits intoopenshift:oadp-devfrom
oadp-rebasebot:rebase-bot-oadp-dev

Conversation

@oadp-rebasebot-app
Copy link

@oadp-rebasebot-app oadp-rebasebot-app bot commented Mar 12, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed Data Block Restore getting stuck when CSI snapshots are unavailable in cloud provider.
    • Improved error detection and cleanup for snapshot content failures.
  • Documentation

    • Added itemOperationTimeout field to Schedule API type documentation.
    • Removed DocSearch integration from documentation site.
  • Tests

    • Added end-to-end test coverage for restore with multiple exec hooks.

dongqingcc and others added 30 commits March 3, 2026 17:53
…estore exec hooks.

Signed-off-by: dongqingcc <dongqingcc@vmware.com>
Revert PR 6105.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
…mware-tanzu#9581)

* Fix DBR stuck when CSI snapshot no longer exists in cloud provider

During backup deletion, VolumeSnapshotContentDeleteItemAction creates a
new VSC with the snapshot handle from the backup and polls for readiness.
If the underlying snapshot no longer exists (e.g., deleted externally),
the CSI driver reports Status.Error but checkVSCReadiness() only checks
ReadyToUse, causing it to poll for the full 10-minute timeout instead of
failing fast. Additionally, the newly created VSC is never cleaned up on
failure, leaving orphaned resources in the cluster.

This commit:
- Adds Status.Error detection in checkVSCReadiness() to fail immediately
  on permanent CSI driver errors (e.g., InvalidSnapshot.NotFound)
- Cleans up the dangling VSC when readiness polling fails

Fixes vmware-tanzu#9579

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Add changelog for PR vmware-tanzu#9581

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

* Fix typo in pod_volume_test.go: colume -> volume

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

---------

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
The itemOperationTimeout field was missing from the Schedule API type
documentation even though it is supported in the Schedule CRD template.
This led users to believe the field was not available per-schedule.

Fixes vmware-tanzu#9598

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
…disable_search_in_site

Disable Algolia docs search
Signed-off-by: lou <alex1988@outlook.com>
Co-authored-by: lou <alex1988@outlook.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
1. Skip deleting the restore files from storage if the backup/BSL is not found
2. Allow deleting the restore files from storage even though the BSL is readonly

Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Add PSA audit and warn labels.

Signed-off-by: Xun Jiang <jxun@vmware.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: allenxu404 <qix2@vmware.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
(cherry picked from commit 8e01d1b)

Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
Replace the base image with paketobuildpacks image

Fixes vmware-tanzu#6851

Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: allenxu404 <qix2@vmware.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Use informer cache with dynamic client for Get calls on restore
When enabled, also make the Get call before create.

Add server and install parameter to allow disabling this feature,
but enable by default

Signed-off-by: Scott Seago <sseago@redhat.com>
When creating resources with generateName, apimachinery
does not guarantee uniqueness when it appends the random
suffix to the generateName stub, so if it fails with
already exists error, we need to retry.

Signed-off-by: Scott Seago <sseago@redhat.com>
Signed-off-by: Sebastian Glab <sglab@catalogicsoftware.com>
Signed-off-by: allenxu404 <qix2@vmware.com>
PVC block mode backup and restore introduced some OS specific
system calls. Those calls are not available for Windows, so
add both non Windows version and Windows version code, and
return error for block mode on the Windows platform.

Signed-off-by: Xun Jiang <jxun@vmware.com>
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
Signed-off-by: allenxu404 <qix2@vmware.com>
add UBI dockerfiles
Use numeric user for velero-restic-restore-helper
Enable multiarch builds (openshift#135)
Use arm64-graviton2 for arm builds (openshift#137)
Add required keys for arm builds (openshift#139)
Update Travis build job to work w/o changes on new branches
Use a full VM for arm
Use numeric non-root user for nonroot SCC compatibility
kaovilai and others added 21 commits March 12, 2026 02:06
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fix: ARM images

Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>

* fixup! fix: ARM images

Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>

---------

Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
…#336)

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
…openshift#334) (openshift#338)

add missing unit test for kopia hashing algo (openshift#337)

Introduction of downstream only option to override Kopia default:
 - hashing algorithm
 - splitting algorithm
 - encryption algorithm

With introduction of 3 environment variables it is possible to override
Kopia algorithms used by Velero:

KOPIA_HASHING_ALGORITHM
KOPIA_SPLITTER_ALGORITHM
KOPIA_ENCRYPTION_ALGORITHM

If the env algorithms are not set or they are not within
Kopia SupportedAlgorithms, the default algorithm will be used.
This behavior is consistent with current behavior without this
change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
Signed-off-by: Shubham Pampattiwar <shubhampampattiwar7@gmail.com>
The rework of Makefile to make it more readable and
inclusion of lint as a target as well extract
golangci-lint version from the upstream Dockerfile,
so we test in PROW or locally on the same version as upstream.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
This fixes the PR openshift#334 where one additional line was
in the code. This was not exposed previously as we
did not had downstream CI Lint jobs.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
* run oadp-operator e2e test from the velero repo

execute openshift/oadp-operator e2e tests directly
against the velero repo locally or via prow ci

Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>

* update variable names, add a cleanup

* make sure env variable overrides default velero_image

Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>

* add options to build, push, and only test

Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>

* add arch to name

Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>

* remove duplicated clean/rm operator checkout

* simplify by dropping export var and use a oneliner

Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>

* drop export and use oneliner

Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>

* just in case, allow oadp to be deployed from makefile

Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>

* Update Makefile.prow

Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>

---------

Signed-off-by: Wesley Hayutin <weshayutin@gmail.com>
Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
Fixes linting error.

Signed-off-by: oadp-team-rebase-bot <oadp-maintainers@redhat.com>
Fix golangci-lint version extraction and disable concat-loop check.

Signed-off-by: Michal Pryc <mpryc@redhat.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 12, 2026

Walkthrough

The pull request addresses a CSI snapshot cleanup issue by adding error handling and fast-fail logic to VolumeSnapshotContent processing, includes corresponding unit and end-to-end tests, removes DocSearch integration from documentation, adds itemOperationTimeout field documentation, and updates the changelog.

Changes

Cohort / File(s) Summary
CSI VolumeSnapshotContent Error Handling
internal/delete/actions/csi/volumesnapshotcontent_action.go, internal/delete/actions/csi/volumesnapshotcontent_action_test.go
Added cleanup logic to delete created VSC on timeout in Execute method; added fast-fail path in checkVSCReadiness to detect Status.Error.Message. Includes test cases for error scenarios and helper functions (boolPtr, stringPtr).
DocSearch Integration Removal
site/algolia-crawler.json, site/layouts/docs/docs.html, site/layouts/partials/head-docs.html
Removed Algolia DocSearch crawler configuration, client-side search UI initialization, and external DocSearch resources (preconnect and CSS script) from documentation site.
Documentation and Changelog
changelogs/unreleased/9581-shubham-pampattiwar, site/content/docs/main/api-types/schedule.md
Added changelog entry for CSI snapshot fix and documented new itemOperationTimeout field in Schedule API spec.template with 4h default.
End-to-End Test for Restore Exec Hooks
test/e2e/basic/restore_exec_hooks.go, test/e2e/e2e_suite_test.go
Introduced new RestoreExecHooks test type with Init, CreateResources, and Verify methods to validate restore with multiple exec hooks. Registered test in e2e suite.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 4

❌ Failed checks (4 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title describes a merge operation rather than the substantive changes in the changeset. It does not clearly summarize the main technical changes (VSC cleanup logic, error handling, doc updates, or test additions). Replace the generic merge title with a specific summary of the primary changes, such as 'Fix DBR stuck when CSI snapshot no longer exists in cloud provider' which reflects the core bugfix and is already documented in the changelog.
Description check ⚠️ Warning The pull request has no description provided, making it impossible to verify compliance with the required template sections including summary, issue reference, DCO acceptance, changelog creation, and documentation updates. Add a detailed pull request description following the template: include a summary of changes, reference the issue being fixed, confirm DCO acceptance and changelog creation, and verify documentation has been updated.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning CSI unit test assertions lack meaningful failure messages, violating requirement #4; E2E test properly includes context-specific messages in all assertions. Add meaningful failure messages to all assertions in volumesnapshotcontent_action_test.go, such as 'require.NoError(t, err, "VSC should be created successfully")' to match the pattern demonstrated in the E2E test.
✅ Passed checks (1 passed)
Check name Status Explanation
Stable And Deterministic Test Names ✅ Passed The Ginkgo test title is stable and deterministic, containing only static descriptive language without dynamic values like pod names, timestamps, UUIDs, or node names. Dynamic values are appropriately placed in the test body, following best practices.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 12, 2026
@openshift-ci
Copy link

openshift-ci bot commented Mar 12, 2026

Hi @oadp-rebasebot-app[bot]. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
internal/delete/actions/csi/volumesnapshotcontent_action_test.go (2)

97-109: Test coverage could be more explicit about cleanup verification.

The test case validates that an error is returned when CSI reports an error, but it doesn't explicitly verify that the dangling VSC was cleaned up (deleted). Consider adding an assertion to check that the VSC no longer exists in the fake client after Execute returns, to fully validate the cleanup behavior described in the test name.

💡 Suggested enhancement to verify cleanup

After p.Execute(...) returns, you could add:

// Verify VSC was cleaned up
vscList := &snapshotv1api.VolumeSnapshotContentList{}
require.NoError(t, crClient.List(context.Background(), vscList))
require.Empty(t, vscList.Items, "Expected VSC to be cleaned up")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go` around
lines 97 - 109, The test should explicitly verify that the dangling
VolumeSnapshotContent was deleted after Execute; after calling p.Execute(...)
use the test client (crClient) to List VolumeSnapshotContent objects
(snapshotv1api.VolumeSnapshotContentList) and assert no items remain by calling
require.NoError for the List result and require.Empty on the list.Items
(referencing the existing vsc variable and the Execute call) so the test
confirms cleanup behavior rather than only checking expectErr.

242-248: Consider reusing existing pointer utilities.

The codebase already has pkg/util/boolptr (used in the main implementation file). For consistency, consider reusing that package for boolPtr. For stringPtr, this local helper is acceptable since there doesn't appear to be a corresponding utility in the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go` around
lines 242 - 248, The local test helper boolPtr should be replaced with the
existing utility to ensure consistency: remove the local boolPtr function and
use the package pkg/util/boolptr (used in the implementation) wherever boolPtr()
is referenced in this test (e.g., in volumesnapshotcontent_action_test.go); keep
the local stringPtr helper as-is since no shared string pointer utility exists.
Ensure the import for the util boolptr package is added and update references
from boolPtr(...) to the util's API.
test/e2e/basic/restore_exec_hooks.go (1)

51-54: Unusual format string for namespace numbering.

The format %00000d is unusual and likely not producing the intended result. The sequence 00000 means the minimum width is 0 (the rightmost digit), so there's no zero-padding. If the intent is 5-digit zero-padded numbers, use %05d instead.

💡 Suggested fix
 	for nsNum := 0; nsNum < r.NamespacesTotal; nsNum++ {
-		createNSName := fmt.Sprintf("%s-%00000d", r.CaseBaseName, nsNum)
+		createNSName := fmt.Sprintf("%s-%05d", r.CaseBaseName, nsNum)
 		*r.NSIncluded = append(*r.NSIncluded, createNSName)
 	}

Note: This also appears at lines 78 and 142 in CreateResources() and Verify().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/basic/restore_exec_hooks.go` around lines 51 - 54, The namespace
name formatting in the loop that builds createNSName uses the incorrect format
string "%00000d" which doesn't produce zero-padded numbers; change it to "%05d"
in the loop that constructs createNSName (inside CreateResources()/the
namespace-building loop) and update the other occurrences of the same format
(the instances referenced in Verify() and the second CreateResources()
occurrence) to use "%05d" so namespace numbers are zero-padded to 5 digits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@site/content/docs/main/api-types/schedule.md`:
- Around line 66-69: Update the grammatical errors in the documentation values:
change "The default value is 4 hour." to "The default value is 4 hours." for the
itemOperationTimeout entry (referenced by the symbol itemOperationTimeout) and
also change "10 minute" to "10 minutes" in the related schedule/config entry
mentioned earlier (ensure the corresponding field name remains correct in the
same doc), keeping plural "minutes" and "hours" for the default value
descriptions.

---

Nitpick comments:
In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go`:
- Around line 97-109: The test should explicitly verify that the dangling
VolumeSnapshotContent was deleted after Execute; after calling p.Execute(...)
use the test client (crClient) to List VolumeSnapshotContent objects
(snapshotv1api.VolumeSnapshotContentList) and assert no items remain by calling
require.NoError for the List result and require.Empty on the list.Items
(referencing the existing vsc variable and the Execute call) so the test
confirms cleanup behavior rather than only checking expectErr.
- Around line 242-248: The local test helper boolPtr should be replaced with the
existing utility to ensure consistency: remove the local boolPtr function and
use the package pkg/util/boolptr (used in the implementation) wherever boolPtr()
is referenced in this test (e.g., in volumesnapshotcontent_action_test.go); keep
the local stringPtr helper as-is since no shared string pointer utility exists.
Ensure the import for the util boolptr package is added and update references
from boolPtr(...) to the util's API.

In `@test/e2e/basic/restore_exec_hooks.go`:
- Around line 51-54: The namespace name formatting in the loop that builds
createNSName uses the incorrect format string "%00000d" which doesn't produce
zero-padded numbers; change it to "%05d" in the loop that constructs
createNSName (inside CreateResources()/the namespace-building loop) and update
the other occurrences of the same format (the instances referenced in Verify()
and the second CreateResources() occurrence) to use "%05d" so namespace numbers
are zero-padded to 5 digits.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9dacb3df-e509-4af3-8614-195975c0171b

📥 Commits

Reviewing files that changed from the base of the PR and between ca5442f and 3e35877.

📒 Files selected for processing (9)
  • changelogs/unreleased/9581-shubham-pampattiwar
  • internal/delete/actions/csi/volumesnapshotcontent_action.go
  • internal/delete/actions/csi/volumesnapshotcontent_action_test.go
  • site/algolia-crawler.json
  • site/content/docs/main/api-types/schedule.md
  • site/layouts/docs/docs.html
  • site/layouts/partials/head-docs.html
  • test/e2e/basic/restore_exec_hooks.go
  • test/e2e/e2e_suite_test.go
💤 Files with no reviewable changes (3)
  • site/layouts/docs/docs.html
  • site/layouts/partials/head-docs.html
  • site/algolia-crawler.json

Comment on lines +66 to +69
# ItemOperationTimeout specifies the time used to wait for
# asynchronous BackupItemAction operations
# The default value is 4 hour.
itemOperationTimeout: 4h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix grammatical error in documentation.

Line 68 contains a grammatical error: "The default value is 4 hour." should be "The default value is 4 hours." (plural).

Note: The same grammatical issue exists in line 65 ("10 minute" should be "10 minutes"). Consider fixing both for consistency.

📝 Proposed fix
     # ItemOperationTimeout specifies the time used to wait for
     # asynchronous BackupItemAction operations
-    # The default value is 4 hour.
+    # The default value is 4 hours.
     itemOperationTimeout: 4h

If fixing line 65 as well:

     # CSISnapshotTimeout specifies the time used to wait for
     # CSI VolumeSnapshot status turns to ReadyToUse during creation, before
-    # returning error as timeout. The default value is 10 minute.
+    # returning error as timeout. The default value is 10 minutes.
     csiSnapshotTimeout: 10m
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# ItemOperationTimeout specifies the time used to wait for
# asynchronous BackupItemAction operations
# The default value is 4 hour.
itemOperationTimeout: 4h
# ItemOperationTimeout specifies the time used to wait for
# asynchronous BackupItemAction operations
# The default value is 4 hours.
itemOperationTimeout: 4h
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site/content/docs/main/api-types/schedule.md` around lines 66 - 69, Update
the grammatical errors in the documentation values: change "The default value is
4 hour." to "The default value is 4 hours." for the itemOperationTimeout entry
(referenced by the symbol itemOperationTimeout) and also change "10 minute" to
"10 minutes" in the related schedule/config entry mentioned earlier (ensure the
corresponding field name remains correct in the same doc), keeping plural
"minutes" and "hours" for the default value descriptions.

@openshift-ci
Copy link

openshift-ci bot commented Mar 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oadp-rebasebot-app[bot], sseago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.