From ed4123b27d79d539cdc0c1173641fce75109c80b Mon Sep 17 00:00:00 2001 From: Shubham Pampattiwar Date: Wed, 9 Apr 2025 19:21:11 -0700 Subject: [PATCH 1/5] Add design for DataProtectionTest CRD and Controller minor fix --- docs/design/data-protection-test.md | 243 ++++++++++++++++++++++++++++ 1 file changed, 243 insertions(+) create mode 100644 docs/design/data-protection-test.md diff --git a/docs/design/data-protection-test.md b/docs/design/data-protection-test.md new file mode 100644 index 0000000000..b524a93536 --- /dev/null +++ b/docs/design/data-protection-test.md @@ -0,0 +1,243 @@ +# DataProtectionTest CRD and Controller Design + +## Abstract + +This design introduces the **DataProtectionTest** (DPT) CRD and its controller to evaluate data protection performance—including backup upload and volume snapshot readiness—in an OpenShift cluster.. It supports two tests: + +1. **Upload Speed Test:** Measures the speed at which a dummy file is uploaded to cloud object storage using configuration from a BackupStorageLocation. +2. **CSI VolumeSnapshot Test:** Creates a VolumeSnapshot from a specified PVC and measures the time taken (as a duration) for the snapshot to become ready. + +Additionally, the design includes a mechanism to determine the S3-compatible vendor. + +## Background + +In several customer environments, backup/restore operations have experienced delays or stalls due to poor network connectivity, throttling at the cloud provider, or misconfigurations in BackupStorageLocation settings. Given the critical role of backup performance in disaster recovery, administrators need real-time, accurate metrics on how fast data uploads occur and how promptly snapshots are created. + +This design addresses that need by measuring: + +- **Upload speed:** How quickly data is transferred to object storage. +- **Snapshot performance:** How long a CSI VolumeSnapshot takes to become ReadyToUse. +- **S3 compatibility:** Identifying the specific S3 vendor by inspecting HTTP response headers from the storage endpoint. + +## Goals + +- Upload a test file and compute the speed in Mbps. +- Create a CSI VolumeSnapshot and measure the time taken for it to become ready (reported as a duration). +- Identify the S3-compatible vendor using an HTTP call that inspects response headers. +- Expose all results—including upload speed, snapshot ready duration, and S3 vendor—in the CRD status. +- Leverage existing BackupStorageLocation (from Velero/DPA) for configuration. + + +## Non-Goals + +- This design will not create or modify BackupStorageLocation entries in OADP. +- It will not implement download or latency tests, focusing solely on upload speed. +- Scheduling of recurring tests is not supported in the initial version. + +## High-Level Design +Components involved and their responsibilities: + +- **DataProtectionTest (DPT) CRD:** + - **Spec:** + - **backupLocation:** Contains Velero-based backup storage configuration. + - **uploadSpeedTestConfig:** Test parameters for the upload (file size, test timeout). + - **CSIVolumeSnapshotTestConfig:** Test parameters for the CSI VolumeSnapshot test (snapshot class, source PVC name, and importantly the PVC namespace, plus snapshot timeout). + - **Status:** + - **UploadTestStatus:** Groups upload speed (in Mbps), success flag, and error messages. + - **SnapshotTestStatus:** Groups the snapshot test results, reporting status and the duration taken for the snapshot to be ready. + - **S3Vendor:** Reports the detected S3 vendor string from vendor determination. +- **DataProtectionTest Controller:** + - Monitors DataProtectionTest CRs. + - Extracts configuration from the Velero backup location. + - Determines the S3 vendor via an HTTP HEAD call. + - Initializes the appropriate cloud provider using the CloudProvider interface. + - Executes the upload test and, if enabled, the CSI VolumeSnapshot test. + - Updates the CRD status with grouped results. +- **CloudProvider Interface:** + - Defines an `UploadTest(ctx, config, bucket, fileSizeMB) (int64, error)` function. + - AWS-specific implementation (S3Provider) is provided using the AWS SDK. +- **Vendor Determination Logic:** + - A helper function performs an HTTP HEAD call to the **s3Url** and inspects headers (especially the `Server` header) to determine the vendor (e.g., "AWS", "MinIO", etc.). +- **OADP/DPA Integration:** + - BSL Configuration details from the DPA/Velero CR flow into the DataProtectionTest CR through established integrations. + +## Detailed Design + +#### Proposed DataProtectionTest CRD: + +DataProtectionTest (DPT) CRD would look like: + +```yaml +apiVersion: oadp.openshift.io/v1alpha1 +kind: DataProtectionTest +metadata: + name: my-data-protection-test +spec: + backupLocation: + velero: + provider: aws # Cloud provider type (aws, azure, gcp) + default: true + objectStorage: + bucket: sample-bucket + prefix: velero + config: + region: us-east-1 + profile: "default" + insecureSkipTLSVerify: "true" + s3Url: "https://s3.amazonaws.com" # indicates s3 compatibility + credential: + name: cloud-credentials # Secret for cloud credentials + key: cloud + uploadSpeedTestConfig: + fileSize: "100MB" # Size of file to upload for testing + testTimeout: "60s" # Maximum duration for upload test + CSIVolumeSnapshotTestConfig: + snapshotClassName: "csi-snapclass" + volumeSnapshotSource: + persistentVolumeClaimName: "my-pvc" + persistentVolumeClaimNamespace: "my-pvc-namespace" + timeout: "120s" # Snapshot readiness timeout +status: + lastTested: "2024-10-08T10:00:00Z" + uploadTest: + speedMbps: 55.3 + success: true + errorMessage: "" + snapshotTest: + status: "Ready" + readyDuration: "2m" # Duration taken for snapshot to be ready + errorMessage: "" + s3Vendor: "AWS" + +``` + +#### CloudProvider interface + +```go +package cloudprovider + +// CloudProvider defines the interface for cloud-based upload tests. +type CloudProvider interface { + // UploadTest performs a test upload and returns the speed in Mbps or error. + UploadTest(ctx context.Context, config v1alpha1.UploadSpeedTestConfig, bucket string, fileSizeMB int) (int64, error) +} + +``` + +#### Changes to DPA CRD BSL Configuration spec: + +The DPA CRD configuration includes fields to enable and configure the Upload Speed Test (UST) within the BSL configuration. + +```yaml +apiVersion: oadp.openshift.io/v1alpha1 +kind: DataProtectionApplication +metadata: + name: sample-dpa +spec: + backupStorageLocations: + - name: aws-backup-location + BSLSpec: ... + uploadSpeedTest: + enabled: true # Flag to enable upload speed test + fileSize: "10MB" # Size of the file to be uploaded + testTimeout: "60s" # Timeout for the upload test +``` + +#### DPT controller workflow General workflow for DPT CR processing - user created or via DPA CR): + +1. Retrieve the DPT CR: At the start of the reconcile loop the controller fetches the DPT CR from the API server +2. Determine the s3 compatible vendor if applicable: Identify the s3 vendor by performing an HTTP request to the configured s3 storage and inspecting the response headers. +```go +// determineVendor inspects the s3Url via an HTTP HEAD request +// and extracts the S3-compatible vendor name from the Server header. +func (r *DataProtectionTestReconciler) determineVendor(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest) error { + // check if s3Url is specified + // Send an HTTP HEAD request to the storage URL + // Parse response 'Server' header to detect vendor +} +``` +3. Initialize the Cloud Provider for the Upload Test: Instantiate a cloud provider based on the BSL config. +```go +// initializeProvider constructs a CloudProvider (currently S3) based on config and credentials from the DPT CR. +func (r *DataProtectionTestReconciler) initializeProvider(dpt *oadpv1alpha1.DataProtectionTest) (cloudprovider.CloudProvider, error) { + + // Get region (default to us-east-1 if not specified) + // s3Url is required for custom endpoints (e.g., MinIO, Ceph) + // Load credentials from Kubernetes secret + // Parse access/secret key pair from the secret data + // Return a CloudProvider instance (S3Provider) +} + +``` +4. Execute the Upload Speed test: Upload a dummy file of specified size to the object storage to measure the data transfer speed. +```go +// runUploadTest uploads a dummy file to object storage and calculates the speed in Mbps. +func (r *DataProtectionTestReconciler) runUploadTest(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest, cp cloudprovider.CloudProvider) error { + // Parse the file size (e.g., "100MB" -> 100) + // Upload to the target bucket defined in the BackupLocation + // Success: update speed on status +} + +``` +5. Execute the CSI VolumeSnapshot Test (If enabled): Create a CSI VolumeSnapshot for a specified PVC and measure the time taken for it to be ready. +```go +// runSnapshotTest creates a CSI VolumeSnapshot from a PVC and measures the time until it's ReadyToUse. +func (r *DataProtectionTestReconciler) runSnapshotTest(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest) error { + // Get PVC and snapshot class info + // Parse snapshot readiness timeout duration + // Create the VolumeSnapshot + // Poll for ReadyToUse status within timeout + // Check if snapshot is ReadyToUse + // Success: capture duration +} + +``` +6. Update the Status of the DPT CR: Consolidate results from the upload test, snapshot test, and vendor detection, and update the CR status. + + +```mermaid +flowchart TD + A[Start Reconciliation] --> B[Fetch DPT CR
If not found, exit] + B --> C[Determine S3 Vendor via inspecting HTTP request
If s3Compatible is true] + C --> D[Initialize Cloud Provider from BSL config] + D --> E[Check if uploadSpeedTestConfig is present] + E -->|Yes| F[Run Upload Speed Test
Upload dummy file, measure speed] + E -->|No| G[Skip Upload Test] + F --> H + G --> H + H[Check if CSI VolumeSnapshot Test is enabled] -->|Yes| I[Create VolumeSnapshot
Poll until ReadyToUse] + H -->|No| J[Skip Snapshot Test] + I --> K[Calculate readyDuration] + J --> K + K --> L[Update DPT Status
Set uploadSpeed, snapshot ready duration, vendor, lastTested] + L --> M[End Reconciliation] + +``` + +#### Integration with DPA controller: +1. During reconciliation, the DPA controller will inspect each BackupStorageLocation (BSL) defined in the DataProtectionApplication (DPA) CR. +2. If a BSL has `uploadSpeedTest.enabled: true`, the controller will: + 1. Construct a corresponding `DataProtectionTest` (DPT) CR **per BSL**. + 2. Populate `spec.backupLocation` using the BSL's provider, object storage, config, and credentials. + 3. Populate `spec.uploadSpeedTestConfig` using values from `uploadSpeedTest.fileSize` and `uploadSpeedTest.testTimeout`. + +**Note:** +- Upload speed test configuration is supported per `BackupStorageLocation` (BSL) via the DPA CR using the `uploadSpeedTest` field. +- There is no support for CSI VolumeSnapshot test configuration in the DPA CR. +- Users who wish to run snapshot readiness tests must manually create a `DataProtectionTest` (DPT) CR with the appropriate `CSIVolumeSnapshotTestConfig`. + + +## Implementation + +- We are targeting this feature for OADP 1.5 +- The implementation would be done in small phases: + 1. First phase would independent introduction of DPT CRD and controller (only for AWS provider) + 1. Then next would be enabling integration with OADP/DPA + 1. Followed by remaining cloud provider Azure and GCP + +## Future Scope + +- Recurring Tests: Support for recurring tests could be added by integrating with a scheduling system. +- Enhanced Metrics: Consider additional metrics like latency or download speed. + +## Open Questions From b3c7f8ed8934d0a6a72b09c3d93a972a26de6927 Mon Sep 17 00:00:00 2001 From: Shubham Pampattiwar Date: Mon, 14 Apr 2025 10:14:45 -0700 Subject: [PATCH 2/5] remove DPA integration and add support for multiple CSI VS --- docs/design/data-protection-test.md | 124 +++++++++++----------------- 1 file changed, 50 insertions(+), 74 deletions(-) diff --git a/docs/design/data-protection-test.md b/docs/design/data-protection-test.md index b524a93536..bc259baf58 100644 --- a/docs/design/data-protection-test.md +++ b/docs/design/data-protection-test.md @@ -2,10 +2,10 @@ ## Abstract -This design introduces the **DataProtectionTest** (DPT) CRD and its controller to evaluate data protection performance—including backup upload and volume snapshot readiness—in an OpenShift cluster.. It supports two tests: +This design introduces the **DataProtectionTest** (DPT) CRD and its controller to evaluate data protection performance—including backup upload and volume snapshot readiness—in an OpenShift cluster. It supports two tests: -1. **Upload Speed Test:** Measures the speed at which a dummy file is uploaded to cloud object storage using configuration from a BackupStorageLocation. -2. **CSI VolumeSnapshot Test:** Creates a VolumeSnapshot from a specified PVC and measures the time taken (as a duration) for the snapshot to become ready. +1. **Upload Speed Test:** Measures the speed at which a test data is uploaded to cloud object storage using configuration from a BackupStorageLocation. +2. **CSI VolumeSnapshot Test:** Creates a VolumeSnapshot(s) from a specified PVC source(s) and measures the time taken (as a duration) for the snapshot(s) to become ready. Additionally, the design includes a mechanism to determine the S3-compatible vendor. @@ -16,16 +16,16 @@ In several customer environments, backup/restore operations have experienced del This design addresses that need by measuring: - **Upload speed:** How quickly data is transferred to object storage. -- **Snapshot performance:** How long a CSI VolumeSnapshot takes to become ReadyToUse. +- **Snapshot performance:** How long a CSI VolumeSnapshot(s) take to become ReadyToUse. - **S3 compatibility:** Identifying the specific S3 vendor by inspecting HTTP response headers from the storage endpoint. ## Goals -- Upload a test file and compute the speed in Mbps. -- Create a CSI VolumeSnapshot and measure the time taken for it to become ready (reported as a duration). +- Upload data and compute the speed in Mbps. +- Create a CSI VolumeSnapshot(s) and measure the time taken for it to become ready (reported as a duration). - Identify the S3-compatible vendor using an HTTP call that inspects response headers. - Expose all results—including upload speed, snapshot ready duration, and S3 vendor—in the CRD status. -- Leverage existing BackupStorageLocation (from Velero/DPA) for configuration. +- Must gather will be updated th gather the DPT CR(s) for insights. ## Non-Goals @@ -33,6 +33,7 @@ This design addresses that need by measuring: - This design will not create or modify BackupStorageLocation entries in OADP. - It will not implement download or latency tests, focusing solely on upload speed. - Scheduling of recurring tests is not supported in the initial version. +- Integration via DPA CR ## High-Level Design Components involved and their responsibilities: @@ -48,7 +49,7 @@ Components involved and their responsibilities: - **S3Vendor:** Reports the detected S3 vendor string from vendor determination. - **DataProtectionTest Controller:** - Monitors DataProtectionTest CRs. - - Extracts configuration from the Velero backup location. + - Extracts configuration from the backup location spec. - Determines the S3 vendor via an HTTP HEAD call. - Initializes the appropriate cloud provider using the CloudProvider interface. - Executes the upload test and, if enabled, the CSI VolumeSnapshot test. @@ -58,8 +59,6 @@ Components involved and their responsibilities: - AWS-specific implementation (S3Provider) is provided using the AWS SDK. - **Vendor Determination Logic:** - A helper function performs an HTTP HEAD call to the **s3Url** and inspects headers (especially the `Server` header) to determine the vendor (e.g., "AWS", "MinIO", etc.). -- **OADP/DPA Integration:** - - BSL Configuration details from the DPA/Velero CR flow into the DataProtectionTest CR through established integrations. ## Detailed Design @@ -74,39 +73,48 @@ metadata: name: my-data-protection-test spec: backupLocation: - velero: - provider: aws # Cloud provider type (aws, azure, gcp) - default: true - objectStorage: - bucket: sample-bucket - prefix: velero - config: - region: us-east-1 - profile: "default" - insecureSkipTLSVerify: "true" - s3Url: "https://s3.amazonaws.com" # indicates s3 compatibility - credential: - name: cloud-credentials # Secret for cloud credentials - key: cloud + provider: aws # Cloud provider type (aws, azure, gcp) + default: true + objectStorage: + bucket: sample-bucket + prefix: velero + config: + region: us-east-1 + profile: "default" + insecureSkipTLSVerify: "true" + s3Url: "https://s3.amazonaws.com" # indicates s3 compatibility + credential: + name: cloud-credentials # Secret for cloud credentials + key: cloud uploadSpeedTestConfig: fileSize: "100MB" # Size of file to upload for testing testTimeout: "60s" # Maximum duration for upload test - CSIVolumeSnapshotTestConfig: - snapshotClassName: "csi-snapclass" - volumeSnapshotSource: - persistentVolumeClaimName: "my-pvc" - persistentVolumeClaimNamespace: "my-pvc-namespace" - timeout: "120s" # Snapshot readiness timeout + CSIVolumeSnapshotTestConfigs: + - snapshotClassName: "csi-snapclass" + volumeSnapshotSource: + persistentVolumeClaimName: "db-pvc-1" + persistentVolumeClaimNamespace: "app1" + timeout: "60s" + - snapshotClassName: "csi-snapclass" + volumeSnapshotSource: + persistentVolumeClaimName: "db-pvc-2" + persistentVolumeClaimNamespace: "app1" + timeout: "60s" status: lastTested: "2024-10-08T10:00:00Z" uploadTest: speedMbps: 55.3 success: true errorMessage: "" - snapshotTest: + snapshotTests: + - persistentVolumeClaimName: "db-pvc-1" + persistentVolumeClaimNamespace: "app1" status: "Ready" - readyDuration: "2m" # Duration taken for snapshot to be ready - errorMessage: "" + readyDuration: "45s" + - persistentVolumeClaimName: "db-pvc-2" + persistentVolumeClaimNamespace: "app1" + status: "Failed" + errorMessage: "timeout waiting for snapshot readiness" s3Vendor: "AWS" ``` @@ -124,25 +132,6 @@ type CloudProvider interface { ``` -#### Changes to DPA CRD BSL Configuration spec: - -The DPA CRD configuration includes fields to enable and configure the Upload Speed Test (UST) within the BSL configuration. - -```yaml -apiVersion: oadp.openshift.io/v1alpha1 -kind: DataProtectionApplication -metadata: - name: sample-dpa -spec: - backupStorageLocations: - - name: aws-backup-location - BSLSpec: ... - uploadSpeedTest: - enabled: true # Flag to enable upload speed test - fileSize: "10MB" # Size of the file to be uploaded - testTimeout: "60s" # Timeout for the upload test -``` - #### DPT controller workflow General workflow for DPT CR processing - user created or via DPA CR): 1. Retrieve the DPT CR: At the start of the reconcile loop the controller fetches the DPT CR from the API server @@ -169,23 +158,23 @@ func (r *DataProtectionTestReconciler) initializeProvider(dpt *oadpv1alpha1.Data } ``` -4. Execute the Upload Speed test: Upload a dummy file of specified size to the object storage to measure the data transfer speed. +4. Execute the Upload Speed test: Upload test data of specified size to the object storage to measure the data transfer speed. ```go -// runUploadTest uploads a dummy file to object storage and calculates the speed in Mbps. +// runUploadTest uploads test data to object storage and calculates the speed in Mbps. func (r *DataProtectionTestReconciler) runUploadTest(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest, cp cloudprovider.CloudProvider) error { - // Parse the file size (e.g., "100MB" -> 100) + // Parse the data size (e.g., "100MB" -> 100) // Upload to the target bucket defined in the BackupLocation // Success: update speed on status } ``` -5. Execute the CSI VolumeSnapshot Test (If enabled): Create a CSI VolumeSnapshot for a specified PVC and measure the time taken for it to be ready. +5. Execute the CSI VolumeSnapshot Test: Create a CSI VolumeSnapshot(s) for a specified PVC(s) and measure the time taken for it to be ready. ```go -// runSnapshotTest creates a CSI VolumeSnapshot from a PVC and measures the time until it's ReadyToUse. +// runSnapshotTest creates a CSI VolumeSnapshot(s) from a PVC(s) and measures the time until it's ReadyToUse. func (r *DataProtectionTestReconciler) runSnapshotTest(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest) error { - // Get PVC and snapshot class info - // Parse snapshot readiness timeout duration - // Create the VolumeSnapshot + // Get PVC(s) and snapshot class info + // Parse snapshot(s) readiness timeout duration + // Create the VolumeSnapshot(s) // Poll for ReadyToUse status within timeout // Check if snapshot is ReadyToUse // Success: capture duration @@ -205,7 +194,7 @@ flowchart TD E -->|No| G[Skip Upload Test] F --> H G --> H - H[Check if CSI VolumeSnapshot Test is enabled] -->|Yes| I[Create VolumeSnapshot
Poll until ReadyToUse] + H[Check for CSI VolumeSnapshot Test config] -->|Yes| I[Create VolumeSnapshot
Poll until ReadyToUse] H -->|No| J[Skip Snapshot Test] I --> K[Calculate readyDuration] J --> K @@ -214,25 +203,12 @@ flowchart TD ``` -#### Integration with DPA controller: -1. During reconciliation, the DPA controller will inspect each BackupStorageLocation (BSL) defined in the DataProtectionApplication (DPA) CR. -2. If a BSL has `uploadSpeedTest.enabled: true`, the controller will: - 1. Construct a corresponding `DataProtectionTest` (DPT) CR **per BSL**. - 2. Populate `spec.backupLocation` using the BSL's provider, object storage, config, and credentials. - 3. Populate `spec.uploadSpeedTestConfig` using values from `uploadSpeedTest.fileSize` and `uploadSpeedTest.testTimeout`. - -**Note:** -- Upload speed test configuration is supported per `BackupStorageLocation` (BSL) via the DPA CR using the `uploadSpeedTest` field. -- There is no support for CSI VolumeSnapshot test configuration in the DPA CR. -- Users who wish to run snapshot readiness tests must manually create a `DataProtectionTest` (DPT) CR with the appropriate `CSIVolumeSnapshotTestConfig`. - ## Implementation - We are targeting this feature for OADP 1.5 - The implementation would be done in small phases: 1. First phase would independent introduction of DPT CRD and controller (only for AWS provider) - 1. Then next would be enabling integration with OADP/DPA 1. Followed by remaining cloud provider Azure and GCP ## Future Scope From 6103b669270dc3ba9eda4208c8f617e5e2f9fa53 Mon Sep 17 00:00:00 2001 From: Shubham Pampattiwar Date: Mon, 14 Apr 2025 17:39:53 -0700 Subject: [PATCH 3/5] fix typos --- docs/design/data-protection-test.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/design/data-protection-test.md b/docs/design/data-protection-test.md index bc259baf58..618e8f9b09 100644 --- a/docs/design/data-protection-test.md +++ b/docs/design/data-protection-test.md @@ -4,7 +4,7 @@ This design introduces the **DataProtectionTest** (DPT) CRD and its controller to evaluate data protection performance—including backup upload and volume snapshot readiness—in an OpenShift cluster. It supports two tests: -1. **Upload Speed Test:** Measures the speed at which a test data is uploaded to cloud object storage using configuration from a BackupStorageLocation. +1. **Upload Speed Test:** Measures the speed at which test data is uploaded to cloud object storage using configuration from a BackupStorageLocation. 2. **CSI VolumeSnapshot Test:** Creates a VolumeSnapshot(s) from a specified PVC source(s) and measures the time taken (as a duration) for the snapshot(s) to become ready. Additionally, the design includes a mechanism to determine the S3-compatible vendor. @@ -25,7 +25,7 @@ This design addresses that need by measuring: - Create a CSI VolumeSnapshot(s) and measure the time taken for it to become ready (reported as a duration). - Identify the S3-compatible vendor using an HTTP call that inspects response headers. - Expose all results—including upload speed, snapshot ready duration, and S3 vendor—in the CRD status. -- Must gather will be updated th gather the DPT CR(s) for insights. +- Must gather will be updated to gather the DPT CR(s) for insights. ## Non-Goals @@ -42,10 +42,10 @@ Components involved and their responsibilities: - **Spec:** - **backupLocation:** Contains Velero-based backup storage configuration. - **uploadSpeedTestConfig:** Test parameters for the upload (file size, test timeout). - - **CSIVolumeSnapshotTestConfig:** Test parameters for the CSI VolumeSnapshot test (snapshot class, source PVC name, and importantly the PVC namespace, plus snapshot timeout). + - **CSIVolumeSnapshotTestConfig:** Test parameters for the CSI VolumeSnapshot test (snapshot class, source PVC name and namespace for each PVC, plus snapshot timeout). - **Status:** - **UploadTestStatus:** Groups upload speed (in Mbps), success flag, and error messages. - - **SnapshotTestStatus:** Groups the snapshot test results, reporting status and the duration taken for the snapshot to be ready. + - **SnapshotTestStatus:** Groups the snapshot test results, reporting status and the duration taken for the snapshot(s) to be ready. - **S3Vendor:** Reports the detected S3 vendor string from vendor determination. - **DataProtectionTest Controller:** - Monitors DataProtectionTest CRs. From c8affd45c8ee4c3bd14f4bb7b3abba3fe35fa4a7 Mon Sep 17 00:00:00 2001 From: Shubham Pampattiwar Date: Mon, 14 Apr 2025 23:32:08 -0700 Subject: [PATCH 4/5] add duration, bucket metadata support: encyrption and versioning status --- docs/design/data-protection-test.md | 53 ++++++++++++++++++++++------- 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/docs/design/data-protection-test.md b/docs/design/data-protection-test.md index 618e8f9b09..69e0ffc316 100644 --- a/docs/design/data-protection-test.md +++ b/docs/design/data-protection-test.md @@ -7,7 +7,7 @@ This design introduces the **DataProtectionTest** (DPT) CRD and its controller t 1. **Upload Speed Test:** Measures the speed at which test data is uploaded to cloud object storage using configuration from a BackupStorageLocation. 2. **CSI VolumeSnapshot Test:** Creates a VolumeSnapshot(s) from a specified PVC source(s) and measures the time taken (as a duration) for the snapshot(s) to become ready. -Additionally, the design includes a mechanism to determine the S3-compatible vendor. +Additionally, the design includes a mechanism to determine the S3-compatible vendor. DPT will also report object storage bucket metadata, such as encryption algorithm and versioning status ## Background @@ -24,7 +24,8 @@ This design addresses that need by measuring: - Upload data and compute the speed in Mbps. - Create a CSI VolumeSnapshot(s) and measure the time taken for it to become ready (reported as a duration). - Identify the S3-compatible vendor using an HTTP call that inspects response headers. -- Expose all results—including upload speed, snapshot ready duration, and S3 vendor—in the CRD status. +- Gather information on object storage bucket metadata like versioning and encryption. +- Expose all results—including upload speed, snapshot ready duration, bucket metadata and S3 vendor—in the CRD status. - Must gather will be updated to gather the DPT CR(s) for insights. @@ -40,13 +41,17 @@ Components involved and their responsibilities: - **DataProtectionTest (DPT) CRD:** - **Spec:** - - **backupLocation:** Contains Velero-based backup storage configuration. + - **backupLocationSpec:** Contains Velero-based backup storage configuration. + - **backupLocationRef:** Contains the Name and Namespace reference for Velero BSL - **uploadSpeedTestConfig:** Test parameters for the upload (file size, test timeout). - **CSIVolumeSnapshotTestConfig:** Test parameters for the CSI VolumeSnapshot test (snapshot class, source PVC name and namespace for each PVC, plus snapshot timeout). - **Status:** - - **UploadTestStatus:** Groups upload speed (in Mbps), success flag, and error messages. + - **UploadTestStatus:** Groups upload speed (in Mbps), duration of time taken to upload data, success flag, and error messages. - **SnapshotTestStatus:** Groups the snapshot test results, reporting status and the duration taken for the snapshot(s) to be ready. - - **S3Vendor:** Reports the detected S3 vendor string from vendor determination. + - **S3Vendor:** Reports the detected S3 vendor string from vendor determination. (This is only applicable for S3-compatible object storage providers) + - **BucketMetadata:** Reports the encryptionAlgorithm used for the bucket as well as the versioningStatus. + +**Note:** Either `backupLocationSpec` or `backupLocationRef` will be processed for a particular DPT instance, if both are specified DPT would error out. - **DataProtectionTest Controller:** - Monitors DataProtectionTest CRs. - Extracts configuration from the backup location spec. @@ -56,6 +61,7 @@ Components involved and their responsibilities: - Updates the CRD status with grouped results. - **CloudProvider Interface:** - Defines an `UploadTest(ctx, config, bucket, fileSizeMB) (int64, error)` function. + - Defines a `GetBucketMetadata(ctx context.Context, bucket string) (*v1alpha1.BucketMetadata, error)` function. - AWS-specific implementation (S3Provider) is provided using the AWS SDK. - **Vendor Determination Logic:** - A helper function performs an HTTP HEAD call to the **s3Url** and inspects headers (especially the `Server` header) to determine the vendor (e.g., "AWS", "MinIO", etc.). @@ -72,7 +78,10 @@ kind: DataProtectionTest metadata: name: my-data-protection-test spec: - backupLocation: + backupLocationRef: # optional, either this or backupLocationSpec + name: aws-bsl + namespace: openshift-adp + backupLocationSpec: provider: aws # Cloud provider type (aws, azure, gcp) default: true objectStorage: @@ -102,8 +111,12 @@ spec: timeout: "60s" status: lastTested: "2024-10-08T10:00:00Z" + bucketMetadata: + encryptionAlgorithm: AES256 + versioningStatus: Enabled uploadTest: speedMbps: 55.3 + duration: 4.1s success: true errorMessage: "" snapshotTests: @@ -126,8 +139,11 @@ package cloudprovider // CloudProvider defines the interface for cloud-based upload tests. type CloudProvider interface { - // UploadTest performs a test upload and returns the speed in Mbps or error. - UploadTest(ctx context.Context, config v1alpha1.UploadSpeedTestConfig, bucket string, fileSizeMB int) (int64, error) + // UploadTest performs a test upload and returns the speed in Mbps or error. + UploadTest(ctx context.Context, config v1alpha1.UploadSpeedTestConfig, bucket string, fileSizeMB int) (int64, error) + + // GetBucketMetadata Fetches the object storage bucket metadata like encryptionAlgorithm and versioning status + GetBucketMetadata(ctx context.Context, bucket string) (*v1alpha1.BucketMetadata, error) } ``` @@ -164,11 +180,21 @@ func (r *DataProtectionTestReconciler) initializeProvider(dpt *oadpv1alpha1.Data func (r *DataProtectionTestReconciler) runUploadTest(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest, cp cloudprovider.CloudProvider) error { // Parse the data size (e.g., "100MB" -> 100) // Upload to the target bucket defined in the BackupLocation - // Success: update speed on status + // Success: update speed and duration on status } ``` -5. Execute the CSI VolumeSnapshot Test: Create a CSI VolumeSnapshot(s) for a specified PVC(s) and measure the time taken for it to be ready. +5. Fetch the object storage bucket metadata. +```go +// getBucketMetadata fetches the object storage bucket metadata +func (r *DataProtectionTestReconciler) getBucketMetadata(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest, cp cloudprovider.CloudProvider) error { + // Get the bucket metadata + // Success: update status to add encryptionAlgorithm and versioning status +} + +``` + +6.Execute the CSI VolumeSnapshot Test: Create a CSI VolumeSnapshot(s) for a specified PVC(s) and measure the time taken for it to be ready. ```go // runSnapshotTest creates a CSI VolumeSnapshot(s) from a PVC(s) and measures the time until it's ReadyToUse. func (r *DataProtectionTestReconciler) runSnapshotTest(ctx context.Context, dpt *oadpv1alpha1.DataProtectionTest) error { @@ -190,15 +216,16 @@ flowchart TD B --> C[Determine S3 Vendor via inspecting HTTP request
If s3Compatible is true] C --> D[Initialize Cloud Provider from BSL config] D --> E[Check if uploadSpeedTestConfig is present] - E -->|Yes| F[Run Upload Speed Test
Upload dummy file, measure speed] + E -->|Yes| F[Run Upload Speed Test
Upload test data, measure speed] + F --> FF[Get Bucket Metadata like versioning status and encryption algorithm used] E -->|No| G[Skip Upload Test] - F --> H + FF --> H G --> H H[Check for CSI VolumeSnapshot Test config] -->|Yes| I[Create VolumeSnapshot
Poll until ReadyToUse] H -->|No| J[Skip Snapshot Test] I --> K[Calculate readyDuration] J --> K - K --> L[Update DPT Status
Set uploadSpeed, snapshot ready duration, vendor, lastTested] + K --> L[Update DPT Status
Set uploadSpeed, snapshot ready duration, vendor, bucket metadata, lastTested] L --> M[End Reconciliation] ``` From 1a18bbf3b5c13e99f96eb99b258613aa782790f1 Mon Sep 17 00:00:00 2001 From: Shubham Pampattiwar Date: Tue, 15 Apr 2025 07:40:01 -0700 Subject: [PATCH 5/5] Remove BSL NS Ref, just use name --- docs/design/data-protection-test.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/design/data-protection-test.md b/docs/design/data-protection-test.md index 69e0ffc316..8ce42e8f0c 100644 --- a/docs/design/data-protection-test.md +++ b/docs/design/data-protection-test.md @@ -37,12 +37,15 @@ This design addresses that need by measuring: - Integration via DPA CR ## High-Level Design + +This controller and CRD will be part of OADP Operator. The controller will be part of the OADP controller manager pod, there won't be a separate pod. + Components involved and their responsibilities: - **DataProtectionTest (DPT) CRD:** - **Spec:** - **backupLocationSpec:** Contains Velero-based backup storage configuration. - - **backupLocationRef:** Contains the Name and Namespace reference for Velero BSL + - **backupLocationName:** Name of Velero BSL to be used - **uploadSpeedTestConfig:** Test parameters for the upload (file size, test timeout). - **CSIVolumeSnapshotTestConfig:** Test parameters for the CSI VolumeSnapshot test (snapshot class, source PVC name and namespace for each PVC, plus snapshot timeout). - **Status:** @@ -51,7 +54,8 @@ Components involved and their responsibilities: - **S3Vendor:** Reports the detected S3 vendor string from vendor determination. (This is only applicable for S3-compatible object storage providers) - **BucketMetadata:** Reports the encryptionAlgorithm used for the bucket as well as the versioningStatus. -**Note:** Either `backupLocationSpec` or `backupLocationRef` will be processed for a particular DPT instance, if both are specified DPT would error out. +**Note:** Either `backupLocationSpec` or `backupLocationName` will be processed for a particular DPT instance, if both are specified DPT would error out. + - **DataProtectionTest Controller:** - Monitors DataProtectionTest CRs. - Extracts configuration from the backup location spec. @@ -65,6 +69,8 @@ Components involved and their responsibilities: - AWS-specific implementation (S3Provider) is provided using the AWS SDK. - **Vendor Determination Logic:** - A helper function performs an HTTP HEAD call to the **s3Url** and inspects headers (especially the `Server` header) to determine the vendor (e.g., "AWS", "MinIO", etc.). +- **Bucket Metadata Retrieval:** + - The DPT controller retrieves encryption and versioning configuration for the target object storage bucket using the cloud provider SDK. ## Detailed Design @@ -78,9 +84,7 @@ kind: DataProtectionTest metadata: name: my-data-protection-test spec: - backupLocationRef: # optional, either this or backupLocationSpec - name: aws-bsl - namespace: openshift-adp + backupLocationName: aws-bsl # optional, either this or backupLocationSpec backupLocationSpec: provider: aws # Cloud provider type (aws, azure, gcp) default: true