-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring Volume Snapshot to Beta #1133
Conversation
|
||
The rest of the document includes required information missing from the original design document: test plan and graduation criteria. | ||
|
||
## Test Plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to add a section for changes from the original design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a section for changes since the original design doc. PTAL.
Is there a window of possibility this KEP will be merged and in an implementable state by EOD tomorrow? Enhancement Freeze is 7/30 and this is going to hinder this feature from being included in 1.16. Thanks |
@kacole2 Yes, we are working on it. Thanks. |
839f544
to
f0139af
Compare
|
||
## Changes | ||
|
||
Here are the changes since the original design proposal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also describe the changes to the CRD lifecycle and controller management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
* e2e tests for creating/deleting snapshot. | ||
* e2e tests for creating volume from snapshot. | ||
|
||
## Graduation Criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add one more section for implementation history for alpha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@msau42 your comments are addressed. PTAL. |
|
||
## Implementation History | ||
|
||
Volume snapshot is implemented as alpha feature in this repo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add what release it was added as alpha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
* Added Finalizer on the snapshot source PVC to prevent it from being deleted when a snapshot is being created from it. | ||
* Added check to see whether ListSnapshots is supported by the CSI driver. If it is supported, ListSnapshots will be called to find out the status of a snapshot during static binding; otherwise it is assumed the snapshot ID provided by the admin is valid. | ||
* If snapshot creation times out, we should allow CSI driver to continue to finish the snapshot creation in the background. Timeout will happen at the foreground so VolumeSnapshot is marked as failed. VolumeSnapshotContent will be created but eventually be marked as failed because there is no VolumeSnapshot requesting for it. This work is on-going. | ||
* Investigation is on-going to determine whether we should separate common controller logic from other logic that belongs to the sidecar. Can be in the same external-snapshotter repo. The common controller should not be deployed with the driver. It should be deployed by the cluster deployer, or we can provide a way to deploy it as a separate Statefulset, not together with the driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for sure CRD installation should not be done as part the driver. So we should call that out and who is responsible for managing the CRD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we classify three groups of people? Cluster admin, storage admin, and users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
* Renamed `Ready` to `ReadyToUse` in the `Status` field of `VolumeSnapshot` API object. | ||
* Changed type of `RestoreSize` in `CSIVolumeSnapshotSource` from `*resource.Quantity` to `*int64`. | ||
* Lease based Leader Election support is added. | ||
* Added `VolumeSnapshotContent` deletion policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deletion policy can be specified in VolumeSnapshotClass, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added clarification that it can be specified in VolumeSnapshotClass. We have it in both Content and Class. What I meant to say is that this is a policy for the content.
* Added `VolumeSnapshot` and `VolumeSnapshotContent` in Use Protection using Finalizers. | ||
* Added Finalizer on the snapshot source PVC to prevent it from being deleted when a snapshot is being created from it. | ||
* Added check to see whether ListSnapshots is supported by the CSI driver. If it is supported, ListSnapshots will be called to find out the status of a snapshot during static binding; otherwise it is assumed the snapshot ID provided by the admin is valid. | ||
* If snapshot creation times out, we should allow CSI driver to continue to finish the snapshot creation in the background. Timeout will happen at the foreground so VolumeSnapshot is marked as failed. VolumeSnapshotContent will be created but eventually be marked as failed because there is no VolumeSnapshot requesting for it. This work is on-going. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for timeout, VolumeSnapshot status will not be marked as failed so that controller will continue to retry to create until the operation either succeeded or failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Fixed it.
/lgtm |
/assign @saad-ali |
* Snapshot feature is used as a basic building block in other advanced applications. | ||
* Feature deployed in production and have gone through at least one K8s upgrade. | ||
|
||
## Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we separate this list between what is already complete and what is in progress for beta so it is easier to understand current state vs what is being done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
* Create/delete volume snapshots | ||
* Create new volumes from a snapshot | ||
* SnapshotContent Deletion/Retain Policy | ||
* Snapshot Object in Use Protection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Beta API? Secrets? New controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Hi @saad-ali, addressed your comments. PTAL. Thanks! |
### E2E tests | ||
|
||
* e2e tests for creating/deleting snapshot. | ||
* e2e tests for creating volume from snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make those P0.
And add e2e tests for:
- P1 delete/retain policy
- P1 deleting API objects out of order (snapshot protection)
- P2 test for secret fields
- P2 tests for metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
* If snapshot creation times out, VolumeSnapshot status will not be marked as failed so that controller will continue to retry to create until the operation either succeeds or fails. It is up to the user or an upper level application that uses the VolumeSnapshot to determine what to do with the snapshot. This work is on-going. | ||
* Investigation is on-going to determine whether we should separate common controller logic from other logic that belongs to the sidecar. Can be in the same external-snapshotter repo. The common controller should not be deployed with the driver. It should be deployed by the cluster deployer, or we can provide a way to deploy it as a separate Statefulset, not together with the driver. CRD installation should also be done by the cluster deployer. | ||
* Add secrets field to list-snapshots RPC in the CSI spec. Add “data-source-secret” in create-volume intended for accessing the data source. Implement them in external-snapshotter and external-provisioner. | ||
* Add metrics support in the snapshot controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the list of metrics to-be-added more specific? I am thinking about following for BETA:
- operational end to end latency metrics
labels:
a. operation_name, i.e., creation-snapshot, delete-snapshot
b. csi-driver-name - operation error count
labels:
a. operation_name, i.e., creation-snapshot, delete-snapshot
b. csi-driver-name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Hi @saad-ali and @yuxiangqian, your comments are addressed. PTAL. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: msau42, saad-ali, xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This KEP is for the volume snapshot feature.