Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split snapshot controller using beta APIs #182

Merged

Conversation

@xing-yang
Copy link
Collaborator

xing-yang commented Oct 21, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR splits the snapshot controller into two, a common controller to be deployed on the controller node, and a sidecar controller to be deployed together with the CSI driver.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
This PR depends on the snapshot API beta changes.

Does this PR introduce a user-facing change?:

Split the snapshot controller into two controllers, a common snapshot controller and a sidecar snapshot controller. Only the sidecar controller should be deployed with the CSI driver.
Copy link
Contributor

yuxiangqian left a comment

some initial thoughts.

// points to a specific VolumeSnapshotContent and the VolumeSnapshotContent also
// points back for this VolumeSnapshot.
//
// The dynamic snapshot creation is multi-step process: first controller triggers

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

this comment no longer applies.

klog.V(5).Infof("syncContent: check if we should remove Finalizer for VolumeSnapshotContent[%s]", content.Name)
// It is a deletion candidate if DeletionTimestamp is not nil and
// VolumeSnapshotContentFinalizer is set.
if utils.IsContentDeletionCandidate(content) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

the deletion workflow needs to be revisited a bit here.
after split, common controller will no longer be removing finalizer on volumesnapshotcontent, rather the sidecar should be doing it.

}
}

if utils.NeedToAddContentFinalizer(content) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

should the finalizer only be added after binding?

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

I still think this should be added after binding, i.e., after line 97?
unless the sidecar will act as following:
if deleteTimestamp set && uid == "", exec deletion policy, and then remove finalizer

Most of the finalizers here are following the in-tree pv/pvc finalizers where they are added very early, but let's revisit this later.

snapshot = nil
}

if snapshot == nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

this if clause is no longer needed as deletion will be handled by sidecar controller

content := obj.(*crdv1.VolumeSnapshotContent)
if content.Spec.VolumeSnapshotRef.Name == snapshot.Name &&
content.Spec.VolumeSnapshotRef.Namespace == snapshot.Namespace &&
content.Spec.VolumeSnapshotRef.UID == snapshot.UID &&

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

It might be missing one case here for pre-existing snapshot binding. UID of content.Spec.VolumeSnapshotRef will be empty, it should also return the content as a match.

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

nvm, findContentfromStore should handle, these two functions needs a bit renaming in future version

}

// Set AnnBoundByController if it is not set yet
if !metav1.HasAnnotation(contentClone.ObjectMeta, utils.AnnBoundByController) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

Is this annotation still needed in the new workflow.

snapshotCopy.Status.BoundVolumeSnapshotContentName = &snapshotContent.Name

// Set AnnBoundByController if it is not set yet
if !metav1.HasAnnotation(snapshotCopy.ObjectMeta, utils.AnnBoundByController) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

are these annotations still needed?

}

// TODO(xyang): update snapshot status
if snapshotContent.Status != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

as you pointed out before, snapshot needs to be re-read from API server for the status update to be successful.
and since there is a retry loop, this line should be removed.

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

I think the BoundVolumeSnapshotContentName on VolumeSnapshot.Status needs to be updated regardless whether snapshotContent.Status == nil or not.
this function is effectively a call to bind and update.

Sure


// TODO(xyang): update snapshot status
if newContent.Status != nil {
_, err = ctrl.updateSnapshotStatus(snapshot, newContent)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

these is a retry loop, this line is a duplication?

updated = true
} else {
newStatus = snapshotObj.Status.DeepCopy()
if newStatus.BoundVolumeSnapshotContentName == nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 25, 2019

Contributor

It might worth checking whether if BoundVolumeSnapshotContentName == boundContentName to decide whether or not to continue here? That case it should be invalid?

@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch from 2035cfc to ac7f89e Oct 25, 2019
@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch from ac7f89e to ef3e675 Oct 26, 2019
@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch 7 times, most recently from 7e85b30 to 855f6ea Oct 27, 2019
Copy link
Contributor

yuxiangqian left a comment

still in process, first pass on common controller

obj, found, err := ctrl.snapshotStore.GetByKey(snapshotName)
if err != nil {
return err
}
if !found {
klog.V(4).Infof("synchronizing VolumeSnapshotContent[%s]: snapshot %s not found", content.Name, snapshotRefKey(content.Spec.VolumeSnapshotRef))
klog.V(4).Infof("synchronizing VolumeSnapshotContent[%s]: snapshot %s not found", content.Name, utils.SnapshotRefKey(&content.Spec.VolumeSnapshotRef))

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

maybe reuse "snapshotName"? Ditto everywhere

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

Done

// Treat the content as bound to a missing snapshot.
snapshot = nil
} else {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

not sure whether this else block is needed.
if syncContent is not doing binding, i.e., in pre-existing VSC case, does it make sense to do the status update here? syncSnapshot will be doing it anyways?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

I tested create snapshot several times and observed that the status update in snapshot happens a lot later than status update in the content without this block of code here. I had to keep typing "kubectl describe volumesnapshot" to check the snapshot status and wait for it to be in sync with content status (this is much slower than before the controller split).

So I added updateSnapshotStatus here, and observed that snapshot status update happens as soon as the content status update.

I don't know if there is a better way to trigger snapshot status update to happen as soon as content status update.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

Maybe I can add this line "ctrl.snapshotQueue.Add(snapshotName)" to trigger the snapshot status update instead of adding this block here. Will give that a try.

// sidecar controller.
// Snapshot won't be deleted until content is deleted
// due to the finalizer
if snapshot == nil || utils.IsSnapshotDeletionCandidate(snapshot) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

There is one case missed here to handle the following steps:

  1. create a VSC (pre-existing snapshot, uid == "")
  2. delete the VSC
    in this case, delete VSC will not be handled as it returns in previous checks.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

I was wondering how to handle that as well as we don't have a way to validate anything about snapshot in the sidecar controller any more.

This means if admin has made a mistake about the VSC he/she created, he/she can't delete it without first binding it to a snapshot.

We probably should just check the deletion timestamp and deletion policy in syncContent in the sidecar controller and delete the content if both conditions meet? Otherwise admin could be stuck with a VSC that he/she wants to delete. It is likely that an admin could make mistakes when creating a VSC and wants to delete it and create a new one.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

Maybe this "ShouldDelete" annotation should only apply to dynamic case? For static case, we just check deletion timestamp and deletion policy?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

If timestamp set in content:

  • If snapshot uid="", sidecar syncContent deletes it because no finalizer.
  • If snapshot uid is set, sidecar syncContent checks "ShouldDelete" annotation and then delete.

This comment has been minimized.

Copy link
@jingxu97

jingxu97 Oct 30, 2019

Collaborator

If we only add finalizer for "bound" snapshots which means that having finalizer == "bound", how about this

If DeletionTimestamp set in content:

  • if no finalizer, do nothing (actually this case should not exist, because VSC will be deleted immediately if no finalizer

  • has a finalizer: if annotation (How about name it as "BoundVolumeSnapshotIsDeleted") exists, depending on DeletionPolicy, delete physical snapshot (or not), remove finalizer.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

Discussed with Jing again about this. We can let the common controller remove snapshot uid when it is going to delete the content but don't add the ShouldDelete annotation.
If deletion timestamp is set in the content:

  • if snapshot uid="", sidecar syncContent invokes the CSI driver to delete the physical snapshot.
  • if snapshot uid is set, don't delete.
// Snapshot won't be deleted until content is deleted
// due to the finalizer
if snapshot == nil || utils.IsSnapshotDeletionCandidate(snapshot) {
if content.Spec.DeletionPolicy != crdv1.VolumeSnapshotContentDelete {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

in this case, should the finalizer on VolumeSnapshot be removed?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

Changed the logic to only add snapshot bound finalizer if snapshot is bound to a content and deletion policy is delete.


if isSnapshotDeletionCandidate(snapshot) {
// Check is snapshot deletionTimestamp is set and any finalizer is on
if utils.IsSnapshotDeletionCandidate(snapshot) {
// Volume snapshot should be deleted. Check if it's used
// and remove finalizer if it's not.
// Check if a volume is being created from snapshot.
isUsed := ctrl.isVolumeBeingCreatedFromSnapshot(snapshot)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

rather inUse?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 30, 2019

Author Collaborator

Done

// and removed it if needed.
func (ctrl *csiSnapshotCommonController) checkandRemoveSnapshotSourceFinalizer(snapshot *crdv1.VolumeSnapshot) error {
// Get snapshot source which is a PVC
pvc, err := ctrl.getClaimFromVolumeSnapshot(snapshot)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

per above, anyways the PVC has already been found out.

klog.Error(strerr)
ctrl.eventRecorder.Event(newSnapshot, v1.EventTypeWarning, "CreateSnapshotContentFailed", strerr)
return nil, newControllerUpdateError(snapshotKey(snapshot), err.Error())
updateSnapshot, err := ctrl.clientset.SnapshotV1beta1().VolumeSnapshots(snapshot.Namespace).Update(snapshotCopy)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

this will not update the status of the snapshot? BoundVolumeSnapshotContentName needs to be updated by invoke UpdateStatus()
since Status will be updated later, this Update call mostly will not do anything. Is this still needed?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

That's right. I think we don't need this any more.

updated := false
// UpdateSnapshotStatus updates snapshot status based on content status
func (ctrl *csiSnapshotCommonController) updateSnapshotStatus(snapshot *crdv1.VolumeSnapshot, content *crdv1.VolumeSnapshotContent) (*crdv1.VolumeSnapshot, error) {
klog.V(5).Infof("updateSnapshotStatus[%s]", utils.SnapshotKey(snapshot))
if content.Status == nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

should BoundContentName be updated even if the status of content is nil?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

That's my question too.

}
if removeBoundFinalizer {
snapshotClone.ObjectMeta.Finalizers = slice.RemoveString(snapshotClone.ObjectMeta.Finalizers, utils.VolumeSnapshotBoundFinalizer, nil)
}
_, err := ctrl.clientset.SnapshotV1beta1().VolumeSnapshots(snapshotClone.Namespace).Update(snapshotClone)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

Update returns the latest copy of snapshotclone on success, rather do snapshotClone, err = ......

if err != nil {
return newControllerUpdateError(pvcClone.Name, err.Error())
func (ctrl *csiSnapshotCommonController) contentExists(snapshot *crdv1.VolumeSnapshot) (bool, error) {
if snapshot.Status.BoundVolumeSnapshotContentName == nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 29, 2019

Contributor

is snapshot.Status == nil checking needed here?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

Added.

@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch 2 times, most recently from 8e84325 to 7261d2e Oct 30, 2019
Copy link
Contributor

yuxiangqian left a comment

on going

// Map of scheduled/running operations.
runningOperations goroutinemap.GoRoutineMap

createSnapshotContentRetryCount int

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

the sidecar will not be creating SnapshotContent, there two variable names seems to be not appropriate

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

Actually these are not needed any more. Removed.


// verify whether the driver specified in VolumeSnapshotContent matches the controller's driver name
func (ctrl *csiSnapshotSideCarController) isDriverMatch(content *crdv1.VolumeSnapshotContent) bool {
if content.Spec.Source.VolumeHandle == nil && content.Spec.Source.SnapshotHandle == nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

It's ok to have it check here, a webhook will be added to enforce either one of those fields should have been specified when creating in API server

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

ok

return false
}
snapshotClassName := content.Spec.VolumeSnapshotClassName
if snapshotClassName != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

what about default class?
In cases where a snapshot class name is not specified in content, the default snapshot class (if exists) should be used to verify against the sidecar snapshotterName instead of return true simply.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

The check for default class is in the common controller.

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

@xing-yang that case only covers the dynamic case, however does not cover the pre-existing case, and even for dynamic case,
sidecar controller should not assume the existence of the driver name? Rather I think it should refer to the default snapshotclass
if the field is not provided.
We should have the sidecar controller verifying it as well.

@yuxiangqian sorry, I don't know how to reply to this, so just directly adding it here. Finding default snapshot class is the job of the common controller. I don't think we should ask the sidecar to repeat that.

_ = ctrl.contentStore.Delete(content)
klog.V(4).Infof("content %q deleted", content.Name)

snapshotName := utils.SnapshotRefKey(&content.Spec.VolumeSnapshotRef)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

this check is no longer needed?
I am also not sure whether the content should be removed from contentStore at this point.
A VSC will only be deleted after DeletionPolicy has been carried. Maybe it still worth keeping in the contentStore?
the following comments do not apply as well

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

We are done with the content. It is already deleted from API, so should be removed from cache store.

klog.V(5).Infof("deleteContent[%q]: content not bound", content.Name)
return
}
// sync the snapshot when its content is deleted. Explicitly sync'ing the

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

comments no longer apply

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

Removed

var snapshotID string

if content.Spec.Source.VolumeHandle == nil && content.Spec.Source.SnapshotHandle != nil {
klog.V(5).Infof("checkandUpdateBoundSnapshotStatusOperation: checking whether snapshot is pre-bound to content [%s]", content.Name)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

log does not match

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

Fixed

var driverName string
var snapshotID string

if content.Spec.Source.VolumeHandle == nil && content.Spec.Source.SnapshotHandle != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

first check is not necessary

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

Why? We need to differentiate between static vs dynamic case.

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

this check means as long as VolumeHandle exists, its a dynamic snapshot.
will checking: content.Spec.Source.SnapshotHandle != nil be just sufficient?
those two fields should be enforced by API server later. I think

if content.Spec.Source.SnapshotHandle != nil {
snapshotID = *content.Spec.Source.SnapshotHandle
}
} else if content.Spec.Source.VolumeHandle != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

If content.Status.SnapshotHandle already exists, we should just call ListSnapshots?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

ListSnapshots is optional. We discussed that we always want to call CreateSnapshot to find out snapshot status if possible.

// content.Status will be created for the first time after a snapshot
// is created by the CSI driver. If content.Status is not nil,
// we should update content status without creating snapshot again.
if content.Status != nil && content.Status.Error != nil && content.Status.Error.Message != nil && !isControllerUpdateFailError(content.Status.Error) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

minor thing, I'd move all nil checks into isControllerUpdateFailError

}

// updateSnapshotContentSize update the restore size for snapshot content
func (ctrl *csiSnapshotSideCarController) updateSnapshotContentSize(content *crdv1.VolumeSnapshotContent, size int64) error {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Oct 30, 2019

Contributor

not used?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Oct 31, 2019

Author Collaborator

Removed

@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch from 7261d2e to fc7ba19 Oct 31, 2019
// If content exists, set DeletionTimeStamp on the content;
// content won't be deleted immediately due to the finalizer
if content != nil && deleteContent {
err := ctrl.clientset.SnapshotV1beta1().VolumeSnapshotContents().Delete(*snapshot.Status.BoundVolumeSnapshotContentName, &metav1.DeleteOptions{})

This comment has been minimized.

Copy link
@jingxu97

jingxu97 Oct 31, 2019

Collaborator

what if *snapshot.Status is nil (status is lost case?),

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

Actually I don't need to rely on snapshot.Status here. We have content so I can use content.Name directly.

@xing-yang xing-yang changed the title WIP: Split snapshot controller using beta APIs Split snapshot controller using beta APIs Nov 1, 2019
@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch from ee47cd1 to 1a2de22 Nov 1, 2019
@yuxiangqian

This comment has been minimized.

Copy link
Contributor

yuxiangqian commented Nov 1, 2019

@saad-ali please put your comments whenever you got a chance. We will proceed while you are off. thanks

Copy link
Contributor

yuxiangqian left a comment

finished another round on sidecar, moving to common controller.


// scheduleOperation starts given asynchronous operation on given volume. It
// makes sure the operation is already not running.
func (ctrl *csiSnapshotSideCarController) scheduleOperation(operationName string, operation func() error) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

for content, the operationName should just be the vsc's name. (@saad-ali)

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

done

@@ -20,15 +20,6 @@ metadata:
# rename if there are conflicts

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

maybe rename this file to rbac-sidecar to follow the pattern?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

done

func NewCSISnapshotSideCarController(
clientset clientset.Interface,
client kubernetes.Interface,
snapshotterName string,

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

I would call it driverName to match its semantic, is it the name returned by CSI?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

done

return false
}
snapshotClassName := content.Spec.VolumeSnapshotClassName
if snapshotClassName != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

@xing-yang that case only covers the dynamic case, however does not cover the pre-existing case, and even for dynamic case,
sidecar controller should not assume the existence of the driver name? Rather I think it should refer to the default snapshotclass
if the field is not provided.
We should have the sidecar controller verifying it as well.

@yuxiangqian sorry, I don't know how to reply to this, so just directly adding it here. Finding default snapshot class is the job of the common controller. I don't think we should ask the sidecar to repeat that.


// Design:
//
// The fundamental key to this design is the bi-directional "pointer" between

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

most of these comments no longer apply.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

I'll leave the high-level design description in the common controller and just talk about the sidecar here.

// createSnapshot starts new asynchronous operation to create snapshot
func (ctrl *csiSnapshotSideCarController) createSnapshot(content *crdv1.VolumeSnapshotContent) error {
klog.V(5).Infof("createSnapshot for content [%s]: started", content.Name)
opName := fmt.Sprintf("create-%s/%s[%s]", string(content.Spec.VolumeSnapshotRef.Namespace), string(content.Spec.VolumeSnapshotRef.Name), string(content.Spec.VolumeSnapshotRef.UID))

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

using content.Name?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

sure

klog.Errorf("getCSISnapshotInput failed to getClassFromVolumeSnapshot %s", err)
return nil, nil, err
}
} else {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

pls see my previous comments on this comments.

var driverName string
var snapshotID string

if content.Spec.Source.VolumeHandle == nil && content.Spec.Source.SnapshotHandle != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

this check means as long as VolumeHandle exists, its a dynamic snapshot.
will checking: content.Spec.Source.SnapshotHandle != nil be just sufficient?
those two fields should be enforced by API server later. I think

klog.Errorf("checkandUpdateContentStatusOperation: failed to call create snapshot to check whether the snapshot is ready to use %q", err)
return nil, err
}
}

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

this could cause potential issue ATM (before webhooks are added to enforce existence of exact one of these two fields.
what if both are nil? in this case, we should NOT be updating the status with default values, but rather append error?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

You just said in an earlier comment that we don't have to check if both are nil because it will be enforced by API server later?

if metav1.HasAnnotation(content.ObjectMeta, utils.AnnShouldDelete) {
return true
}
// 3) shouldDelete returns true for content with Retain policy.

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 2, 2019

Contributor

the last check may introduces problem and not necessary.
for example:
dynamically created snapshotcontent with retain policy = retain, user issues a delete via kubectl on VSC,
it should NOT delete the content BEFORE its corresponding snapshot has been deleted.
in a case where the VSC's bound VS has been deleted by user, common controller should have removed the finalizer on the VSC if retain policy is set to retain,
in this case, it will be simply an no-op in sidecar controller, the content will be deleted directly from the API server (unless there are other finalizers added via other controllers/user)

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 2, 2019

Author Collaborator

in a case where the VSC's bound VS has been deleted by user, common controller should have removed the finalizer on the VSC if retain policy is set to retain,

Okay, that is fine, but then it behaves the same as what we have in the current PR. Since finalizer is already removed by the common controller, the content will be removed BEFORE its corresponding snapshot has been deleted. I think it is better just let sidecar controller handle the removal of the content finalizer so that it is not in both the common controller and the sidecar.

@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch 3 times, most recently from bf9a419 to bc596b3 Nov 2, 2019
Copy link
Member

saad-ali left a comment

/approve

Will let @yuxiangqian do final review and lgtm

runningOperations: goroutinemap.NewGoRoutineMap(true),
resyncPeriod: resyncPeriod,
contentStore: cache.NewStore(cache.DeletionHandlingMetaNamespaceKeyFunc),
contentQueue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "csi-snapshotter-content"),

This comment has been minimized.

Copy link
@saad-ali

saad-ali Nov 3, 2019

Member

Let's make sure there are finalizers in case the sidecar crashes and the content object is deleted

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 3, 2019

Author Collaborator

sure. Adding content finalizer is handled by the common controller and the removal is handled by the sidecar right now.

func (ctrl *csiSnapshotSideCarController) syncContent(content *crdv1.VolumeSnapshotContent) error {
klog.V(5).Infof("synchronizing VolumeSnapshotContent[%s]", content.Name)

if metav1.HasAnnotation(content.ObjectMeta, utils.AnnShouldDelete) {

This comment has been minimized.

Copy link
@saad-ali

saad-ali Nov 3, 2019

Member

For GA, this annotation should become a first class field probably in VolumeSnapshotContent.Status.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 3, 2019

Author Collaborator

ok

@@ -0,0 +1,6 @@
FROM gcr.io/distroless/static:latest

This comment has been minimized.

Copy link
@saad-ali

saad-ali Nov 3, 2019

Member

Eventually we should consider putting csi-snapshotter-common in to its own repo.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 3, 2019

Author Collaborator

ok

# This YAML file shows how to deploy the common snapshot controller

---
kind: StatefulSet

This comment has been minimized.

Copy link
@saad-ali

saad-ali Nov 3, 2019

Member

We used to do StatefulSet because there was no leader election. Is this still the best choice? CC @misterikkit thoughts?


import (
crdv1 "github.com/kubernetes-csi/external-snapshotter/pkg/apis/volumesnapshot/v1beta1"
"k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
//metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

This comment has been minimized.

Copy link
@saad-ali

saad-ali Nov 3, 2019

Member

Remove if not needed.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 3, 2019

Author Collaborator

sure

pkg/utils/util.go Outdated Show resolved Hide resolved
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 3, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: saad-ali, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch from bc596b3 to f49fe34 Nov 3, 2019
}
}

if utils.NeedToAddContentFinalizer(content) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

I still think this should be added after binding, i.e., after line 97?
unless the sidecar will act as following:
if deleteTimestamp set && uid == "", exec deletion policy, and then remove finalizer

Most of the finalizers here are following the in-tree pv/pvc finalizers where they are added very early, but let's revisit this later.


var err error
// Check is snapshot deletionTimestamp is set and any finalizer is on
if utils.IsSnapshotDeletionCandidate(snapshot) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

it is a bit confusing to me when to call this function, it's name does not match well with the logic, maybe move the utils.IsSnapshotDeletionCandidate to caller to check?

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

could you shed some light on why existences of "VolumeSnapshotAsSourceFinalizer" OR "VolumeSnapshotBoundFinalizer"
finalizers are checked in this call "IsSnapshotDeletionCandidate"? is not deletion timestamp itself enough?
I am thinking about cases when there newly created snapshto was never bound to a content (i.e., content failed to be created), how would it be deleted?

This comment has been minimized.

Copy link
@xing-yang

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

one thing I can think of is to save the API call if there is no finalizer added, we will revisit this later.

klog.V(5).Infof("syncSnapshot[%s]: set DeletionTimeStamp on content.", utils.SnapshotKey(snapshot))
// If content exists, set DeletionTimeStamp on the content;
// content won't be deleted immediately due to the finalizer
if content != nil && deleteContent {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

what if inUse = true?
set "DeletionTimestamp" in API server would trigger sidecar to execute deletion policy and cause data lost if the system is trying to create a volume from the snapshot?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 5, 2019

Author Collaborator

Added the check.

// It checks if contents exists, it checks if snapshot has bi-directional binding, it checks if
// finalizers should be added or removed, and it checks if content should be deleted and deletes it
// if needed.
func (ctrl *csiSnapshotCommonController) processFinalizersandDeleteContent(snapshot *crdv1.VolumeSnapshot) error {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

maybe beneficial enough to break this function into two based on whether a snapshot is a deletion candidate or not.
mixing adding/deleting into one single function is kinda of confusing.

func (ctrl *csiSnapshotCommonController) checkandAddSnapshotFinalizers(snapshot *crdv1.VolumeSnapshot, snapshotBound bool, deleteContent bool) {
addSourceFinalizer := false
addBoundFinalizer := false
if utils.NeedToAddSnapshotAsSourceFinalizer(snapshot) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

Should not this finalizer being added only when a VolumeSnapshot object is being used as source to restore a volume?
It seems to me it will be always added in this routine regardless whether it's under using or not.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 5, 2019

Author Collaborator

This follows the in-tree pv/pvc finalizers pattern. Even though it will be added early, it will be deleted when necessary.

// we delete content before the snapshot
klog.V(5).Infof("Content for snapshot %s not found. It may be already deleted as expected.", uniqueSnapshotName)
} else {
// TODO(xyang): may not need this check any more

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

in this case, we should be updating snapshot.status based on content.status?

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 5, 2019

Author Collaborator

Whenever there is an update on the content status, we will be triggering a snapshot status update. So it should not need it here.

} else {
klog.Errorf("failed to getCreateSnapshotInput %s without a snapshot class", snapshot.Name)
return nil, nil, "", nil, fmt.Errorf("failed to take snapshot %s without a snapshot class", snapshot.Name)
klog.Infof("Added protection finalizer to persistent volume claim %s", pvc.Name)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

the else piece seems to be gone, nvm.

}
for _, snap := range snapshots {
// Skip static bound snapshot without a PVC source
if snap.Spec.Source.PersistentVolumeClaimName == nil && snap.Spec.Source.VolumeSnapshotContentName == nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

should be snap.Spec.Source.VolumeSnapshtoContentName != nil

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 5, 2019

Author Collaborator

Yes

}

// isPVCBeingUsed checks if a PVC is being used as a source to create a snapshot
func (ctrl *csiSnapshotCommonController) isPVCBeingUsed(snapshot *crdv1.VolumeSnapshot) bool {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

lets take in a PersistentVolumeClaim object as input, anyways the pvc has been found out before this call

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 5, 2019

Author Collaborator

ok

}

// removePVCFinalizer removes a Finalizer for VolumeSnapshot Source PVC.
func (ctrl *csiSnapshotCommonController) removePVCFinalizer(snapshot *crdv1.VolumeSnapshot) error {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

lets take in a PersistentVolumeClaim object as input, anyways the pvc has been found out before this call

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 5, 2019

Author Collaborator

ok

@xing-yang xing-yang force-pushed the xing-yang:new_beta_split_controller branch from f49fe34 to 4954e80 Nov 5, 2019

var err error
// Check is snapshot deletionTimestamp is set and any finalizer is on
if utils.IsSnapshotDeletionCandidate(snapshot) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

one thing I can think of is to save the API call if there is no finalizer added, we will revisit this later.

} else { // snapshot.Source.Spec.VolumeSnapshotContentName == nil
// find a matching volume snapshot content
} else { // snapshot.Spec.Source.VolumeSnapshotContentName == nil - dynamically creating snapshot
klog.V(5).Infof("before getMatchSnapshotContent for snapshot %s", uniqueSnapshotName)
if contentObj := ctrl.getMatchSnapshotContent(snapshot); contentObj != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

error out should be good, I think an error event/message should be populated in VolumeSnapshot.Status as well.

Added


// If PVC is not being deleted and finalizer is not added yet, a finalizer should be added to PVC until snapshot is created
klog.V(5).Infof("createSnapshotContent: Check if PVC is not being deleted and add Finalizer for source of snapshot [%s] if needed", snapshot.Name)
err := ctrl.ensurePVCFinalizer(snapshot)

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

"ensurePVCFinalizer" should return error when "pvc.ObjectMeta.DeletionTimestamp != nil" -> the PVC/PV are getting deleted from K8s
so that snapshot will fail.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 6, 2019

Author Collaborator

sure

content := obj.(*crdv1.VolumeSnapshotContent)
if content.Spec.VolumeSnapshotRef.Name == snapshot.Name &&
content.Spec.VolumeSnapshotRef.Namespace == snapshot.Namespace &&
content.Spec.VolumeSnapshotRef.UID == snapshot.UID &&

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

nvm, findContentfromStore should handle, these two functions needs a bit renaming in future version

return false
}
for _, pvc := range pvcList {
if pvc.Spec.DataSource != nil && len(pvc.Spec.DataSource.Name) > 0 && pvc.Spec.DataSource.Name == snapshot.Name {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

please remove unnecessary len check

Done

}

// TODO(xyang): update snapshot status
if snapshotContent.Status != nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

I think the BoundVolumeSnapshotContentName on VolumeSnapshot.Status needs to be updated regardless whether snapshotContent.Status == nil or not.
this function is effectively a call to bind and update.

Sure


// SetDefaultSnapshotClass is a helper function to figure out the default snapshot class from
// PVC/PV StorageClass and update VolumeSnapshot with this snapshot class name.
func (ctrl *csiSnapshotCommonController) SetDefaultSnapshotClass(snapshot *crdv1.VolumeSnapshot) (*crdv1.VolumeSnapshotClass, *crdv1.VolumeSnapshot, error) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

this function does not work for a pre-existing snapshot which does not have StorageClass supplied.
its called at:

newSnapshot, err := ctrl.checkAndUpdateSnapshotClass(snapshot)

For pre-existing snapshot, it does not have PVC, thus there will not be storage class

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 7, 2019

Author Collaborator

Changed it. Now we don't return error if default snapshot class not found for pre-provisioned snapshots.

}

func (ctrl *csiSnapshotCommonController) contentExists(snapshot *crdv1.VolumeSnapshot) (*crdv1.VolumeSnapshotContent, error) {
if snapshot.Status == nil || snapshot.Status.BoundVolumeSnapshotContentName == nil {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

@jingxu97 I think Jing posted a concern w.r.t the cases where status is lost. only refer to the Status.BoundVolumeSnapshotContentName seems to be not sufficient.
my rough thoughts to get conentName is:
if snapshot.Status == nil || snapshot.Status.BoundVolumeSnapshotContentName == nil --> use BoundVolumeSnapshotContentName
else if snapshot.Spec.VolumeSnapshotContentName != nil -> use VolumeSnapshotContentName --> this is to handle pre-existing case where binding did not happen?
else for dynamic creation case (to avoid situations like controller crashes before VS.Status has been updated), use the generated name.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 7, 2019

Author Collaborator

Done

// verifySnapshotClass gets the VolumeSnapshotClass from VolumeSnapshot.
// It also detects if snapshotter in the VolumeSnapshotClass is the same as
// the snapshotter in external controller.
func (ctrl *csiSnapshotCommonController) verifySnapshotClass(snapshot *crdv1.VolumeSnapshot) (*crdv1.VolumeSnapshot, error) {

This comment has been minimized.

Copy link
@yuxiangqian

yuxiangqian Nov 5, 2019

Contributor

returning a new snapshot is not useful here.

This comment has been minimized.

Copy link
@xing-yang

xing-yang Nov 7, 2019

Author Collaborator

This function is not needed any more.

Makefile Outdated Show resolved Hide resolved
cmd/csi-snapshotter-common/main.go Outdated Show resolved Hide resolved
cmd/csi-snapshotter-common/main.go Outdated Show resolved Hide resolved
cmd/csi-snapshotter-sidecar/main.go Outdated Show resolved Hide resolved
deploy/kubernetes/common-snapshot-controller.yaml Outdated Show resolved Hide resolved
deploy/kubernetes/rbac-common.yaml Outdated Show resolved Hide resolved
deploy/kubernetes/setup-csi-snapshotter.yaml Outdated Show resolved Hide resolved