Add group volume snapshot CRD by harsh-px · Pull Request #222 · libopenstorage/stork

harsh-px · 2018-12-21T01:27:03Z

This change adds CRD and controller for Group local and cloud snapshots including 3D snaps.

This version also deprecates group snapshots using legacy volumesnapshots. Hence specs have been removed from integration tests.

Integration tests will be added for GroupVolumeSnapshots once this and portworx/sched-ops#98 are merged.

pkg/apis/stork/v1alpha1/types.go

pkg/apis/stork/register.go

pkg/migration/controllers/clusterpair.go

pkg/apis/stork/v1beta1/doc.go

pkg/apis/stork/v1alpha1/types.go

disrani-px

Still reviewing the controller

cmd/stork/stork.go

drivers/volume/portworx/portworx.go

disrani-px · 2019-01-11T22:02:55Z

drivers/volume/portworx/portworx.go

+
+	credID := getCredIDFromSnapshot(groupSnap.Spec.Options)
+
+	resp, err := p.volDriver.CloudBackupGroupCreate(&api.CloudBackupGroupCreateRequest{


If something crashes after triggering the cloudsnap but before we store the task IDs those cloudsnaps will be orphaned. We should pass in a task ID for this API similar to the Migration API. Doesn't need to be part of this change.

You mean pass a task ID to the OSD API to allow it to resume from a previously triggered backup?

Yes, we should pass in the same taskID for the same cloudsnap when we resume. The response should return an appropriate error if the same taskID has already been triggered and we can deal with it here.

Can you file a bug to track this?

Done

#229
https://portworx.atlassian.net/browse/PWX-7385

drivers/volume/portworx/portworx.go

pkg/groupsnapshot/controllers/groupsnapshot.go

disrani-px · 2019-01-11T22:24:20Z

pkg/groupsnapshot/controllers/groupsnapshot.go

+		// event is already being processed. In such situation, the operator framework
+		// with provide a groupSnapshot which is the same version as the previous groupSnapshot
+		// If we reprocess an outdated object, this can throw off the status checks in the snapshot stage
+		m.minResourceVersion = groupSnapshot.ResourceVersion


I don't think this is required, it doesn't happen in the migration controller.

I verified this by printing the actual objects between 2 updates. This was happening very regularly when the sync period was 15 seconds.

First print showed the I was updating the snapshots in the status of the group snapshot using sdk.Update().
Second print showed the new handle event coming in at the same timestamp. The groupSnapshot object in the event did not have the snapshots in status and had the same Metadata.ResourceVersion.

disrani-px · 2019-01-11T22:26:10Z

pkg/groupsnapshot/controllers/groupsnapshot.go

+	validateCRDTimeout  time.Duration = 1 * time.Minute
+	resyncPeriod                      = 60 * time.Second
+
+	updateCRD     = true


These seem odd, its clearer to just return true or false from the function this is being used. If you still want to use it, just define one variable and use !updateCRD

it was done to make the returns more readable. Made change to just have one variable.

disrani-px · 2019-01-14T06:04:58Z

cmd/stork/stork.go

+			Recorder: recorder,
+		}
+		if err := groupsnapshotInst.Init(); err != nil {
+			log.Fatalf("Error initializing groupsnapshot controller due to: %v", err)


Remove due to, keep it consistent with the other messages above.

drivers/volume/mock/mock.go

drivers/volume/volume.go

disrani-px · 2019-01-14T06:11:31Z

drivers/volume/portworx/portworx.go

+	case crdv1.PortworxSnapshotTypeCloud:
+		return p.getGroupCloudSnapStatus(snap)
+	case crdv1.PortworxSnapshotTypeLocal:
+		return nil, fmt.Errorf("status API not supported for local group snapshots")


This can't return an error. The caller wouldn't know how to deal with a legit error as opposed to this error.

Changed this to errors.ErrNotSupported. However, no change in controller handling. It still needs to record the err as an event so it clearly shows up in the snapshot events. The controller won't treat driver errors as sync errors so it won't return err on the .Handle().

disrani-px · 2019-01-14T06:14:11Z

drivers/volume/portworx/portworx.go

+
+	credID := getCredIDFromSnapshot(groupSnap.Spec.Options)
+
+	resp, err := p.volDriver.CloudBackupGroupCreate(&api.CloudBackupGroupCreateRequest{


Yes, we should pass in the same taskID for the same cloudsnap when we resume. The response should return an appropriate error if the same taskID has already been triggered and we can deal with it here.

disrani-px · 2019-01-14T06:32:43Z

pkg/groupsnapshot/controllers/groupsnapshot.go

+			return !updateCRD, err
+		}
+	} else {
+		logrus.Infof("Creating new group snapshot")


Log with GroupSnapshotLog

disrani-px · 2019-01-14T06:33:57Z

pkg/groupsnapshot/controllers/groupsnapshot.go

+		response, err = m.Driver.GetGroupSnapshotStatus(groupSnap)
+		if err != nil {
+			logrus.Errorf("group snapshot status returned err: %v", err)
+			m.Recorder.Event(groupSnap,


This will log multiple events? Handle() is also recording events when this function returns an error.

Same for other places in this function where event is being recorded.

disrani-px · 2019-01-14T06:37:19Z

pkg/groupsnapshot/controllers/groupsnapshot.go

+func (m *GroupSnapshotController) handleDelete(groupSnap *stork_api.GroupVolumeSnapshot) error {
+	err := m.Driver.DeleteGroupSnapshot(groupSnap)
+	if err != nil {
+		m.Recorder.Event(groupSnap,


I think the object is gone at this time, so the event won't really show up.

disrani-px · 2019-01-14T06:37:49Z

pkg/log/log.go

 	appv1beta1 "k8s.io/api/apps/v1beta1"
 	appv1beta2 "k8s.io/api/apps/v1beta2"
-	"k8s.io/api/core/v1"
+	v1 "k8s.io/api/core/v1"


Not required

disrani-px · 2019-01-14T06:40:59Z

test/integration_test/snapshot_test.go

 	destroyAndWait(t, ctx)
 }

-func groupSnapshotTest(t *testing.T) {


We need to have similar tests for the new groupsnapshot CRD

Will add the integration tests after this and portworx/sched-ops#98 merges.

disrani-px · 2019-01-14T06:42:23Z

Also need to add sub commands to storkctl

harsh-px · 2019-01-14T17:13:14Z

Also need to add sub commands to storkctl

Created #226 for this. storkctl support was absent for prior groupsnaps too. We are already too late for the 2.0.2 release. So out of scope for new changes.

harsh-px · 2019-01-14T17:17:56Z

Have addressed the review comments.

disrani-px · 2019-01-14T21:14:46Z

specs/groupvolumesnapshots/cassandra-e2e/cassandra-restore-pvcs.yaml

@@ -0,0 +1,50 @@
+apiVersion: v1


specs is for the deployments. Can you move these under /examples so that they are easier to find.

disrani-px · 2019-01-14T21:20:08Z

drivers/volume/portworx/portworx.go

+		if csStatus.status == api.CloudBackupStatusDone {
+			conditions = getReadySnapshotConditions()
+			doneTasks = append(doneTasks, taskID)
+			snapIDsPendingRevert = append(snapIDsPendingRevert, csStatus.cloudSnapID)


My point was it's confusing that we are saying that completed cloudsnaps need to be reverted. The decision that it needs to be reverted is being made somewhere else. This should just keep a track of snaps that are done and the variable name should reflect that.

harsh-px · 2019-01-14T21:55:43Z

Addressed last 2 comments.

Signed-off-by: Harsh Desai <harsh@portworx.com>

Signed-off-by: Harsh Desai <harsh@portworx.com> Address review comments - process cloudsnap failed tasks - revert local and cloudsnaps as soon as the first failure is observed - allow deletes of legacy group snapshots - group snapshot controller part of snapshot controller - fix cassandra restore pvcs - fix v1 imports - fix duplicate events - revert active cloudsnapshots too - use groupsnap logger - move specs to examples - use explicit variables to track done and active IDs Signed-off-by: Harsh Desai <harsh@portworx.com>

Signed-off-by: Harsh Desai <harsh@portworx.com>

harsh-px self-assigned this Dec 21, 2018

harsh-px requested a review from disrani-px December 21, 2018 01:27

disrani-px reviewed Dec 21, 2018

View reviewed changes

pkg/apis/stork/v1alpha1/types.go Show resolved Hide resolved

disrani-px reviewed Jan 5, 2019

View reviewed changes

pkg/apis/stork/register.go Show resolved Hide resolved

pkg/migration/controllers/clusterpair.go Outdated Show resolved Hide resolved

pkg/apis/stork/v1beta1/doc.go Outdated Show resolved Hide resolved

disrani-px reviewed Jan 10, 2019

View reviewed changes

pkg/apis/stork/v1alpha1/types.go Outdated Show resolved Hide resolved

harsh-px force-pushed the group-snap-crd branch 2 times, most recently from d708f1f to f8c44c9 Compare January 11, 2019 06:18

disrani-px reviewed Jan 11, 2019

View reviewed changes

disrani-px suggested changes Jan 14, 2019

View reviewed changes

harsh-px force-pushed the group-snap-crd branch from 16bf491 to 589cd9c Compare January 14, 2019 17:16

disrani-px reviewed Jan 14, 2019

View reviewed changes

harsh-px force-pushed the group-snap-crd branch from 8de4da0 to 3054ece Compare January 14, 2019 22:27

Harsh Desai added 3 commits January 14, 2019 14:46

Add group volume snapshot CRD

fd53f4e

Signed-off-by: Harsh Desai <harsh@portworx.com>

vendoring changes for group snapshots

725a456

Signed-off-by: Harsh Desai <harsh@portworx.com>

harsh-px force-pushed the group-snap-crd branch from 3054ece to 725a456 Compare January 14, 2019 22:48

disrani-px approved these changes Jan 14, 2019

View reviewed changes

harsh-px merged commit 2d46472 into libopenstorage:master Jan 14, 2019

harsh-px deleted the group-snap-crd branch January 14, 2019 23:17

disrani-px added enhancement release-note Information about this change needs to be added to the release note labels Feb 19, 2019

disrani-px added this to the 2.1.0 milestone Feb 19, 2019


		credID := getCredIDFromSnapshot(groupSnap.Spec.Options)

		resp, err := p.volDriver.CloudBackupGroupCreate(&api.CloudBackupGroupCreateRequest{

Conversation

harsh-px commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

disrani-px left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

disrani-px commented Jan 14, 2019

Uh oh!

harsh-px commented Jan 14, 2019

Uh oh!

harsh-px commented Jan 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harsh-px commented Jan 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

harsh-px commented Dec 21, 2018 •

edited

Loading