Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Recovery for Restic Backup #202

Merged
merged 27 commits into from Nov 10, 2017

Conversation

Projects
None yet
2 participants
@diptadas
Copy link
Contributor

commented Oct 16, 2017

@diptadas diptadas requested a review from tamalsaha Oct 16, 2017

@tamalsaha tamalsaha changed the title Add Recovery for Restic Backup Implement Recovery for Restic Backup Oct 16, 2017

@tamalsaha

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2017

v1 -> /src1

v2 -> /src2

FileGroups;

  • /s1/a
  • /s2/b

Backup: >>>>

Recover: >>>

[]Volume: {v1, v2}

[]VloumeMount: {v1 -> s1, v2 -> s2}

restic restore --snapshot=123

/s1/a, /s2/b

@tamalsaha

This comment has been minimized.

Copy link
Contributor

commented Oct 17, 2017

api revision:
a67d0b1

@diptadas diptadas force-pushed the recovery branch from 04b0b1b to 6fa790b Oct 17, 2017

@tamalsaha

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2017

// NodeSelector is a selector which must be true for the pod to fit on a node.
--
  | // Selector which must match a node's labels for the pod to be scheduled on that node.
  | // More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  | // +optional
  | NodeSelector map[string]string `json:"nodeSelector,omitempty" protobuf:"bytes,7,rep,name=nodeSelector"`
@tamalsaha

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2017

Workload "deploy/xyz" , "rc/xyz", "rs/xyz"

SmartPrefix = deploy/xyz | RESTIC_REPOSITORY= s3://bucket/restic/deploy/xyz
Hostname = xyz

Workload "ds/xyz"

SmartPrefix = ds/NODE_NAME/xyz | RESTIC_REPOSITORY= s3://bucket/restic/ds/NODE_NAME/xyz
Hostname = NODE_NAME/xyz

Workload "ss/xyz"

SmartPrefix = ss/POD_NAME | RESTIC_REPOSITORY= s3://bucket/restic/ss/POD_NAME
Hostname = POD_NAME


Recovery

Workload: Kind/Name
NodeName (only for DS)
PodOrdinal (only for SS)


type RecoverySpec struct {
Restic string `json:"restic,omitempty"`
Workload string `json:"workload,omitempty"`

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

I think we should use Workload LocalTypedReference instead of Workload string
https://github.com/appscode/voyager/blob/master/apis/voyager/v1beta1/types.go

Restic string `json:"restic,omitempty"`
Workload string `json:"workload,omitempty"`
PodOrdinal string `json:"podOrdinal,omitempty"`
NodeSelector map[string]string `json:"nodeSelector,omitempty"`

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

I think instead if taking NodeSelector, user should provide NodeName string, since we are dealing with a single Node.

@@ -111,3 +111,15 @@ func (w *ResticWrapper) Forget(resource *api.Restic, fg api.FileGroup) error {
}
return nil
}

func (w *ResticWrapper) Restore(path, host string) error {

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

Do we need to consider the tags?

func (c *StashController) runRecoveryJob(rec *api.Recovery) error {
restic, err := c.stashClient.Restics(rec.Namespace).Get(rec.Spec.Restic, metav1.GetOptions{})
if err != nil {
log.Infoln(err)

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

No need to log here. It is logged by caller. LN#114.

But since this is probably a user error, we need to log event. Then user can run kubectl describe rec <> to see the error.
https://github.com/appscode/voyager/blob/4f89df1c10bda6d31a203f5359bcab143227a92d/pkg/ingress/hostport.go#L129

}

if err = restic.IsValid(); err != nil {
log.Infoln(err)

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

No need to log here. It is logged by caller. LN#114.

But since this is probably a user error, we need to log event. Then user can run kubectl describe rec <> to see the error.
https://github.com/appscode/voyager/blob/4f89df1c10bda6d31a203f5359bcab143227a92d/pkg/ingress/hostport.go#L129

@@ -307,3 +308,97 @@ func RecoveryEqual(old, new *api.Recovery) bool {
}
return reflect.DeepEqual(oldSpec, newSpec)
}

func CreateRecoveryJob(recovery *api.Recovery, restic *api.Restic, tag string) *batch.Job {
job := &batch.Job{

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

It should have OwnerRef to the Recovery so that Job is deleted if the user deleted the Recovery object in the middle.

return c.runRecoveryJob(d)
}

func (c *StashController) runRecoveryJob(rec *api.Recovery) error {

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

Where are we deleting the Job and its pods? We don't want pods to remain in success status.

metav1.TypeMeta `json:",inline,omitempty"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec RecoverySpec `json:"spec,omitempty"`
Status string `json:"status,omitempty"`

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

RecoveryStatus needs to be redone. See how it is done for Pods: https://github.com/kubernetes/api/blob/409c3b2393cd3359f275bd8b883b4d9c5aec41f6/core/v1/types.go#L2751

type RecoveryPhase string 

const(
  ...
  RecoverySucceeded RecoveryPhase = "Succeeded"
)

type RecoveryStatus struct {
	Phase RecoveryPhase `json:"phase,omitempty"`
	Stats []RestoreStats `json:"stats,omitempty"`
}

type RestoreStats struct {
	Path             string       `json:"path,omitempty"`
	Duration       string       `json:"duration,omitempty"`
}
return err
}

nodeName := os.Getenv("NODE_NAME")

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

nodeName comes from rec.Spec.NodeName

}

for _, fg := range restic.Spec.FileGroups {
if err = cli.Restore(fg.Path, hostname); err != nil {

This comment has been minimized.

Copy link
@tamalsaha

tamalsaha Oct 25, 2017

Contributor

We need to time these calls so that we can set that in the status and also send to Prometheus.

eg:
https://github.com/appscode/stash/blob/3910849819dc4da098aab7d8aa525ddb76dec2d6/pkg/scheduler/scheduler.go#L287

@tamalsaha tamalsaha force-pushed the recovery branch from bc38b89 to f7f7b44 Nov 10, 2017

@tamalsaha

This comment has been minimized.

Copy link
Contributor

commented Nov 10, 2017

I have filed bunch of follow up tasks. We can merge this for now.

@tamalsaha tamalsaha merged commit f476f44 into master Nov 10, 2017

@tamalsaha tamalsaha deleted the recovery branch Nov 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.