Don't log error if restic volume is empty #1480

carlisia · 2019-05-15T00:36:51Z

Couple of notes:

I thought this error would happen here: https://github.com/heptio/velero/blob/ee8f8ca1db5917b631547b5f59e4075caee66b20/pkg/controller/pod_volume_backup_controller.go#L233

But the execution path doesn't go thru there.

I made small mods to some error messages to make them different from one another. Let me know if it's best to leave them as they were.

Signed-off-by: Carlisia carlisiac@vmware.com

skriss · 2019-05-20T19:40:42Z

@carlisia it definitely looks to me like the error is occurring on L232-233 in the pod volume backup controller. Here's the error I see in the log of the restic pod that ran the backup:

time="2019-05-20T19:32:09Z" level=error msg="Error running command=restic backup --repo=azure:velero-rc1:/restic/nginx --password-file=/tmp/velero-restic-credentials-nginx253408481 --cache-dir=/scratch/.cache/restic . --tag=volume=nginx-logs --tag=backup=empty-dir --tag=backup-uid=f3568c2b-7b35-11e9-bf06-be4461881575 --tag=ns=nginx --tag=pod=nginx-deployment-f56465b95-tkbjr --tag=pod-uid=4167f363-7aa8-11e9-9e1c-d22329cbafb3 --host=velero, stdout=open repository\n, stderr=Fatal: unable to save snapshot: snapshot is empty\n" backup=velero/empty-dir controller=pod-volume-backup error="exit status 1" error.file="/go/src/github.com/heptio/velero/pkg/controller/pod_volume_backup_controller.go:232" error.function="github.com/heptio/velero/pkg/controller.(*podVolumeBackupController).processBackup" logSource="pkg/controller/pod_volume_backup_controller.go:232" name=empty-dir-gmr2d namespace=velero

Remember, the pod volume backup controller is not running in the main velero deployment pod; it's running in the restic daemonset.

So I think that's the appropriate place to check for this error, not in resource_backupper.go.

Additionally, we need to decide what to do with the PodVolumeBackup custom resource when this occurs. Right now it goes into a Failed phase; I think it would make more sense to have it go to Completed. However, we'd then have Completed PVBs that didn't have SnapshotIDs - would need to consider whether this is an issue or not. I'm not sure off the top of my head what the right thing is to do here, but please look into it and propose something.

carlisia · 2019-05-20T23:22:07Z

I know why I didn't see the changes I had made to ln pod_volume_backup_controller.go:232: I was looking exclusively in the backup log. And there I did see the error that came from the resource_backupper. So that log is showing on the restic pod log. I'll fix it in the proper place then.

carlisia · 2019-05-20T23:25:36Z

As for the phase: this is not a failure. So it leaves us with Complete of PartiallyFailed. Since we will end up with PVBs that don't have SnapshotIDs, I wouldn't expect to see Complete. By rule of elimination, it leaves us with PartiallyFailed with a proper message indicating why. This is what I would do. WDYT?

skriss · 2019-05-21T14:55:44Z

So, couple things:

PodVolumeBackups don't currently have a phase of PartiallyFailed. It would be a breaking change to add.
From the user's perspective, I would not consider this scenario to be a failure or partial failure at all - so I'd say the overall backup should end as Completed (assuming no other errors).
For the PodVolumeBackup, I think it probably makes sense to mark it as Completed, with no SnapshotID, and something in the status.message field indicating that the volume was empty. This is a non-breaking change, that is still reasonably obvious to the user.
https://github.com/heptio/velero/blob/master/pkg/restic/backupper.go#L168-L176 is where the main velero deployment process waits for PVBs to complete, and returns results. We'd need to update this to properly handle Completed PVBs with no snapshot ID (probably by removing them from the map that's returned). Take a look at where this function is called - pkg/backup uses the returned results to write annotations to the pod, which are later used during restore to know which volumes need to be restored using restic. Since we won't have a snapshot for the empty volumes, we won't want to write the annotation for them onto the pod.

Signed-off-by: Carlisia <carlisiac@vmware.com>

carlisia · 2019-05-29T01:05:33Z

This is what the output looks like:

skriss

Mostly LGTM, two small comments. Let's add a changelog for this as well.

skriss · 2019-05-29T22:30:57Z

pkg/restic/backupper.go

@@ -168,6 +168,10 @@ ForEachVolume:
 		case res := <-resultsChan:
 			switch res.Status.Phase {
 			case velerov1api.PodVolumeBackupPhaseCompleted:
+				if res.Status.SnapshotID == "" {


let's add a comment here explaining that this would be the case if the volume was empty and therefore we don't actually have a restic snapshot, since it's not immediately obvious.

skriss · 2019-05-29T22:31:58Z

pkg/controller/pod_volume_backup_controller.go

 		r.Status.Phase = velerov1api.PodVolumeBackupPhaseCompleted
+		r.Status.SnapshotID = snapshotID
+		if emptySnapshot {
+			r.Status.Message = "backup completed with an empty volume"


maybe something like "volume was empty so no snapshot was taken" would be slightly clearer?

Signed-off-by: Carlisia <carlisiac@vmware.com>

skriss

LGTM, can be squashed on merge.

carlisia force-pushed the c-restic-log-error branch from 8d914c2 to ef30ee2 Compare May 15, 2019 00:45

skriss added this to the v1.x milestone May 20, 2019

carlisia force-pushed the c-restic-log-error branch from ef30ee2 to 007cc3e Compare May 28, 2019 05:53

Don't log error if restic volume is empty

836d4d9

Signed-off-by: Carlisia <carlisiac@vmware.com>

carlisia force-pushed the c-restic-log-error branch from 007cc3e to 836d4d9 Compare May 29, 2019 01:01

carlisia requested review from skriss and nrb May 29, 2019 01:05

skriss suggested changes May 29, 2019

View reviewed changes

Address code review + changelog

4f01b50

Signed-off-by: Carlisia <carlisiac@vmware.com>

carlisia force-pushed the c-restic-log-error branch from 56606f3 to 4f01b50 Compare May 30, 2019 02:12

skriss approved these changes May 30, 2019

View reviewed changes

nrb merged commit 0804f34 into vmware-tanzu:master May 30, 2019

carlisia deleted the c-restic-log-error branch May 30, 2019 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't log error if restic volume is empty #1480

Don't log error if restic volume is empty #1480

carlisia commented May 15, 2019

skriss commented May 20, 2019 •

edited

Loading

carlisia commented May 20, 2019

carlisia commented May 20, 2019

skriss commented May 21, 2019

carlisia commented May 29, 2019

skriss left a comment

skriss May 29, 2019

skriss May 29, 2019

skriss left a comment

Don't log error if restic volume is empty #1480

Don't log error if restic volume is empty #1480

Conversation

carlisia commented May 15, 2019

skriss commented May 20, 2019 • edited Loading

carlisia commented May 20, 2019

carlisia commented May 20, 2019

skriss commented May 21, 2019

carlisia commented May 29, 2019

skriss left a comment

Choose a reason for hiding this comment

skriss May 29, 2019

Choose a reason for hiding this comment

skriss May 29, 2019

Choose a reason for hiding this comment

skriss left a comment

Choose a reason for hiding this comment

skriss commented May 20, 2019 •

edited

Loading