Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning Vertical Backup storage gone awry #465

Open
droolio opened this issue Jul 23, 2018 · 14 comments
Open

Pruning Vertical Backup storage gone awry #465

droolio opened this issue Jul 23, 2018 · 14 comments

Comments

@droolio
Copy link

droolio commented Jul 23, 2018

Some background: I'm using Vertical Backup to backup 3 VMs to local SFTP storage on a Debian VM. Then I'm using the Duplicacy copy command to copy the storage to a remote SFTP.

So far, because I've had to rearrange the remote storage a bit (had to buy a bigger HDD!) and the quantity of data is quite large, some time has past since the last full copy, so I'm copying (manually at the moment) each VM one at a time, from the latest revision, rather than all revisions. While this was going on, I also took the opportunity to manually run prune a few times on the local storage, for the first time ever.

During the copy of one VM I received this error:

Chunk 60a85ee0721cd38cac2cc1372e822ea2d56db93d311687a827a8477c894fbc4b can't be found

Apparently, this chunk was removed after two consecutive prunes.

user@cdsback:~/dummy/.duplicacy/logs$ grep 60a85ee0721cd38cac2cc1372e822ea2d56db93d311687a827a8477c894fbc4b prune-log-*
prune-log-20180721-174544:Marked fossil 60a85ee0721cd38cac2cc1372e822ea2d56db93d311687a827a8477c894fbc4b
prune-log-20180721-212613:Deleted fossil 60a85ee0721cd38cac2cc1372e822ea2d56db93d311687a827a8477c894fbc4b (collection 1)
user@cdsback:~/dummy/.duplicacy/logs$ ls -l
total 82004
-rw------- 1 user user     16447249 Jul 21 20:14 prune-log-20180721-174544
-rw------- 1 user user     24153881 Jul 21 23:34 prune-log-20180721-212613

(There are other prune logs in there but I trimmed it for relevancy; I listed these logs mainly to show the timestamps - when the prune was started, and when it completed.)

In order to understand the chain of events, I took the emailed Vertical Backup log and edited it (here) - adjusted the timestamps (which were in UTC) to local time and inserted events according to the two log files:

prune-log-20180721-174544.txt
prune-log-20180721-212613.txt

Note CDSServ is the VM with the missing chunk and none of the removed snapshot revisions are that recent. Subsequent backups by Vertical Backup, however, haven't replaced the missing chunk!

From my understanding, it shouldn't be possible for prune to remove chunks that are still used by snapshots. A fossil collection shouldn't be deleted until at least 1 backup from all repositories are made.

I don't have a wonderful understanding of the code but these comments drew my attention.

presumably one host can do one backup at a time

That's not what actually happen though, is it? Vertical Backup does one VM backup after another - they're not atomic.

Is Duplicacy treating the backup of the first VM as satisfied the fossil deletion step on the collected fossils?

@gilbertchen
Copy link
Owner

Which revision of CDS-Server referenced the missing chunk? Is it 67?

@droolio
Copy link
Author

droolio commented Jul 23, 2018

CDSServ (not CDS-Server) is the one I was trying to copy, from a revision (I can't remember which exactly) since those prunes were run. But seems the chunk isn't referenced in 67 but it is in 68 onwards (even 72 which completed a few hours ago):

user@cdsback:~/dummy$ duplicacy list -id CDSServ@cds-esxi6 -chunks -r 68 >chunks.txt
user@cdsback:~/dummy$ grep 60a85ee0721cd38cac2cc1372e822ea2d56db93d311687a827a8477c894fbc4b chunks.txt
chunk: 60a85ee0721cd38cac2cc1372e822ea2d56db93d311687a827a8477c894fbc4b

Edit: I'm currently running a check command but it's taking a loooong time - stuck on listing all chunks - will see in the morning. :)

@gilbertchen
Copy link
Owner

I think this was caused by a bug in determining the set of new snapshots:

if snapshot.EndTime > collection.EndTime+int64(extraTime) {
hasNewSnapshot[hostID] = true
newSnapshots = append(newSnapshots, snapshot)
break
} else {

A snapshot not seen when the fossil collection was created should always be added to newSnapshots, regardless of what its EndTime is. From the log it looks like the revision 68 was not included in newSnapshots, because it was completed before the first prune command finished.

I'll roll out a fix tomorrow.

@droolio
Copy link
Author

droolio commented Jul 23, 2018

Excellent, thanks.

How do I go about fixing the missing chunk? :) I'd rather not start from scratch (it's 480GB and mostly copied onto remote storage).

It seems like the backup command in Vertical Backup needs a -hash or -force option, but I guess that would entail listing all the chunks on the storage, which takes an extraordinary long time on my system. (I'm still trying to run a check command and it's still "Listing chunks/8f/" etc..)

@gilbertchen
Copy link
Owner

The easiest way to fix this is to create a new empty storage directory on your sftp server, and copy over the config file to this new directory. Then modify the .verticalbackup/preferences file to change the storage url to point to this new storage directory. Run a backup of CDSServ, which will be an initial backup so it will attempt to upload every chunks. After that you may be able to find missing chunks in the new storage directory and copy them to the old storage directory.

@droolio
Copy link
Author

droolio commented Jul 23, 2018

That certainly seems like the safest way to go, but I'm running out of space on the SFTP server and don't have any spare, locally at least. This sounds hacky but what if I could go back to revision 67 by temporarily removing snapshot files 68 and above? I'd probably have to remove/rename the cache. Then I could put the original 68-75 revisions back and clean up. Would this work?

@droolio
Copy link
Author

droolio commented Jul 23, 2018

My dilemma solved itself! Subsequent backups no longer reference this chunk and a check command (which took a helluva long time; to be investigated another day!) says all chunks referenced by the snapshot at revision 74 exist. My guess is that part of the .vmdk was overwritten by the running VM.

@gilbertchen
Copy link
Owner

I think this is because revision 73 didn't reference that missing chunk and when revision 74 was created, Duplicacy did a lookup on that chunk and uploaded a new copy since the chunk couldn't be found. By default, any chunks referenced by the latest revision are assumed to exist so no lookups are needed for them.

The bug has been fixed by 72dfaa8. There is also a new release 2.1.1 that includes this bug fix if you need latest binaries.

@droolio
Copy link
Author

droolio commented Jul 24, 2018

Wonderful, thanks. I can't see a v2.1.1 on the release page for CLI, is that for the GUI version? No matter, wanted to get around to compiling myself anyway...

@gilbertchen
Copy link
Owner

Sorry it was saved as draft and I thought it was published. Should be available on the release page now.

@droolio
Copy link
Author

droolio commented Jul 24, 2018

I am still thinking about the check command and what happens if the latest revision has chunks that are missing, for whatever reason. Rare/impossible occurrence, but that's why we have the check command... Even if check discovers these missing chunks, there seems to be no easy way for the user to repair the backup chain - future backups will continue to make inconsistent revisions.

What if check could dump a list of missing chunks, once discovered - much like how fossil collections are stored in the repository's cache - and then backup could look for this file and cross-reference these missing chunks with everything to be backed up (implies the entire source needs to be re-hashed).

Or, back to my hacky idea...

Would it be feasible and simply easier to setup a new, temporary repository id, and run an initial backup to the same storage? Most of it would be de-duplicated but would it guarantee missing chunks referenced by the damaged repository id, be recreated? I can imagine it would in the case of Vertical Backup where chunks are fixed size, but a normal Duplicacy file-based backup?

@gilbertchen
Copy link
Owner

It is fairly easy to implement an option to avoid using the last snapshot as the cache and force a lookup on every chunk. I just couldn't find a good short and descriptive name for it.

But I also wonder whether it is really needed. For Duplicacy you can just edit the preferences file to switch to a new repository id and then the next backup will be an initial backup. For Vertical Backup the repository id is linked to the vm name so this trick isn't applicable so such an option is definitely needed.

@droolio
Copy link
Author

droolio commented Jul 24, 2018

For Duplicacy, would such a procedure guarantee missing chunks are recreated? Some time between backups pass, directories may get reorganised and the boundaries between files and chunks change.

For Vertical Backup, one could temporarily rename the snapshot folder on the storage out of the way (and the local cache). The majority of chunks would exist on storage and an initial backup would replace the missing chunks, correct?

@gilbertchen
Copy link
Owner

For Duplicacy, would such a procedure guarantee missing chunks are recreated? Some time between backups pass, directories may get reorganised and the boundaries between files and chunks change.

Right, you may not be able to recreate the missing chunks even if you still have the original files.

For Vertical Backup, one could temporarily rename the snapshot folder on the storage out of the way (and the local cache). The majority of chunks would exist on storage and an initial backup would replace the missing chunks, correct?

You're correct. Renaming the snapshot folder should work for Vertical Backup. I didn't think about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants