Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement --backup(-dir) #98

Closed
mkiesel opened this issue Aug 14, 2015 · 53 comments
Closed

Implement --backup(-dir) #98

mkiesel opened this issue Aug 14, 2015 · 53 comments
Labels
Milestone

Comments

@mkiesel
Copy link

@mkiesel mkiesel commented Aug 14, 2015

rsync's --backup(-dir) moves all files on the destination that would be deleted or overwritten with newer data to a backup directory. This feature is very helpful for creating incremental backups.

@ncw ncw added the enhancement label Aug 16, 2015
@ncw
Copy link
Member

@ncw ncw commented Aug 16, 2015

A nice idea and I didn't realise rsync had that feature

@roms2000
Copy link

@roms2000 roms2000 commented Nov 20, 2015

+1 and rclone can be used in every day backup for servers.
If you try to implement it, --backup-dir is very useful in rsync because it keep directory structure intact and move modified or deleted files in this backup dir.

@dbcm
Copy link

@dbcm dbcm commented Dec 11, 2015

+1

@ncw
Copy link
Member

@ncw ncw commented Feb 10, 2016

This has some similar ideas to #18

@ncw ncw added this to the Unplanned milestone Feb 10, 2016
@balazer
Copy link

@balazer balazer commented Jul 19, 2016

A --backup-dir option would be a great addition. Server-side copying goes part of the way to providing versioned backups, but it takes a very long time when there are a lot of files, and it can make wasteful use of the remote storage, depending on how the remote storage deals with duplicate files.

I think all you need is a --backup-dir option, and not a --backup option with suffixes. Suffixes would be more complicated, because then you have to worry about what suffix to add, and what filters to add to exclude the suffixed files.

The logic for --backup-dir would simple:

  • If a copy, sync, or move operation would replace or remove an existing destination file, first move that file to the backup-dir path.

Like with a normal move, the destination path and backup-dir path should not overlap. Some consideration must be made for what to do when a file to be moved already exists in the backup-dir path. For versioned backups, I'd argue that it makes sense to preserve the file that already exists in the backup-dir path. But in other circumstances, it might make more sense to replace the file in the backup-dir path. Maybe that should be an option.

Sample command:

rclone sync "c:\My Documents" "remote:backup/current/My Documents" --backup-dir "remote:backup/old versions/My Documents"

Relative paths should be preserved. So if remote:backup/current/My Documents/Folder/file.txt existed in the destination and would be replaced or removed, it would be moved to remote:backup/old versions/My Documents/Folder/file.txt.

Under this backup scheme, the destination path would be a proper sync of the source, and old versions & deleted files could be found in some backup-dir path. In typical usage, the backup-dir path would contain a date, and it would be up to the user to remove old folders after sufficient time has passed. This seems simpler than some of the ideas in issue #18 .

You might add a restriction that the backup-dir path be in the same remote as the destination path, if that makes things easier.

@robjlg
Copy link

@robjlg robjlg commented Aug 23, 2016

Hello, let me show how I use the rsync's backup-dir, backup and suffix options. This would be a very good feature to add to rclone.

I use 3 directory:

RSYNC - the destination dir
RSYNC_BAK - where the changed files are stored
RSYNC_DEL - where the deleted files are stored

I use 2 commands, the first do not delete files on destination (RSYNC) that was removed from source, it just move changed files to the backup-dir (RSYNC_BAK) and add a date and time label to the end of each file name before store the newer versions. For easy find files from a specific date, we add a directory for each month in the backup-dir path.

The second command move the deleted files to the backup-dir (RSYNC_DEL). This command runs just after the first one.

As an example, lets say a local file /drawing/ele/E1500025.dwg was changed a lot before being removed and the file /drawing/hid/H1500012.DWG also was changed but not removed. It would end up with this files on remote:

RSYNC/drawing/hid/H1500012.dwg

RSYNC_BAK/2016_08/drawing/hid/H1500012.dwg__2016_08_17_224039
RSYNC_BAK/2016_08/drawing/hid/H1500012.dwg__2016_08_18_125240
RSYNC_BAK/2016_08/drawing/hid/H1500012.dwg__2016_08_22_224135

RSYNC_BAK/2016_08/drawing/ele/E1500025.dwg__2016_08_03_224007
RSYNC_BAK/2016_08/drawing/ele/E1500025.dwg__2016_08_04_125246
RSYNC_BAK/2016_08/drawing/ele/E1500025.dwg__2016_08_08_224054
RSYNC_BAK/2016_08/drawing/ele/E1500025.dwg__2016_08_11_223952
RSYNC_BAK/2016_08/drawing/ele/E1500025.dwg__2016_08_17_224039

RSYNC_DEL/drawing/ele/E1500025.dwg

The 2 commands I use is like bellow

rsync -av --acls --xattrs --stats --backup --backup-dir=/RSYNC_BAK/2016_08 --suffix=__2016_08_17_224039 /drawing /RSYNC

rsync -av --acls --xattrs --stats --backup --backup-dir=/RSYNC_DEL --delete /drawing /RSYNC

Or

BASE="/RSYNC"
BASE_DEL="/RSYNC_DEL"
BASE_BAK="/RSYNC_BAK"

dir="/drawing"

rsync -av --acls --xattrs --stats --backup --backup-dir=${BASE_BAK} --suffix=$(date +"__%Y_%m_%d_%H%M%S") ${dir} ${BASE}

rsync -av --acls --xattrs --stats --backup --backup-dir=${BASE_DEL} --delete ${dir} ${BASE}

Thank you

@stephenjamieson
Copy link

@stephenjamieson stephenjamieson commented Sep 8, 2016

This would be awesome!

@ncw
Copy link
Member

@ncw ncw commented Sep 12, 2016

Now most remotes can do Copy or Move/Delete this is now a practical feature to implement.

It would also require #197 and #721 in an ideal world.

@ncw ncw modified the milestones: Soon, Unplanned / Help Wanted Sep 12, 2016
@dsrbecky
Copy link

@dsrbecky dsrbecky commented Sep 13, 2016

+1

@jkaberg
Copy link

@jkaberg jkaberg commented Sep 19, 2016

Might I suggest to at least consider/think about deduplication when this is implemented. Deduplication is a huge win, space wise :-)

@Cadish
Copy link

@Cadish Cadish commented Dec 27, 2016

Really looking forward to a backup feature like this! For me, this is the only thing missing in rclone.

@MONKiCODE
Copy link

@MONKiCODE MONKiCODE commented Jan 4, 2017

How's the "backup" feature coming along?

I assume this would also be working with encryption & ideally have a deduplication feature as well.

@ncw ncw modified the milestones: v1.36, Soon Jan 4, 2017
@ncw
Copy link
Member

@ncw ncw commented Jan 16, 2017

I've implemented this now - please find it in this beta. Any feedback much appreciated!

http://beta.rclone.org/v1.35-33-g47ebd07/ (uploaded in 15-30 mins)

@ncw ncw closed this in 3745c52 Jan 16, 2017
@ncw
Copy link
Member

@ncw ncw commented Jan 16, 2017

Here are the docs for --backup-dir

--backup-dir=DIR

When using sync, copy or move any files which would have been
overwritten or deleted are moved in their original hierarchy into this
directory.

The remote in use must support server side move or copy and you must
use the same remote as the destination of the sync. The backup
directory must not overlap the destination directory.

For example

rclone sync /path/to/local remote:current --backup-dir remote:old

will sync /path/to/local to remote:current, but for any files
which would have been updated or deleted will be stored in
remote:old.

@scorbisiero
Copy link

@scorbisiero scorbisiero commented Jan 16, 2017

Thanks a lot for this one! I'll download it as soon as I have a minute.
I am guessing files previously present in "backup-dir" will be overwritten?

So let's say I have Folder A with File1.txt and File2.txt. I copy it to Destination\A\File1.txt and File2.txt. Then I modify File1.txt and run the job with backup-dir. This will move Destination\A\File1.txt in the backup-dir.
Then I modify File1.txt again, this i guess will overwrite Destination\A\File1.txt?

@Cadish
Copy link

@Cadish Cadish commented Jan 16, 2017

Very nice, will test it when it's available for my platform (QNAP).

Thanks a lot!

@dsrbecky
Copy link

@dsrbecky dsrbecky commented Jan 16, 2017

@simnether Good question. My intention is to always set a different backup-dir based on date.

@scorbisiero
Copy link

@scorbisiero scorbisiero commented Jan 16, 2017

@dsrbecky Actual your is a good point, I think I will do that too 👍
For the function itself, perhaps adding an _# would help. So for instance if File1.txt is already in the backup dir, it could create File1_1.txt, File1_2.txt and so on.. just a thought

@ncw
Copy link
Member

@ncw ncw commented Jan 16, 2017

@simnether wrote:

I am guessing files previously present in "backup-dir" will be overwritten?

Yes you are correct. I'll add that to the documentation.

The intention is that you'd make a new backup dir for each day, or each backup, so it is up to you how granular you want the old backups to be. I don't really want to rename the files - that would complicate the implementation.

@ncw
Copy link
Member

@ncw ncw commented Jan 16, 2017

@dsrbecky wrote

My intention is to always set a different backup-dir based on date.

My plan is that rclone will grow a backup command eventually which will automate the use of --backup-dir which will do exactly that.

@scorbisiero
Copy link

@scorbisiero scorbisiero commented Jan 17, 2017

@ncw - Seems to be working fine so far :)

@mizzuri
Copy link

@mizzuri mizzuri commented Jan 17, 2017

ACD would complain if the filenames are the same in "backup-dir" about naming conflicts. But then we should be using new backup-dir names every time anyway.

@robjlg
Copy link

@robjlg robjlg commented Jan 17, 2017

Excellent, it is working like a charm. Perfect !

In DRIVE, every version in the backup-dir stays with the same name. This is very good.

Thank you, very very much

@ncw
Copy link
Member

@ncw ncw commented Jan 17, 2017

@balazer wrote

One note for your documentation, on Google Drive at least, I found that if the file in the backup-dir already exists, it will remain there alongside the moved file. This is totally fine, and I actually prefer having duplicates instead of overwriting in this case.

Hmm, that is unexpected! That probably means I haven't thought through enough what happens if there is an existing file in the backup-dir. @unnfav - you are right ACD complains about naming conflicts here in my tests.

I don't really like rclone creating duplicate file names. Even though drive allows it, it causes trouble with practically everything else! I could allow this behavior for fses which allow duplicate files I suppose, but I forsee it causing problems!

So my preferred course of action would be to overwrite the files in the backup-dir.

As for the technical mechanism... The solution is quite simple - to pass in a dst Object to Move if one exists and it will delete it first.

I'll re-open the ticket to remind me to fix this.

@ncw ncw reopened this Jan 17, 2017
@robjlg
Copy link

@robjlg robjlg commented Jan 17, 2017

@ncw wrote

I could allow this behavior for fses which allow duplicate files I suppose, but I forsee it causing problems! So my preferred course of action would be to overwrite the files in the backup-dir.

Is it very difficult to allow duplicate files on the backup-dir for DRIVE and others that allow it? As an option?

Could you consider this option?

Please note that it could be a very useful feature. In my case, an particular folder I sync it every 30 minutes and there are a lot of changes. Been able to have all those changes in the backup-dir is a huge advantage.

I was thinking about to create one backup-dir by month and have all file versions for a month there. It is not a good solution to create a new backcup-dir for every sync.

The program RSYNC has the --suffix option that allow to change file names by attaching the provided suffix for each file that goes to the backup-dir. Implementing this could be another solution.

Maybe an option to allow duplicate files on the backup-dir could be simple but will not work for all storages supported by rclone.

@mizzuri
Copy link

@mizzuri mizzuri commented Jan 18, 2017

@robjlg wrote

The program RSYNC has the --suffix option that allow to change file names by attaching the provided suffix for each file that goes to the backup-dir. Implementing this could be another solution.

That'd be great.

As for allowing duplicate naming, wouldn't it be a nightmare to do restores if various versions with the same filename exist in a directory?

@robjlg
Copy link

@robjlg robjlg commented Jan 18, 2017

@unnfav wrote

As for allowing duplicate naming, wouldn't it be a nightmare to do restores if various versions with the same filename exist in a directory?

For me it is OK! We can select the correct file based on its time-stamp.

On my RSYNC (bash shell) scripts, the --sufix option is configured as:

--suffix=$(date +"__%Y_%m_%d_%H%M%S")

The file FILE_TESTE.txt will be stored on the backup-dir as

FILE_TESTE.txt__2017_01_17_125337

@ncw
Copy link
Member

@ncw ncw commented Jan 18, 2017

@unnfav wrote

As for allowing duplicate naming, wouldn't it be a nightmare to do restores if various versions with the same filename exist in a directory?

Yes it would.

So what does everyone think about this plan?

  • --backup-dir will overwrite existing files when storing new files in the DIR
  • you can set --suffix to give those files a new name - by default it will be empty so files will be stored with their original name

I don't intend to implement --suffix without --backup-dir for rclone - rsync has to jump through hoops of fire to make that work with automatic filters etc.

To fix the original overwrite issue, the most efficient thing to be will be to load the metadata for the objects in backup-dir into memory. This should be a small fraction of the objects in the destination dir which also are loaded into memory.

@robjlg
Copy link

@robjlg robjlg commented Jan 18, 2017

@ncw wrote

So what does everyone think about this plan?

  • --backup-dir will overwrite existing files when storing new files in the DIR
  • you can set --suffix to give those files a new name - by default it will be empty so files will be stored with their original name

For me the best solution. I will use --suffix with --backup-dir,

@scorbisiero
Copy link

@scorbisiero scorbisiero commented Jan 18, 2017

@ncw - I would implement that as well to keep it simpler. As of right now the backups are stored in a separated folders and then a subfolder is created with the current date. Using a suffix will probably make it easier when restoring:
It'll all be in the same folder, meaning I will know what file version I have by looking at the same location (I will also use the date on the file).

I guess I can set --backup-dir to target the same (backupto) folder to keep everything in one place?

As I'm using it right now, I have a SYNC job towards ACL, then I do a COPY with --backup-dir of ACL's to another location in ACL that'll keep all the stuff I delete. [For who thinks it might be a pain as i need to download/reupload, it's not as a) I am backing a NAS up, so I would still need download traffic and b) my bandwidth is higher than what amazon provides me with, so i won't see any improvement]

ncw added a commit that referenced this issue Jan 19, 2017
This also makes sure we remove files we are about to override in the
--backup-dir properly.
ncw added a commit that referenced this issue Jan 19, 2017
This also makes sure we remove files we are about to override in the
--backup-dir properly.
@ncw
Copy link
Member

@ncw ncw commented Jan 19, 2017

OK Here is the next revision. It supports --suffix and won't duplicate files in drive or cause 409 errors with ACD

http://beta.rclone.org/v1.35-40-gb6848a3/ (uploaded in 15-30 mins)

Please test and let me know how you get on - thanks!

New docs

--backup-dir=DIR

When using sync, copy or move any files which would have been
overwritten or deleted are moved in their original hierarchy into this
directory.

If --suffix is set, then the moved files will have the suffix added
to them. If there is a file with the same path (after the suffix has
been added) in DIR, then it will be overwritten.

The remote in use must support server side move or copy and you must
use the same remote as the destination of the sync. The backup
directory must not overlap the destination directory.

For example

rclone sync /path/to/local remote:current --backup-dir remote:old

will sync /path/to/local to remote:current, but for any files
which would have been updated or deleted will be stored in
remote:old.

If running rclone from a script you might want to use today's date as
the directory name passed to --backup-dir to store the old files, or
you might want to pass --suffix with today's date.

--suffix=SUFFIX

This is for use with --backup-dir only. If this isn't set then
--backup-dir will move files with their original name. If it is set
then the files will have SUFFIX added on to them.

See --backup-dir for more info.

@balazer
Copy link

@balazer balazer commented Jan 19, 2017

@ncw, I just tested v1.35-40-gb6848a3 here with Google Drive. It's working fine. Existing files in the backup-dir get replaced. It doesn't keep the old revision, like a move would. Thanks again for your hard work. I already converted my backup scheme to use --backup-dir, and Google is thrilled about all of the data I have stored in their servers.

@ncw
Copy link
Member

@ncw ncw commented Jan 20, 2017

@balazer thanks for testing :-)

@robjlg
Copy link

@robjlg robjlg commented Jan 20, 2017

@ncw wrote

OK Here is the next revision. It supports --suffix and won't duplicate files in drive or cause 409 errors with ACD

http://beta.rclone.org/v1.35-40-gb6848a3/ (uploaded in 15-30 mins)

Please test and let me know how you get on - thanks!

Hello, I tested this new version with DRIVE, worked as the stated. With --suffix all versions remain in the backup-dir, each one with its different suffix.

With no --suffix provided, only one version is kept.

Thank you very much for this marvelous version

@scorbisiero
Copy link

@scorbisiero scorbisiero commented Jan 20, 2017

Tested with ACL and seems to be working as expected :)

@ncw
Copy link
Member

@ncw ncw commented Jan 20, 2017

Whoohoo!

Thanks for testing @robjlg and @simnether .

I think I'll close this ticket now which is another one ticked off for the 1.36 release :-)

@ncw ncw closed this Jan 20, 2017
@robjlg
Copy link

@robjlg robjlg commented Jan 23, 2017

Hi,

When do you intend to release the 1.36 version?

@ncw
Copy link
Member

@ncw ncw commented Jan 24, 2017

@robjlg wrote:

When do you intend to release the 1.36 version?

The plan is by 19th Feb

@robjlg
Copy link

@robjlg robjlg commented Jan 25, 2017

Thanks,

anxious waiting

@xfrankbx
Copy link

@xfrankbx xfrankbx commented Feb 2, 2017

Sorry if this is in the wrong spot, new to posting in community resources.
I read through this ticket and went and downloaded the following Betas
rclone-v1.35-33-g47ebd07β
rclone-v1.35-40-gb6848a3β
rclone-v1.33-100-gcb40511β

rclone-v1.35-33-g47ebd07β had an issue where it moved changed files over, as the original for ACD...
rclone-v1.35-40-gb6848a3β seems to work perfectly
rclone-v1.33-100-gcb40511β Is missing the --backup-dir option completely.

But good work, from my testing on rclone-v1.35-40-gb6848a3β, everything is working nicely with ACD.

@ncw
Copy link
Member

@ncw ncw commented Feb 2, 2017

@fboyd78 thanks for testing.

@vb0
Copy link

@vb0 vb0 commented Feb 4, 2017

I always missed this option from rclone, it is something EXTREMELY useful and not easy (and not completely) possible to work around. It is fantastic for making simple (=reliable and hard to screw up) incremental backups, very easy to access, very easy to test everything is there (as the destination folder will always mirror the original while the --backup-dir directories will contain all the removed/changed files), very easy to prune (just delete all --backup-dir from last year), etc, etc, etc.

Now that we have it rclone is a one-line powerful almost no-setup-needed (encrypted too if desired!) all-included incremental backup system. No database needed, no big multi-GB archive files holding who-knows-what-where, everything just filesystem-based. You can set it on multiple computers with minimal effort, you can access your files from any machine with minimal setup (just get your config and rclone binary).

For reference (maybe it helps somebody less familiar with Windows's peculiarities) I'm using in Windows %datetime% generated from cygwin's date binary and of course in linux the same date directly:

for /f %%i in ('c:\cygwin64\bin\date.exe +"%%Y%%m%%d%%H%%M%%S"') do set datetime=%%i

Two inconsequential details I've noticed testing this:

  • there's no "--backup" switch needed (actually accepted at all) like for rsync, just use --backup-dir
    - it doesn't seem to work for "copy", only "sync". Perfectly fine, anyway we wanted it for sync but if you're coming from copy (like I was, because I wanted sync actually but was too afraid it'll sync some accidentally removed folders) you might be surprised it isn't working as expected.

In any case absolutely fantastic job. Thanks a lot.

@balazer
Copy link

@balazer balazer commented Feb 4, 2017

--backup-dir is supposed to work with sync, copy, and move. I've tested it with sync and copy to Google Drive, and it is working fine. @vb0, if it's not working for you, maybe you can provide details so it can be debugged.

@vb0
Copy link

@vb0 vb0 commented Feb 4, 2017

My mistake, it does work as expected with "copy" - I was probably to quick and the web GUI wasn't showing the folders yet.

@traynier
Copy link

@traynier traynier commented Mar 8, 2017

I'm a very recent convert to rclone, and love it. Thanks for all your hard work on it! :-)

I have been testing this new functionality out, and just updated to the latest beta rclone v1.35-163-gc45c604β, and the following error(s) still happens:

Note my remote is onedrive, and using crypt with encrypted filenames (although using the onedrive remote NOT using crypt doesn't seem to make any difference?)

I am copying from local to my remote, and specifying a backup-dir:

rclone sync test oneenc:Z/test --backup-dir oneenc:Zold

Note that if the backup-dir does not exist, it gets stuck "Waiting for checks to finish", and in digging down into --dump-bodies gets in a retry loop calling GET /v1.0/monitor/REDACTED over and over after the call to POST /v1.0/drive/items/REDACTED/action.copy

The monitor returns HTTP 500, which seems to make rclone think it's rate limited and gets the pacer to continually retry, but the HTTP500 monitor response shows the server side copy actually failed in the body returned:

{"operation":"ItemCopy","percentageComplete":0.0,"status":"failed","statusDescription":"Completed 0/0 files; 0/85 bytes"}

If the backup-dir exists, but the file with the same name already exists in the backup-dir, it fails to copy:

2017/03/08 18:35:42 ERROR : test.txt: Failed to copy: can't copy "002i3q2u20jlcd0ietvoa93ri8" -> "002i3q2u20jlcd0ietvoa93ri8" as are same name when lowercase

If I use --suffix ".old", but a file with the new filename (i.e. including the new suffix) already exists in backup-dir, it gets stuck in a loop calling monitor the same as if the directory doesn't exist.

[Edit] I will confirm if the backup-dir exists, and the file doesn't exist in the backup-dir it does work as expected. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.