rclone backup: new command with incremental strategy #18

briceburg · 2015-01-19T01:18:07Z

Hi Nick,

I'm currently using s3ql to mount remote S3/GCS data as a local filesystem (through fuse), and then use a shell script to implement rsync based backups to this filesystem.

I'd like to make use of rclone based backups in ansible-pcd -- for its simplicity and for user friendliness (e.g. browsing on the remote end will display the backed up files themselves -- versus s3ql which displays indecipherable filesystem metadata).

For the majority of "backups", it's important to have incremental functionality that allows you to a restore a file from "yesterday" w/o having "todays" changes override and make that impossible. E.g. snapshots/rotations/&c.

Most incremental strategies also make subsequent backups efficient by limiting what gets backed up to changed components only; speeding up runtime and transfer time, and saving space @ the backup destination.

Protecting against file corruption making its way into downstream [snapshots/rotations/&c] is also a plus.

I think rclone handles the syncing component well -- although don't see anything apparent re: incremental snapshotting. Do you plan to implement or could you share your thoughts on this feature?

How to use GitHub

Please use the 👍 reaction to show that you are affected by the same issue.
Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
Subscribe to receive notifications on status change and new comments.

briceburg · 2015-01-19T01:34:59Z

My initial thought is to use remote-to-remote for this, e.g.

First Backup ("base")

rclone copy /path/to/backup remote:/backups/base

Subsequent Backups

date=`date "+%Y%m%d_%H:%M:%S"`
rclone sync remote:/backups/base remote:/backups/$date
rclone sync /path/to/backup remote:/backups/$date

Not sure about the efficiency of the remote-to-remote. Bad idea?

Also, the README indicates:

[sync] Deletes any files that exist in source that don't exist in destination.

I'm used to behavior that deletes files that exist in the target destination that do not exist in the source destination. Worried that the rclone behavior would remove [new] files from /path/to/backup ...

Thanks!

ncw · 2015-01-19T16:57:44Z

Your idea for the remote to remote copy is how I would approach it.

This has one disdvantage with rclone as it stands today in that it will effectively download the data and re-upload it. However I have been thinking about an allowing bucket to bucket copies which would be exactly what you want. S3, Swift and GCS all allow this. Here is the docs for GCS

https://cloud.google.com/storage/docs/json_api/v1/objects/copy

So if I were to implement that then the copy to backup first would work really quite well I think.

As for

[sync] Deletes any files that exist in source that don't exist in destination.

I think it is badly worded, it "deletes files in the destination that don't exist in the source" as you would expect. I'll fix the wording

briceburg · 2015-01-19T17:22:00Z

Nick,

Great! This is pretty exciting. Bucket-to-bucket copying sounds promising. What about this approach as well;

rclone sync /path/to/backup remote:/backups/base remote:/backups/changes.2015-01-19

Where rclone would compare /path/to/backup against remote:/backups/base , and copy changes to remote:/backups/changes-2015-01-19

Obviously this would mess with the deletes behavior, which could be dealt with by adding a flag that would remove deleted files from remote:/backups/base, and optionally preserving them elsewhere (e.g. copying them to remote:/backups/deleted-files ). We could then run a janitorial command that removes files older than X days from remote:/backups/deleted-files ) ... *and also take advantage of bucket-to-bucket copying without incurring the cost of doubling storage space with each snapshot *

ncw · 2015-01-19T17:39:55Z

Interesting idea!

I think I'd simplify the logic slightly and make it a new rclone command

rclone sync3 /path/to/backup remote:/backups/base remote:/backups/changes.2015-01-19

for every file in/path/to/backup
- if it is in base unchanged - skip
- if it is modified in base
  - copy the file from base to changes if it exists in base
  - upload the file to base
for every file in base but not in backup
- move it from base to changes

This would mean that base would end up with a proper sync of backup, but changes would have any old files which changed or were deleted. It would then effectively be a delta, and you would have all the files at both points in time.

You could re-create the old filesystem easily, except for if you uploaded new files into base - there would be no way of telling just by looking at base and changes that those new files where new or just unchanged old files. This may or may not be a problem!

briceburg · 2015-01-19T18:11:10Z

Nick,

Right -- totally agree re: new rclone command for this. The precursor is getting bucket-to-bucket functionality.

Just FYI, I'm super excited about rclone && am loving your work. For now, I'm going to go with an incredibly simple approach to backups with rclone.

basically I'll have two targets, one that is synced weekly, and one that is synced daily. E.g. my cron will look like;

10     2     *     *     *  rclone sync ~/VAULT google:vault/nesta/daily
10     4     *     *     0  rclone sync ~/VAULT google:vault/nesta/weekly

This will [hopefully] preserve deleted files in the weekly snapshot. Could also add a monthly &c.

I think this will work for now, but certainly interested in helping with bucket-to-bucket and incremental strategies. If I can help, please let me know. May need to learn Go :)

briceburg · 2015-01-19T18:41:30Z

I've added ansible scripts to 1) install rclone and 2) implement the above backup strategy on a crontab based system (still need to make a systemd timer compatible version for archlinux &c). Sharing for fun.

Install rclone:

---

- name: install rclone 
  hosts: all
  sudo: true
  sudo_user: root

  vars:
    # check http://rclone.org/downloads/ for latest...
    rclone_version: 1.07
    rclone_vstr: rclone-v{{ rclone_version }}-linux-amd64
    rclone_target: /opt/rclone/{{ rclone_vstr }}

  pre_tasks:
    - stat: path={{ rclone_target }}
      register: stat_rclone 

  tasks:
    - name: download rclone
      uri:
        dest=/tmp/
        follow_redirects=all
        url=http://downloads.rclone.org/{{ rclone_vstr }}.zip
      when: not stat_rclone.stat.exists

    - name: unpack rclone
      command: unzip /tmp/{{ rclone_vstr }}.zip -d /opt/rclone
        creates={{ rclone_target }}

    - name: add rclone to path
      file:
        state=link
        dest=/usr/local/bin/rclone
        src={{ rclone_target }}/rclone

Backup Stategy

---
- name: vault backup 
  hosts: all

  vars:
    vault_base: "google:iceburg-vault/{{ TARGET_USER }}"
    vault_daily: "{{ vault_base }}/daily" 
    vault_weekly: "{{ vault_base }}/weekly"

  tasks:
    - name: $HOME/.rclone.conf
      file:
        state=link
        dest={{ TARGET_USER_HOME }}/.rclone.conf
        src={{ DOTFILES_DIR }}/.rclone.conf
        force={{ FORCE_LINKS }}

    - name: fetch vault
      command: rclone copy {{ vault_daily }} ~/VAULT
        creates=~/VAULT

    - name: schedule daily vault backup
      cron:
        name="daily vault backup"
        minute=40
        hour=4
        job="rclone sync ~/VAULT {{ vault_daily }}"

    - name: schedule weekly vault backup
      cron:
        name="weekly vault backup"
        minute=40
        hour=5
        job="rclone sync ~/VAULT {{ vault_weekly }}"

briceburg · 2015-01-25T19:57:59Z

Nick,

I've been playing with Syncthing of late. It uses the very cool idea of "versions" I believe derrived from Dropbox and/or Bittorret Sync. Vs. the incremental ideas outlined -- perhaps an incremental versioning scheme is prefered and easier to implement?

The "simple" Versioning scheme in Syncthing allows you to specify a folder name and number of copies you would like to preserve. E.g.

During a sync, if a file is changed, copy the original version to the "versioned" folder.
E.g. :/.versions//filename.
If more than X versions of a file exist, delete the oldest.

So for the sync

rclone sync-versioned /path/to/backup remote:/backups

If remote:/backups/apache/virtualhost.a was FOUND, but deleted or changed from /path/to/backup/apache/virtualhost.a , rclone would

make sure remote:/backups/.versions/apache folder exists (assuming .versions is the configured folder name)
copy remote:/backups/apache/virtualhost.a to remote:/backups/.versions/apache/virtualhost.a
- if remote:/backups/.versions/apache/virtualhost.a exists, apply versioning scheme. E.g. rename older backups to remote:/backups/.versions/apache/virtualhost.a.[1-4] if configured to preserve 5 versions of a file.

Personally I think versions may be more accessible, and doesn't involve deltas. What do you think?

ncw · 2015-02-13T17:22:19Z

Sorry missed your last comment..

Yes, Versions sounds like it would be simpler for people to understand.

The renaming scheme needs a bit of thought - windows doesn't deal well with files with funny extensions.

Implementation wise, it is quite similar to the schemes above.

ncw · 2016-02-10T17:50:17Z

I'll just note that rclone now has bucket to bucket copy and sync which may be helpful!

leocrawford · 2016-08-08T20:34:41Z

A feature along the lines of #18 or #98 would be very welcome. I agree that it is desirable to store full files rather than diffs for simplicity and ease of restoration, but i wonder if we could improve on versioned folders idea?

The main drawback of this is when a file is moved (or repeatedly removed and created) we get a lot of copies of the same file. Instead if we treated the .backup directory as a content addressable storage, such that each backed up file was stored using its md5 has as a filename we would only need a little metadata stored to allow a restore.

I'd suggest that what we could need to store for each version is a JSON file that contains a line for each filesystem change along the lines of

operation, metadata, blob

here:

Operation would be add, delete, mkdir or similar (probably to match operations in fs)
Metadata would contain chmod, date, etc.
blob would be a md5 of the file in question

I'd suggest that the version file itself is named as the md5 of its contents and contains a reference (probably in the first line) to the previous backup. The most recent backup would probably be retained by writing the md5 of the most recent backup to a file called HEAD in the .backup directory. this would be the only file that would ever need to change. (in effect we're creating a merkle tree)

The advantage of this approach is that as well as restoring files we can restore other changes readily, by returning to any arbitrary point in the history (including deleted files, metadata, etc) and it could cope with multi-way syncing with a little work. I also believe this approach could support a full two-way sync more readily than simple versioning, as the metadata allows us to determine what changes have been made since last sync reducing our ability to determine which update to propagate, rather than simply having to mark a potential conflict.

In practice the easiest way of doing a restore is to allow source to have an optional version specified (either by using the md5 hash or simply an integer to represent the number of steps back to go), and so a restore could simply be a copy from the (old) destination.

One interesting way to implement this would be to provide SourceVersionWrapper and DestinationVersionWrapper which wrap any existing fs object, and in the case of SourceVersionWrapper allow an arbitrary version to be specified, and DestinationVersionWrapper simply creates the .backup metadata and blobs.

The advantage of this would be that if you did implement a FUSE support #494 then you would have in effect created a versioned filesystem for free. :-)

thibaultmol · 2017-01-12T07:48:08Z

New feature from Backblaze for B2: https://www.backblaze.com/blog/backblaze-b2-lifecycle-rules/

(might be relevant)

ncw · 2017-11-10T09:13:37Z

rclone now supports --backup-dir which with a tiny amount of scripting gives all the tools necessary for incremental backups.

I keep meaning to wrap this into an rclone backup command, but I haven't got round to it yet!

ncw · 2018-01-29T18:13:54Z

@navotera --backup-dir does a server side move, (or possibly a server side copy followed by a delete if server side move isn't available).

guestisp · 2018-08-10T07:05:26Z

So, by using something like:
rclone sync /path/to/local remote:current --backup-dir remote:$(date) remote:current will hold the latest backup (thus, the "current" version of files) and every changes between the current version and the previous one would be stored in "remote:$(date)" resulting in something like rsnapshot?

In other words, if yesterday i had a file called "foo" that was deleted today, with today clone, this file will be removed from the current remote and placed in remote with yesterday date, right?

Isn't easier to run a remote copy before a new sync? Like the following:

rclone sync remote:current remote:yesterday
rclone sync /path/to/local remote:current

Exactly like rsnapshot

ncw · 2018-08-10T10:27:37Z

@guestisp

So, by using something like:
rclone sync /path/to/local remote:current --backup-dir remote:$(date) remote:current will hold the latest backup (thus, the "current" version of files) and every changes between the current version and the previous one would be stored in "remote:$(date)" resulting in something like rsnapshot?

Yes that is right

In other words, if yesterday i had a file called "foo" that was deleted today, with today clone, this file will be removed from the current remote and placed in remote with yesterday date, right?

Yes.

Isn't easier to run a remote copy before a new sync? Like the following:

That will use a lot more storage - you'll have a complete copy for yesterday and a complete copy for current.

guestisp · 2018-08-10T10:45:23Z

But with --backup-dir, i have to search a file in every repository or each repository is a complete copy like with rsnapshot and hardlinks?

ncw · 2018-08-11T14:11:38Z

But with --backup-dir, i have to search a file in every repository or each repository is a complete copy like with rsnapshot and hardlinks?

Yes searching will be necessary as not many cloud providers support hard links. (A few do like google drive).

I intend to make a rclone backup command which hides this from the user though at some point.

guestisp · 2018-08-28T07:32:58Z

I'm trying to use the suggested method (--backup-dir) but something is not working as expected.

This is a simple script that i'm running:

#!/bin/sh

BACKUP_DIR=$(/bin/date +'%F_%R')
for dir in /etc /var/www /var/backups /var/spool/backups; do 
   rclone sync $dir amazon_s3:mybuket/current/$dir --backup-dir amazon_s3:mybuket/${BACKUP_DIR} --exclude '*/storage/logs/*' --stats 2s --log-level ERROR
done

I would expect that on first run, everthing would be synced in mybuket/current (and this is working properly), then on every subsequent run, changed files should be moved to mybucket/${BACKUP_DIR} but this is not working. Files are still synced in current

I would like to have something like rsnapshot. current should hold the latest sync, then every changes from the latest sync and the previous one, should be moved to the backup-dir.
In example, yesterday I had file1, file2. These are synced in current. Today I remove file2 and change file1 content. On next run, today's version should be synced in current, the yesterday version should be moved in 20180817_0930

ncw · 2018-08-28T13:06:16Z

What should happen is any files that are changed or deleted get moved to the backup-dir which is I think what you are asking for.

Here is a simple example

$ tree src
src
└── file1

0 directories, 1 file
$ rclone sync src dst/current --backup-dir dst/backup1
$ tree dst
dst
└── current
    └── file1

1 directory, 1 file
$ date > src/file1
$ date > src/file2
$ rclone sync src dst/current --backup-dir dst/backup1
$ tree dst
dst
├── backup1
│   └── file1
└── current
    ├── file1
    └── file2

2 directories, 3 files
$ rm src/file1
$ rclone sync src dst/current --backup-dir dst/backup2
$ tree dst
dst
├── backup1
│   └── file1
├── backup2
│   └── file1
└── current
    └── file2

3 directories, 3 files
$

I would say also that amazon_s3:mybuket/${BACKUP_DIR} in your script should be amazon_s3:mybuket/${BACKUP_DIR}/$dir to fit in with the naming scheme.

guestisp · 2018-08-28T13:12:06Z

So, removing file1 results in removal from current and a copy stored in backup1, right ?

I'm trying to figure out a properly naming schema, in example, I would like to create hourly backups. Currently i'm testing this:

BACKUP_DIR=$(/bin/date +'%F_%R' -d '1 hour ago')
rclone sync $dir amazon_s3:mybucket/current$dir --backup-dir amazon_s3:mybucket/${BACKUP_DIR}$dir

in a hourly cron.
This should move any old file from current to the last hour, and keep the current backup in current, so, if file1 is removed now (15:10), on next run (16:00), the current will loose file1 but the 15:00 directory will keep it. right ?

There is a huge drawback: the backup-dir will only hold changes, not the full tree like rsnapshot does via links. Probably, the following would create something more similiar to rsnapshot:

rclone sync remote:current remote:1-hour-ago
rclone sync /path/to/local remote:current

but using much more space.

ncw · 2018-08-31T21:19:25Z

So, removing file1 results in removal from current and a copy stored in backup1, right ?

That is correct.

This should move any old file from current to the last hour, and keep the current backup in current, so, if file1 is removed now (15:10), on next run (16:00), the current will loose file1 but the 15:00 directory will keep it. right ?

Yes that sounds correct too.

There is a huge drawback: the backup-dir will only hold changes, not the full tree like rsnapshot does via links

I intend to fix this with a dedicated backup command at some point but we are not there yet.

rclone sync remote:current remote:1-hour-ago
rclone sync /path/to/local remote:current

yes that would work. The first rclone command would use server side copies so be relatively quick too. It does use a lot more space though. Some might say that was a good thing as you then have two actually independent backups.

ivandeex · 2021-10-21T16:50:20Z

@ncw
The last comment here is 3 years old
Do you think that rclone backup is still a viable idea?

hmoffatt · 2022-07-28T05:00:42Z

Could you use --compare-dest with a list of all the directories since the last full backup in order to make an incremental backup?

Full backup: possibly use --copy-dest from all of the previous incrementals to avoid uploading again
Incremental backup: --compare-dest all the incrementals + the last full
Differential backup: --compare-dest the last full backup only

ncw added the enhancement label Feb 4, 2015

This comment has been minimized.

Sign in to view

ncw added this to the Unplanned milestone Feb 10, 2016

ncw mentioned this issue Feb 10, 2016

Implement --backup(-dir) #98

Closed

leocrawford mentioned this issue Sep 7, 2016

Feature Request: Versioning / Conflict resolution #670

Closed

ncw mentioned this issue Sep 12, 2016

Enhancement: Allow a 'delete to' directory similar to rsync's backup feature #720

Closed

This comment has been minimized.

Sign in to view

wolfv6 mentioned this issue Mar 27, 2018

Ability to restore a snapshot of the filesystem at a specific timestamp #2126

Open

ivandeex added help wanted new feature labels Nov 15, 2020

ivandeex added the change detection label Nov 25, 2020

ivandeex changed the title ~~incremental strategy~~ rclone backup: new command with incremental strategy Dec 2, 2020

ivandeex added the thinking label Dec 2, 2020

This comment has been minimized.

Sign in to view

ivandeex removed the enhancement label Mar 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rclone backup: new command with incremental strategy #18

rclone backup: new command with incremental strategy #18

briceburg commented Jan 19, 2015 •

edited by ivandeex

briceburg commented Jan 19, 2015

ncw commented Jan 19, 2015

briceburg commented Jan 19, 2015

ncw commented Jan 19, 2015

briceburg commented Jan 19, 2015

briceburg commented Jan 19, 2015

briceburg commented Jan 25, 2015

ncw commented Feb 13, 2015

This comment has been minimized.

ncw commented Feb 10, 2016

leocrawford commented Aug 8, 2016 •

edited

thibaultmol commented Jan 12, 2017

This comment has been minimized.

ncw commented Nov 10, 2017

This comment has been minimized.

ncw commented Jan 29, 2018

This comment has been minimized.

guestisp commented Aug 10, 2018

ncw commented Aug 10, 2018

guestisp commented Aug 10, 2018

ncw commented Aug 11, 2018

guestisp commented Aug 28, 2018

ncw commented Aug 28, 2018

guestisp commented Aug 28, 2018

ncw commented Aug 31, 2018

This comment has been minimized.

ivandeex commented Oct 21, 2021

hmoffatt commented Jul 28, 2022 •

edited

rclone backup: new command with incremental strategy #18

rclone backup: new command with incremental strategy #18

Comments

briceburg commented Jan 19, 2015 • edited by ivandeex

How to use GitHub

briceburg commented Jan 19, 2015

ncw commented Jan 19, 2015

briceburg commented Jan 19, 2015

ncw commented Jan 19, 2015

briceburg commented Jan 19, 2015

briceburg commented Jan 19, 2015

briceburg commented Jan 25, 2015

ncw commented Feb 13, 2015

This comment has been minimized.

ncw commented Feb 10, 2016

leocrawford commented Aug 8, 2016 • edited

thibaultmol commented Jan 12, 2017

This comment has been minimized.

ncw commented Nov 10, 2017

This comment has been minimized.

ncw commented Jan 29, 2018

This comment has been minimized.

guestisp commented Aug 10, 2018

ncw commented Aug 10, 2018

guestisp commented Aug 10, 2018

ncw commented Aug 11, 2018

guestisp commented Aug 28, 2018

ncw commented Aug 28, 2018

guestisp commented Aug 28, 2018

ncw commented Aug 31, 2018

This comment has been minimized.

ivandeex commented Oct 21, 2021

hmoffatt commented Jul 28, 2022 • edited

briceburg commented Jan 19, 2015 •

edited by ivandeex

leocrawford commented Aug 8, 2016 •

edited

hmoffatt commented Jul 28, 2022 •

edited