New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup planning: multiple repositories better than one? #1015

Closed
mholt opened this Issue Jun 10, 2017 · 11 comments

Comments

Projects
None yet
2 participants
@mholt
Copy link
Contributor

mholt commented Jun 10, 2017

This isn't an issue/bug, but I wanted to ask the question and see what you (and other users) think.

I have some files on my laptop and some files on a mounted network drive. The network drive is only mounted sometimes when I am at home, but both contain master copies of their data. Because their data is disjoint, I want to back up both my laptop and the network drive to another local hard drive as well as to B2 for redundancy.

Do you recommend a single repo in each backup location, each one shared by my laptop and network drive as a destination? Or should I use one repo per device per backup location?

From my perspective, both these devices are "my master copy of my data" and the only reason I use a network drive is lack of space on my laptop. They're both my data, it just so happens they're split between two devices.

What are the advantages or disadvantages of using one repo per device in this case, or sharing the same repo between both devices?

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Jun 10, 2017

That's a good question. First: Both is possible (using a separate repo per data source, or using the same repo). If the data in both sources is completely different, then it doesn't make a huge difference if you use the same repo or not, because the deduplication is not of any use.

The second question is whether to backup to one (local) repo, and sync that to B2 (e.g. via rclone), or run separate backups first to the local repo, and the second backup to B2. I don't see any particular pro/con for either workflow. If you sync a local repo, you need to make sure that the sync is bi-directional (so locally deleted files are also deleted on B2).

Anything else I missed?

@mholt

This comment has been minimized.

Copy link
Contributor Author

mholt commented Jun 11, 2017

@fd0 Indeed, the data on my laptop and the network drive is different, so I guess locking won't be an issue if backing up to the same repo? And one repo is probably easier to keep track of and manage than two.

I hadn't thought about your second point, about just syncing the repo to B2 using rclone or something. 🤔 Backups are CPU intensive, and on a raspberry pi that's a lot to ask, so maybe cloning is a better option. I'll have to think about this; seems kind of silly that I wouldn't ever use the B2 backend I'd been advocating for 😄

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Jun 11, 2017

Locking isn't an issue, even when the data is completely the same and you run the backup for two sources in parallel. It may then happen that a bit too much data is saved, but that is cleaned up the next time you run prune.

It's not silly to sync back and forth if that's the best solution. After all the B2 backend gives you many additional possibilities: Restoring directly from B2, and more important: Mount the repo via fuse directly from B2 to restore a single dir/file. That's valuable :)

So, can we close this issue?

@mholt

This comment has been minimized.

Copy link
Contributor Author

mholt commented Jun 11, 2017

Oh, so if I only sync the repo to B2 instead of backing up to it directly, I can't restore from it or mount as a drive?

Yeah, we can close this. :) My main question(s) have been answered.

Edit: I just re-read your answer and I think I understand now. I can just sync to B2 and then use the B2 backend to do restores and mount it from B2 directly, without having to sync back locally first. That's really great! The repos are "portable" in that sense.

@mholt mholt closed this Jun 11, 2017

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Jun 11, 2017

Hm, maybe I was unclear: If you backup to a local repository, sync that (including deletes) to B2, then you can use the repo on B2 the same way as the local repo: Mount, restore, all works. You shouldn't backup directly to the B2 repo in this constellation, that may have unwanted side effects.

@mholt

This comment has been minimized.

Copy link
Contributor Author

mholt commented Jun 11, 2017

Ah yes, thanks for the clarification! I edited my answer at the same time, where I eventually figured that out, haha. Thanks again.

@mholt

This comment has been minimized.

Copy link
Contributor Author

mholt commented Jun 13, 2017

@fd0 Oh, one more follow-up question. Is it a problem if rclone syncs while the backup is being performed? Of course the sync is read-only at the source (where the backup is actively happening), but still: could syncing and backing up at the same time lead to a copy of the repo on B2 that's at an inconsistent state?

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Jun 13, 2017

Hm, excellent question. I don't think that it is a problem starting the sync while the backup is still running. You may end up with a repo on B2 that in itself contains too much data and does not have the latest snapshot (as the snapshot file is stored at the end). Apart from that it should work just fine. Please let me know if anything goes wrong, you can easily test that by running restic check directly on the B2 repository (and ignore reports about pack files not being in the index, that's expected).

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Jun 13, 2017

Ah, maybe another comment on the restic repository format: It is designed in a way that during backup, no data is ever modified or removed, it only creates now files. Only forget removes snapshot files, and prune removes data files (and adds new ones), but even those operations won't modify existing files.

@mholt

This comment has been minimized.

Copy link
Contributor Author

mholt commented Jun 13, 2017

Excellent, thanks! What I have so far is:

  • My laptop and a home file server backing up different things to the same repo on a network drive
  • The home server is syncing the backup repo to B2 using rclone

Amazingly, it's working. I've tested restores and syncing (with rclone) while backing up and you're right, the snapshots don't appear on B2 until the next sync, but that's fine, since they'll both run regularly.

So, that's all! Nice work. I think I'm ready to use this prime time for my backups. 👍

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Jun 13, 2017

Nice, thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment