Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "pulling" backups from remote servers #299

Open
fd0 opened this Issue Sep 9, 2015 · 40 comments

Comments

Projects
None yet
@fd0
Copy link
Member

fd0 commented Sep 9, 2015

At the moment, restic only supports creating backups of local files and directories and saving the data to a remote server. This issue is about the other way around: Having a dedicated backup server that logs into the to be backed up systems, gets the data, and saves it locally. For example, this may be implemented by creating an SSH connection to the remote server and starting a restic binary there, which communicates with the local restic binary over stdin/stdout.

Is that a revelant or interesting use case? What do you think?

@fd0 fd0 added rfc feature labels Sep 9, 2015

@cowai

This comment has been minimized.

Copy link

cowai commented Sep 9, 2015

I for one would love this feature. In my case I really trust the backup server as I have cared for that much more than the web servers where my websites is hosted. So for my this issue's use case is much better than the default restic behaviour.

Also It would be awesome if restic didn't need to be installed on the remote server, but could be pushed on the fly ( if same arch as local of course). This way I could upgrade restic on my backup server and all potential compatibility problems would simply go away.

@ar-jan

This comment has been minimized.

Copy link

ar-jan commented Sep 9, 2015

+1, that's exactly my use case. I was thinking of using sshfs to accomplish this for now, but what you describe sounds like it'll be faster and more reliable.

@fw42

This comment has been minimized.

Copy link
Member

fw42 commented Sep 9, 2015

👍 I think this would be useful (preferably without needing code execution access on the remote server, just sftp).

@bchapuis

This comment has been minimized.

Copy link
Contributor

bchapuis commented Sep 9, 2015

👍 but I think the deduplication should still be performed on the remote server, before transmitting the data, in order to optimize the bandwidth.

@fd0

This comment has been minimized.

Copy link
Member Author

fd0 commented Sep 9, 2015

@fw42 without code-access via sftp is possible, but harder to implement (I think).

@fw42

This comment has been minimized.

Copy link
Member

fw42 commented Sep 9, 2015

good point about deduplicating server-side, I guess that doesn't work without real ssh access

@cowai

This comment has been minimized.

Copy link

cowai commented Sep 9, 2015

sftp/sshfs is way too slow for big backups. Getting a file list over sshfs is multiple orders of magnitude slower. We are talking minutes instead of seconds. So even if no files have been changed, it needs a long time to just detect if anything is changed.

@cowai

This comment has been minimized.

Copy link

cowai commented Sep 9, 2015

Some benchmarks with the find command:
I did everything twice to let it cache. The second command obviously too fast, because of disk caching though...

[cowai@cowai-desktop ~]$ sshfs server:/media/raid servermount
[cowai@cowai-desktop ~]$ time(find servermount|wc -l)
76441

real    5m1.678s
user    0m0.760s

[cowai@cowai-desktop ~]$ ssh server 'time(find /media/raid|wc -l)'
76441

real    0m0.156s
user    0m0.032s
sys 0m0.136s

[cowai@cowai-desktop ~]$ time(find servermount|wc -l)
76441

real    4m34.652s
user    0m0.743s
sys 0m2.717s

[cowai@cowai-desktop ~]$ ssh server 'time(find /media/raid|wc -l)'
76441

real    0m0.145s
user    0m0.044s
sys 0m0.112s
@fd0

This comment has been minimized.

Copy link
Member Author

fd0 commented Sep 9, 2015

Yes, sftp (which is used by sshfs) is a rather old and clunky protocol. Which version of sshfs is that? I'm curious because I've once submitted a patch that speeds up READDIR by several orders of magnitude... does it have the option -o sync_readdir?

@cowai

This comment has been minimized.

Copy link

cowai commented Sep 9, 2015

$ sshfs -V
SSHFS version 2.5
FUSE library version: 2.9.4
fusermount version: 2.9.4
using FUSE kernel interface version 7.19

I am on archlinux connecting a debian 7 server.
I tried with -o sync_readdir, but the command took a minute longer with it.

@fd0

This comment has been minimized.

Copy link
Member Author

fd0 commented Sep 10, 2015

Ok, then you have a version with my patch included. -o sync_readdir enables the old behavior and is expected to be slower.

@yatesco

This comment has been minimized.

Copy link

yatesco commented Jan 8, 2016

For me, I wouldn't be interested in this and would be concerned about the added complexity. It feels somewhat like re-inventing the wheel given how many excellent tools there are already in this space.

The approach I use is that the central backup server contains a copy of the data to be backed up, runs restic on that backup to a local destination and then rsyncs the repo offsite.

To get the data to the backup server I either use cron/rsync or backuppc (brilliant tool BTW) with a regular dump to the local FS.

Sure - this requires 3 times the amount of disk space (once for the data on the client machine, once for the source copy on the central server and then the local restic repo) but:

  • disk space is cheap, really cheap
  • built in redundancy
  • restorations are really quick due to the local repo

So for me, no, I wouldn't use this. Ironically, one of the reasons I am considering restic instead of borg is because it has no requirements on the remote server (initially I was going to have restic push to an sftp destination rather than local rsync) and yes, I can do the same local rsync with borg.

@dimsua

This comment has been minimized.

Copy link

dimsua commented Feb 4, 2016

+1, that's exactly my use case. We run backup centralised from backup server.

@rubenv

This comment has been minimized.

Copy link
Contributor

rubenv commented Feb 6, 2016

This feature would allow replacing BackupPC with restic. It's a very real use-case!

@yatesco

This comment has been minimized.

Copy link

yatesco commented Feb 7, 2016

Hmmm, I wouldn't say it completely replaces BackupPC:

  • clients need rsync (Linux) or smbfs (Windows) which are easier to install than restic
  • BackupPC UI is really quite nice
  • remote in-place restore

restic rocks, and its delta syncing capability is far better than BackupPC's hard links, but at the moment I don't think it is feature-compatible with BackupPC.

Don't get me wrong, I think restic is absolutely fantastic, and I use it at home and at work, but I see them as filling different purposes. I do think restic could easily surpass BackupPC but it currently needs a lot more hand-holding to do that.

Just my 2 cents :-)>

@rubenv

This comment has been minimized.

Copy link
Contributor

rubenv commented Feb 7, 2016

@yatesco obviously not a full replacement from day one, but it is the key feature needed to do so. The rest is UI, scheduling and management, which can be built on top of restic.

As for rsync and smb: you'll never get the efficiency of restic when relying on those. Also, the whole point of "difficult to install" is moot with restic: it's a single Go binary. Ideally restic could just copy itself over to the remote server and start the backup. Just set up SSH access and you're good to go.

@yatesco

This comment has been minimized.

Copy link

yatesco commented Feb 8, 2016

Hi @rubenv - I agree - restic's dedup is great. However, "UI, scheduling and management" is very non-trivial, any go library is harder to install on windows than SMB and writing cross-platform code needs more than just a cross-platform compiler - restic itself has issues running on Windows.

As I said, I think restic is great, actually really great, but it is a little disingenuous comparing a new project (with amazing capability already) against the tried and true and much more feature complete and Windows friendly BackupPC.

And, to repeat, I don't have a bad thing to say about restic, I simply take issue with the apples/oranges comparison.

The only reason I care is because I think BackupPC is a rock star and I wanted to make sure we are being fair to them as well and ironically I felt it was unfair to compare a pretty new project with the established BackupPC.

I am not sure this many words is worth the very minor quibble I had against your assertion that this feature alone makes it a BackupPC replacement, and on hindsight I am not sure this is adding much so peace, let's shake-hands and part as gentlemen :-).

@iowelinux

This comment has been minimized.

Copy link

iowelinux commented Feb 10, 2016

It would be a brilliant feature
For instance I do really need it and I believe a lot of people will be grateful for it

@fd0

This comment has been minimized.

Copy link
Member Author

fd0 commented Feb 10, 2016

Ok, thanks for commenting on this issue. Please do only add a comment if you have/need a use-case that hasn't been described yet, thanks!

@messju

This comment has been minimized.

Copy link

messju commented Sep 17, 2016

Some point that was not mentioned before (i think): If the backed up machine gets compromised also the backups can easily be compromised since they are pushed by that machine. This is the main reason for me to have a pulling backup server instead of a server that pushes it's own backups somewhere.

@aparcar

This comment has been minimized.

Copy link

aparcar commented Sep 29, 2016

Would data deduplication work between multiple remote hosts?

@fd0

This comment has been minimized.

Copy link
Member Author

fd0 commented Sep 29, 2016

That depends on how exactly this feature gets (eventually) implemented.

@sanmai

This comment has been minimized.

Copy link

sanmai commented Feb 4, 2017

I really wish restic had a feature similar to serve and append-only borg has.

This is useful for scenarios where multiple machines back up to a central backup server using borg serve, since a hacked machine cannot delete backups permanently.

If this feature deserves a separate issue, I'll open a such.

Though pull backups are about to solve a similar problem, they're a compliance nightmare, as in my case.

@mschiff

This comment has been minimized.

Copy link

mschiff commented Sep 21, 2017

There is a workaround: You can pull backups by using 'ssh -R' to the client from the backup server and then let restic do the backup to e.g. sftp:127.0.0.1. I tested this and it works fine. I have not done any performance tests though.

For me this was important, because it is not possible for the clients to connect to the backup server via SSH but it is possible to connect from the backup server to the clients.

Example command on the server: ssh -R 127.0.0.1:20022:<backup server IP>:22 <client IP> "sudo restic -r 'sftp:127.0.0.1:/path/to/repo' -p .resticpw backup -x /"

For that to work you need:

  • a .ssh/config specifying the "Port 20022" for 127.0.0.1 on the client
  • proper known_hosts files
  • ssh pubkeys exchanged between client and server (via 127.0.0.1)
@fd0

This comment has been minimized.

Copy link
Member Author

fd0 commented Sep 21, 2017

It's also possible to use the REST server for that, instead of sftp. It now also has an option for append-only backup.

@hagenbauer

This comment has been minimized.

Copy link

hagenbauer commented Oct 2, 2017

I guess the normal use case would be that for security reason the backup server is behind a firewall or in a dedicated vlan. The remote ssh approach would allow to put a big backdoor in this security measure because it would allow to access the backup server from the client that is backed up.

So there is still a need for a "pull" model if the connection from the "backed up client" to the "backup server" is limited for a security reasons

@anarcat

This comment has been minimized.

Copy link
Contributor

anarcat commented Jan 24, 2018

my use case is slightly different from the "backuppc/bacula"-style "just fetch the files from clients directly" approach. in some backup systems I have deployed, we would have all clients push their backups to a central server, which, in effect, behaves like a shell server, with all the security complications that involves. obviously, we lock down the accounts as much as we can with ForceCommand and everything, but always consider the possibility of that host being completely compromised.

this is where pull backups come in: an offsite backup server comes in and logs in as root on the backup server, and pulls all the backups from there. that way, even if the main backup server is compromised, it has no access to the offsite server so it can't alter that final safeguard.

we use rsync or rsnapshot to do the offsites now, but it would be nice to have smarter storage than hardlinks: it would allow incredibly efficient deduplication across all servers in the farm...

and I don't mind requiring a client/server protocol here, an idea that is considered in #187: such a pull system probably can't be implemented securely and efficiently with native remote filesystem tools. we're not in plan9 here, we're in the messy world of SFTP and Samba and who knows what horrors. so requiring a pipe that can talk to a possible restic serve that could not only act as a backup server (serving and storing backup chunks) but as a backup client (serving actual files) would be quite interesting.

@pasikarkkainen

This comment has been minimized.

Copy link

pasikarkkainen commented Feb 4, 2018

Yes, pull-mode backups are needed for restic. Is there currently a method to do pull-mode backups without "double ssh" (using the ssh remote port forwarding mentioned earlier) ?

Also it's quite common to have an use-case where you want to backup multiple (non-trusted) clients to a central server, all clients to the same repository to get the full benefits of global deduplication, but still have proper restrictions in place about what each client is allowed to access. Basicly one client should only be able to access its own data, and none of the data of other hosts on the same repository. Has there been thoughts about implementing per-host security restrictions? (these restrictions are also needed for pull-mode, at least when/if it's implemented via ssh port forwarding).

@sanmai

This comment has been minimized.

Copy link

sanmai commented Feb 5, 2018

@pasikarkkainen see comments for #784

@colans

This comment has been minimized.

Copy link

colans commented Aug 28, 2018

So there is still a need for a "pull" model if the connection from the "backed up client" to the "backup server" is limited for a security reasons

I'm not convinced there's still a need for that type of security model. If someone really wants to set things up that way, should it really be Restic's job to handle it?

After reading through the comments here again, I believe all use cases are now sufficiently handled by #784 and its follow-ups. Maybe it's time we close this one?

@fd0 fd0 removed the rfc label Aug 28, 2018

@fd0

This comment has been minimized.

Copy link
Member Author

fd0 commented Aug 28, 2018

This issue is subtly different. I like to keep it and (eventually) implement it. It's a common use case (especially in "enterprise" settings) and while it's possible to do this right now with the REST server, it's too complicated.

@chris-aeviator

This comment has been minimized.

Copy link

chris-aeviator commented Jan 21, 2019

Currently trying to achieve this with

  • Having an always on raspberry pi that
  • mounts remote server folders via SSHFS
  • mounts external storage (NAS) via cifs.samba (which currently breaks restic on arm for me)
  • runs regular backup tasks via CRON/SYSTEMD

which in theory is not a complicated setup and you'll interact with restic as if everything is local

@nmeum

This comment has been minimized.

Copy link

nmeum commented Jan 22, 2019

My workaround for this issue is a script which starts a rest-server locally and uses ssh remote port forwarding. Works pretty well and is very straight forward.

#!/bin/sh
set -e

RESTIC_USER="${RESTIC_USER:-restic}"
RESTIC_PATH="${RESTIC_PATH:-/data/backups/restic-repo}"

RPORT=4242
LPORT=5432

abort() {
	echo $@ 1>&2
	exit 1
}

if [ $# -lt 2 ]; then
	abort "USAGE: ${0##*/} HOST FILES..."
elif [ ! -d "${RESTIC_PATH}" ]; then
	abort "Directory '${RESTIC_PATH}' doesn't exist"
fi

hostname="${1}"
shift

(sudo -u ${RESTIC_USER} rest-server \
	--listen localhost:${LPORT} \
	--append-only --path "${RESTIC_PATH}" >/dev/null) &
trap "sudo -u ${RESTIC_USER} kill $!" INT EXIT

ssh -t -R ${RPORT}:localhost:${LPORT} "${hostname}" \
	sudo restic -r rest:http://localhost:${RPORT} backup "$@"
@paulvt

This comment has been minimized.

Copy link

paulvt commented Jan 23, 2019

@nmeum But how to deal with credentials? Do you use this non-interactively?

@sanmai

This comment has been minimized.

Copy link

sanmai commented Feb 13, 2019

@nmeum isn't that someone else on the server can access your backups too while you do them?

Here's one reason why it is really important that no one could alter backups even in from within a "safe" environment. Proper backups should be append-only at all times.

(I'm guilty too having backup servers accessible with the same SSH key I normally use. Guess this has to change.)

@tcely

This comment has been minimized.

Copy link

tcely commented Feb 13, 2019

@sanmai If someone is on your backup server. You have many more problems.

The solution @nmeum proposed is running a script on the backup server that connects to the client (source of backups) which then sends the backup data back to the server via the rest-server with the --append-only flag.

A user on the client could send another backup, but they could not delete or overwrite existing backups with this solution.

@paulvt The typical solution is to setup SSH key(s) on the server that have access to login to the clients we are pulling from (hostname in the script).

@sanmai

This comment has been minimized.

Copy link

sanmai commented Feb 13, 2019

They still access all backup for all they want. Say, you have /etc/shadow backed up. Everyone on the server being backed-up can read that file from any old backup over the rest protocol. This is not cool. Please correct me if I'm wrong.

@tcely

This comment has been minimized.

Copy link

tcely commented Feb 13, 2019

@sanmai There is HTTP auth to address this concern. I'd suggest a modification to the script to generate a new set of credentials to be used by the ssh connection to the client. I might go about this by moving the current .htpasswd file out of the way then concatenating it with the new credential and setting up a trap to put the previous version back in place when the script ends.

From https://github.com/restic/rest-server/blob/master/README.md:

If you want to disable authentication, you must add the --no-auth flag. If this flag is not specified and the .htpasswd cannot be opened, rest-server will refuse to start.

Also, don't forget that you still need to provide a key to access the restic repository that presumably isn't available to other users running on the client machine. Setting up the SSH server to accept a new environment variable and the sudoers file to preserve an environment variable for this is often worth the extra effort.

@paulvt

This comment has been minimized.

Copy link

paulvt commented Feb 13, 2019

@tcely Of course I was aware of the SSH keys, it is also in the script. But now I glean that you also pass environment variables, supposedly also the repository key which is very important as ported out above. The latter is not in the script, so that got me curious in how you deal with the repository key. Or do you run this interactively?

@tcely

This comment has been minimized.

Copy link

tcely commented Feb 13, 2019

@paulvt Let's get some shortcuts setup here first.

I'll call the execution environment on the backup server, (A). I'll call the rest-server running in (A), (B). The server we are backing up data from will be called (C). So (A) connects to (C) over SSH with a remote tunnel and also runs the sudo restic ... command on (C). The remote end of the tunnel on (C) goes back to (B) and is encrypted along with the rest of the SSH traffic. The rest binary running as root (because of sudo) on (C) we'll call (D).

On (A), set your environment variables. Setup the .htaccess file and load your SSH private key data. If the keys are encrypted with a passphrase we can use ssh-agent or gpg-agent to handle accessing those keys. Other methods also exist for dealing with this. Vault is a fairly popular solution.

Before connecting to (C) we need a ssh_config or .ssh/config file setup to handle things like SendEnv on (A). You also need to setup (C) by putting the proper SSH public key into .ssh/authorized_keys and AcceptEnv in sshd_config to allow passing environment variables. Then on (C) you setup sudoers to keep the variables (env_keep) you are passing from (A).

With all this setup, (A) runs (B) then connects to (C) with a SSH connection and (D) starts running. The variables you had set on (A) should now be available to (D).

Setting up RESTIC_PASSWORD and RESTIC_REPOSITORY variables are the most common. Of course, there are a few more you could be using.

https://restic.readthedocs.io/en/stable/040_backup.html#environment-variables

For rest-server with HTTP auth, I'd setup RESTIC_REPOSITORY with generated credentials every time the script runs. You don't even need to care what these are if you generate them on (A) then set the RESTIC_REPOSITORY URL to include them and everything gets passed through to (D).

A setup like this is a good work around, it's not really "pulling" backups from the remote machines. The data is being "pushed" to the rest-server over a remote SSH tunnel after restic is executed on the remote machine. However, it's a much better solution than having your RESTIC_PASSWORD or other credentials hanging around on all the remote machines. It may be good enough for most use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.