Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command to copy all data to another repository #323

Open
fd0 opened this issue Oct 25, 2015 · 19 comments · May be fixed by #2606
Open

Add command to copy all data to another repository #323

fd0 opened this issue Oct 25, 2015 · 19 comments · May be fixed by #2606

Comments

@fd0
Copy link
Member

@fd0 fd0 commented Oct 25, 2015

During the discussion in #320 we discovered that functionality may be helpful to copy all data (data blobs, tree blobs, snapshots) from a repository to a new one, recreating pack files and indexes on the fly. This allows creating a new repository in a different location (e.g. moving from a local repository to an sftp-server) and using that from now on without losing any history and old snapshots.

This issues tracks the implementation of this feature and can be closed when it is implemented.

@Intensity

This comment has been minimized.

Copy link

@Intensity Intensity commented Nov 3, 2015

Is this intended to handle a one-time copy from one repository (A) to a new one (B)? Or is this meant to be more general by performing a "sync" or update of changed content between (A) and (B) since the last sync?

@fd0

This comment has been minimized.

Copy link
Member Author

@fd0 fd0 commented Nov 3, 2015

At the moment this is inteded to handle a one-time copy only, so that users can migrate to a different repository in a different location, or with a new master key.

@witeshadow

This comment has been minimized.

Copy link

@witeshadow witeshadow commented Dec 13, 2016

Given a slow internet connection, I would like the possibility to backup to s3 and another location as efficiently as possible.

@middelink

This comment has been minimized.

Copy link
Member

@middelink middelink commented Jun 30, 2017

@witeshadow I'm not sure how that can be done efficiently, as the data is encrypted in repo A with master A', and needs to do to repo B with a different masterkey B`. We need to read in all the data, decrypt with A`, encrypt with B` and write out. There is no way to optimize this for slow bandwidth. Its gonna hurt...

the only optimization I can think is having a selection criteria on the source repo A, by using the host, path and tags filters so you don't have to copy all. However, that depends on your use case.

@mholt

This comment has been minimized.

Copy link
Contributor

@mholt mholt commented Apr 9, 2018

@fd0 I just wanted to add my vote for this feature request. Anything I can do to make it happen?

@fd0

This comment has been minimized.

Copy link
Member Author

@fd0 fd0 commented Apr 9, 2018

You could implement it... The functionality itself is not hard to do, configuring the two backends is the hard thing. We don't support accessing more than one backend (e.g. there's only one $B2_ACCOUNT_ID)... so I think this feature depends on a proper config file (see #16).

Let's say we have two repos, A and B, and you'd like to sync A->B so that after the process is finished, the set of blobs (and snapshots) in B is a superset of the set of blobs in A.

So, you open both repos and load the index files for each one. Then you iterate over the index of A, for each blob checking if the blob is also contained in B. If this is true, move on to the next. If it's false, download, decrypt, encrypt and upload it to B.

Last is copying the snapshot files over. For each snapshot file in A, decrypt the file, encrypt it again for B, store it there and it's done.

As I said, the technical implementation is rather easy :)

@mholt

This comment has been minimized.

Copy link
Contributor

@mholt mholt commented Apr 9, 2018

Great! Thanks for the tips. I have this itch, so I will see if I can make time to scratch it -- but for the short-term I will have to go without this restic merge feature. If someone gets to it before I do, that's fine -- or I'll circle back around to this eventually!

@middelink

This comment has been minimized.

Copy link
Member

@middelink middelink commented Apr 12, 2018

I think I have the implemented already... /me scratches head and looks for it...
... https://github.com/middelink/restic/tree/fix-323
I need to check if it still compiles though, that branch is 228 commits behind ...

@matthijskooijman

This comment has been minimized.

Copy link

@matthijskooijman matthijskooijman commented Aug 27, 2018

It might be useful to allow not only a full copy, but also a subset of snapshots. This would support a usecase suggested by #1910 (backup to a primary repo often, and from there backup to offsite/slower/more expensive storage less often) and, I think, would not be a lot harder to implement than a full copy. Might be a future addition, though :-)

@sergeevabc

This comment has been minimized.

Copy link

@sergeevabc sergeevabc commented Nov 21, 2018

Err… Any news for mere users without dev skills to compile and try out @middelink’s suggestion?

@klmitch

This comment has been minimized.

Copy link

@klmitch klmitch commented Jan 23, 2019

This is mostly a "me too" comment, but I'd like to have the ability to copy only specific snapshots from one repo to another, rather than a "copy-all" or "sync" semantic; e.g., make daily backups to local storage, then once a week copy only the most recent daily to an s3 bucket, etc.

@middelink

This comment has been minimized.

Copy link
Member

@middelink middelink commented Jan 24, 2019

Well, then you are in luck, my copy cmd takes one or more snapshot ids. In fact copy-all is not something it does. You would have to list your snapshot ids first and then concatenate them on the "restic copy" cmdline. As I see this as a degenerate use-case, I'm good with it.

@Fjodor42

This comment has been minimized.

Copy link

@Fjodor42 Fjodor42 commented Jan 24, 2019

Without delving too deep into this, perhaps some discussions with ncw/rclone could be of use...

@keesse

This comment has been minimized.

Copy link

@keesse keesse commented Aug 13, 2019

I'm also interested in the merge/copy functionality, I have a repository on an USB-stick I would like to merge/copy to my central repository (same passwords).
Any news on this?

@theoretical2019

This comment has been minimized.

Copy link

@theoretical2019 theoretical2019 commented Sep 23, 2019

Looks like the fork branch was updated to master, but there's not yet a PR for it.

@middelink Is your code finished / mergeable? If not, what still needs to be done? This is a feature I really want :)

@middelink

This comment has been minimized.

Copy link
Member

@middelink middelink commented Sep 24, 2019

@theoretical2019 The code itself is finished, but each time I sit down to create an official PR, I keep finding things I need to do before it's ready. Like documentation, like a unreleased/changelog...
Oh, and tests! Did I mention tests? It needs tests :P

@seqizz

This comment has been minimized.

Copy link

@seqizz seqizz commented Jan 29, 2020

@middelink Fyi, I have tested your branch by rebasing to upstream master and it works pretty good. It created a new snapshot with same host, tags and date 👍
Waiting for PR 🎉

Now with such feature, I can create a secondary repository, which is used by the clients only when the first repository is locked for maintenance (e.g. prune). And prune task can trigger a copy from the secondary after it finishes, so no missing backups, hence zero downtime on backup service.

@rawtaz

This comment has been minimized.

Copy link
Contributor

@rawtaz rawtaz commented Feb 26, 2020

@middelink Would you be so kind as to create a PR of your code? When doing so, please also allow edits from maintainers - this way, we can help you with the changelog, documentation and so on.

The important thing is that we get a base PR to work on. I'd love to get your great work moving, and so would others I think :) Let me know if you need any help creating the PR!

@middelink

This comment has been minimized.

Copy link
Member

@middelink middelink commented Feb 27, 2020

@rawtaz Sure. Let me sync up and all that stuff. For some reason I have not found the time to do so earlier, but it looks like I have some time now.

@middelink middelink linked a pull request that will close this issue Feb 27, 2020
5 of 7 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

You can’t perform that action at this time.