-
Notifications
You must be signed in to change notification settings - Fork 1.7k
backup: Add options --set-path and --set-path-from #3200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There has already been some discussion on such an option in #2714. |
600ad35 to
8679128
Compare
Thanks for the hint! I added this to in the description and adapted the changelog file. |
chrahunt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some hints in the docs for users would be helpful. I think the including files section would be a good place to mention it, along with an explanation that if it is not set then the individual paths will be included in the snapshot metadata, which can have negative impacts for users specifying large numbers of files.
574256c to
f81d412
Compare
|
I added some hints in the docu and realized that an extra options |
|
Also added some checks for the new flags in combination with stdin, analog to the checks of |
|
About |
|
Thanks a lot for this ! I was playing with this PR today and although
I think that we should expect the same final state either doing it with normal backup or using the combination of Roberto. |
|
@robvalca Thanks for your feedback! |
|
Hi @aawsome , thanks for the quick reply! I only have that single path in the txt. I thought that there would be a merge with the files on the list and the previous snapshot of the path specified by |
|
I very much look forward to seeing this merged! What would be extra useful would be the ability to also change the path of existing snapshots so as not to have to do a full backup of everything after this option is in use. My use case is backing up a series of backups created with dirvish (for various reasons we can't use restic to directly back up the hosts), so they all end up in /var/backups/dirvish///tree. I'm making the path consistent between backups by bind mounting the .../tree directory but this feature would completely remove the need for that complexity and allow me to directly back up the directory without having to bind mount it, as well as making restoration on to the server a whole lot easier, but then the path would be inconsistent with all of the existing months of backups I've got in the restic repository. |
|
Another use case not described above is when using filesystem snapshots (eg, with btrfs or zfs) as the source for the restic backup but wanting the restic backup path to clearly represent the actual data source. Using the filesystem snapshot makes it much more likely to have a consistent backup of things like database files. I also envision using the --set-path feature to write a utility that can take a list of local filesystem snapshots and ensure that they are all replicated in restic (including correct creation timestamps and parent relationship) as that would allow simple interfacing of an existing local backup system with its creation and expiry schedules and the more flexible off-site restic backup. |
|
@amuckart I agree that changing the path would be a useful feature. But maybe that could be left for another PR (so this one doesn't get held up)? |
|
I would also like to leave the scope of this PR as it is. Can we open a new issue to discuss this changing of trees in the repo? I think it's not just changing the tree names (which should be pretty simple) but the syntax how to specify which path maps to which tree. I can imagine that there are many pitfalls which should be discussed first. |
|
👍 for this proposal. My use-case is for a selective backup that I cannot perform with an exclusion-list :
With an inclusion list (files-from) :
I read the patch and it seems sane, but I surely does not have enough experience on restic codebase to give an informed opinion. Is there anything I can do to help ? (I'm building and trying this patch on a working repository ; I'll give some feedback on this). |
|
Working with my repository:
|
@hamishcoleman This is exactly what I was trying to do yesterday and realized it didn't exist. Anxiously awaiting the approval of this PR. I'll probably build 0.12.0 from source with this patch applied just so I can use this functionality! This is going to make backing up ZFS snapshots clean. |
|
@lpulley I have worked around it for the moment by making my backup script take a temporary snap of the real snap but using a single stable name - then destroying the temporary snap after restic has backed things up - which is working well. |
* restic_list_commands added; if present, stdout from restic_list_commands are used to populate a file list; this file list is used as files-from restic argument. restic_src is still used to set snapshot's paths attribute. This needs a custom restic version (restic/restic#3200) * restic_rclone_remotes allow to produce an enhanced rclone configuration with multiple remote. Useful to perform operation on multiple remotes
chrahunt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 minor comments, otherwise this LGTM.
|
Is this just awaiting review? |
From my side: yes |
|
is updated and rebast to current master. |
| if len(opts.Paths) > 0 { | ||
| return errors.Fatal("--stdin and --set-path cannot be used together") | ||
| } | ||
| if len(opts.PathsFrom) > 0 { | ||
| return errors.Fatal("--stdin and --set-paths-from cannot be used together") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is at conflict with the deprecation of the --stdin-filename option at R120.
If --set-path/--set-paths-from are supposed to replace --stdin-filename, then using --stdin together with them must be possible.
|
I've been using this PR on top of https://github.com/flokli/borg2restic to migrate from a mounted borg repo to restic. I can confirm it still does what it's supposed to do on |
|
Reading through this pull request, the two remaining blockers appear to be:
|
There has been no feedback regarding the actual design of this PR yet from the core devs. The PR is by now relatively far up in my review queue, but it will still take some more time to get there. From a quick look, my main criticism is that it allows setting arbitrary paths which don't reflect the structure within a snapshot in any way. What's the use case for that? (To remove a prefix from all paths, I'd rather prefer something like |
|
I use that behavior from switching prefixes from Resulting snapshots: |
I create lists of files to backup mostly dynamically and use
|
|
looks like findParentSnapshot gets called with the unoverriden path, though; not sure if that's intentional EDIT: never mind, it'd appear that the change got lost in bitrot during a rebase |
|
I thought about this a bit. Actually I don't like this idea not that much as it allows to set paths which do not correspond to the trees within the snapshot. So, I'm closing this PR hoping someone will find time to implement a better solution. |
|
I think there is a clear need for this type of functionality. Unfortunately, I suspect that currently proposed solutions don't really address the underlying issue. They will provide solutions to many existing use cases, but as @aawsome and @MichaelEischer have noted, it does introduce other problems. I'd like to propose a radically different approach that I think will address the underlying issue itself. Unfortunately what I propose will almost certainly require changes to the repository format, necessitating a new repo version. Thus, I understand adoption of this approach will need to meet a high bar in terms of its value proposition. What follows is an explanation of the underlying issue as I understand it, an explanation of the new approach and it's benefits, examples of how it would be used, and my limited understanding of potential obstacles or issues, along with an invitation for critique and alternate suggestions. I accept that I may have some things wrong - happy to be (respectfully) corrected. But first, I would like to say that Restic is truly remarkable technology. My sincere congrats to the creators on an amazing tool. And more broadly, well done to all the contributors who have made Restic what it is today. My contribution in this post, is well-meaning and constructive. Restic provides a repository that can be shared among multiple hosts. Which of course is really neat. To identify which backup snapshots belong to which hosts, the hostname, by default is attached to each backup snapshot. What is backed up in a snapshot on that host, is recorded by one or more absolute paths. Unlike the host, which is but a simple text label, paths have intrinsic meaning. And a snapshot can comprise multiple paths. A backup is a snapshot in time of data stored on a particular host, at one or more particular absolute paths on that host. For the purpose of this writing, I am defining a backup 'set' as all snapshots in the repository for the combination of absolute paths and host. This unit is important for how we identify files we wish to restore in the dimensions of host, path location and time. It is also important for subsequent backups that seek to identify their parent backup snapshot (i.e. --parent). This triangulated identification of backup snapshot sets is where I see a problem. The flaw in this strategy I think is that Restic tightly couples the identification of backup snapshot sets to absolute paths on the host virtual filesystem. Or on windows, the drive letters. The core issue is that these absolute paths have their own meaning, but they can also change. The virtual filesystem comprises separate logical and/or physical storage devices attached at different path locations. In the simple case of windows, separate storage attached to different drive letters. Filesystem boundaries are significant. Restic already has a --one-file-system option that recognises this importance. Filesystem snapshots are a clear example. And while the virtual filesystem, using fstab, has pre-determined mount points, there are exceptions. Notably attachment of external storage devices. Linux and macOS use filesystem volume labels often mounting under /var/media and /Volumes respectively, which can obviously collide. While Windows can assign arbitrary drive letters to attached storage devices, influenced by the order of attachment. Working around this variability by fudging absolute paths to correctly align with backup snapshot sets is complicated and risky for mere humans - it's easy to bugger it up! I suspect it is also making it difficult to implement some requested features that relate to paths, such as path translations because of how it is tethered to identifying backup snapshot sets. Hope I have understood things correctly here. What I propose is to decouple the paths as part of the identifier for a backup snapshot set. Instead, uniquely identify backup sets by a user-defined label alongside the host, but still separate to tags. So the host/label combination is a unique identifier for backup snapshot sets instead of host/paths. The labels themselves could be arbitrary, and can be chosen meaningfully by the data owner. Much like a volume/filesystem label. But they would be abstract - so they aren't tied to diverse semantics associated with paths. Like arbitrary drive letters selected by the OS, or absolute virtual filesystem paths. This concept makes it portable. From a technical standpoint, which paths are given for the backup at that host/label combination doesn't matter as Restic clearly knows which previous --parent to use. As Restic currently stands, relative paths can be given for a backup, but Restic still resolves them back to an absolute path for the backup to identify its parent. So this change would also mean that all backup paths can be considered relative, without conflation of the backup snapshot set identity itself. I anticipate this opens up much easier implementation strategies for path translations during backup, but also retrospective changes to existing backup snapshot sets, because the paths are no longer tightly coupled to the backup snapshot set identity. As a concrete example, the tar command's --strip-components --transform --absolute-names I suspect would be much easier to implement. It also simplifies the --stdin option because you identify the backup with the label, rather than an additional and specific --stdin-filename parameter. So it removes another special case. From a usability standpoint, this change also allows Restic to behave with similar semantics to other well established backup tools, such as tar, cpio, zip, etc. One of things I struggled with when first using Restic was the effective reliance on absolute paths. Backup utilities that I am familiar with all use semantics that allow for relative paths. Another consideration is that in many circumstances, restoring from backups can be a very stressful event. Turning to backups generally means Plan A & B failed - or there was no Plan A. :) So investing time and effort into the backup strategy in exchange for keeping the restore procedure simple can reduce cognitive load during an emergency situation, limit potential human error, reduce stress, and expedite the recovery. So while you can rename paths after restoring from backups, I think it is better to design your backups so that restores can be performed with minimal post processing. This means having paths appropriately captured during the backup phase. And performing backups per filesystem is a common strategy, because that is how restores are often reconstructed from a hardware failure. This is my experience anyway. But I am old. :) So how might this look from a usage perspective? If you perform a tar backup for instance, the unique identifier for the destination backup is the filename (-f option) or stdout. The selector for the source can be provided using relative or absolute paths. For the existing traditional case as an example, a backup of /home and /data (i.e. all data files) on a host, could be: Restic, could look like the following, where 'data' is the backup label: These commands would store home and data has part of the backup file paths. Alternatively, you could backup filesystems separately and relative to the mount points. So for just /home And the equivalent for Restic could be: If you snapshot /home and mount the snapshot as /mnt/home then tar and Restic could be: If you snapshot multiple filesystems and mount the snapshots for a full system backup (without mucking around with chroot): For @ArsenArsen use case with zfs snapshots e.g. For @aawsome use case of backups performed on the same physical volume, being attached to cross-platform hosts: Example usage with stdin - Backup for a database dump: So what are the issues? Here, I can contribute little as I don't have enough skin in the game. So I am going to assume there are plenty more issues, some substantial with what I propose. There is also a pretty good chance I've made mistakes with some of the facts I have used to underpin my arguments. And to this extent, I accept that what I propose may not be viable because the benefits don't outweigh the obstacles (combined). Potential obstacles include (but not limited to):
So what issues have I missed? And what mistakes have I made or factors I have overlooked? Do you agree that this would make Restic easier to use? Does it create other issues? Or does it not help or make things worse? If this is worth exploring further, what other ideas might strengthen this concept? If you made it this far, thanks for reading. :) Damo. |
|
@damoclark I agree that using the path or paths to identify backup source location can be very misleading. For example, if you have
I however think that the discussion about how to identify what we have backup'ed is the second prio, whereas the first prio is about what intention we have to run that backup. And this may vary a lot which becomes mostly clear if we ask the questions: If things change, what should be backuped then. For example, what should be backup'ed if Now, I fully agree that saving the path(s) cannot fully reflect all situations. But I don't think that adding an additional identifier also simplifies the problem. There are already tags which can be used. And moreover there are simple settings, where the path is able to identify the backup. Moreover, the path (knowing the problems above) still is a very relevant information which helping to find candidates for the snapshots you might need for a given restore duty. Plus, I would reckon a needed additional identifier might irritate new users as they simply wouldn't know what to use and what it is good for - except if they simply use the path they provide to backup. About the absolute vs. relative paths: You are right here, but from another point of view, relative paths could be always seen as absolute paths (if you take them relative to So, I'd vote for keeping the path(s) but trying to rework where there are problems with its usage. For instance this is what I did in rustic:
|
|
Reopened by accident (mouse slip), hence closing this again. |
|
Hey @aawsome I appreciate you engaging with my idea. A few clarifications and counter-arguments follow. :)
The idea of a label isn't to add an identifier for a snapshot set. It is to replace the existing identifier, which historically is the path, or a composite of multiple paths. And for which is implicitly stated. A label would be explicit, and thus, more reliably authoritative. If it needs to change, that could be done explicitly too, and with far less complexity than a path or combination of paths. The tags are for a different purpose, in terms of classifying backup snapshots, especially in ways that could encompass different combinations of backup snapshot sets. And chiefly, a tag internally, or semantically, does not uniquely identify a backup snapshot set.
Information about what paths given to backup with the label would still need to be stored. They just wouldn't be used as the identifier for that backup set for the purposes of subsequent backups, nor identifying the
This is tricky to argue either way, because use cases and contexts can vary considerably for a backup utility. Ranging from data centres with overworked sys admins, through to home server warriors (old sys admins like me), through to hobbyists backing up their personal computer data. Still with that diversity, I'm not convinced the requirement for an identifier would be that much of an irritant, and in many cases, might instead be valued. Labels are commonplace in technology. We label just about everything. :) So I just don't see it as a barrier. But I invite further debate if I have overlooked something. Also, an explicit label for a backup, in my mind is far more meaningful than the implicit path/s alone upon which the files were mounted when the backup was taken. With the many cases where these paths can and do change, I just don't think paths are sufficiently authoritative to meaningfully identify backup snapshot sets in anything other than the most trivial of use cases. And in those cases, the label could simply be the path anyway. I'm reminded of filesystem labels being used in I didn't mention in my original message (which was already way too long), that I think Restic is also missing opportunities for further user-defined metadata. The obvious example that came to my mind was a 'notes' or 'description' field in the snapshot records. The use case for me recently was a manual backup snapshot I took of a host before I performed a substantial change on it. I wanted to write comments about why I took that backup, what it was for, and when it might be safe to forget it. I think the --as-path option is a good stop-gap solution for the meantime. But long term, by introducing the --as-path option, you allow arbitrary fudging of the paths, that remain the authoritative identifier for backup snapshot sets. I think this will create far more confusion than a label. For instance, by fudging the paths, they may collide with other existing path locations within the VFS on the same host. There is also the complexity of when paths are added or removed for a backup snapshot. Which parent do they belong to? It's all determined implicitly. These examples too me seems far more likely a cause for confusion with users than using a label. What I am proposing is a substantial change, and likely a great deal of work. So I get that it has to be worth it. |
|
@damoclark Let's move the discussion into a new issue. An already closed pull request isn't the right place. So far the suggestion sounds like slightly beefed up tags in addition to a Regarding snapshot descriptions: have a look at #2376. |
|
@damoclark For rustic I opened three issues to discuss backup label, snapshot descriptions and the relative/absolute path discussion, see the links above. Feel free to contribute to the discussion if you want to support the development of rustic. |
This is a resurrection of the deleted aawsome:backup-set-path branch from closed PR#3200 by Alex Weiss. See also Issues restic#3198,restic#2993,restic#2714... restic#3200 restic#3198
What does this PR change? What problem does it solve?
Adds an option
--set-pathtobackupwhich allows to manually set the path(s) saved in the snapshot and used for finding the parent snapshot. Also the option--set-path-fromis added to read the paths from a file.Both options are useful e.g. if the files to backup are selected by an external tool.
As
set-pathfunctionally replaces--stdin-filename, the latter is now marked as deprecated.Was the change discussed in an issue or in the forum before?
closes #2714
closes #3198
allows users to use an easy workaround for #1514 by using
--files-from-rawin combination withfd(or similar find tools) and--set-pathmaybe also closes #2246
closes #1376
closes #2092
Checklist
changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits