Skip to content

Conversation

@aawsome
Copy link
Contributor

@aawsome aawsome commented Dec 29, 2020

What does this PR change? What problem does it solve?

Adds an option --set-path to backup which allows to manually set the path(s) saved in the snapshot and used for finding the parent snapshot. Also the option --set-path-from is added to read the paths from a file.

Both options are useful e.g. if the files to backup are selected by an external tool.

As set-path functionally replaces --stdin-filename, the latter is now marked as deprecated.

Was the change discussed in an issue or in the forum before?

closes #2714
closes #3198
allows users to use an easy workaround for #1514 by using --files-from-raw in combination with fd (or similar find tools) and --set-path
maybe also closes #2246
closes #1376
closes #2092

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@MichaelEischer
Copy link
Member

There has already been some discussion on such an option in #2714.

@aawsome
Copy link
Contributor Author

aawsome commented Dec 29, 2020

There has already been some discussion on such an option in #2714.

Thanks for the hint! I added this to in the description and adapted the changelog file.

Copy link

@chrahunt chrahunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some hints in the docs for users would be helpful. I think the including files section would be a good place to mention it, along with an explanation that if it is not set then the individual paths will be included in the snapshot metadata, which can have negative impacts for users specifying large numbers of files.

@aawsome aawsome force-pushed the backup-set-path branch 2 times, most recently from 574256c to f81d412 Compare January 1, 2021 12:55
@aawsome
Copy link
Contributor Author

aawsome commented Jan 1, 2021

I added some hints in the docu and realized that an extra options --set-paths-from would be also quite handy. This is now also added.

@aawsome
Copy link
Contributor Author

aawsome commented Jan 1, 2021

Also added some checks for the new flags in combination with stdin, analog to the checks of --files-from*.

@aawsome aawsome changed the title backup: Add option --set-path backup: Add options --set-path and --set-paths-from Jan 1, 2021
@aawsome
Copy link
Contributor Author

aawsome commented Jan 1, 2021

About --stdin, we might think of making --stdin-filename an alias of --set-path and mark it deprecated. Then, for --stdin the only check we need is that only one path is given.
Should I add this to this PR?

@robvalca
Copy link

robvalca commented Jan 15, 2021

Thanks a lot for this ! I was playing with this PR today and although --set-path works as expected, I'm see some inconsistencies compared to the normal backup command. This was always like that, but I don't know if this is expected also after this PR. I'll detail here what I'm doing:

  1. I do a normal initial backup:
restic_dev -r local:/tmp/test backup /opt
  1. I add a new file /opt/this_is_a_test and I add this path in a list.txt file which I source to restic using --files-from. I've also pass --set-path /opt. (Tried with and without the --parent flag but the result is the same)
restic_dev -r local:/tmp/test backup --files-from /list.txt  --set-path /opt --parent 5976257d
  1. The backup ends correctly, and in the snapshots view I see both snapshots with the same path, so this is OK.
ID        Time                 Host                 Tags        Paths
---------------------------------------------------------------------
5976257d  2021-01-15 16:29:35  cbox-restic.cern.ch              /opt
e5fa3044  2021-01-15 16:38:56  cbox-restic.cern.ch              /opt
---------------------------------------------------------------------
  1. Now, if I mount the repo and traverse to snapshots/latests I see only the file added in the last snapshot:
[16:39][root@cbox-restic (qa:box/restic/dev) restic]# cd snapshots/latest/
[16:39][root@cbox-restic (qa:box/restic/dev) latest]# ls
opt
[16:39][root@cbox-restic (qa:box/restic/dev) latest]# cd opt/
[16:39][root@cbox-restic (qa:box/restic/dev) opt]# ls -la
total 1
drwxr-xr-x. 2 root root  0 Jan 15 16:31 .
dr-xr-xr-x. 2 root root  0 Jan 15 16:38 ..
-rw-r--r--. 1 root root 13 Jan 15 16:31 this_is_a_test
  1. If I try restore the snapshot, only that file is recovered.
  2. If instead of doing the combination of --files-from,--set-path and --parent I trigger a normal backup after adding the this_is_a_test file, the contents of snapshots/latests I see the full /opt view with the recently added file.

I think that we should expect the same final state either doing it with normal backup or using the combination of --files-from and --set-path, what do you think?

Roberto.

@aawsome
Copy link
Contributor Author

aawsome commented Jan 15, 2021

@robvalca Thanks for your feedback!
Is in your /list.txt only the line /opt/this_is_a_test or are there also the other files?
As you write "This was always like this", I guess so. In this case, yes then the new snapshot should only contain this one file.
Note that in this case, there is no sense in using a parent as there are no "old" files which should be in the new snapshot.

@robvalca
Copy link

Hi @aawsome , thanks for the quick reply! I only have that single path in the txt. I thought that there would be a merge with the files on the list and the previous snapshot of the path specified by --set-path. In any case, with this PR the internal view of the snapshots makes more sense. I'm looking forward to have this merged. Thanks!

@amuckart
Copy link

I very much look forward to seeing this merged!

What would be extra useful would be the ability to also change the path of existing snapshots so as not to have to do a full backup of everything after this option is in use.

My use case is backing up a series of backups created with dirvish (for various reasons we can't use restic to directly back up the hosts), so they all end up in /var/backups/dirvish///tree. I'm making the path consistent between backups by bind mounting the .../tree directory but this feature would completely remove the need for that complexity and allow me to directly back up the directory without having to bind mount it, as well as making restoration on to the server a whole lot easier, but then the path would be inconsistent with all of the existing months of backups I've got in the restic repository.

@hamishcoleman
Copy link

Another use case not described above is when using filesystem snapshots (eg, with btrfs or zfs) as the source for the restic backup but wanting the restic backup path to clearly represent the actual data source. Using the filesystem snapshot makes it much more likely to have a consistent backup of things like database files.

I also envision using the --set-path feature to write a utility that can take a list of local filesystem snapshots and ensure that they are all replicated in restic (including correct creation timestamps and parent relationship) as that would allow simple interfacing of an existing local backup system with its creation and expiry schedules and the more flexible off-site restic backup.

@wren
Copy link

wren commented Mar 2, 2021

@amuckart I agree that changing the path would be a useful feature. But maybe that could be left for another PR (so this one doesn't get held up)?

@aawsome
Copy link
Contributor Author

aawsome commented Mar 4, 2021

I would also like to leave the scope of this PR as it is. Can we open a new issue to discuss this changing of trees in the repo? I think it's not just changing the tree names (which should be pretty simple) but the syntax how to specify which path maps to which tree. I can imagine that there are many pitfalls which should be discussed first.

@lalmeras
Copy link

lalmeras commented Mar 24, 2021

👍 for this proposal.

My use-case is for a selective backup that I cannot perform with an exclusion-list :

  • >100.000 files to exclude and growing, even with optimization done in 0.11.0 and 0.12.0, scanning file takes 2h30
  • no general glob / regexp pattern applicable to reduce exclusion list

With an inclusion list (files-from) :

  • I don't detect myself the incorrect parent lookup behavior ; but I encounter a x2 time for my first snapshot with files-from
  • I detect that snapshot list would be an issue, with the growing paths attribute

I read the patch and it seems sane, but I surely does not have enough experience on restic codebase to give an informed opinion.

Is there anything I can do to help ? (I'm building and trying this patch on a working repository ; I'll give some feedback on this).

@lalmeras
Copy link

Working with my repository:

  • parent snapshot is correctly detected; cpu and io seems to confirm that useless re-read are not done
  • backup elapsed time is consistent with timings on a similar tree that does not need include/exclude tricks
  • snapshots list display the expected paths
  • backup restore give me back the expected files

@lpulley
Copy link

lpulley commented Mar 25, 2021

using filesystem snapshots (eg, with btrfs or zfs) as the source for the restic backup but wanting the restic backup path to clearly represent the actual data source

@hamishcoleman This is exactly what I was trying to do yesterday and realized it didn't exist. Anxiously awaiting the approval of this PR. I'll probably build 0.12.0 from source with this patch applied just so I can use this functionality!

This is going to make backing up ZFS snapshots clean.

@hamishcoleman
Copy link

@lpulley I have worked around it for the moment by making my backup script take a temporary snap of the real snap but using a single stable name - then destroying the temporary snap after restic has backed things up - which is working well.

lalmeras added a commit to powo-roles/powo.restic that referenced this pull request Mar 27, 2021
* restic_list_commands added; if present, stdout from
  restic_list_commands are used to populate a file list; this file list
  is used as files-from restic argument. restic_src is still used to set
  snapshot's paths attribute. This needs a custom restic version
  (restic/restic#3200)
* restic_rclone_remotes allow to produce an enhanced rclone
  configuration with multiple remote. Useful to perform operation on
  multiple remotes
Copy link

@chrahunt chrahunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 minor comments, otherwise this LGTM.

@lpulley
Copy link

lpulley commented Apr 27, 2021

Is this just awaiting review?

@aawsome
Copy link
Contributor Author

aawsome commented Apr 28, 2021

Is this just awaiting review?

From my side: yes

@aawsome
Copy link
Contributor Author

aawsome commented Jan 16, 2022

is updated and rebast to current master.
I also marked the flag --stdin-filename as deprecated as this can be now done by --set-path.

Comment on lines +283 to +288
if len(opts.Paths) > 0 {
return errors.Fatal("--stdin and --set-path cannot be used together")
}
if len(opts.PathsFrom) > 0 {
return errors.Fatal("--stdin and --set-paths-from cannot be used together")
}
Copy link

@rsnitsch rsnitsch Mar 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at conflict with the deprecation of the --stdin-filename option at R120.

If --set-path/--set-paths-from are supposed to replace --stdin-filename, then using --stdin together with them must be possible.

@MichaelEischer MichaelEischer linked an issue Aug 21, 2022 that may be closed by this pull request
@flokli
Copy link

flokli commented Aug 27, 2022

I've been using this PR on top of https://github.com/flokli/borg2restic to migrate from a mounted borg repo to restic. I can confirm it still does what it's supposed to do on 0.14.0 just fine (only needs a minimal rebase).

@Hurricos
Copy link

Reading through this pull request, the two remaining blockers appear to be:

@MichaelEischer
Copy link
Member

Reading through this pull request, the two remaining blockers appear to be:

There has been no feedback regarding the actual design of this PR yet from the core devs. The PR is by now relatively far up in my review queue, but it will still take some more time to get there. From a quick look, my main criticism is that it allows setting arbitrary paths which don't reflect the structure within a snapshot in any way. What's the use case for that? (To remove a prefix from all paths, I'd rather prefer something like --strip-prefix /prefix/path)

@ArsenArsen
Copy link

I use that behavior from switching prefixes from /home/.zfs/snapshot/restic2022-08-20 to simply /home.

Resulting snapshots:

[c] ~$ restic snapshots 
repository f84e9c1d opened successfully, password is correct
found 3 old cache directories in /home/arsen/.cache/restic, run `restic cache --cleanup` to remove them
ID        Time                 Host              Tags        Paths
------------------------------------------------------------------
758dbc14  2022-02-05 13:41:43  bstg.arsen.local              /home
010297d2  2022-02-07 03:00:01  bstg.arsen.local              /home
...

@DRON-666
Copy link
Contributor

What's the use case for that?

I create lists of files to backup mostly dynamically and use --set-path with fake paths like A:\SOMETHING.
This solves two problems for me:

  • no need to bother with getting the snapshot ID for the --parent flag
  • logs no longer overflows with multi-megabyte lists of unnecessary paths (not all commands support the --compact flag).

@ArsenArsen
Copy link

ArsenArsen commented Sep 19, 2022

looks like findParentSnapshot gets called with the unoverriden path, though; not sure if that's intentional

EDIT: never mind, it'd appear that the change got lost in bitrot during a rebase

@aawsome
Copy link
Contributor Author

aawsome commented Nov 3, 2022

I thought about this a bit. Actually I don't like this idea not that much as it allows to set paths which do not correspond to the trees within the snapshot.
Actually a much better idea would be to exchange paths as well in the snapshot entry as well as in the tree structure, see e.g. rustic-rs/rustic#280.

So, I'm closing this PR hoping someone will find time to implement a better solution.

@aawsome aawsome closed this Nov 3, 2022
@damoclark
Copy link
Contributor

I think there is a clear need for this type of functionality.

Unfortunately, I suspect that currently proposed solutions don't really address the underlying issue. They will provide solutions to many existing use cases, but as @aawsome and @MichaelEischer have noted, it does introduce other problems.

I'd like to propose a radically different approach that I think will address the underlying issue itself. Unfortunately what I propose will almost certainly require changes to the repository format, necessitating a new repo version. Thus, I understand adoption of this approach will need to meet a high bar in terms of its value proposition. What follows is an explanation of the underlying issue as I understand it, an explanation of the new approach and it's benefits, examples of how it would be used, and my limited understanding of potential obstacles or issues, along with an invitation for critique and alternate suggestions. I accept that I may have some things wrong - happy to be (respectfully) corrected.

But first, I would like to say that Restic is truly remarkable technology. My sincere congrats to the creators on an amazing tool. And more broadly, well done to all the contributors who have made Restic what it is today. My contribution in this post, is well-meaning and constructive.

Restic provides a repository that can be shared among multiple hosts. Which of course is really neat. To identify which backup snapshots belong to which hosts, the hostname, by default is attached to each backup snapshot. What is backed up in a snapshot on that host, is recorded by one or more absolute paths. Unlike the host, which is but a simple text label, paths have intrinsic meaning. And a snapshot can comprise multiple paths.

A backup is a snapshot in time of data stored on a particular host, at one or more particular absolute paths on that host. For the purpose of this writing, I am defining a backup 'set' as all snapshots in the repository for the combination of absolute paths and host. This unit is important for how we identify files we wish to restore in the dimensions of host, path location and time. It is also important for subsequent backups that seek to identify their parent backup snapshot (i.e. --parent). This triangulated identification of backup snapshot sets is where I see a problem.

The flaw in this strategy I think is that Restic tightly couples the identification of backup snapshot sets to absolute paths on the host virtual filesystem. Or on windows, the drive letters. The core issue is that these absolute paths have their own meaning, but they can also change.

The virtual filesystem comprises separate logical and/or physical storage devices attached at different path locations. In the simple case of windows, separate storage attached to different drive letters. Filesystem boundaries are significant. Restic already has a --one-file-system option that recognises this importance. Filesystem snapshots are a clear example. And while the virtual filesystem, using fstab, has pre-determined mount points, there are exceptions. Notably attachment of external storage devices. Linux and macOS use filesystem volume labels often mounting under /var/media and /Volumes respectively, which can obviously collide. While Windows can assign arbitrary drive letters to attached storage devices, influenced by the order of attachment.

Working around this variability by fudging absolute paths to correctly align with backup snapshot sets is complicated and risky for mere humans - it's easy to bugger it up! I suspect it is also making it difficult to implement some requested features that relate to paths, such as path translations because of how it is tethered to identifying backup snapshot sets. Hope I have understood things correctly here.

What I propose is to decouple the paths as part of the identifier for a backup snapshot set. Instead, uniquely identify backup sets by a user-defined label alongside the host, but still separate to tags. So the host/label combination is a unique identifier for backup snapshot sets instead of host/paths. The labels themselves could be arbitrary, and can be chosen meaningfully by the data owner. Much like a volume/filesystem label. But they would be abstract - so they aren't tied to diverse semantics associated with paths. Like arbitrary drive letters selected by the OS, or absolute virtual filesystem paths. This concept makes it portable.

From a technical standpoint, which paths are given for the backup at that host/label combination doesn't matter as Restic clearly knows which previous --parent to use. As Restic currently stands, relative paths can be given for a backup, but Restic still resolves them back to an absolute path for the backup to identify its parent. So this change would also mean that all backup paths can be considered relative, without conflation of the backup snapshot set identity itself. I anticipate this opens up much easier implementation strategies for path translations during backup, but also retrospective changes to existing backup snapshot sets, because the paths are no longer tightly coupled to the backup snapshot set identity. As a concrete example, the tar command's --strip-components --transform --absolute-names I suspect would be much easier to implement. It also simplifies the --stdin option because you identify the backup with the label, rather than an additional and specific --stdin-filename parameter. So it removes another special case.

From a usability standpoint, this change also allows Restic to behave with similar semantics to other well established backup tools, such as tar, cpio, zip, etc. One of things I struggled with when first using Restic was the effective reliance on absolute paths. Backup utilities that I am familiar with all use semantics that allow for relative paths.

Another consideration is that in many circumstances, restoring from backups can be a very stressful event. Turning to backups generally means Plan A & B failed - or there was no Plan A. :) So investing time and effort into the backup strategy in exchange for keeping the restore procedure simple can reduce cognitive load during an emergency situation, limit potential human error, reduce stress, and expedite the recovery. So while you can rename paths after restoring from backups, I think it is better to design your backups so that restores can be performed with minimal post processing. This means having paths appropriately captured during the backup phase. And performing backups per filesystem is a common strategy, because that is how restores are often reconstructed from a hardware failure. This is my experience anyway. But I am old. :)

So how might this look from a usage perspective?

If you perform a tar backup for instance, the unique identifier for the destination backup is the filename (-f option) or stdout. The selector for the source can be provided using relative or absolute paths.

For the existing traditional case as an example, a backup of /home and /data (i.e. all data files) on a host, could be:

tar cf filename.tar /home /data
# or
cd /
tar cf filename.tar home data

Restic, could look like the following, where 'data' is the backup label:

restic backup -r repo --label data /home /data
# or
cd /
restic backup -r repo --label data home data

These commands would store home and data has part of the backup file paths. Alternatively, you could backup filesystems separately and relative to the mount points. So for just /home

tar cf filename.tar -C /home .
# or
cd /home
tar cf filename.tar .

And the equivalent for Restic could be:

restic backup -r repo --label home -C /home .
# or
cd /home
restic backup -r repo --label home .

If you snapshot /home and mount the snapshot as /mnt/home then tar and Restic could be:

tar cf filename.tar -C /mnt/home .
# or
restic backup -r repo --label home -C /mnt/home .

# or if you want to retain the /home prefix

tar cf filename.tar -C /mnt home
restic backup -r repo --label home -C /mnt home

If you snapshot multiple filesystems and mount the snapshots for a full system backup (without mucking around with chroot):

mount /dev/vg00/root_snap /mnt/backup/root
mount /dev/vg00/home_snap /mnt/backup/root/home
mount /dev/vg00/var_snap /mnt/backup/root/var
restic backup -r repo --label fullbackup -C /mnt/backup/root .
umount -R /mnt/backup/root

For @ArsenArsen use case with zfs snapshots e.g. /home/.zfs/snapshot/restic2022-08-20 :

restic backup -r repo --label home -C /home/.zfs/snapshot/restic2022-08-20 .

# if you wish to retain the home prefix, using syntax from tar

restic backup -r repo --label home --transform='s/^home\//' -C /home/.zfs/snapshot/restic2022-08-20 .

For @aawsome use case of backups performed on the same physical volume, being attached to cross-platform hosts:

# Unix-like
restic backup -r repo --label myfiles -C /var/media/aawsome/myfiles .
# Windows
restic backup -r repo --label myfiles -C X:\ .

Example usage with stdin - Backup for a database dump:

pg_dump -Fc -Z 0 -f - mydatabase | restic backup -r repo --label mydatabase --stdin

So what are the issues? Here, I can contribute little as I don't have enough skin in the game. So I am going to assume there are plenty more issues, some substantial with what I propose. There is also a pretty good chance I've made mistakes with some of the facts I have used to underpin my arguments. And to this extent, I accept that what I propose may not be viable because the benefits don't outweigh the obstacles (combined).

Potential obstacles include (but not limited to):

  • With a new mandatory parameter, script/automation changes will be necessary to use new repo format
  • could it make some existing use cases more difficult or impossible;
  • it breaks future plans of the core Restic team; or,
  • too difficult a change without breaking core philosophy of the technology (i.e. guaranteed backwards compatibility)

So what issues have I missed? And what mistakes have I made or factors I have overlooked?

Do you agree that this would make Restic easier to use? Does it create other issues? Or does it not help or make things worse?

If this is worth exploring further, what other ideas might strengthen this concept?

If you made it this far, thanks for reading. :)

Damo.

@aawsome
Copy link
Contributor Author

aawsome commented Nov 10, 2022

@damoclark I agree that using the path or paths to identify backup source location can be very misleading. For example, if you have /path/a, /path/b and path/c and want to backup the first two but omit the third, you can

  • run backup /path/a /path/b
  • run backup /path and provide exclude options to exclude /path/c
  • run backup --files-from* with a suitable file source (actually many possibilities here...)
  • may be able to run backup /path --one-file-system if /path/c is located on a different file system

I however think that the discussion about how to identify what we have backup'ed is the second prio, whereas the first prio is about what intention we have to run that backup. And this may vary a lot which becomes mostly clear if we ask the questions: If things change, what should be backuped then. For example, what should be backup'ed if /path/d is added but /path/a is removed and /path/b is renamed to /path/e? And under what circumstances do we want to (respectively have to) change the backup command (or exclude list or file list)? And when does this impact backup runs like needing to re-read the source if no parent is found?

Now, I fully agree that saving the path(s) cannot fully reflect all situations. But I don't think that adding an additional identifier also simplifies the problem. There are already tags which can be used. And moreover there are simple settings, where the path is able to identify the backup. Moreover, the path (knowing the problems above) still is a very relevant information which helping to find candidates for the snapshots you might need for a given restore duty. Plus, I would reckon a needed additional identifier might irritate new users as they simply wouldn't know what to use and what it is good for - except if they simply use the path they provide to backup.

About the absolute vs. relative paths: You are right here, but from another point of view, relative paths could be always seen as absolute paths (if you take them relative to /). In fact the tree stored within a snapshot would always look the same, so it's only about semantics how to create and handle them.

So, I'd vote for keeping the path(s) but trying to rework where there are problems with its usage.

For instance this is what I did in rustic:

  • Added more powerful include/exclude options which allows users to get results you would get by the various -from-file* options while still being in the setting of specifying just the "base" path as source.
  • The from-file* options are not supported as I think these are the root of most problems we are discussing here... (and the workarounds exist with the include/exclude options)
  • The mentioned --as-path option which IMO fully allows to tackle the absolute/relative paths problem.
  • The snapshots json files do not contain the path(s) but also the command used to generate this specific backup. This allows to identify special inculde/exclude rules.

@aawsome aawsome reopened this Nov 10, 2022
@aawsome
Copy link
Contributor Author

aawsome commented Nov 10, 2022

Reopened by accident (mouse slip), hence closing this again.

@aawsome aawsome closed this Nov 10, 2022
@aawsome aawsome deleted the backup-set-path branch November 10, 2022 05:48
@damoclark
Copy link
Contributor

Hey @aawsome I appreciate you engaging with my idea.

A few clarifications and counter-arguments follow. :)

But I don't think that adding an additional identifier also simplifies the problem. There are already tags which can be used.

The idea of a label isn't to add an identifier for a snapshot set. It is to replace the existing identifier, which historically is the path, or a composite of multiple paths. And for which is implicitly stated. A label would be explicit, and thus, more reliably authoritative. If it needs to change, that could be done explicitly too, and with far less complexity than a path or combination of paths.

The tags are for a different purpose, in terms of classifying backup snapshots, especially in ways that could encompass different combinations of backup snapshot sets. And chiefly, a tag internally, or semantically, does not uniquely identify a backup snapshot set.

And moreover there are simple settings, where the path is able to identify the backup. Moreover, the path (knowing the problems above) still is a very relevant information which helping to find candidates for the snapshots you might need for a given restore duty.

Information about what paths given to backup with the label would still need to be stored. They just wouldn't be used as the identifier for that backup set for the purposes of subsequent backups, nor identifying the --parent. At restore time, you could still search based on path. But you could also more meaningfully specify the label as well, triangulating the exact files you want.

Plus, I would reckon a needed additional identifier might irritate new users as they simply wouldn't know what to use and what it is good for - except if they simply use the path they provide to backup.

This is tricky to argue either way, because use cases and contexts can vary considerably for a backup utility. Ranging from data centres with overworked sys admins, through to home server warriors (old sys admins like me), through to hobbyists backing up their personal computer data.

Still with that diversity, I'm not convinced the requirement for an identifier would be that much of an irritant, and in many cases, might instead be valued. Labels are commonplace in technology. We label just about everything. :) So I just don't see it as a barrier. But I invite further debate if I have overlooked something.

Also, an explicit label for a backup, in my mind is far more meaningful than the implicit path/s alone upon which the files were mounted when the backup was taken. With the many cases where these paths can and do change, I just don't think paths are sufficiently authoritative to meaningfully identify backup snapshot sets in anything other than the most trivial of use cases. And in those cases, the label could simply be the path anyway. I'm reminded of filesystem labels being used in /etc/fstab to map mount locations.

I didn't mention in my original message (which was already way too long), that I think Restic is also missing opportunities for further user-defined metadata. The obvious example that came to my mind was a 'notes' or 'description' field in the snapshot records. The use case for me recently was a manual backup snapshot I took of a host before I performed a substantial change on it. I wanted to write comments about why I took that backup, what it was for, and when it might be safe to forget it.

I think the --as-path option is a good stop-gap solution for the meantime. But long term, by introducing the --as-path option, you allow arbitrary fudging of the paths, that remain the authoritative identifier for backup snapshot sets. I think this will create far more confusion than a label. For instance, by fudging the paths, they may collide with other existing path locations within the VFS on the same host. There is also the complexity of when paths are added or removed for a backup snapshot. Which parent do they belong to? It's all determined implicitly. These examples too me seems far more likely a cause for confusion with users than using a label.

What I am proposing is a substantial change, and likely a great deal of work. So I get that it has to be worth it.

@MichaelEischer
Copy link
Member

@damoclark Let's move the discussion into a new issue. An already closed pull request isn't the right place.

So far the suggestion sounds like slightly beefed up tags in addition to a --strip-prefix or -C option. But I'm not exactly sure which problem exactly you're trying to solve: identifying the parent snapshot, efficiently handling renamed mountpoints, or something else? Identifying the parent snapshot could just look at the tags instead of the paths (which would solve the parent snapshot identification problem). Removing a prefix from all backup paths would also be relatively simple to implement (could be used to completely remove the changing part of the filename).

Regarding snapshot descriptions: have a look at #2376.

@aawsome
Copy link
Contributor Author

aawsome commented Nov 13, 2022

@damoclark For rustic I opened three issues to discuss backup label, snapshot descriptions and the relative/absolute path discussion, see the links above. Feel free to contribute to the discussion if you want to support the development of rustic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet