Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete Files from Existing Snapshot #14

Open
scoddy opened this issue Nov 15, 2014 · 41 comments

Comments

@scoddy
Copy link
Member

commented Nov 15, 2014

In cases of accidential backup of e.g. too large files, I would like to be able to delete specific files or directories (incl. recursion) from existing snapshots

@scoddy scoddy added this to the 2014-48 milestone Nov 16, 2014

@scoddy scoddy modified the milestone: 2014-48 Nov 27, 2014

@viric

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2016

That'd be really nice.

@teknico

This comment has been minimized.

Copy link

commented Mar 24, 2016

It would also allow removing sensitive data that got included unwittingly.

@zcalusic

This comment has been minimized.

Copy link
Member

commented Oct 16, 2016

This would be a great feature!

@alphapapa

This comment has been minimized.

Copy link

commented Jan 16, 2018

Any feedback from the devs on this idea? It would be very nice. For example, I just discovered that a program I build from git checkouts has been creating enormous binaries (almost 100 MB), and these have been getting backed up in my Restic backups unnecessarily. I haven't been using Restic for very long, as I'm still in a testing phase, so it's not a problem to delete the old snapshots in question. But this issue can happen quite easily, and it would be good to have long-term solutions for it, other than forgetting every snapshot.

I suppose it would be possible to write a script to restore every snapshot, delete undesired files, and re-backup the snapshot by setting the date manually, but obviously that would take a very long time. It would be great if Restic could do this natively.

Thanks.

@rawtaz

This comment has been minimized.

Copy link
Contributor

commented Jan 16, 2018

I think there are multiple valid use cases for this. Seems like a really good feature to have. I would probably use it myself at some point.

@dnnr

This comment has been minimized.

Copy link

commented Jan 18, 2018

It probably doesn't really change the implementation effort, but from an UX viewpoint, this might be done with a rather low profile by extending the backup command instead of adding an entirely new command:

restic backup [flags] FILE/DIR/SNAPSHOT [FILE/DIR/SNAPSHOT] ...

So instead of offering a command that modifies snapshots, this would allow making a new backup based on an existing snapshot ID. Deleting a file would be achieved with exclude rules.
All the documentation on restic backup could basically be "reused" (that is, almost nothing would need to be added for this new feature).

@alphapapa

This comment has been minimized.

Copy link

commented Jan 18, 2018

@dnnr See #1550 (comment)

However, I don't follow you here. Removing data from old snapshots is definitely a distinct operation and should have its own command. Something like:

restic purge --snapshots abcd1234 deadbeef --paths /path/to/file1 /path/to/file2

And --snapshots should probably accept an all keyword to operate on all snapshots (or all snapshots with the specified --tag). And the command should probably require confirmation by typing yes.

It would also be good for it to have a --patterns option, which would delete paths matching the given patterns.

purge is one possibility for the command's name. erase might also be a good choice, as well as delete. Whatever is chosen, it should make it clear that the operation permanently deletes data. This is backup software we're talking about, and any dangerous operations should be distinct, explicit, and require confirmation.

@dnnr

This comment has been minimized.

Copy link

commented Jan 18, 2018

Well, I left out the step where you'd delete the source snapshot afterwards (using forget, then maybe prune) , because I thought that was obvious.

In my opinion, doing it like this would keep the command set more orthogonal compared to adding a new command that overlaps with the functionality of existing commands. Right now, there is backup, forget and prune and they all do completely separate things. Adding a purge like you describe it, changes that. My suggestion doesn't.

@alvarolm

This comment has been minimized.

Copy link

commented Jan 18, 2018

since we are proposing one file operations it would be nice being able to rename.

@rawtaz

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2018

I agree with @alphapapa that there should be a distinct command for this type of operation. It might be purge, that's not a bad name, then again there might be other similar operations in the future, e.g. @alvarolm already suggested being able to rename files.

For that reason I think perhaps adding a rewrite command is the best alternative in this case, and make that command have e.g. --purge and --rename options, assuming the latter is relevant to implement. So the final commands would be e.g. restic -r foo rewrite --purge snap1,snap2 path1 path2 ... and restic -r foo rewrite --rename snap1,snap2 pathFrom pathTo.

That said I'm not entirely sure renaming is something that's reasonable to implement - it goes quite a long way from what a backup program is about. But sure, why not.

I don't think it's wise to have the purge stuff be part of the backup command. In one perspective, you could argue that it's fine - you are doing an operation on your backup. But with that rationale the prune and unlock and forget actions should also be part of the backup command, as they too are about maintaining stuff in your backup. I don't think that makes sense, so I think it should indeed be a separate operation/command, e.g. rewrite or purge.

@alphapapa

This comment has been minimized.

Copy link

commented Jan 18, 2018

@dnnr

Well, I left out the step where you'd delete the source snapshot afterwards (using forget, then maybe prune) , because I thought that was obvious.

It's definitely not obvious. It's also better if Restic handles that for the user, rather than the user having to keep track of which snapshot IDs have changed and need to be forgotten--which would be quite a burden if the user were rewriting all snapshots in the repo.

In my opinion, doing it like this would keep the command set more orthogonal compared to adding a new command that overlaps with the functionality of existing commands.

I don't understand what you mean. The opposite is the case. This proposed purge/delete/rewrite command does not overlap with backup at all--it deletes data from existing snapshots. It is orthogonal to existing commands.

Right now, there is backup, forget and prune and they all do completely separate things. Adding a purge like you describe it, changes that. My suggestion doesn't.

Again, no idea what you're thinking here. purge is completely separate from backup, forget, and prune:

  • backup: Creates a new snapshot of given paths.
  • forget: Removes existing snapshots.
  • prune: Garbage-collects unused blobs from forgotten snapshots.
  • purge/rewrite/whatever: Deletes files from existing snapshots.

You are proposing making the backup command operate in two modes, one of which backs up data, and the other of which would delete data.

@rawtaz Yes, rewrite is a good suggestion, because it literally rewrites existing snapshots. I'd suggest a UI like:

restic --repo REPO rewrite --snapshots abcd1234 deadbeef --delete /path/to/file1 "*.unwanted-file-extension-glob"

I recommend against using commas as separators, because it makes constructing command lines in scripts much more complicated.

@dnnr

This comment has been minimized.

Copy link

commented Jan 18, 2018

backup: Creates a new snapshot of given paths.

Well, in a sense, modifying the contents of a snapshot is creating a new snapshot (because it's not the same snapshot as before). Think git commit --amend, which creates a new commit based an existing commit. The analogy is actually pretty fitting, since this ticket seems to move rapidly towards reinventing Git.

You are proposing making the backup command operate in two modes, one of which backs up data, and the other of which would delete data.

I didn't say that. Why would it? There is forget and prune, which are perfectly fine for removing things.

@alphapapa

This comment has been minimized.

Copy link

commented Jan 19, 2018

Well, in a sense, modifying the contents of a snapshot is creating a new snapshot (because it's not the same snapshot as before). Think git commit --amend, which creates a new commit based an existing commit. The analogy is actually pretty fitting, since this ticket seems to move rapidly towards reinventing Git.

You're right. But at the same time, Restic is not git, and it's not designed to require knowledge of content-based addressing to work. Regardless of how it works under the hood, I think that, to users, the command we are proposing should be considered to modify an existing snapshot, not create a new one, therefore it should be a distinct command.

I didn't say that. Why would it?

Well, you said:

from an UX viewpoint, this might be done with a rather low profile by extending the backup command instead of adding an entirely new command

Maybe you should explain in more detail.

There is forget and prune, which are perfectly fine for removing things.

Let's be specific. forget removes snapshots, and prune removes blobs. We're proposing a command to remove files within snapshots. It should be a distinct command.

@fd0

This comment has been minimized.

Copy link
Member

commented Jan 19, 2018

I'd like to add my opinion:

I think having a way to modify snapshots in the repo is valuable, based on the feedback how many people would like to have something like this.

The command should be independent of the backup command, not only for orthogonality reasons (which is quite Go-like), but also out of practical consideration: The backup command is already complex enough so I'd like to separate the other command from it.

I don't like the name purge, because of the similarity to prune. What about change? Then we have restic backup, restic restore and restic change.

For the supported operations of the command, I've seen requests for:

  • Delete files, e.g. --delete
  • Rename files, e.g. --rename

The former is exactly what this issue (originally) is about, but are there really use cases for renaming files?

@fd0 fd0 added the feature request label Jan 19, 2018

@rawtaz

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2018

I think change sounds more like taking something out and putting something in, rather than modifying the contents of something.

Imagine the repo/backup/snapshot is a bucket. Change is more like swapping the bucket itself for something else, or taking something out of it and putting another thing in, rather than picking something in the bucket up, modifying it a bit, and putting it back.

Perhaps some native english/american person knows which is more proper :) It boils down to linguistics I think.

@fd0

This comment has been minimized.

Copy link
Member

commented Jan 19, 2018

Hm, modify then?

@rawtaz

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2018

modify is definitely better than change. So either rewrite or modify out of what's been proposed so far. Curious what others think :)

@dimejo

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2018

If this is only about deleting files, would it make sense to enhance the forget command to work with snapshots and files? Or would this be too complex?

If this new feature is about deleting and renaming (or something else) I'd vote for modify.

@rawtaz

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2018

Thanks for your input @dimejo 👍

I think that when you're renaming and/or deleting, you are not forgetting (at least not in the former case).

@pvgoran

This comment has been minimized.

Copy link

commented Jan 19, 2018

IMHO "rewrite" conveys the meaning the best.

@fd0

This comment has been minimized.

Copy link
Member

commented Jan 19, 2018

The forget command is also very complex, we won't add anything to that if we can help it ;)

@dnnr

This comment has been minimized.

Copy link

commented Jan 19, 2018

If it's gonna be separate command, calling it modify would be my favorite as well (I'd also like modify-snapshot, even though it is rather long). It's also generic enough to be an appropriate place for all kinds of modifying file operations (renaming, maybe even adding). However, I still think that anything beyond removing files smells strongly of feature creep.

By the way, I feel that restic would benefit from command categories, similar to what Git has with its plumbing commands. Right now, restic -h lists all commands in lexical order, mixing low-level commands (e.g., cat, list, which will never be needed by "normal" users) with the primary high-level commands.

@zcalusic

This comment has been minimized.

Copy link
Member

commented Jan 19, 2018

You might also consider update.

@teknico

This comment has been minimized.

Copy link

commented Jan 19, 2018

+1 for rewrite, it has a nice Orwellian ring to it. :-)

@armhold

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2018

alter
discard
evict
expel
expunge
extrude
oust
...
nuke? 😄

@scoddy

This comment has been minimized.

Copy link
Member Author

commented Jan 20, 2018

I'd like to propose a new edit command. Based on all the feedback re this issue it appears to me that we might end up with multiple actions to edit one or multiple snapshots.

For the time being it could be just something like:

$ restic edit 40dc1520 remove dir/file

In the future we could implement deletion of one file from multiple snapshots (input list of snapshot ID or date range).

Other commands under the edit context might be

  • rename to rename files and folders
  • move to correct file/dir structures that may have changed

I believe it is important that we allow these actions to be executed on one or multiple snapshots (by ID or possibly a number of dates or a range).

@rawtaz

This comment has been minimized.

Copy link
Contributor

commented Jan 20, 2018

I'm still not sure about how much restic should be able to do with backed up data. I mean, it's meant to back up data to preserve what things looked like at a certain point in time. It's not meant to be a NAS.

I especially don't see the validity in the use case of renaming and removing files. I mean, why would you change files on your local disk and then go fiddle with your backups to keep its file tree in sync with your current data. It doesn't make sense to me. Can you elaborate on that use case?

@dnnr

This comment has been minimized.

Copy link

commented Jan 20, 2018

@rawtaz
My thoughts (almost) exactly.

I'd argue the validity of removing files lies in the scenario where you discover a mistake in your exclude rules after already having made backups with those rules. So removing files basically serves as the retroactive application of exclude rules. It seems that regardless of the controversy in this thread, everybody agrees on that particular use case.

Concerning operations beyond that (i.e., renaming, adding), I share your doubts. It's feature creep and not in the scope of a backup tool, IMHO.

@alphapapa

This comment has been minimized.

Copy link

commented Jan 20, 2018

I agree: deleting files from snapshots is important, as it's very easy to accidentally backup files that one didn't intend to. This is often necessary for both security and disk-usage reasons. Having this feature could mean the difference between being able to keep old backup data or having to "throw out the baby with the bathwater."

But renaming or moving files within a snapshot is probably not a good idea. To be frank I've never heard of backup software that can do this, and it seems like a weird feature. If a user absolutely needed this, it could be implemented outside of Restic by restoring the snapshot, rearranging the files, and backing it up again with the date set explicitly (although this might become more complicated in the future when Restic starts using absolute paths).

Granted, the remove-paths-from-snapshots feature could also be implemented this way, but since it seems much more likely to be needed, I think it's reasonable for it to be included in Restic.

@fd0

This comment has been minimized.

Copy link
Member

commented Jan 20, 2018

So, thank you all for your feedback, we have a clear way forward: implement a command which allows removing files from an existing snapshot. The name of the command will be decided when we get there. We can revisit the other uses cases when the need arises.

I don't think we need more discussion here, thanks for participating!

@naggie

This comment has been minimized.

Copy link

commented Mar 24, 2018

Suggestion for the command name: restic purge.

I'm looking forward to this feature. Thanks

@Schnitzel

This comment has been minimized.

Copy link

commented Sep 16, 2018

@fd0
Any update on this feature? Would love to use it :)
We're using restic in a government environment and deletion of a single file from a backup is required for them. We could fund some of the work if needed!

@NovacomExperts

This comment has been minimized.

Copy link

commented Oct 2, 2018

I'm looking forward to this too ! I propose using something like the base structure for restic find.

restic purge [flags] PATTERN

Where you could limit the purge to host (-H) snapshots (-s) or paths (--path)

Then maybe a restic prune would afterward do the actual delete

This would be soooo helpful when a unforseen file gets backed up by error (a large video in a document folder or maybe a some confidential file) Right now, I run a restic find then delete every snapshot containing the file... This is less than desirable if the file is far in the repo (in time)

Thanks !

@fd0

This comment has been minimized.

Copy link
Member

commented Oct 3, 2018

No update, sorry. You'll get notified by subscribing to this issue when something happens.

@nullcake

This comment has been minimized.

Copy link

commented Dec 6, 2018

It sounds like most people want to be able to clone a backup's metadata, but exclude offending files - without having to restore them all in a scratch location. The idea of cloning a backup would copy metadata with the ability to remove certain pointers.

Is this the use case?

  • restic backup --exclude <something> --clone <original backup id> [new feature]
  • restic forget <original backup id>
  • restic prune

rewrite and modify could be macros to the above process.

@RPDiep

This comment has been minimized.

Copy link

commented Dec 6, 2018

For me, that would indeed suffice @nullcake

@zcalusic

This comment has been minimized.

Copy link
Member

commented Dec 6, 2018

Not too bad, @nullcake.

Though, based on my past experience, it was usually that I detect that I backup loads of worthless stuff only days or weeks later. When I have some time to investigate. What this means is that by the time I understand I need some specific --exclude, there's probably a dozen or more backups impacted.

Of course, even if any kind of cleanup is implemented based on a single backup, like you suggest, it would still be a great step forward. We, of course, know how to script. ;)

So, thumbs up. :)

@fd0

This comment has been minimized.

Copy link
Member

commented Dec 7, 2018

While this is an interesting idea, I fear that the backup command is way too complex already, and adding another "source" for a backup will complicate it even more. Also, this function would only operate on data already in the repo (only on metadata, to be precise). A separate command (e.g. purge or so) could encapsulate the functionality nicely.

@bherila

This comment has been minimized.

Copy link

commented Dec 7, 2018

CrashPlan had an interesting behavior that when a file is excluded, it is purged from all existing snapshots. That could be something to consider.

@ifelsefi

This comment has been minimized.

Copy link

commented Jan 17, 2019

This would be a great feature. Has it been added?

@mholt

This comment has been minimized.

Copy link
Contributor

commented Jan 17, 2019

Nope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.