Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup pruning should be smarter #7

Closed
ewxrjk opened this issue Jun 14, 2014 · 6 comments
Closed

Backup pruning should be smarter #7

ewxrjk opened this issue Jun 14, 2014 · 6 comments
Assignees
Labels
Milestone

Comments

@ewxrjk
Copy link
Owner

@ewxrjk ewxrjk commented Jun 14, 2014

Currently pruning involves removing the largest contiguous chunk of backups, starting at the oldest, which is consistent with the prune-age and min-backups constraints.

This doesn't make for very good use of storage space. An alternative policy would be to thin out backups non-contiguously. For instance the last week could keep daily backups, the rest of the last month could keep weekly backups and the rest of the last year could keep monthly backups.

More generally, an interface could be defined for operators to select completely arbitrary pruning policies.

@ewxrjk ewxrjk modified the milestone: 2.0 Jun 14, 2014
@ewxrjk ewxrjk added the feature label Jun 14, 2014
@senji-rsbackup
Copy link

@senji-rsbackup senji-rsbackup commented Aug 31, 2014

I would certainly appreciate an option to have a prune-selection-hook of some kind that would allow me to produce an arbitrary pruning policy.

Also it would be nice to be able to specify pruning policies per device; e.g. I currently have one device that I keep locally at all times for easy correction of user/sysadmin errors and minor problems and a number of devices that I cycle around through an offsite cycle. The former could optimally do with a fairly long period (say a month) of daily (or more frequent!) backups kept and then very fast tail-off, maybe not keeping anything older than 6 months; whereas the offsites probably want fewer daily backups but then a much slower tail-off; having to go to an offsite to get a year old file back isn't likely to be an issue!

@senji-rsbackup
Copy link

@senji-rsbackup senji-rsbackup commented Aug 31, 2014

Additionally the current behaviour has the effect that when an offsite returns to being onsite it generally gets pruned down to "min-backups" quantity immediately because everything is older than prune-age. I can't quite put a rational explanation on why this behaviour feels wrong; but it does.

@ewxrjk
Copy link
Owner Author

@ewxrjk ewxrjk commented Oct 19, 2014

I was thinking of something along these lines:

prune-policy NAME: Name the pruning policy to use. Inherits between global/host/volume in the usual way.

prune-parameter NAME VALUE: Define a parameter for the chosen pruning policy. The name and value can be anything. Possible to set for devices as well as volumes. Volume parameters always beat device parameters with matching names.

The existing pruning configuration would be re-expressed as one possible policy and set of parameters. One or more policies would be built in.

A policy name starting / would be processed by executing the program of that name with parameters and volume information encoded in the environment. The program should emit a list of backups to be pruned with reasons. It may perform sqlite queries on the backup database to determine what to prune.

@senji-rsbackup
Copy link

@senji-rsbackup senji-rsbackup commented Nov 5, 2014

That design appears to suit my requirements,

@ewxrjk ewxrjk removed this from the 2.0 milestone Dec 28, 2014
@ewxrjk ewxrjk added this to the 3.0 milestone Apr 3, 2015
@ewxrjk
Copy link
Owner Author

@ewxrjk ewxrjk commented Aug 31, 2015

https://github.com/ewxrjk/rsbackup/tree/pruning-7 contains a just-written implementation. The degree of testing can be seen in the new test scripts.
It's slightly different from the design proposed above in at least the following ways:

  • The policy is invoked for each backup in turn, rather than being invoked once to get a list of backups to be pruned. However, it is provided a list of unpruned backup ages so it can still make decisions based on global information.
  • There's no support for per-device parameters (yet), although the device name is exposed to the policy.

Feedback welcome.

@ewxrjk
Copy link
Owner Author

@ewxrjk ewxrjk commented Sep 13, 2015

The same branch has evolved a bit:

  • pruning policies are now given all the backups on a volume at once and must identify the backups to be pruned
  • a new 'decay' policy has been added, providing along the ideas of the "non-contiguous" policy from the description

I consider this basically done, but feedback is still welcome.

@ewxrjk ewxrjk self-assigned this Sep 13, 2015
@ewxrjk ewxrjk closed this in de72fdd Sep 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants