Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The automatic parent selection of cmd_backup is not smart enough. #856

Closed
middelink opened this issue Mar 5, 2017 · 4 comments
Closed
Labels
type: feature enhancement improving existing features

Comments

@middelink
Copy link
Member

middelink commented Mar 5, 2017

Currently the code sets the current hostname as default. This is then used in FindLatestSnapshot to discover the latest snapshot for that host.

However, what if I make 2 snapshots for a host, because I have different retention policies for them? E.g. each night the host makes backups of /etc and of /home. The next night backup runs for /etc again and picks the /home backup as parent, as it is the most recent one. Consequently the /home backup picks the just made /etc as parent...

So none of this would be especially worrisome as restic would still dedup files at file and blob level, but as you (@fd0) explained so eloquently in in #652, picking the wrong parentID will result in lots of compute cycles as it cannot determine if the file has changed upfront.

So, I think the way forward would be to create a FindOptimalParentSnapshot(repo, tags, paths, host) function, which uses host as a hard constraint, but the tag and paths ones as soft constraints. Bonus points if it would notice supersets. If I created a backup of /etc,/home and later on do a backup of just /home, it could pick the /etc,/home backup as it's parent. (unless there is a more recent single /home of course). Then of course making a new backup of /etc,/home becomes hairy. what do we pick, the most recent one of /home or the most recent one of /etc,/home ? My bet is on the later as they will have the most overlap.

@sanmai
Copy link

sanmai commented Mar 5, 2017

You can set your own hostname for each and every backup, if that helps.

@fd0
Copy link
Member

fd0 commented Mar 5, 2017

Good idea for improving parent snapshot detection!

Unfortunately I think you've got a tiny bit wrong:

However, what if I make 2 snapshots for a host, because I have different retention policies for them? E.g. each night the host makes backups of /etc and of /home. The next night backup runs for /etc again and picks the /home backup as parent, as it is the most recent one. Consequently the /home backup picks the just made /etc as parent...

Suppose I have a backup of /home and a separate, later backup of /etc on the same host. On the next backup run of /home, restic will find the first snapshot (which just contains /home):

Is there a bug that I'm not aware of?

@fd0 fd0 added type: feature enhancement improving existing features state: need feedback waiting for feedback, e.g. from the submitter labels Mar 5, 2017
@middelink
Copy link
Member Author

middelink commented Mar 5, 2017

Aww, shoot. I completely missed that. <ponders> ...but there was something wrong with the code when I went over it yesterday...

Ah. its SamePath() which threw me off. Say I have 2 existing backups, /,/boot and a newer backup of / (of the same host; who knows why ^^). Then when I make a new backup of /,/home, the one with / will be it's parent.

SamePaths() is not really checking if the paths are exactly the same, but rather a subset. (expected="/", actual="/,/boot") will return true. And since FindLatestSnapshot picks the most recent matching backup. Bingo, problem.

PS. Why does that function not take a Snapshot receiver, like HasTags? All 4 use-cases take sn.Paths as first argument

@fd0
Copy link
Member

fd0 commented Mar 5, 2017

Oh, interesting. SamePaths() does actually only test whether actual is a subset of expected. That's... unexpected (from the function's name).

And we can certainly make SamePaths a function of Snapshot.

@fd0 fd0 removed the state: need feedback waiting for feedback, e.g. from the submitter label Mar 5, 2017
@fd0 fd0 closed this as completed in #857 Mar 6, 2017
@fd0 fd0 mentioned this issue Jun 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature enhancement improving existing features
Projects
None yet
Development

No branches or pull requests

3 participants