Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check -files optimization? #266

Open
Ossssip opened this issue Nov 17, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@Ossssip
Copy link

commented Nov 17, 2017

Currently if duplicacy given check -files command, it

will download chunks and compute file hashes in memory, to make sure that all hashes match.

My impression is (just from execution times) that it's done completely independently for each snapshot, i.e. if I have two snapshots with exactly same files (or with just few new chunks), it will download and check the entire chunk set for each snapshot. Am I right?
I am wondering if it is possible, in case of checking of several snapshots, to check only altered pieces of backup for each following snapshot? It would reduce the data transfer overhead and execution time significantly.

@gilbertchen

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2017

Right, the -files option checks snapshots independently and doesn't skipped files that have been verified. It shouldn't be hard to create a map to store verified files and skip files if they can be found in this map.

I also think we need a -chunks option which would verify each chunk rather than individual files.

@thrnz

This comment has been minimized.

Copy link

commented Nov 19, 2017

The ability to run partial checks with the -files option that resume where the previous check left off would also be handy. This would be useful for large backup sets on remote destination, where a full check could then be split into smaller jobs over several weeks/months.

Perhaps something similar to what HashBackup does. Though how this might work with old revisions/chunks being pruned etc. between checks I'm not sure.

Incremental checking, for example, --inc 1d/30d, means that selftest is run every day, perhaps via a cron job, and the entire backup should be checked over a 30-day period. Each day, 1/30th of the backup is checked. The -v3, -v4, and -v5 options control the check level, and each level has its own incremental check schedule. For huge backups, it may be necessary to spread checking out over a quarter or even longer. The schedule can be changed at any time by using a different time specification.

You may also specify a download limit with incremental checking. For example, --inc 1d/30d,500MB means to limit the check to 500MB of backup data. This is useful with -v4 when cache-size-limit set. In this case, archives may have to be downloaded. Many storage services have free daily allowances for downloads, but charge for going over it. Adding a download limit ensures that incremental selftest doesn't go over the free allowance. The download limit is always honored, even if it causes a complete cycle to take longer than specified.

@TheBestPessimist

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

Duplicate: #477

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.