Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write sparse files in restorer #2601

Closed
wants to merge 4 commits into from
Closed

Conversation

greatroar
Copy link
Contributor

@greatroar greatroar commented Feb 26, 2020

What is the purpose of this change? What does it change?

The capacity to write sparse files has been added to the restorer. Files aren't necessarily as sparse as they can be, since the code for detecting runs of zeros isn't exhaustive, but long runs of zeros have a good chance of being detected.

In a short test, I restored a pair of disk images and found that one only took up half the disk space that it used to. The other one was 13% smaller. Restoring it took the same wall-clock time that it took with current master.

See the commit message for some benchmark results.

Was the change discussed in an issue or in the forum before?

Closes #79. Closes #2378, which tried to do the same thing in a different way. I redid the work since the restorer was just rewritten and I came up with a different heuristic for detecting zero blocks.

Checklist

  • I have read the Contribution Guidelines
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

@greatroar greatroar force-pushed the sparsefiles branch 3 times, most recently from 5b80924 to b1f1e4f Compare February 26, 2020 21:40
This writes files by using (*os.File).Truncate, which resolves to the
truncate system call on Unix.

Compared to the naive loop,

	for _, b := range p {
		if b != 0 {
			return false
		}
	}

the optimized allZero is about 10× faster:

name       old time/op    new time/op     delta
AllZero-8    1.09ms ± 1%     0.09ms ± 1%    -92.10%  (p=0.000 n=10+10)

name       old speed      new speed       delta
AllZero-8  3.84GB/s ± 1%  48.59GB/s ± 1%  +1166.51%  (p=0.000 n=10+10)
greatroar added 2 commits February 28, 2020 17:44
The all-zero check has been replaced with skipping zero prefixes.
Despite the check not being complete, restored disk images are slightly
more sparse than before.
@greatroar
Copy link
Contributor Author

I just read @cdhowie's comment over at https://forum.restic.net/t/sparse-file-support/1264/17 and I disagree with it. That "the presence of SEEK_HOLE would indicate that some programs likely use this functionality" is true, but ordinary application should be using it only as an optimization. Any application that relies on the holes in a file being at specific locations is so tied to a particular filesystem that it cannot work with a general-purpose backup tool anyway, unless the disk image is backed up instead of the files.

Case in point: if the application's requirement is anything other than that the file is completely stored, with all the zeros written out, then it won't work with the current version of restic.

@greatroar
Copy link
Contributor Author

Also, I just double-checked that restoring an all-zeros file works with Linux writing to an NTFS partition. It does. The file is not sparse, but it has the right size and is all zeros.

@ifedorenko
Copy link
Contributor

FWIW, I am not in favour of proposed implementation. I believe restic should preserve original file "sparsiness" (which requires repository format change, unfortunately). For example, files may be intentionally preallocated with zeros to "reserve" space on filesystem to avoid running out of space when writing files with actual data.

@greatroar
Copy link
Contributor Author

That's a good point. Since a format change isn't likely to happen anytime soon, would this implementation + a flag to turn it on/off be an option as a stopgap measure?

@ifedorenko
Copy link
Contributor

This is something for @fd0 to decide, but personally I would prefer a proper solution. Repository format change to support sparse files does not have to break compatibility with older clients. Even if the format change does break older restic version, that may be acceptable too, if the format change is guarded by a feature flag.

@dionorgua
Copy link
Contributor

As far as I understand current code, this fragments filesystem even more.
I agree with @greatroar that nobody should rely on exact hole position. So it should be probably ok to restore holes not at exact positions (at least for now until proper hole information is not stored).

At the same time I don't like idea of having holes not aligned to filesystem block size. Plus I don't like optimization from latest commit (about early stopping prevents having to do a zero-length). I think that it's bad even for VM usage just because it creates 'block' of zeros in hole.

Here is an example with 16GB sparse file (without single byte of data):

% truncate -s 16G ~/tmp/test.img
% sudo filefrag -e ~/tmp/test.img                      
Filesystem type is: ef53
File size of /home/dion/tmp/test.img is 17179869184 (4194304 blocks of 4096 bytes)
/home/dion/tmp/test.img: 0 extents found

Backup-restore:

% cd ~/tmp; restic backup -q test.img
% /tmp/restic/restic restore latest --target ~/tmp/restore

And restored file:

% sudo filefrag -e ~/tmp/restore/test.img |head -n 20
Filesystem type is: ef53
File size of /home/dion/tmp/restore/test.img is 17179869184 (4194304 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:      127..     127:   19212415..  19212415:      1:        127:
   1:      255..     255:   19212543..  19212543:      1:            
   2:      383..     383:   19212671..  19212671:      1:            
   3:      511..     511:   19212799..  19212799:      1:            
   4:      639..     639:   19212927..  19212927:      1:            
   5:      767..     767:   19213055..  19213055:      1:            
   6:      895..     895:   19213183..  19213183:      1:            
   7:     1023..    1023:   19213311..  19213311:      1:            
   8:     1151..    1151:   19213439..  19213439:      1:
... 32K lines ...

So I've got 32 thousands of small 'holes' instead of one big hole. It even takes ~8 secs here to dump this...

So I think that we at least should be able to restore multiple adjacent zero blobs as one hole.

@greatroar
Copy link
Contributor Author

greatroar commented May 26, 2020

When I repeat this, I also get a more fragmented file: 24258 extents versus 25. However, the 8s figure isn't saying much out of context. In my run, restore speed is a factor 100 better than with dense restore: 2.09s vs. 3m20.56s. To simulate what happens when we actually use the file, I ran sha256sum:

$ time sha256sum /tmp/restore-dense/tmp/test.img 
07d217ebccc55480b7afa191674ec5da87f2d14efbc04dbc7e40efe345f16776  /tmp/restore-dense/tmp/test.img

real	1m17.382s
user	1m13.928s
sys	0m1.920s
$ time sha256sum /tmp/restore-sparse/tmp/test.img 
07d217ebccc55480b7afa191674ec5da87f2d14efbc04dbc7e40efe345f16776  /tmp/restore-sparse/tmp/test.img

real	1m17.524s
user	1m13.449s
sys	0m4.072s

The system time is up by 2x, but the wallclock time is the same. I then copied the files into /dev/null:

# First the original, as a baseline.
$ dd if=/tmp/test.img of=/dev/null bs=4096
4194304+0 records in
4194304+0 records out
17179869184 bytes (17 GB, 16 GiB) copied, 6.19832 s, 2.8 GB/s
$ dd if=/tmp/restore-dense/tmp/test.img of=/dev/null bs=4096
4194304+0 records in
4194304+0 records out
17179869184 bytes (17 GB, 16 GiB) copied, 190.28 s, 90.3 MB/s
$ dd if=/tmp/restore-sparse/tmp/test.img of=/dev/null bs=4096
4194304+0 records in
4194304+0 records out
17179869184 bytes (17 GB, 16 GiB) copied, 5.89015 s, 2.9 GB/s

That's a factor 32 faster and on par with the completely sparse file, despite the additional fragmentation.

This is on Linux/amd64 with an ext4 fs on an old spinning disk.

All this is not to say that there's no room for improvement. I can try to restore the Truncate dance.

@dionorgua
Copy link
Contributor

I think that it's better to wait for @fd0 feedback to understand chances about merging this in any form.

This is a new take on the write-by-Truncate idea from f4f8b50.
@greatroar
Copy link
Contributor Author

I already did it. Restore speed is unchanged here, so I think the 8s you were observing was just the time to fetch the backup from the repo.

@MichaelEischer
Copy link
Member

I agree with @ifedorenko that the proper solution would be to have restic backup and restore the original sparseness of a file.

Unless I'm mistaken, it's possible to add new fields to the JSON data structures without breaking backwards compatibility as long as these fields are optional. That way we could add a list of the holes in a file to the JSON node of a sparse file. To keep this backwards compatible, restic would still have to create the usual blob list. However, it should be possible to optimize this process a bit by using the information about (large) sparse sections to add all zero blobs for these without actually reading these sections.

An older restic version then could safely ignore the sparseness information. Known sparse sections of a file upfront would also allow the restorer to preallocate space on disk to reduce fragmentation.

Regarding the current implementation: the WriteAt method of the partialFile is racy as there's no synchronization and multiple WriteAt calls for the same file can be active at the same time. Besides it would be much cleaner to truncate the file to its final size right from the start and then later on just fill in relevant file parts. Only removing 0-prefixes seems kind of arbitrary. Why not skip a 0-suffix of a large stretch of zeros within a chunk? Or just all zero chunks?

@greatroar
Copy link
Contributor Author

WriteAt should not need synchronization:

Clients of WriteAt can execute parallel WriteAt calls on the same destination if the ranges do not overlap.

Regarding the prefix-only heuristic, it's a speed hack, but if we want to preserve sparsity information, the whole point is of course moot. I'll close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

archive and restore sparse files
4 participants