-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive and restore sparse files #79
Comments
At least on linux, the description in the tar manual is outdated, the part "in order to determine if the file is sparse tar has to read it before trying to archive it" isn't true any more. The lseek() option SEEK_HOLE allows to find holes without reading the whole file. Using this from golang would involve some low-level non-portable code, but I guess for files with very large holes it could be an important optimization. |
Thanks for the hint. I have some more optimizations in mind that may only work on Linux. In general, I think that's okay to implement, as long as the other platforms are not degraded in functionality. |
Long runs of zeros will be compressed perfectly anyway. |
Correct, detecting holes doesn't help reducing the backup size. It can still improve performance, as reading large amounts of zeroes and then compressing them takes longer than just skipping over them. |
There is a technical tradeoff here, complicate the code for something that won't be important 99 percent of the time? I also think actually restoring backups is far more rare than storing them. I don't mind either way, I just want to point out that adding more code hurts reliability as a whole. |
We had some relevant discussion here: #117 Documentation for glibc's lseek() - used when detecting sparse ranges I think the main benefit from supporting sparse files would be in the Supporting sparse files would allow us to save space even if the original file was not sparse, or if the filesystem did not support sparse ranges. I suspect there are many cases where files are not correctly made sparse, so this could potentially yield some surprising results upon restoration of a snapshot. Documentation for glibc's fallocate() - used when making sparse ranges Here's an example call to #include <fcntl.h>
int result;
result = fallocate( fd, FALLOC_FL_PUNCH_HOLE, current_offset, length_of_zeroes );
switch( result ){
case 0: break; /* success */
case ENOSYS: break; /* stop trying to use fallocate() for this kernel / program invocation */
case EOPNOSUPP: break; /* stop trying to use fallocate() for this file/filesystem */
default: break; } Making sparse files only makes sense for kernels / filesystems that support them. Don't know how relevant this is, but for Windows: StackOverflow discussion on sparse files in NTFS and MSDN documentation on sparse files in NTFS Side-note: It would be interesting to investigate how many consecutive null blocks are required before a noticeable performance boost is achieved. |
@andrewchambers file-backed VMs images and similar are quite common. |
+1 backing up VM backups/images is my primary use-case for finding a block-level deduplicating tool. |
It is important to be able to restore sparse files correctly. I've experienced backup tools not doing that causing havoc on restore. I don't recall all the details now, but I believe it was related to a backup of /var/log, where /var/log/lastlog was a sparse file of 265GB. Of course lastlog don't use all that space in the vast majority of cases. But the restore of /var/log on a 80GB partition caused a nice explosion, as the restore insisted on restoring every single 0 in that file. |
Does restic support sparse files today ? |
No, it does not. |
Just an additional information: Core dump files are also sparse files (although I am not sure if anyone wants to backup/restore these). Being able to restore a sparse file exactly as backed up seems mandatory from our point of view. |
description from the gnu tar manual: https://www.gnu.org/software/tar/manual/html_node/sparse.html
The text was updated successfully, but these errors were encountered: