Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit sparse file creation on Windows #4245

Open
ncw opened this issue May 18, 2020 · 4 comments
Open

Revisit sparse file creation on Windows #4245

ncw opened this issue May 18, 2020 · 4 comments

Comments

@ncw
Copy link
Member

ncw commented May 18, 2020

I was assuming that if you are writing at 4 points in the file then you'll be making 4 fragments, but it seems I was wrong about that!

Yeah, not on Windows, unless you're dealing with sparse files. For normal files, Windows just "extends" the initialized portion of the file to that point, which can take forever if you're writing at offset 25GB and none of the file has been initialized (regardless of whether it has been allocated).

I thought I'd share a couple of experiments to illustrate what happens in reality with sparse files. Here's what happens on Windows. Note that you can see the fragmentation via fsutil file queryExtents, but there's also contig -a which is similar.

C:\>fsutil file createNew temp 0 && fsutil sparse setFlag temp 1 && fsutil file setEOF temp 134217728 && (echo.>>temp) && fsutil file setEOF temp 268435456 && (echo.>>temp) && fsutil file queryExtents temp && del temp
File C:\temp is created
File C:\temp eof set
File C:\temp eof set
VCN: 0x0        Clusters: 0x8000     LCN: 0xffffffffffffffff
VCN: 0x8000     Clusters: 0x10       LCN: 0xa4b91e
VCN: 0x8010     Clusters: 0x7ff0     LCN: 0xffffffffffffffff
VCN: 0x10000    Clusters: 0x10       LCN: 0xa4cb52

LCN is the logical cluster number (i.e. the block number relative to the beginning of the volume), and VCN is the virtual cluster number (the block number relative to the beginning of the file).

Notice that there are 0x7ff0 clusters between the two allocated extents virtually, but only 0xa4cb52 - (0xa4b91e + 0x10) = 0x1224 clusters allocated to the file on the volume. This means the file will never end up contiguous. Extents can even end up out-of-order as a result.

Here's what the equivalent would look like on Linux:

$ fallocate -l 134217728 temp && fallocate -p -l 134217728 temp && (echo "">>temp) && fallocate -o 134217729 -l 134217728 temp && fallocate -p -o 134217729 -l 134217728 temp && (echo "">>temp) && sudo hdparm --fibmap temp && rm -f temp

temp:
 filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
   134217728  195430400  195430407          8
   268435456  196298752  196298759          8

It's a similar story here. There are (196298752 - 195430407) * 4096 = 3556741120 bytes between the FS blocks, but only 268435456 - (134217728 + 1) = 134217727 in the file. Again, the file cannot be contiguous.

So we conclude sparse files have fragmentation problems.

Now, the nice thing about fallocate (at least on ext4) is that it seems to support blocks that are allocated yet uninitialized. And the system will return zeros if you try to read such blocks. I didn't use this feature, because I used "hole punching" to mimic the NTFS behavior. But you are using this feature, so you should be fine on Linux.

NTFS doesn't quite support this feature though. What it does support is something Windows calls a "valid data length", which is the length of the file (starting from offset 0) which is assumed to contain initialized data on the disk. On Windows, the "fallocate" method of aria2c directly sets the valid data length (SetFileValidData), which is like setting the file length while taking whatever is on the disk as the file contents.

What this means for you is that you have the following options:

  • To minimize fragmentation and initialization time, you want to call SetFileValidData, and I think this needs to be after setting FileAllocationInformation rather than before (but I'm not sure). However, this requires SeManageVolumePrivilege, since it can leak underlying disk contents. I think you need to be an administrator and also request this privilege explicitly. The easiest way is to call RtlAdjustPrivilege(28 /*SeManageVolumePrivilege*/, TRUE, FALSE, &wasEnabled) from ntdll.dll at program startup time. If the call succeeds (returns zero), then you know you can call SetFileValidData and don't need to do anything else. If it fails, and the user has passed --file-allocation=falloc, then I suggest aborting: they've probably forgotten to use "Run As Administrator", and you want to let them know. If they haven't specified anything, however, then you probably want to be "smart" and try the other options below.

  • You can just keep going and make the user wait for initialization during download. It's what download programs typically do. It avoids fragmentation as much as possible and it works on all file systems. But the waiting time can be prohibitive for a huge file. If you do this, you can also consider temporarily lowering the I/O priority to IoPriorityLow, and restoring it to IoPriorityNormal before actually downloading the file. Or you can provide a flag to the user to do this manually if they're interested. This hopefully ensures other activity doesn't grind to a halt while the file is getting initialized. But I have not tested this, so I'm not sure.

  • You can use the buffering technique I mentioned earlier (with, say, 64MB or 256MB or even 1GB chunks). This allows for a bit of fragmentation, but with very large fragments, so it's unlikely to be a problem for anybody. I think this is the smartest solution if SetFileValidData fails, but I can't speak to the charge/ban issue. You'll need to figure out (possibly on a case-by-case basis) if this solution makes sense for the given remote. I suspect if your chunks are large (say, ≥ 256MB?) it shouldn't be a problem, but I don't really know.

Hope this clarified everything you were asking about!

Originally posted by @mehrdadn in #2469 (comment)

@ncw ncw added this to the v1.53 milestone May 18, 2020
@ncw ncw modified the milestones: v1.53, v1.54 Sep 5, 2020
@ncw ncw modified the milestones: v1.54, v1.55 Feb 3, 2021
@ncw ncw modified the milestones: v1.55, v1.56 Apr 3, 2021
@klunky
Copy link

klunky commented Apr 22, 2021

I think I am with this problem, with high loading times on my disk when it's clearing the cache or doing any rclone background activity... there's any tests I can make with a new version ?

@klunky
Copy link

klunky commented Apr 24, 2021

@ncw you think i can help in any way ?

@ncw
Copy link
Member Author

ncw commented Apr 25, 2021

@klunky - can you post on the rclone forum about this with more details - I think that will be the best way to help you.

@ncw ncw modified the milestones: v1.56, v1.57 Jul 20, 2021
@ncw ncw modified the milestones: v1.57, Soon Nov 1, 2021
@ncw
Copy link
Member Author

ncw commented Sep 11, 2023

Note that for v1.64 we've implemented the 3rd of the suggestions above

  • You can use the buffering technique I mentioned earlier (with, say, 64MB or 256MB or even 1GB chunks). This allows for a bit of fragmentation, but with very large fragments, so it's unlikely to be a problem for anybody. I think this is the smartest solution if SetFileValidData fails, but I can't speak to the charge/ban issue. You'll need to figure out (possibly on a case-by-case basis) if this solution makes sense for the given remote. I suspect if your chunks are large (say, ≥ 256MB?) it shouldn't be a problem, but I don't really know.

And I think this should improve the situation a lot on Windows, but I haven't extensively performance tested it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants