Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fast space allocation? #45

Open
Lekensteyn opened this issue Aug 19, 2016 · 7 comments
Open

Support fast space allocation? #45

Lekensteyn opened this issue Aug 19, 2016 · 7 comments

Comments

@Lekensteyn
Copy link

Lekensteyn commented Aug 19, 2016

I have recently bought an IODD-2541 USB disk enclosure which should allow me to expose files as virtual disk devices. It supports FAT32/exFAT and NTFS, but since NTFS sounds overkill and less supported for Linux I decided to use the exFAT.

Due to the design, files must be contiguously allocated. When I used truncate -s 20G disk.vhd, it took a while before all bytes were written to the SSD. This could probably be explained by the requirement to initialize extended part with zeroes (\0).

Proposal: a method to request allocation some sectors without writing zeroes to new blocks.

I am fully aware of the security issues from exposing possibly deleted data, but am willing to risk that for creating "empty" disk images. The undelying SSD does not have confidential data and the test images are just like that, test images.

Possible mechanism 1:
The Linux-specific fallocate(2) function combined with the FALLOC_FL_NO_HIDE_STALE mode could probably be used here. Originally proposed in April 2012 (see https://lwn.net/Articles/492959/, https://lwn.net/Articles/492920/). Apparently still in use via out-of-tree patches in production according to the ext4 maintainer, writing in September 2015:

However, this patch is in
active use in practically every single data center kernel for Google,
and it's in use in at least one other very large publically traded
company that uses cluster file systems such as Hadoopfs. And if
someone wants a copy of the FALLOC_FL_NO_HIDE_STALE patch for ext4,
I'm happy to give it to them.

Unfortunately the FUSE layer rejects such flags, so more work would be needed:

static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
                loff_t length)
{
    ...
    if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
        return -EOPNOTSUPP;

Possible mechanism 2:
Introduce an ioctl that could preallocate some blocks. (Restricting it to those who have CAP_SYS_RAWIO.)

Possible mechanism 3:
Introduce a separate utility that can allocate such file on an offline (unmounted) image.

Hopefully I have shown enough research and made the intent clear. Until then I have to waste some SSD write cycles and wait somewhat longer.

@relan
Copy link
Owner

relan commented Aug 23, 2016

Proposal: a method to request allocation some sectors without writing zeroes to new blocks.

Well, looks like the infrastructure (kernel, FUSE and programs) just isn't ready for such feature.

Possible mechanism 4:
Implement FUSE 2.9.1 operation fallocate and add an optional mount parameter that will disable data zeroing for this particular call. Some programs may break though.

Possible mechanism 5:
Zero data using discard (trim) command when it's supported by the device (and sets flash memory blocks to 0). Hopefully it will be faster.

Until then I have to waste some SSD write cycles and wait somewhat longer.

If SSD controller supports data compression, it hardly writes anything to flash in this case.

@Lekensteyn
Copy link
Author

Option (4) would be probably the easiest to add/use but it is a violation of the interface and may break programs.

Option (5) is better than nothing, but I am not sure if it works on the USB disk enclosure.
SSDs with data compression are solving problems on the wrong layer, I believe that only (some?) Sandforce SSDs do this hack.

@moneytoo
Copy link

moneytoo commented Nov 2, 2016

I had intention to run Samba share of a exfat formatted external drive but this issue is what prevents me to do that.

After I connect to a Samba share from Windows client and attempt to copy large file there (on the exfat drive), it starts by truncating the file first. Because the truncate is slow (it's via USB 2), Windows client timeouts (and reports error) if the actual transfer doesn't start within 20 seconds.
Transferring the file from macOS is ok. Windows hosted share on exfat is also working fine - which I don't get - is it SMB version thing or truncate implementation in Windows?

https://bugzilla.samba.org/show_bug.cgi?id=3583
http://www.gossamer-threads.com/lists/linux/kernel/683607

@relan
Copy link
Owner

relan commented Nov 3, 2016

is it SMB version thing or truncate implementation in Windows?

Can be both.

A possible solution that comes to my mind:

  1. On truncate set size to the desired value and valid_size leave intact. Do not initialize allocated blocks.
  2. On write beyond valid_size adjust it accordingly. Initialize blocks between old and new valid_size if they were not overwritten.
  3. On read return zeros for blocks beyond valid_size.

FS should be in a consistent state during all those operations while avoiding extra initialization of blocks that will be overwritten soon. But this would be quite a complex change.

@JsBergbau
Copy link

Is there anything new about this topic? Problem still exists, makes ExFat as underlying file system unusable.

@JsBergbau
Copy link

bump...
Problem still exists :(

@josephernest
Copy link

josephernest commented Dec 7, 2020

Same problem here!

After I connect to a Samba share from Windows client and attempt to copy large file there (on the exfat drive), it starts by truncating the file first. Because the truncate is slow (it's via USB 2), Windows client timeouts (and reports error) if the actual transfer doesn't start within 20 seconds.

Tested with both exfat-fuse and Linux Kernel 5.4's exfat (non-fuse).

So when sending 2 GB from Windows Explorer to a Linux+Samba+exFAT computer, 4 GB are actually written:

  • 2 GB of null bytes for the initial truncate
  • 2 GB for the actual data

It doubles the transfer time.

Anyone an idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants