-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
massive slowdown in buffered IO performance from fstat st_blksize incorrectly reported #3898
Comments
Just rebooted my Linux workstation to Windows (19H1 insider updates) and it does correctly report block size. Interested what blocksize others see from stat / |
Running 18845 (20H1), stat reports 512. |
@Gabrielcarvfer - You mention fstat is being called, but I don't see what the file descriptor is. LxFs? DrvFs? ProcFs? |
Looking at this a bit, it looks like we will fall back to 512 for LxFs and DrvFs if FileFsSectorSizeInformation is not supported by the device. @SvenGroot - Do you know why 512 was chosen instead of 4096? |
@benhillis - @f948lan said that. I've asked you guys to take a look at it. :) Checking both glibc 2.28 code and the strace, I can confirm that it is working properly on 18845, at least for normal files. Writing to the terminal is more complicated, and the code pointed by @f948lan is related to that. It was supposed to adjust the buffer size to the tty buffer size. The fstat called is the following: The weird part comes after that, but I'm pretty sure it is a glibc problem. The write calls are being split somewhere, not respecting the st_blksize (512 bytes), nor the 4096 byte fs block size, nor the 8192 byte BUFSIZ constant. Whatever size of string I try to write to the terminal (bigger than 512 bytes), gets split into (size-512) byte and another 512 byte write. As an example, for a 2048 byte string, I got the following: Even worse, if I call a flush or append a std::endl, another 512 byte write is called, which is pretty weird as writes to std::cout were supposed to be always flushed and those could be ignored. |
Sorry, should have mentioned that. |
Yes, I saw this too (but to file). Seems to be a side effect of the libc implementation, I didn't explore in detail, but it always happens if fwrite is called with more than bufsize bytes, it seems to fill the buffer, write that(512) then write the rest of the data as a second write call (except for the very first fwrite on a stream, which will get written as a single write). |
Hold down. Look at the stack: _IO_file_xsputn points to _IO_new_file_xsputn in https://github.com/lattera/glibc/blob/master/libio/fileops.c. Edit: |
This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request. Thank you! |
Your Windows build number:
What you're doing and what's happening:
Investigating cause of image processing routines running much slower in WSL than VM on same host.
Investigating glibc, buffer size is set by _IO_file_doallocate, specifically in this code:
where st.st_blksize is the result of an fstat system call, as is set to 512.
The same can be seen much more easily just by running stat /
What's wrong / what should be happening instead:
The underlying filesystem is using 4096byte blocks, as confirmed by powershell:
So fstat should return st_blksize as 4096
Effects
As noted above, glibc relies on this value to set the default buffer size for buffered IO, used by many applications. In some cases this can result in 8x more system calls for read and write operations, which carry a heavy penalty under WSL.
To prove the impact, I rebuild glibc with the code block above commented out (which falls back on a default value BUFSIZ, 8192, for the buffer).
In one particular example, rasterising an image with ghostscript takes 7minutes in default WSL, but reduces to 3m30 with the patched glibc!
Given the ubiquitous use of libc buffered IO, it seems likely there are many things affected by this.
I've seen it widely noted that WSL currently suffers on IO performance, and slow system calls seem to be a key reason for that. Clearly this will be compounded where 8x more calls are generated than necessary.
The text was updated successfully, but these errors were encountered: