New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce system call overhead of `copyFile` by increasing the buffer size to 128 KiB #69

Merged
merged 1 commit into from Feb 10, 2017

Conversation

Projects
None yet
3 participants
@nh2
Member

nh2 commented Feb 8, 2017

128 KiB is what coreutils cp uses, based on a benchmark provided here:
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ioblksize.h?id=c0a79542fb5c2c22cf0a250db94af6f8581ca342#n23

The benchmarks report speedups of factor 5x to 30x depending on
the hardware.

Reduce system call overhead of `copyFile` by increasing the buffer si…
…ze to 128 KiB.

128 KiB is what coreutils `cp` uses, based on a benchmark provided here:
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ioblksize.h?id=c0a79542fb5c2c22cf0a250db94af6f8581ca342#n23

The benchmarks report speedups of factor 5x to 30x depending on
the hardware.
@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

I'd still like somebody to independently verify the gains.

Member

nh2 commented Feb 8, 2017

I'd still like somebody to independently verify the gains.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

Also, there's some weird thing going on with write() syscalls and handle buffers:

write(3, "", 0)                         = 0 <0.000004>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000050>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000023>
write(3, "", 0)                         = 0 <0.000005>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000050>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000022>

This happens as soon as you set bufferSize > 8095.

With 8095 or smaller, it looks right like:

read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8096) = 8096 <0.000005>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8095) = 8095 <0.000011>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8096) = 8096 <0.000005>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8095) = 8095 <0.000009>
Member

nh2 commented Feb 8, 2017

Also, there's some weird thing going on with write() syscalls and handle buffers:

write(3, "", 0)                         = 0 <0.000004>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000050>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000023>
write(3, "", 0)                         = 0 <0.000005>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000050>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000022>

This happens as soon as you set bufferSize > 8095.

With 8095 or smaller, it looks right like:

read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8096) = 8096 <0.000005>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8095) = 8095 <0.000011>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8096) = 8096 <0.000005>
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8095) = 8095 <0.000009>
@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 8, 2017

Member

Also, there's some weird thing going on with write() syscalls and handle buffers:

Have you tried disabling buffering perhaps?

Member

Rufflewind commented Feb 8, 2017

Also, there's some weird thing going on with write() syscalls and handle buffers:

Have you tried disabling buffering perhaps?

@erikd

This comment has been minimized.

Show comment
Hide comment
@erikd

erikd Feb 8, 2017

Member

Have you tried disabling buffering perhaps?

I can't seem to figure out whether hGteBuf uses FILE* handles for file descriptors underneath. The former are buffered and the latter are not.

Member

erikd commented Feb 8, 2017

Have you tried disabling buffering perhaps?

I can't seem to figure out whether hGteBuf uses FILE* handles for file descriptors underneath. The former are buffered and the latter are not.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

Have you tried disabling buffering perhaps?

@Rufflewind Yes, I had, to no avail; tried with NoBuffering, BlockBuffing Nothing, and BlockBuffering (Just 100000), no difference.

Member

nh2 commented Feb 8, 2017

Have you tried disabling buffering perhaps?

@Rufflewind Yes, I had, to no avail; tried with NoBuffering, BlockBuffing Nothing, and BlockBuffering (Just 100000), no difference.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

What I find especially disturbing is that the write("") comes before the proper write.

Member

nh2 commented Feb 8, 2017

What I find especially disturbing is that the write("") comes before the proper write.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

This leads me to the suspicion that this might be going on in wantWritableHandle / checkWriteableHandle.

Member

nh2 commented Feb 8, 2017

This leads me to the suspicion that this might be going on in wantWritableHandle / checkWriteableHandle.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

Minimal example:

import Foreign.Marshal.Alloc (allocaBytes)
import System.IO

main = do
  withBinaryFile "testfile2" WriteMode $ \ hTo -> do
    let bufferSize = 8096 -- write("") is gone when we use 8095 here
    allocaBytes bufferSize $ \buffer -> do
      hPutBuf hTo buffer bufferSize

In strace -f (evil write("") highlighted for emphasis):

open("testfile2", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 <0.000007>
fstat(3, {st_mode=S_IFREG|0664, st_size=8096, ...}) = 0 <0.000005>
ftruncate(3, 0)                         = 0 <0.000027>
ioctl(3, TCGETS, 0x7fff86ea7f60)        = -1 ENOTTY (Inappropriate ioctl for device) <0.000006>

write(3, "", 0)                         = 0 <0.000004>

write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8096) = 8096 <0.000017>
close(3)                                = 0 <0.000030>
Member

nh2 commented Feb 8, 2017

Minimal example:

import Foreign.Marshal.Alloc (allocaBytes)
import System.IO

main = do
  withBinaryFile "testfile2" WriteMode $ \ hTo -> do
    let bufferSize = 8096 -- write("") is gone when we use 8095 here
    allocaBytes bufferSize $ \buffer -> do
      hPutBuf hTo buffer bufferSize

In strace -f (evil write("") highlighted for emphasis):

open("testfile2", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 <0.000007>
fstat(3, {st_mode=S_IFREG|0664, st_size=8096, ...}) = 0 <0.000005>
ftruncate(3, 0)                         = 0 <0.000027>
ioctl(3, TCGETS, 0x7fff86ea7f60)        = -1 ENOTTY (Inappropriate ioctl for device) <0.000006>

write(3, "", 0)                         = 0 <0.000004>

write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8096) = 8096 <0.000017>
close(3)                                = 0 <0.000030>
@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

OK, making some progress here. The 8096 comes from https://hackage.haskell.org/package/base-4.9.1.0/docs/src/GHC.IO.FD.html#dEFAULT_FD_BUFFER_SIZE

instance BufferedIO FD where
  newBuffer _dev state = newByteBuffer dEFAULT_FD_BUFFER_SIZE state

-- We used to use System.Posix.Internals.dEFAULT_BUFFER_SIZE, which is
-- taken from the value of BUFSIZ on the current platform.  This value
-- varies too much though: it is 512 on Windows, 1024 on OS X and 8192
-- on Linux.  So let's just use a decent size on every platform:
dEFAULT_FD_BUFFER_SIZE :: Int
dEFAULT_FD_BUFFER_SIZE = 8096

Does anybody know why 8096 was chosen? It's none of the numbers mentioned before, and it's not a power of 2.

Member

nh2 commented Feb 8, 2017

OK, making some progress here. The 8096 comes from https://hackage.haskell.org/package/base-4.9.1.0/docs/src/GHC.IO.FD.html#dEFAULT_FD_BUFFER_SIZE

instance BufferedIO FD where
  newBuffer _dev state = newByteBuffer dEFAULT_FD_BUFFER_SIZE state

-- We used to use System.Posix.Internals.dEFAULT_BUFFER_SIZE, which is
-- taken from the value of BUFSIZ on the current platform.  This value
-- varies too much though: it is 512 on Windows, 1024 on OS X and 8192
-- on Linux.  So let's just use a decent size on every platform:
dEFAULT_FD_BUFFER_SIZE :: Int
dEFAULT_FD_BUFFER_SIZE = 8096

Does anybody know why 8096 was chosen? It's none of the numbers mentioned before, and it's not a power of 2.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 8, 2017

Member

OK it's all clear now, it's a typo in the number indeed, will post an update shortly.

The write("")s come from bufWrite, where a > should be >= in if (size - w > count).

There's also an unrelated -- TODO: we should do a non-blocking flush here in there that should probably be fixed to not violate the non-blocking guarantees of hPutBuf'.

Member

nh2 commented Feb 8, 2017

OK it's all clear now, it's a typo in the number indeed, will post an update shortly.

The write("")s come from bufWrite, where a > should be >= in if (size - w > count).

There's also an unrelated -- TODO: we should do a non-blocking flush here in there that should probably be fixed to not violate the non-blocking guarantees of hPutBuf'.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 9, 2017

Member

OK, I've written this up as GHC bugs:

and filed a third slightly related

Member

nh2 commented Feb 9, 2017

OK, I've written this up as GHC bugs:

and filed a third slightly related

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 9, 2017

Member

This PR can now return to its original purpose, increasing the copyFile buffer size to 128 KB.

Member

nh2 commented Feb 9, 2017

This PR can now return to its original purpose, increasing the copyFile buffer size to 128 KB.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 9, 2017

Member

I've done some quick time based benchmarking between 1024 and 131072 buffer size.

Preparing a 1GB file:

dd if=/dev/zero of=testfile bs=1M count=1000

Code:

main = copyFile "testfile" "testfile2"

(running on a version of base that has the bug hPutBuf issues unnecessary empty write syscalls for large writes fixed -- for benchmarking it without having to recompile base, I used this file for generating the below measurements).

Because of the 8096 handle buffer bug, I provide times for both 1024 and 8096/8 = 1012 (as the latter will be more efficient until that bug is fixed, for bufferSize < 8096; for larger buffer sizes I don't provide multiples of 1012 because it makes no difference as the Handle buffer is not used in that case).

Running with GHC 8.0.2 on Stackage nightly-2017-01-25, using ghc --make -O copytest.hs && time ./copytest.

bufferSize = 1012:

0.18user 0.87system 0:02.69elapsed 39%CPU (0avgtext+0avgdata 4352maxresident)k
0inputs+2048000outputs (0major+314minor)pagefaults 0swaps

0.16user 0.92system 0:02.62elapsed 41%CPU (0avgtext+0avgdata 4572maxresident)k
0inputs+2048000outputs (0major+320minor)pagefaults 0swaps

0.20user 0.86system 0:02.80elapsed 38%CPU (0avgtext+0avgdata 4360maxresident)k
0inputs+2048000outputs (0major+315minor)pagefaults 0swaps

0.19user 0.85system 0:02.61elapsed 40%CPU (0avgtext+0avgdata 4532maxresident)k
0inputs+2048000outputs (0major+319minor)pagefaults 0swaps

0.18user 0.98system 0:02.65elapsed 43%CPU (0avgtext+0avgdata 4472maxresident)k
0inputs+2048000outputs (0major+317minor)pagefaults 0swaps

bufferSize = 1024:

0.20user 0.86system 0:02.82elapsed 37%CPU (0avgtext+0avgdata 4420maxresident)k
0inputs+2048000outputs (0major+314minor)pagefaults 0swaps

0.22user 0.73system 0:02.68elapsed 35%CPU (0avgtext+0avgdata 4468maxresident)k
0inputs+2048000outputs (0major+320minor)pagefaults 0swaps

0.19user 0.88system 0:02.67elapsed 40%CPU (0avgtext+0avgdata 4576maxresident)k
0inputs+2048000outputs (0major+318minor)pagefaults 0swaps

0.17user 0.78system 0:02.60elapsed 36%CPU (0avgtext+0avgdata 4496maxresident)k
0inputs+2048000outputs (0major+318minor)pagefaults 0swaps

0.20user 0.79system 0:02.64elapsed 37%CPU (0avgtext+0avgdata 4400maxresident)k
0inputs+2048000outputs (0major+318minor)pagefaults 0swaps

bufferSize = 1024 * 8:

0.09user 0.83system 0:02.50elapsed 37%CPU (0avgtext+0avgdata 4396maxresident)k
0inputs+2048000outputs (0major+315minor)pagefaults 0swaps

0.08user 0.84system 0:02.97elapsed 31%CPU (0avgtext+0avgdata 4508maxresident)k
0inputs+2048000outputs (0major+320minor)pagefaults 0swaps

0.12user 0.78system 0:02.64elapsed 34%CPU (0avgtext+0avgdata 4408maxresident)k
0inputs+2048000outputs (0major+319minor)pagefaults 0swaps

0.12user 0.78system 0:02.44elapsed 36%CPU (0avgtext+0avgdata 4504maxresident)k
0inputs+2048000outputs (0major+321minor)pagefaults 0swaps

0.10user 0.77system 0:02.47elapsed 35%CPU (0avgtext+0avgdata 4544maxresident)k
0inputs+2048000outputs (0major+321minor)pagefaults 0swaps

bufferSize = 1024 * 128:

0.00user 0.78system 0:02.33elapsed 33%CPU (0avgtext+0avgdata 4656maxresident)k
0inputs+2048000outputs (0major+349minor)pagefaults 0swaps

0.00user 0.82system 0:02.33elapsed 35%CPU (0avgtext+0avgdata 4528maxresident)k
0inputs+2048000outputs (0major+349minor)pagefaults 0swaps

0.00user 0.76system 0:02.36elapsed 32%CPU (0avgtext+0avgdata 4524maxresident)k
0inputs+2048000outputs (0major+348minor)pagefaults 0swaps

0.01user 0.74system 0:02.33elapsed 32%CPU (0avgtext+0avgdata 4548maxresident)k
0inputs+2048000outputs (0major+345minor)pagefaults 0swaps

0.00user 0.71system 0:02.27elapsed 31%CPU (0avgtext+0avgdata 4520maxresident)k
0inputs+2048000outputs (0major+346minor)pagefaults 0swaps

As you can see, user CPU time went from 0.20 to 0.01 seconds, and system CPU time also improved slightly.

These measurements support that bufSize = 1024 * 128 indeed improves performance.

Member

nh2 commented Feb 9, 2017

I've done some quick time based benchmarking between 1024 and 131072 buffer size.

Preparing a 1GB file:

dd if=/dev/zero of=testfile bs=1M count=1000

Code:

main = copyFile "testfile" "testfile2"

(running on a version of base that has the bug hPutBuf issues unnecessary empty write syscalls for large writes fixed -- for benchmarking it without having to recompile base, I used this file for generating the below measurements).

Because of the 8096 handle buffer bug, I provide times for both 1024 and 8096/8 = 1012 (as the latter will be more efficient until that bug is fixed, for bufferSize < 8096; for larger buffer sizes I don't provide multiples of 1012 because it makes no difference as the Handle buffer is not used in that case).

Running with GHC 8.0.2 on Stackage nightly-2017-01-25, using ghc --make -O copytest.hs && time ./copytest.

bufferSize = 1012:

0.18user 0.87system 0:02.69elapsed 39%CPU (0avgtext+0avgdata 4352maxresident)k
0inputs+2048000outputs (0major+314minor)pagefaults 0swaps

0.16user 0.92system 0:02.62elapsed 41%CPU (0avgtext+0avgdata 4572maxresident)k
0inputs+2048000outputs (0major+320minor)pagefaults 0swaps

0.20user 0.86system 0:02.80elapsed 38%CPU (0avgtext+0avgdata 4360maxresident)k
0inputs+2048000outputs (0major+315minor)pagefaults 0swaps

0.19user 0.85system 0:02.61elapsed 40%CPU (0avgtext+0avgdata 4532maxresident)k
0inputs+2048000outputs (0major+319minor)pagefaults 0swaps

0.18user 0.98system 0:02.65elapsed 43%CPU (0avgtext+0avgdata 4472maxresident)k
0inputs+2048000outputs (0major+317minor)pagefaults 0swaps

bufferSize = 1024:

0.20user 0.86system 0:02.82elapsed 37%CPU (0avgtext+0avgdata 4420maxresident)k
0inputs+2048000outputs (0major+314minor)pagefaults 0swaps

0.22user 0.73system 0:02.68elapsed 35%CPU (0avgtext+0avgdata 4468maxresident)k
0inputs+2048000outputs (0major+320minor)pagefaults 0swaps

0.19user 0.88system 0:02.67elapsed 40%CPU (0avgtext+0avgdata 4576maxresident)k
0inputs+2048000outputs (0major+318minor)pagefaults 0swaps

0.17user 0.78system 0:02.60elapsed 36%CPU (0avgtext+0avgdata 4496maxresident)k
0inputs+2048000outputs (0major+318minor)pagefaults 0swaps

0.20user 0.79system 0:02.64elapsed 37%CPU (0avgtext+0avgdata 4400maxresident)k
0inputs+2048000outputs (0major+318minor)pagefaults 0swaps

bufferSize = 1024 * 8:

0.09user 0.83system 0:02.50elapsed 37%CPU (0avgtext+0avgdata 4396maxresident)k
0inputs+2048000outputs (0major+315minor)pagefaults 0swaps

0.08user 0.84system 0:02.97elapsed 31%CPU (0avgtext+0avgdata 4508maxresident)k
0inputs+2048000outputs (0major+320minor)pagefaults 0swaps

0.12user 0.78system 0:02.64elapsed 34%CPU (0avgtext+0avgdata 4408maxresident)k
0inputs+2048000outputs (0major+319minor)pagefaults 0swaps

0.12user 0.78system 0:02.44elapsed 36%CPU (0avgtext+0avgdata 4504maxresident)k
0inputs+2048000outputs (0major+321minor)pagefaults 0swaps

0.10user 0.77system 0:02.47elapsed 35%CPU (0avgtext+0avgdata 4544maxresident)k
0inputs+2048000outputs (0major+321minor)pagefaults 0swaps

bufferSize = 1024 * 128:

0.00user 0.78system 0:02.33elapsed 33%CPU (0avgtext+0avgdata 4656maxresident)k
0inputs+2048000outputs (0major+349minor)pagefaults 0swaps

0.00user 0.82system 0:02.33elapsed 35%CPU (0avgtext+0avgdata 4528maxresident)k
0inputs+2048000outputs (0major+349minor)pagefaults 0swaps

0.00user 0.76system 0:02.36elapsed 32%CPU (0avgtext+0avgdata 4524maxresident)k
0inputs+2048000outputs (0major+348minor)pagefaults 0swaps

0.01user 0.74system 0:02.33elapsed 32%CPU (0avgtext+0avgdata 4548maxresident)k
0inputs+2048000outputs (0major+345minor)pagefaults 0swaps

0.00user 0.71system 0:02.27elapsed 31%CPU (0avgtext+0avgdata 4520maxresident)k
0inputs+2048000outputs (0major+346minor)pagefaults 0swaps

As you can see, user CPU time went from 0.20 to 0.01 seconds, and system CPU time also improved slightly.

These measurements support that bufSize = 1024 * 128 indeed improves performance.

@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 9, 2017

Member

I should also mention, this change makes copyFile as fast as cp on a ramdisk, while before it was twice as slow:

Before: 0.20user 0.38system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 4472maxresident)k
After:  0.00user 0.35system 0:00.35elapsed 99%CPU (0avgtext+0avgdata 4708maxresident)k
cp:     0.00user 0.33system 0:00.33elapsed 99%CPU (0avgtext+0avgdata 2568maxresident)k

The only thing that's faster is sendfile():

  • sendfile in Haskell: 0.00user 0.28system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 3528maxresident)k
  • probably even slightly faster in C
Member

nh2 commented Feb 9, 2017

I should also mention, this change makes copyFile as fast as cp on a ramdisk, while before it was twice as slow:

Before: 0.20user 0.38system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 4472maxresident)k
After:  0.00user 0.35system 0:00.35elapsed 99%CPU (0avgtext+0avgdata 4708maxresident)k
cp:     0.00user 0.33system 0:00.33elapsed 99%CPU (0avgtext+0avgdata 2568maxresident)k

The only thing that's faster is sendfile():

  • sendfile in Haskell: 0.00user 0.28system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 3528maxresident)k
  • probably even slightly faster in C
@nh2

This comment has been minimized.

Show comment
Hide comment
@nh2

nh2 Feb 9, 2017

Member

To answer the other question:

I can't seem to figure out whether hGteBuf uses FILE* handles for file descriptors underneath.

@erikd It doesn't, Handles are inspired by libc's standard input/output streams, but don't use them. Instead, they implement their own similar logic directly.

Member

nh2 commented Feb 9, 2017

To answer the other question:

I can't seem to figure out whether hGteBuf uses FILE* handles for file descriptors underneath.

@erikd It doesn't, Handles are inspired by libc's standard input/output streams, but don't use them. Instead, they implement their own similar logic directly.

@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 10, 2017

Member

I'd still like somebody to independently verify the gains.

Tested this on tmpfs (ramdisk) and SSD on my Linux machine, and there was a speedup of ~60%. The results fluctuates a lot on my machine. Generally I expect it to different from OS to OS depending on how well the OS caches the I/O.

In any case, I don't doubt that this should be speed increase in the majority of cases given how large I/O caches have become these days. Will merge shortly.

Preparing a 1GB file:

It's probably better to use /dev/urandom rather than /dev/zero to avoid predictability.

The only thing that's faster is sendfile():

In the future, it'd be nice to use sendfile on platforms that do support it.

Member

Rufflewind commented Feb 10, 2017

I'd still like somebody to independently verify the gains.

Tested this on tmpfs (ramdisk) and SSD on my Linux machine, and there was a speedup of ~60%. The results fluctuates a lot on my machine. Generally I expect it to different from OS to OS depending on how well the OS caches the I/O.

In any case, I don't doubt that this should be speed increase in the majority of cases given how large I/O caches have become these days. Will merge shortly.

Preparing a 1GB file:

It's probably better to use /dev/urandom rather than /dev/zero to avoid predictability.

The only thing that's faster is sendfile():

In the future, it'd be nice to use sendfile on platforms that do support it.

Rufflewind added a commit to Rufflewind/directory that referenced this pull request Feb 10, 2017

@Rufflewind Rufflewind merged commit 51a0f72 into haskell:master Feb 10, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Rufflewind

This comment has been minimized.

Show comment
Hide comment
@Rufflewind

Rufflewind Feb 10, 2017

Member

Thanks for the PR and the benchmarks!

Member

Rufflewind commented Feb 10, 2017

Thanks for the PR and the benchmarks!

nh2 added a commit to nh2/copier that referenced this pull request Feb 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment