You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current behaviour is that sinkHandle calls hFlush after every chunk is written. This can easily lead to a performance bottleneck (see the benchmark below). In the case of using sinkFileCautious, this is particularly galling as there is no need to flush every intermediate chunk.
Removing the hFlush call results in a 4x faster process on my machine. Benchmark (create a file called input.txt before running it):
import qualified Data.ByteString as B
import qualified Data.Conduit.Combinators as CC
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import qualified Data.Conduit as C
import Data.Conduit ((.|))
import Data.Maybe (maybe)
import Data.Time
import Control.Monad.IO.Class
import System.IO (hClose, openBinaryTempFile)
import qualified System.IO as IO
fastSinkHandle h = C.awaitForever (liftIO . B.hPut h)
fastSinkIOHandle alloc = C.bracketP alloc IO.hClose fastSinkHandle
fastSinkFile fp = fastSinkIOHandle (IO.openBinaryFile fp IO.WriteMode)
main = do
start <- getCurrentTime
C.runConduitRes $
CB.sourceFile "input.txt"
.| CB.lines
.| CC.unlinesAscii
.| CB.sinkFile "output.txt"
mid <- getCurrentTime
C.runConduitRes $
CB.sourceFile "input.txt"
.| CB.lines
.| CC.unlinesAscii
.| fastSinkFile "output.txt"
end <- getCurrentTime
putStrLn
("Data.Conduit.sinkFile: "++(show $ diffUTCTime mid start)++"\n"++
"fastSinkFile: "++(show $ diffUTCTime end mid)++"\n")
The text was updated successfully, but these errors were encountered:
I tend to agree. The change was made in b37505f. Someone (me) stupidly made the change without explaining why it was done. I think it's OK to undo it, PR welcome (but please note the change in the changelog).
luispedro
added a commit
to luispedro/conduit
that referenced
this issue
Sep 4, 2017
This bug fix in particular should make ngless faster:
snoyberg/conduit#322
as it was originally discovered while debugging bad performance in some
ngless workflows.
The current behaviour is that
sinkHandle
callshFlush
after every chunk is written. This can easily lead to a performance bottleneck (see the benchmark below). In the case of usingsinkFileCautious
, this is particularly galling as there is no need to flush every intermediate chunk.Removing the
hFlush
call results in a 4x faster process on my machine. Benchmark (create a file calledinput.txt
before running it):The text was updated successfully, but these errors were encountered: