New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`Data.Conduit.Binary.sinkHandle` should not flush after every chunk #322

Closed
luispedro opened this Issue Sep 4, 2017 · 1 comment

Comments

Projects
None yet
2 participants
@luispedro
Contributor

luispedro commented Sep 4, 2017

The current behaviour is that sinkHandle calls hFlush after every chunk is written. This can easily lead to a performance bottleneck (see the benchmark below). In the case of using sinkFileCautious, this is particularly galling as there is no need to flush every intermediate chunk.

Removing the hFlush call results in a 4x faster process on my machine. Benchmark (create a file called input.txt before running it):

import qualified Data.ByteString as B

import qualified Data.Conduit.Combinators as CC
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import qualified Data.Conduit as C
import           Data.Conduit ((.|))
import Data.Maybe (maybe)
import Data.Time
import Control.Monad.IO.Class 

import System.IO (hClose, openBinaryTempFile)
import qualified System.IO as IO


fastSinkHandle h = C.awaitForever (liftIO . B.hPut h)
fastSinkIOHandle alloc = C.bracketP alloc IO.hClose fastSinkHandle
fastSinkFile fp = fastSinkIOHandle (IO.openBinaryFile fp IO.WriteMode)

main = do
    start <- getCurrentTime
    C.runConduitRes $
        CB.sourceFile "input.txt"
            .| CB.lines
            .| CC.unlinesAscii
            .| CB.sinkFile "output.txt"
    mid <- getCurrentTime
    C.runConduitRes $
        CB.sourceFile "input.txt"
            .| CB.lines
            .| CC.unlinesAscii
            .| fastSinkFile "output.txt"
    end <- getCurrentTime
    putStrLn
        ("Data.Conduit.sinkFile: "++(show $ diffUTCTime mid start)++"\n"++
         "fastSinkFile:          "++(show $ diffUTCTime end   mid)++"\n")
@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Sep 4, 2017

Owner

I tend to agree. The change was made in b37505f. Someone (me) stupidly made the change without explaining why it was done. I think it's OK to undo it, PR welcome (but please note the change in the changelog).

Owner

snoyberg commented Sep 4, 2017

I tend to agree. The change was made in b37505f. Someone (me) stupidly made the change without explaining why it was done. I think it's OK to undo it, PR welcome (but please note the change in the changelog).

luispedro added a commit to luispedro/conduit that referenced this issue Sep 4, 2017

Speed up sinkHandle by not flushing after output
The previous behaviour could easily lead to severe performance
degradation.

fixes #322

@snoyberg snoyberg closed this in #324 Sep 5, 2017

luispedro added a commit to ngless-toolkit/ngless that referenced this issue Sep 11, 2017

BLD Update stackage version
This bug fix in particular should make ngless faster:

snoyberg/conduit#322

as it was originally discovered while debugging bad performance in some
ngless workflows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment