Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data.Conduit.Binary.sinkHandle should not flush after every chunk #322

Closed
luispedro opened this issue Sep 4, 2017 · 1 comment · Fixed by #324
Closed

Data.Conduit.Binary.sinkHandle should not flush after every chunk #322

luispedro opened this issue Sep 4, 2017 · 1 comment · Fixed by #324

Comments

@luispedro
Copy link
Contributor

The current behaviour is that sinkHandle calls hFlush after every chunk is written. This can easily lead to a performance bottleneck (see the benchmark below). In the case of using sinkFileCautious, this is particularly galling as there is no need to flush every intermediate chunk.

Removing the hFlush call results in a 4x faster process on my machine. Benchmark (create a file called input.txt before running it):

import qualified Data.ByteString as B

import qualified Data.Conduit.Combinators as CC
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import qualified Data.Conduit as C
import           Data.Conduit ((.|))
import Data.Maybe (maybe)
import Data.Time
import Control.Monad.IO.Class 

import System.IO (hClose, openBinaryTempFile)
import qualified System.IO as IO


fastSinkHandle h = C.awaitForever (liftIO . B.hPut h)
fastSinkIOHandle alloc = C.bracketP alloc IO.hClose fastSinkHandle
fastSinkFile fp = fastSinkIOHandle (IO.openBinaryFile fp IO.WriteMode)

main = do
    start <- getCurrentTime
    C.runConduitRes $
        CB.sourceFile "input.txt"
            .| CB.lines
            .| CC.unlinesAscii
            .| CB.sinkFile "output.txt"
    mid <- getCurrentTime
    C.runConduitRes $
        CB.sourceFile "input.txt"
            .| CB.lines
            .| CC.unlinesAscii
            .| fastSinkFile "output.txt"
    end <- getCurrentTime
    putStrLn
        ("Data.Conduit.sinkFile: "++(show $ diffUTCTime mid start)++"\n"++
         "fastSinkFile:          "++(show $ diffUTCTime end   mid)++"\n")
@snoyberg
Copy link
Owner

snoyberg commented Sep 4, 2017

I tend to agree. The change was made in b37505f. Someone (me) stupidly made the change without explaining why it was done. I think it's OK to undo it, PR welcome (but please note the change in the changelog).

luispedro added a commit to luispedro/conduit that referenced this issue Sep 4, 2017
The previous behaviour could easily lead to severe performance
degradation.

fixes snoyberg#322
luispedro added a commit to ngless-toolkit/ngless that referenced this issue Sep 11, 2017
This bug fix in particular should make ngless faster:

snoyberg/conduit#322

as it was originally discovered while debugging bad performance in some
ngless workflows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants