New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear `getByteString` #76

Merged
merged 2 commits into from May 28, 2015

Conversation

Projects
None yet
2 participants
@bitonic
Contributor

bitonic commented May 27, 2015

getByteString exhibited quadratic behaviour -- a classic case of bad "appending". This commit fixes it.

@bitonic

This comment has been minimized.

Show comment
Hide comment
@bitonic

bitonic May 27, 2015

Contributor

Note that this is probably better done with a Builder -- but it's getting late and I wanted to make sure to submit this :).

Contributor

bitonic commented May 27, 2015

Note that this is probably better done with a Builder -- but it's getting late and I wanted to make sure to submit this :).

Remove quadratic behaviour in `ensureN`.
Chains of `B.append`s were being created by repeated calls to
`demandInput`.

Try the following program, which writes and read 100MB, to appreciate
the difference:

```
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL
import Data.Binary (encode, decode)
import Data.Char (ord)

main :: IO ()
main = do

  let inBs = BS.replicate 100000000 (fromIntegral $ ord 'a')
  BSL.writeFile "bs.bin" (encode inBs)
  putStrLn "writing done"

  bin <- BSL.readFile "bs.bin"
  -- This takes around 30 seconds and causes more than 10GB to be
  -- allocated.
  let outBs = decode bin
  print $ inBs == outBs
```
@kolmodin

This comment has been minimized.

Show comment
Hide comment
@kolmodin

kolmodin May 28, 2015

Owner

Hey! Thanks for the fix. Indeed, there's a problem with ensureN if the chunks are small and a lot of data is requested.
It's essentially the same issue as #73 which was reported a little while ago. I've been working on a fix for that bug, but not quite ready to submit it.
Your patch fixes the problem, but makes ensureN recursive which we want to avoid since it no longer will get inlined.

I'm not sure why a Builder would be better to use, all you need is to concat some ByteStrings.

Owner

kolmodin commented May 28, 2015

Hey! Thanks for the fix. Indeed, there's a problem with ensureN if the chunks are small and a lot of data is requested.
It's essentially the same issue as #73 which was reported a little while ago. I've been working on a fix for that bug, but not quite ready to submit it.
Your patch fixes the problem, but makes ensureN recursive which we want to avoid since it no longer will get inlined.

I'm not sure why a Builder would be better to use, all you need is to concat some ByteStrings.

@bitonic

This comment has been minimized.

Show comment
Hide comment
@bitonic

bitonic May 28, 2015

Contributor

Wasn't the previous ensureN recursive too?

Contributor

bitonic commented May 28, 2015

Wasn't the previous ensureN recursive too?

@kolmodin

This comment has been minimized.

Show comment
Hide comment
@kolmodin

kolmodin May 28, 2015

Owner

Part of it was inlineable. That part is used whenever there is enough input (99% of the time).
The fallback when there isn't enough input was recursive.

Owner

kolmodin commented May 28, 2015

Part of it was inlineable. That part is used whenever there is enough input (99% of the time).
The fallback when there isn't enough input was recursive.

Make `ensureN` more inliner-friendly...
...by making the common case (when the input is big enough) non-recursive.
@bitonic

This comment has been minimized.

Show comment
Hide comment
@bitonic

bitonic May 28, 2015

Contributor

I pushed a commit that makes ensureN inlinable in the way you mention.

Contributor

bitonic commented May 28, 2015

I pushed a commit that makes ensureN inlinable in the way you mention.

@kolmodin

This comment has been minimized.

Show comment
Hide comment
@kolmodin

kolmodin May 28, 2015

Owner

Thanks! I'll have a look later today.

Owner

kolmodin commented May 28, 2015

Thanks! I'll have a look later today.

@kolmodin kolmodin merged commit f947333 into kolmodin:master May 28, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

hvr added a commit to ghc/ghc that referenced this pull request Jun 1, 2015

Update binary submodule to 0.7.5.0 release
Quoting the changelog, this pulls in the following fixes:

binary-0.7.5.0
--------------

- Fix performance bug that was noticable when you get a big strict ByteString
  and the input to the decoder consists of many small chunks.
    - kolmodin/binary#73
    - kolmodin/binary#76
- Fix memory leak when decoding Double and Float.
    - Commit 497a181c083fa9faf7fa3aa64d1d8deb9ac76ecb
- We now require QuickCheck >= 2.8. Remove our version of arbitrarySizedNatural.

binary-0.7.4.0
--------------

- Some invalid UTF-8 strings caused an exception when decoded. Those errors will
  now now fail in the Get monad instead. See issue 70.
  Patch contributed by @ttuegel.

hvr added a commit to ghc/ghc that referenced this pull request Jun 9, 2015

Update binary submodule to 0.7.5.0 release
Quoting the changelog, this pulls in the following fixes:

binary-0.7.5.0
--------------

- Fix performance bug that was noticable when you get a big strict ByteString
  and the input to the decoder consists of many small chunks.
    - kolmodin/binary#73
    - kolmodin/binary#76
- Fix memory leak when decoding Double and Float.
    - Commit 497a181c083fa9faf7fa3aa64d1d8deb9ac76ecb
- We now require QuickCheck >= 2.8. Remove our version of arbitrarySizedNatural.

binary-0.7.4.0
--------------

- Some invalid UTF-8 strings caused an exception when decoded. Those errors will
  now now fail in the Get monad instead. See issue 70.
  Patch contributed by @ttuegel.

(cherry picked from commit 7dd0ea7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment