Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazier unlines #477

Merged
merged 5 commits into from
Jan 30, 2022
Merged

Lazier unlines #477

merged 5 commits into from
Jan 30, 2022

Conversation

noughtmare
Copy link
Contributor

Fixes #476

Benchmark results:

before:

  lazy-unlines: OK (0.30s)
    547  μs ±  34 μs, 3.2 MB allocated,  23 KB copied,  38 MB peak memory

after:

  lazy-unlines: OK (0.20s)
    354  μs ±  24 μs, 2.0 MB allocated,  23 KB copied,  38 MB peak memory

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a test for the laziness property or at least a comment that points out this requirement for the implementation.

@noughtmare
Copy link
Contributor Author

Should I also change the strict version? It won't benefit from the strictness changes and a similar benchmark doesn't show very significant changes, but it does make sense to implement both functions similarly and this new implementation is simpler.

@noughtmare
Copy link
Contributor Author

@sjakobi I've added a test that checks the laziness of unlines. It fails with the old implementation.

@clyring
Copy link
Member

clyring commented Jan 30, 2022

It's not directly related to the motivation for this patch, but does performance of lazy unlines further improve if concat is also rewritten as a foldr, so that the List.concatMap can potentially fuse with it? And are there any drawbacks to doing so?

@sjakobi
Copy link
Member

sjakobi commented Jan 30, 2022

@noughtmare What's the performance impact on the strict version?

@clyring Good question. I think it would be best to address it separately from this PR though, to keep the size of this one more manageable and to avoid a long-standing branch that might acquire merge conflicts. Feel free to record the idea on the issue tracker.

@noughtmare
Copy link
Contributor Author

noughtmare commented Jan 30, 2022

What's the performance impact on the strict version?

Before:

strict: OK (2.44s)
  292  μs ± 7.3 μs, 876 KB allocated, 5.7 KB copied,  40 MB peak memory

After:

strict: OK (0.59s)
  279  μs ± 6.8 μs, 739 KB allocated, 5.3 KB copied,  38 MB peak memory

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sjakobi sjakobi requested a review from Bodigrim January 30, 2022 19:46
unlines [] = empty
unlines ss = concat (List.intersperse nl ss) `append` nl -- half as much space
where nl = singleton '\n'
unlines = concat . List.concatMap (\x -> [x, singleton '\n'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than before, because it avoids copying the full output twice (with and without final nl).
Ideally unlines should be implemented in a style similar to intercalate, avoiding concat (which is quite ineffective):

bytestring/Data/ByteString.hs

Lines 1209 to 1227 in 5efd6b5

intercalate :: ByteString -> [ByteString] -> ByteString
intercalate _ [] = mempty
intercalate _ [x] = x -- This branch exists for laziness, not speed
intercalate (BS fSepPtr sepLen) (BS fhPtr hLen : t) =
unsafeCreate totalLen $ \dstPtr0 ->
unsafeWithForeignPtr fSepPtr $ \sepPtr -> do
unsafeWithForeignPtr fhPtr $ \hPtr ->
memcpy dstPtr0 hPtr hLen
let go _ [] = pure ()
go dstPtr (BS fChunkPtr chunkLen : chunks) = do
memcpy dstPtr sepPtr sepLen
let destPtr' = dstPtr `plusPtr` sepLen
unsafeWithForeignPtr fChunkPtr $ \chunkPtr ->
memcpy destPtr' chunkPtr chunkLen
go (destPtr' `plusPtr` chunkLen) chunks
go (dstPtr0 `plusPtr` hLen) t
where
totalLen = List.foldl' (\acc chunk -> acc +! sepLen +! length chunk) hLen t
(+!) = checkedAdd "intercalate"

@Bodigrim Bodigrim added this to the 0.11.3.0 milestone Jan 30, 2022
@Bodigrim Bodigrim merged commit 0badf97 into haskell:master Jan 30, 2022
@Bodigrim
Copy link
Contributor

Thanks @noughtmare!

Bodigrim pushed a commit to Bodigrim/bytestring that referenced this pull request Jan 30, 2022
* Add benchmark for lazy unlines

* Make unlines lazier

* Add strict unlines benchmark

* Simplify strict unlines

* Test laziness of unlines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lazy unlines is too strict
4 participants