Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite intercalate in a more direct way #459

Merged
merged 3 commits into from
Jan 13, 2022
Merged

Conversation

ethercrow
Copy link
Contributor

One thing that seems suspicious was that there's supposedly efficient intercalateWithByte, but it only works on two strings.

This PR takes the approach from intercalateWithByte to intercalate. Unfortunately there are not any benchmarks for either function so I don't know where the performance claims of intercalateWithByte come from.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jan 3, 2022

A related, but more general question is whether concat implementation makes sense in modern environment:

concat :: [ByteString] -> ByteString
concat = \bss0 -> goLen0 bss0 bss0
-- The idea here is we first do a pass over the input list to determine:
--
-- 1. is a copy necessary? e.g. @concat []@, @concat [mempty, "hello"]@,
-- and @concat ["hello", mempty, mempty]@ can all be handled without
-- copying.
-- 2. if a copy is necessary, how large is the result going to be?
--
-- If a copy is necessary then we create a buffer of the appropriate size
-- and do another pass over the input list, copying the chunks into the
-- buffer. Also, since foreign calls aren't entirely free we skip over
-- empty chunks while copying.
--
-- We pass the original [ByteString] (bss0) through as an argument through
-- goLen0, goLen1, and goLen since we will need it again in goCopy. Passing
-- it as an explicit argument avoids capturing it in these functions'
-- closures which would result in unnecessary closure allocation.
where
-- It's still possible that the result is empty
goLen0 _ [] = mempty
goLen0 bss0 (BS _ 0 :bss) = goLen0 bss0 bss
goLen0 bss0 (bs :bss) = goLen1 bss0 bs bss
-- It's still possible that the result is a single chunk
goLen1 _ bs [] = bs
goLen1 bss0 bs (BS _ 0 :bss) = goLen1 bss0 bs bss
goLen1 bss0 bs (BS _ len:bss) = goLen bss0 (checkedAdd "concat" len' len) bss
where BS _ len' = bs
-- General case, just find the total length we'll need
goLen bss0 !total (BS _ len:bss) = goLen bss0 total' bss
where total' = checkedAdd "concat" total len
goLen bss0 total [] =
unsafeCreate total $ \ptr -> goCopy bss0 ptr
-- Copy the data
goCopy [] !_ = return ()
goCopy (BS _ 0 :bss) !ptr = goCopy bss ptr
goCopy (BS fp len:bss) !ptr = do
unsafeWithForeignPtr fp $ \p -> memcpy ptr p len
goCopy bss (ptr `plusPtr` len)
{-# NOINLINE concat #-}

I suspect that the reasoning to thread bss0 through all recursive calls is invalid since join points' advent. Also filtering out empty elements strikes me as a very ad-hoc optimization, which is more likely to hurt than to help in a general case.

@@ -1213,7 +1213,22 @@ groupBy k xs = case uncons xs of
-- 'ByteString's and concatenates the list after interspersing the first
-- argument between each element of the list.
intercalate :: ByteString -> [ByteString] -> ByteString
intercalate s = concat . List.intersperse s
intercalate _ [] = mempty
intercalate (BS f_sep_ptr sep_len) (BS f_h_ptr h_len : t) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCase please.

unsafeCreate total_len $ \dst_ptr0 ->
unsafeWithForeignPtr f_sep_ptr $ \sep_ptr -> do
unsafeWithForeignPtr f_h_ptr $ \h_ptr ->
memcpy dst_ptr0 h_ptr (fromIntegral h_len)
Copy link
Contributor

@Bodigrim Bodigrim Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adorn fromIntegral with type applications, explaining what kind of conversion is going on, e. g., fromIntegral @Int @CInt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like quite a few of those were redundant in that file, I've removed them.

Data/ByteString.hs Outdated Show resolved Hide resolved
@ethercrow
Copy link
Contributor Author

Before / After

Screen Shot 2022-01-07 at 9 49 30 AM

@Bodigrim Bodigrim requested a review from sjakobi January 8, 2022 20:33
Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers!

@Bodigrim Bodigrim added this to the 0.11.3.0 milestone Jan 13, 2022
@Bodigrim Bodigrim merged commit cfed0e2 into haskell:master Jan 13, 2022
Bodigrim pushed a commit to Bodigrim/bytestring that referenced this pull request Jan 30, 2022
* Rewrite intercalate in a more direct way

* Remove redundant fromIntegrals

* Add intercalate benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants