Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Data.ByteString.Lazy.Char8.lines less strict #562

Merged
merged 1 commit into from
Dec 8, 2022

Conversation

vdukhovni
Copy link
Contributor

The current implementation of lines in Data.ByteString.Lazy.Char8 is too strict. When a "line" spans multiple chunks it traverses all the chunks to the first line boundary before constructing the list head.

For example, lines <$> getContents reading a large file with no line breaks does not make the first chunk of the (only) line available until the entire file is read into memory.

Now that Data.ByteString.break is optimised for the (== c) case, we can get efficient code for the common many lines per-chunk use-case, without being needlessly strict. Tests added to make sure that the first chunk is available prompty without looking further.

@hs-viktor
Copy link
Contributor

@Bodigrim Something went wrong early in downing GHC 8.2 for Ubuntu, and also cancelled all the other CI jobs. Perhaps a transient glitch, but I have no permissions to kick the CI it seems. Please make it go. A review would also be great.

Copy link
Member

@clyring clyring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retried CI without success. I don't have time to investigate that further right now.

Although it was (partially) documented, the old strictness behavior is surprising and I expect any potential performance benefit is negligible outside of the degenerate small-chunks case, so I'm happy to replace it. (Have you benchmarked to verify that any performance impact is negligible?)

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved
@vdukhovni vdukhovni force-pushed the lazy-lines branch 2 times, most recently from c79262c to 827df30 Compare December 7, 2022 15:50
@vdukhovni
Copy link
Contributor Author

vdukhovni commented Dec 7, 2022

We don't have existing benchmarks for lines, but looking at the code path taken when the initial chunk holds multiple lines I see no opportunity for performance degradation. If anything the new code should be faster, because no revchunks, ...

The in-chunk loop boils down to:

lines (Chunk c0 cs0) = let l :| ls = lines1 c0 cs0 in l : ls
  where 
    lines1 c cs
        | len > 1   = c1 <| lines1 (B.unsafeDrop 1 t0) cs
       ...
      where
        (h0, t0) = B.break (== 0x0a) c
        len      = B.length t0
        c1       = if B.null h0 then Empty else Chunk h0 Empty

Which is essentially the same as the old:

    loop0 c cs =
        case B.elemIndex (c2w '\n') c of
            ...
            Just n | n /= 0    -> Chunk (B.unsafeTake n c) Empty
                                : loop0 (B.unsafeDrop (n+1) c) cs
                   | otherwise -> Empty
                                : loop0 (B.unsafeTail c) cs

Strict ByteStrings optimise B.break via RULEs to B.elemIndex ... and the rest is the same. The case we optimise for (lines shorter than chunks) wants both h0 and t0 to ultimately be produced, and the new code is simpler and cleaner.

@vdukhovni vdukhovni force-pushed the lazy-lines branch 2 times, most recently from e5b2716 to 564ee0a Compare December 7, 2022 19:29
@Bodigrim
Copy link
Contributor

Bodigrim commented Dec 7, 2022

@vdukhovni please rebase atop of #563

@vdukhovni
Copy link
Contributor Author

@vdukhovni please rebase atop of #563

Done.

@vdukhovni
Copy link
Contributor Author

I retried CI without success. I don't have time to investigate that further right now.

Although it was (partially) documented, the old strictness behavior is surprising and I expect any potential performance benefit is negligible outside of the degenerate small-chunks case, so I'm happy to replace it. (Have you benchmarked to verify that any performance impact is negligible?)

Rebasing on #563 should resolve the CI issues...

@vdukhovni vdukhovni force-pushed the lazy-lines branch 2 times, most recently from 7a10c98 to e191a74 Compare December 8, 2022 01:08
@vdukhovni
Copy link
Contributor Author

Rebased on master, now that the CI fix is merged.

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy/Char8.hs Show resolved Hide resolved
Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved
@vdukhovni
Copy link
Contributor Author

@clyring I think is looking pretty clean now. Once there are no residual issues, and you mark the PR approved, I'll squash and wait for any final reviews.

@vdukhovni
Copy link
Contributor Author

vdukhovni commented Dec 8, 2022

I looked at the generated "core" output from GHC 9.2 (with -O), and indeed there are no unwanted thunks. In the expected case that a chunk has an internal newline, all the NonEmpty wrapping is optimised away to unboxed pairs. Only when the chunk is newline-free, and we defer work to lazyRest do we construct a lazy heap-allocated NonEmpty to pattern-match once forced.

The interface file has wrappers to unbox the arguments and call into the main loop:

041e6f5635fdf6c0bc3b7d530ee83971
  lines ::
    Data.ByteString.Lazy.Internal.ByteString
    -> [Data.ByteString.Lazy.Internal.ByteString]
  [HasNoCafRefs, LambdaFormInfo: LFReEntrant 1, Arity: 1,
   Strictness: <1L>,
   Unfolding: InlineRule (1, True, False)
              (\ (ds['Many] :: Data.ByteString.Lazy.Internal.ByteString) ->
               case ds of wild {
                 Data.ByteString.Lazy.Internal.Empty
                 -> GHC.Types.[] @Data.ByteString.Lazy.Internal.ByteString
                 Data.ByteString.Lazy.Internal.Chunk dt dt1 dt2 cs0
                 -> case lines_go
                           (Data.ByteString.Internal.Type.BS dt dt1 dt2)
                           cs0 of vx { GHC.Base.:| ipv ipv1 ->
                    GHC.Types.:
                      @Data.ByteString.Lazy.Internal.ByteString
                      ipv
                      ipv1 } })]
d322efb5d40daa160df73c9d2961ac90
  lines_go ::
    Data.ByteString.Internal.Type.ByteString
    -> Data.ByteString.Lazy.Internal.ByteString
    -> GHC.Base.NonEmpty Data.ByteString.Lazy.Internal.ByteString
  [HasNoCafRefs, LambdaFormInfo: LFReEntrant 2, Arity: 2,
   Strictness: <1P(L,L,L)><ML>, CPR: 1, Inline: [2],
   Unfolding: InlineRule (2, True, False)
              (\ (w['Many] :: Data.ByteString.Internal.Type.ByteString)
                 (w1['Many] :: Data.ByteString.Lazy.Internal.ByteString) ->
               case w of ww { Data.ByteString.Internal.Type.BS ww1 ww2 ww3 ->
               case $wgo ww1 ww2 ww3 w1 of ww4 { (#,#) ww5 ww6 ->
               GHC.Base.:| @Data.ByteString.Lazy.Internal.ByteString ww5 ww6 } })]

While the loop itself becomes:

-- RHS size: {terms: 3, types: 2, coercions: 0, joins: 0/0}
lvl66 :: NonEmpty ByteString
lvl66 = :| Empty []

Rec {
-- RHS size: {terms: 15, types: 16, coercions: 0, joins: 0/0}
lines_go :: ByteString -> ByteString -> NonEmpty ByteString
lines_go
  = \ (w :: ByteString) (w1 :: ByteString) ->
      case w of { BS ww1 ww2 ww3 ->
      case $wgo ww1 ww2 ww3 w1 of { (# ww5, ww6 #) -> :| ww5 ww6 }
      }

-- RHS size: {terms: 16, types: 17, coercions: 0, joins: 0/0}
lines :: ByteString -> [ByteString]
lines
  = \ (ds :: ByteString) ->
      case ds of {
        Empty -> [];
        Chunk dt dt1 dt2 cs0 ->
          case $wgo dt dt1 dt2 cs0 of { (# ww1, ww2 #) -> : ww1 ww2 }
      }

-- RHS size: {terms: 107, types: 88, coercions: 0, joins: 1/4}
$wgo
  :: Addr#
     -> ForeignPtrContents
     -> Int#
     -> ByteString
     -> (# ByteString, [ByteString] #)
$wgo
  = \ (ww :: Addr#)
      (ww1 :: ForeignPtrContents)
      (ww2 :: Int#)
      (w :: ByteString) ->
      case {__ffi_static_ccall_unsafe main:memchr :: Addr#
                                          -> Int32#
                                          -> Word#
                                          -> State# RealWorld
                                          -> (# State# RealWorld, Addr# #)}
             ww 10#32 (int2Word# ww2) realWorld#
      of
      { (# ds4, ds5 #) ->
      case eqAddr# ds5 __NULL of {
        __DEFAULT ->
          case touch# ww1 ds4 of { __DEFAULT ->
          let {
            dt :: Int#
            dt = minusAddr# ds5 ww } in
          join {
            $j :: ByteString -> (# ByteString, [ByteString] #)
            $j (c'
                  :: ByteString
                  Unf=OtherCon [])
              = case <# (+# dt 1#) ww2 of {
                  __DEFAULT -> (# c', lines w #);
                  1# ->
                    (# c',
                       let {
                         d :: Int#
                         d = +# dt 1# } in
                       case $wgo (plusAddr# ww d) ww1 (-# ww2 d) w of { (# ww4, ww5 #) ->
                       : ww4 ww5
                       } #)
                } } in
          case dt of wild {
            __DEFAULT -> jump $j (Chunk ww ww1 wild Empty);
            0# -> jump $j Empty
          }
          };
        1# ->
          case touch# ww1 ds4 of { __DEFAULT ->
          let {
            ds :: NonEmpty ByteString
            ds
              = case w of {
                  Empty -> lvl66;
                  Chunk dt dt1 dt2 cs' ->
                    case $wgo dt dt1 dt2 cs' of { (# ww4, ww5 #) -> :| ww4 ww5 }
                } } in
          (# Chunk ww ww1 ww2 (case ds of { :| l ls -> l }),
             case ds of { :| l ls -> ls } #)
          }
      }
      }
end Rec }

FWIW, the unboxed (n+1) is computed twice, it is easy to patch the code to avoid that too, but I think the micro-optimisation is not worth the price of readability:

--- a/Data/ByteString/Lazy/Char8.hs
+++ b/Data/ByteString/Lazy/Char8.hs
@@ -882,7 +882,8 @@ lines (Chunk c0 cs0) = unNE $! go c0 cs0
     go :: S.ByteString -> ByteString -> NonEmpty ByteString
     go c cs = case B.elemIndex (c2w '\n') c of
         Just n
-            | n + 1 < B.length c -> consNE c' $ go (B.unsafeDrop (n+1) c) cs
+            | n1 <- n + 1
+            , n1 < B.length c -> consNE c' $ go (B.unsafeDrop n1 c) cs
               -- 'c' was a multi-line chunk
             | otherwise       -> c' :| lines cs
               -- 'c' was a single-line chunk

Copy link
Member

@clyring clyring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. Just two aesthetic quibbles:

  • You may wish to remove Data.ByteString.break from the commit message.
  • The comment for consNE is now out-of-date.

The maintainers will generally squash when merging, so doing so yourself is not required.

I did also confirm with a toy benchmark (nf L8.lines (L.fromStrict $ stimes 32 loremIpsum)) that this patch does not regress performance in the typical many-more-lines-than-chunks situation. (...and 827df30 was about 60% slower, almost entirely because (<|) is terrible.)

The current implementation of `lines` in Data.ByteString.Lazy.Char8 is too
strict.  When a "line" spans multiple chunks it traverses all the chunks
to the first line boundary before constructing the list head.

For example, `lines <$> getContents` reading a large file with no line breaks
does not make the first chunk of the (only) line available until the entire
file is read into memory.
@vdukhovni
Copy link
Contributor Author

vdukhovni commented Dec 8, 2022

This looks good now. Just two aesthetic quibbles:

  • You may wish to remove Data.ByteString.break from the commit message.
  • The comment for consNE is now out-of-date.

Thanks for pitching in. I've updated the comments as suggested and squashed. I hope this is it.

Also, is it worth bothering with the hypothetical patch below (from #562 (comment))? Does your benchmark show any difference?

--- a/Data/ByteString/Lazy/Char8.hs
+++ b/Data/ByteString/Lazy/Char8.hs
@@ -882,7 +882,8 @@ lines (Chunk c0 cs0) = unNE $! go c0 cs0
     go :: S.ByteString -> ByteString -> NonEmpty ByteString
     go c cs = case B.elemIndex (c2w '\n') c of
         Just n
-            | n + 1 < B.length c -> consNE c' $ go (B.unsafeDrop (n+1) c) cs
+            | n1 <- n + 1
+            , n1 < B.length c -> consNE c' $ go (B.unsafeDrop n1 c) cs
               -- 'c' was a multi-line chunk
             | otherwise       -> c' :| lines cs
               -- 'c' was a single-line chunk

Comment on lines +85 to +88
prop_lines_empty_invariant =
True === case LC.lines (LC.pack "\nfoo\n") of
Empty : _ -> True
_ -> False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have a randomized test that uses invariant or checkInvariant to check the output of lines? I was expecting this test to exist already, but I couldn't find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that be a separate PR, or would you care to propose the test? It'd take me some scarce cycles to page in (or acquire) the knowhow of how to construct randomised tests for this, I don't use the facilities in question sufficiently often in real life...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this can be done separately. To generate the test input, I'd simply use some arbitrary LazyByteString's, interleaved with additional newlines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #564 to track this.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers!

@vdukhovni
Copy link
Contributor Author

Are we waiting for @Bodigrim to review, or can this now be merged? By the way, when squashing I did inadvertently merge that n1 <- n + 1 patch I was asking about. I hope that's OK.

@Bodigrim Bodigrim added this to the 0.11.4.0 milestone Dec 8, 2022
@Bodigrim Bodigrim merged commit eb352a9 into haskell:master Dec 8, 2022
@Bodigrim
Copy link
Contributor

Bodigrim commented Dec 8, 2022

Thanks @vdukhovni!

clyring pushed a commit that referenced this pull request Dec 29, 2022
The current implementation of `lines` in Data.ByteString.Lazy.Char8 is too
strict.  When a "line" spans multiple chunks it traverses all the chunks
to the first line boundary before constructing the list head.

For example, `lines <$> getContents` reading a large file with no line breaks
does not make the first chunk of the (only) line available until the entire
file is read into memory.

Co-authored-by: Viktor Dukhovni <ietf-dane@dukhovni.org>

(cherry picked from commit eb352a9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants