Make Data.ByteString.Lazy.Char8.lines less strict #562

vdukhovni · 2022-12-07T03:51:45Z

The current implementation of lines in Data.ByteString.Lazy.Char8 is too strict. When a "line" spans multiple chunks it traverses all the chunks to the first line boundary before constructing the list head.

For example, lines <$> getContents reading a large file with no line breaks does not make the first chunk of the (only) line available until the entire file is read into memory.

Now that Data.ByteString.break is optimised for the (== c) case, we can get efficient code for the common many lines per-chunk use-case, without being needlessly strict. Tests added to make sure that the first chunk is available prompty without looking further.

hs-viktor · 2022-12-07T05:09:37Z

@Bodigrim Something went wrong early in downing GHC 8.2 for Ubuntu, and also cancelled all the other CI jobs. Perhaps a transient glitch, but I have no permissions to kick the CI it seems. Please make it go. A review would also be great.

clyring

I retried CI without success. I don't have time to investigate that further right now.

Although it was (partially) documented, the old strictness behavior is surprising and I expect any potential performance benefit is negligible outside of the degenerate small-chunks case, so I'm happy to replace it. (Have you benchmarked to verify that any performance impact is negligible?)

Data/ByteString/Lazy/Char8.hs

vdukhovni · 2022-12-07T16:03:24Z

We don't have existing benchmarks for lines, but looking at the code path taken when the initial chunk holds multiple lines I see no opportunity for performance degradation. If anything the new code should be faster, because no revchunks, ...

The in-chunk loop boils down to:

lines (Chunk c0 cs0) = let l :| ls = lines1 c0 cs0 in l : ls
  where 
    lines1 c cs
        | len > 1   = c1 <| lines1 (B.unsafeDrop 1 t0) cs
       ...
      where
        (h0, t0) = B.break (== 0x0a) c
        len      = B.length t0
        c1       = if B.null h0 then Empty else Chunk h0 Empty

Which is essentially the same as the old:

    loop0 c cs =
        case B.elemIndex (c2w '\n') c of
            ...
            Just n | n /= 0    -> Chunk (B.unsafeTake n c) Empty
                                : loop0 (B.unsafeDrop (n+1) c) cs
                   | otherwise -> Empty
                                : loop0 (B.unsafeTail c) cs

Strict ByteStrings optimise B.break via RULEs to B.elemIndex ... and the rest is the same. The case we optimise for (lines shorter than chunks) wants both h0 and t0 to ultimately be produced, and the new code is simpler and cleaner.

Bodigrim · 2022-12-07T20:40:47Z

@vdukhovni please rebase atop of #563

vdukhovni · 2022-12-07T20:49:18Z

@vdukhovni please rebase atop of #563

Done.

vdukhovni · 2022-12-07T21:54:25Z

I retried CI without success. I don't have time to investigate that further right now.

Although it was (partially) documented, the old strictness behavior is surprising and I expect any potential performance benefit is negligible outside of the degenerate small-chunks case, so I'm happy to replace it. (Have you benchmarked to verify that any performance impact is negligible?)

Rebasing on #563 should resolve the CI issues...

vdukhovni · 2022-12-08T01:09:36Z

Rebased on master, now that the CI fix is merged.

Data/ByteString/Lazy/Char8.hs

vdukhovni · 2022-12-08T04:53:45Z

@clyring I think is looking pretty clean now. Once there are no residual issues, and you mark the PR approved, I'll squash and wait for any final reviews.

vdukhovni · 2022-12-08T06:50:45Z

I looked at the generated "core" output from GHC 9.2 (with -O), and indeed there are no unwanted thunks. In the expected case that a chunk has an internal newline, all the NonEmpty wrapping is optimised away to unboxed pairs. Only when the chunk is newline-free, and we defer work to lazyRest do we construct a lazy heap-allocated NonEmpty to pattern-match once forced.

The interface file has wrappers to unbox the arguments and call into the main loop:

041e6f5635fdf6c0bc3b7d530ee83971
  lines ::
    Data.ByteString.Lazy.Internal.ByteString
    -> [Data.ByteString.Lazy.Internal.ByteString]
  [HasNoCafRefs, LambdaFormInfo: LFReEntrant 1, Arity: 1,
   Strictness: <1L>,
   Unfolding: InlineRule (1, True, False)
              (\ (ds['Many] :: Data.ByteString.Lazy.Internal.ByteString) ->
               case ds of wild {
                 Data.ByteString.Lazy.Internal.Empty
                 -> GHC.Types.[] @Data.ByteString.Lazy.Internal.ByteString
                 Data.ByteString.Lazy.Internal.Chunk dt dt1 dt2 cs0
                 -> case lines_go
                           (Data.ByteString.Internal.Type.BS dt dt1 dt2)
                           cs0 of vx { GHC.Base.:| ipv ipv1 ->
                    GHC.Types.:
                      @Data.ByteString.Lazy.Internal.ByteString
                      ipv
                      ipv1 } })]
d322efb5d40daa160df73c9d2961ac90
  lines_go ::
    Data.ByteString.Internal.Type.ByteString
    -> Data.ByteString.Lazy.Internal.ByteString
    -> GHC.Base.NonEmpty Data.ByteString.Lazy.Internal.ByteString
  [HasNoCafRefs, LambdaFormInfo: LFReEntrant 2, Arity: 2,
   Strictness: <1P(L,L,L)><ML>, CPR: 1, Inline: [2],
   Unfolding: InlineRule (2, True, False)
              (\ (w['Many] :: Data.ByteString.Internal.Type.ByteString)
                 (w1['Many] :: Data.ByteString.Lazy.Internal.ByteString) ->
               case w of ww { Data.ByteString.Internal.Type.BS ww1 ww2 ww3 ->
               case $wgo ww1 ww2 ww3 w1 of ww4 { (#,#) ww5 ww6 ->
               GHC.Base.:| @Data.ByteString.Lazy.Internal.ByteString ww5 ww6 } })]

While the loop itself becomes:

-- RHS size: {terms: 3, types: 2, coercions: 0, joins: 0/0}
lvl66 :: NonEmpty ByteString
lvl66 = :| Empty []

Rec {
-- RHS size: {terms: 15, types: 16, coercions: 0, joins: 0/0}
lines_go :: ByteString -> ByteString -> NonEmpty ByteString
lines_go
  = \ (w :: ByteString) (w1 :: ByteString) ->
      case w of { BS ww1 ww2 ww3 ->
      case $wgo ww1 ww2 ww3 w1 of { (# ww5, ww6 #) -> :| ww5 ww6 }
      }

-- RHS size: {terms: 16, types: 17, coercions: 0, joins: 0/0}
lines :: ByteString -> [ByteString]
lines
  = \ (ds :: ByteString) ->
      case ds of {
        Empty -> [];
        Chunk dt dt1 dt2 cs0 ->
          case $wgo dt dt1 dt2 cs0 of { (# ww1, ww2 #) -> : ww1 ww2 }
      }

-- RHS size: {terms: 107, types: 88, coercions: 0, joins: 1/4}
$wgo
  :: Addr#
     -> ForeignPtrContents
     -> Int#
     -> ByteString
     -> (# ByteString, [ByteString] #)
$wgo
  = \ (ww :: Addr#)
      (ww1 :: ForeignPtrContents)
      (ww2 :: Int#)
      (w :: ByteString) ->
      case {__ffi_static_ccall_unsafe main:memchr :: Addr#
                                          -> Int32#
                                          -> Word#
                                          -> State# RealWorld
                                          -> (# State# RealWorld, Addr# #)}
             ww 10#32 (int2Word# ww2) realWorld#
      of
      { (# ds4, ds5 #) ->
      case eqAddr# ds5 __NULL of {
        __DEFAULT ->
          case touch# ww1 ds4 of { __DEFAULT ->
          let {
            dt :: Int#
            dt = minusAddr# ds5 ww } in
          join {
            $j :: ByteString -> (# ByteString, [ByteString] #)
            $j (c'
                  :: ByteString
                  Unf=OtherCon [])
              = case <# (+# dt 1#) ww2 of {
                  __DEFAULT -> (# c', lines w #);
                  1# ->
                    (# c',
                       let {
                         d :: Int#
                         d = +# dt 1# } in
                       case $wgo (plusAddr# ww d) ww1 (-# ww2 d) w of { (# ww4, ww5 #) ->
                       : ww4 ww5
                       } #)
                } } in
          case dt of wild {
            __DEFAULT -> jump $j (Chunk ww ww1 wild Empty);
            0# -> jump $j Empty
          }
          };
        1# ->
          case touch# ww1 ds4 of { __DEFAULT ->
          let {
            ds :: NonEmpty ByteString
            ds
              = case w of {
                  Empty -> lvl66;
                  Chunk dt dt1 dt2 cs' ->
                    case $wgo dt dt1 dt2 cs' of { (# ww4, ww5 #) -> :| ww4 ww5 }
                } } in
          (# Chunk ww ww1 ww2 (case ds of { :| l ls -> l }),
             case ds of { :| l ls -> ls } #)
          }
      }
      }
end Rec }

FWIW, the unboxed (n+1) is computed twice, it is easy to patch the code to avoid that too, but I think the micro-optimisation is not worth the price of readability:

--- a/Data/ByteString/Lazy/Char8.hs
+++ b/Data/ByteString/Lazy/Char8.hs
@@ -882,7 +882,8 @@ lines (Chunk c0 cs0) = unNE $! go c0 cs0
     go :: S.ByteString -> ByteString -> NonEmpty ByteString
     go c cs = case B.elemIndex (c2w '\n') c of
         Just n
-            | n + 1 < B.length c -> consNE c' $ go (B.unsafeDrop (n+1) c) cs
+            | n1 <- n + 1
+            , n1 < B.length c -> consNE c' $ go (B.unsafeDrop n1 c) cs
               -- 'c' was a multi-line chunk
             | otherwise       -> c' :| lines cs
               -- 'c' was a single-line chunk

clyring

This looks good now. Just two aesthetic quibbles:

You may wish to remove Data.ByteString.break from the commit message.
The comment for consNE is now out-of-date.

The maintainers will generally squash when merging, so doing so yourself is not required.

I did also confirm with a toy benchmark (nf L8.lines (L.fromStrict $ stimes 32 loremIpsum)) that this patch does not regress performance in the typical many-more-lines-than-chunks situation. (...and 827df30 was about 60% slower, almost entirely because (<|) is terrible.)

The current implementation of `lines` in Data.ByteString.Lazy.Char8 is too strict. When a "line" spans multiple chunks it traverses all the chunks to the first line boundary before constructing the list head. For example, `lines <$> getContents` reading a large file with no line breaks does not make the first chunk of the (only) line available until the entire file is read into memory.

vdukhovni · 2022-12-08T15:01:44Z

This looks good now. Just two aesthetic quibbles:

You may wish to remove Data.ByteString.break from the commit message.

The comment for consNE is now out-of-date.

Thanks for pitching in. I've updated the comments as suggested and squashed. I hope this is it.

Also, is it worth bothering with the hypothetical patch below (from #562 (comment))? Does your benchmark show any difference?

--- a/Data/ByteString/Lazy/Char8.hs
+++ b/Data/ByteString/Lazy/Char8.hs
@@ -882,7 +882,8 @@ lines (Chunk c0 cs0) = unNE $! go c0 cs0
     go :: S.ByteString -> ByteString -> NonEmpty ByteString
     go c cs = case B.elemIndex (c2w '\n') c of
         Just n
-            | n + 1 < B.length c -> consNE c' $ go (B.unsafeDrop (n+1) c) cs
+            | n1 <- n + 1
+            , n1 < B.length c -> consNE c' $ go (B.unsafeDrop n1 c) cs
               -- 'c' was a multi-line chunk
             | otherwise       -> c' :| lines cs
               -- 'c' was a single-line chunk

sjakobi · 2022-12-08T15:28:00Z

tests/Properties.hs

+prop_lines_empty_invariant =
+     True === case LC.lines (LC.pack "\nfoo\n") of
+        Empty : _ -> True
+        _         -> False


Could we also have a randomized test that uses invariant or checkInvariant to check the output of lines? I was expecting this test to exist already, but I couldn't find it.

Could that be a separate PR, or would you care to propose the test? It'd take me some scarce cycles to page in (or acquire) the knowhow of how to construct randomised tests for this, I don't use the facilities in question sufficiently often in real life...

Yeah, this can be done separately. To generate the test input, I'd simply use some arbitrary LazyByteString's, interleaved with additional newlines.

I have opened #564 to track this.

sjakobi

Cheers!

vdukhovni · 2022-12-08T18:37:03Z

Are we waiting for @Bodigrim to review, or can this now be merged? By the way, when squashing I did inadvertently merge that n1 <- n + 1 patch I was asking about. I hope that's OK.

Bodigrim · 2022-12-08T21:21:11Z

Thanks @vdukhovni!

The current implementation of `lines` in Data.ByteString.Lazy.Char8 is too strict. When a "line" spans multiple chunks it traverses all the chunks to the first line boundary before constructing the list head. For example, `lines <$> getContents` reading a large file with no line breaks does not make the first chunk of the (only) line available until the entire file is read into memory. Co-authored-by: Viktor Dukhovni <ietf-dane@dukhovni.org> (cherry picked from commit eb352a9)

clyring requested changes Dec 7, 2022

View reviewed changes

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved

vdukhovni force-pushed the lazy-lines branch 2 times, most recently from c79262c to 827df30 Compare December 7, 2022 15:50

vdukhovni force-pushed the lazy-lines branch 2 times, most recently from e5b2716 to 564ee0a Compare December 7, 2022 19:29

vdukhovni force-pushed the lazy-lines branch from 564ee0a to fe3dd07 Compare December 7, 2022 20:48

vdukhovni requested a review from clyring December 7, 2022 21:52

vdukhovni force-pushed the lazy-lines branch 2 times, most recently from 7a10c98 to e191a74 Compare December 8, 2022 01:08

clyring reviewed Dec 8, 2022

View reviewed changes

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy/Char8.hs Show resolved Hide resolved

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved

clyring reviewed Dec 8, 2022

View reviewed changes

vdukhovni force-pushed the lazy-lines branch from 65aab91 to 451fe05 Compare December 8, 2022 15:00

clyring approved these changes Dec 8, 2022

View reviewed changes

sjakobi reviewed Dec 8, 2022

View reviewed changes

sjakobi approved these changes Dec 8, 2022

View reviewed changes

Bodigrim approved these changes Dec 8, 2022

View reviewed changes

Bodigrim added this to the 0.11.4.0 milestone Dec 8, 2022

Bodigrim merged commit eb352a9 into haskell:master Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Data.ByteString.Lazy.Char8.lines less strict #562

Make Data.ByteString.Lazy.Char8.lines less strict #562

vdukhovni commented Dec 7, 2022

hs-viktor commented Dec 7, 2022

clyring left a comment

vdukhovni commented Dec 7, 2022 •

edited

Loading

Bodigrim commented Dec 7, 2022

vdukhovni commented Dec 7, 2022

vdukhovni commented Dec 7, 2022

vdukhovni commented Dec 8, 2022

vdukhovni commented Dec 8, 2022

vdukhovni commented Dec 8, 2022 •

edited

Loading

clyring left a comment

vdukhovni commented Dec 8, 2022 •

edited

Loading

sjakobi Dec 8, 2022

vdukhovni Dec 8, 2022

sjakobi Dec 8, 2022

sjakobi Dec 8, 2022

sjakobi left a comment

vdukhovni commented Dec 8, 2022

Bodigrim commented Dec 8, 2022

Make Data.ByteString.Lazy.Char8.lines less strict #562

Make Data.ByteString.Lazy.Char8.lines less strict #562

Conversation

vdukhovni commented Dec 7, 2022

hs-viktor commented Dec 7, 2022

clyring left a comment

Choose a reason for hiding this comment

vdukhovni commented Dec 7, 2022 • edited Loading

Bodigrim commented Dec 7, 2022

vdukhovni commented Dec 7, 2022

vdukhovni commented Dec 7, 2022

vdukhovni commented Dec 8, 2022

vdukhovni commented Dec 8, 2022

vdukhovni commented Dec 8, 2022 • edited Loading

clyring left a comment

Choose a reason for hiding this comment

vdukhovni commented Dec 8, 2022 • edited Loading

sjakobi Dec 8, 2022

Choose a reason for hiding this comment

vdukhovni Dec 8, 2022

Choose a reason for hiding this comment

sjakobi Dec 8, 2022

Choose a reason for hiding this comment

sjakobi Dec 8, 2022

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

vdukhovni commented Dec 8, 2022

Bodigrim commented Dec 8, 2022

vdukhovni commented Dec 7, 2022 •

edited

Loading

vdukhovni commented Dec 8, 2022 •

edited

Loading

vdukhovni commented Dec 8, 2022 •

edited

Loading