`lineSplit` inefficiency using ByteString `findIndices` #14

archaephyrryx · 2020-08-26T20:41:58Z

TL;DR: The per-byte-cost of findIndices (strict bytestring function) has a huge constant factor, rendering lineSplit inefficient

When looking at examples of various optimizations of Streams, I happened to notice a huge discrepancy between the performance of the same code using lineSplit 1 versus lines (both from Data.ByteString.Streaming.Char8) when run over large input.

I checked the implementation source code for lineSplit and noticed that it calls the (strict) ByteString function findIndices to locate the nth newline character in a given byte sequence to capture everything up to and including that character. However, for lineSplit 1 as a minimal example, this yields abysmal performance, as the per-byte cost of findIndices is ~20x greater than findIndex.

The call to findIndices should ideally be replaced with a function with lower per-byte overhead, which is a bug against the bytestring package. (CR)

The text was updated successfully, but these errors were encountered:

archaephyrryx · 2020-09-07T20:15:10Z

This has been resolved by #18

This was referenced Aug 26, 2020

Strict ByteString findIndices very costly haskell/bytestring#269

Closed

replaces findIndices with elemIndices #15

Closed

archaephyrryx closed this as completed Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`lineSplit` inefficiency using ByteString `findIndices` #14

`lineSplit` inefficiency using ByteString `findIndices` #14

archaephyrryx commented Aug 26, 2020 •

edited

Loading

archaephyrryx commented Sep 7, 2020

lineSplit inefficiency using ByteString findIndices #14

lineSplit inefficiency using ByteString findIndices #14

Comments

archaephyrryx commented Aug 26, 2020 • edited Loading

archaephyrryx commented Sep 7, 2020

`lineSplit` inefficiency using ByteString `findIndices` #14

`lineSplit` inefficiency using ByteString `findIndices` #14

archaephyrryx commented Aug 26, 2020 •

edited

Loading