You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TL;DR: The per-byte-cost of findIndices (strict bytestring function) has a huge constant factor, rendering lineSplit inefficient
When looking at examples of various optimizations of Streams, I happened to notice a huge discrepancy between the performance of the same code using lineSplit 1 versus lines (both from Data.ByteString.Streaming.Char8) when run over large input.
I checked the implementation source code for lineSplit and noticed that it calls the (strict) ByteString function findIndices to locate the nth newline character in a given byte sequence to capture everything up to and including that character. However, for lineSplit 1 as a minimal example, this yields abysmal performance, as the per-byte cost of findIndices is ~20x greater than findIndex.
The call to findIndices should ideally be replaced with a function with lower per-byte overhead, which is a bug against the bytestring package. (CR)
The text was updated successfully, but these errors were encountered:
TL;DR: The per-byte-cost of
findIndices
(strict bytestring function) has a huge constant factor, rendering lineSplit inefficientWhen looking at examples of various optimizations of Streams, I happened to notice a huge discrepancy between the performance of the same code using
lineSplit 1
versuslines
(both from Data.ByteString.Streaming.Char8) when run over large input.I checked the implementation source code for
lineSplit
and noticed that it calls the (strict) ByteString functionfindIndices
to locate the nth newline character in a given byte sequence to capture everything up to and including that character. However, forlineSplit 1
as a minimal example, this yields abysmal performance, as the per-byte cost offindIndices
is ~20x greater thanfindIndex
.The call to
findIndices
should ideally be replaced with a function with lower per-byte overhead, which is a bug against thebytestring
package. (CR)The text was updated successfully, but these errors were encountered: