Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lineSplit inefficiency using ByteString findIndices #14

Closed
archaephyrryx opened this issue Aug 26, 2020 · 1 comment
Closed

lineSplit inefficiency using ByteString findIndices #14

archaephyrryx opened this issue Aug 26, 2020 · 1 comment

Comments

@archaephyrryx
Copy link

archaephyrryx commented Aug 26, 2020

TL;DR: The per-byte-cost of findIndices (strict bytestring function) has a huge constant factor, rendering lineSplit inefficient

When looking at examples of various optimizations of Streams, I happened to notice a huge discrepancy between the performance of the same code using lineSplit 1 versus lines (both from Data.ByteString.Streaming.Char8) when run over large input.

I checked the implementation source code for lineSplit and noticed that it calls the (strict) ByteString function findIndices to locate the nth newline character in a given byte sequence to capture everything up to and including that character. However, for lineSplit 1 as a minimal example, this yields abysmal performance, as the per-byte cost of findIndices is ~20x greater than findIndex.

The call to findIndices should ideally be replaced with a function with lower per-byte overhead, which is a bug against the bytestring package. (CR)

@archaephyrryx
Copy link
Author

This has been resolved by #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant