Deduplicate findIndexOrEnd by exporting it from Data.ByteString.Internal #337

noughtmare · 2020-12-19T12:39:52Z

Closes #334

I am open to other suggestions for the section name "Internal indexing".

sjakobi

I am open to other suggestions for the section name "Internal indexing".

Just "Indexing" would also do, IMHO.

Does this PR improve any benchmark results? Feel free to add benchmarks if the current suite doesn't cover the issue from #334.

noughtmare · 2020-12-19T13:27:21Z

Does this PR improve any benchmark results? Feel free to add benchmarks if the current suite doesn't cover the issue from #334.

There are no existing benchmarks that test the affected functions as far as I'm aware. I could add a new one in this PR, but then we would only see the new "score". The affected functions are the functions in D.B.Lazy that use findIndexOrEnd which are: takeWhile, dropWhile, break, group and groupBy.

I think group is an odd one, because all the others have a custom predicate argument. I think group could actually be improved by using a function that uses memchr internally, but that is probably something for another issue.

So, I am now thinking of adding extra benchmarks in BenchAll.hs for the takeWhile, dropWhile, break and groupBy functions.

The BenchAll.hs file is a bit strange, at the top it says: "Benchmark all 'Builder' functions.", but I think other people have added benchmarks to this file that are not related to the Builder functions, so it might be okay to put these benchmarks there.

Also, I don't know what the best input would be for these benchmarks. For dropWhile, takeWhile and break I could just use a lazy ByteString that is a few chunks large where each chunk is filled with a single byte repeated 4k times so that the condition is never met. But groupBy might need some other kind of input.

Bodigrim · 2020-12-19T13:30:11Z

BenchAll.hs is kinda kitchen sink these days, please add benchmarks for dropWhile and takeWhile there.

Bodigrim · 2021-01-09T18:24:19Z

@noughtmare sorry to nudge, but I'd like to get this merged before the next release.

noughtmare · 2021-01-11T16:31:39Z

Sorry for the long wait. I'm thinking of reordering the history so that the benchmarks are added before the changes. That way, you can more easily benchmark before and after the changes. I'm also having trouble with naming things.

noughtmare · 2021-01-11T16:47:10Z

I've added some benchmarks. The results on my machine:

Benchmark	Old	New
takeWhile	46.15 μs	35.34 μs
dropWhile	49.15 μs	36.65 μs
break	45.95 μs	50.95 μs
group zeroes	6.874 μs	6.283 μs
group zero-one	247.0 μs	248.7 μs
groupBy (>=)	191.1 μs	187.1 μs
groupBy (>)	522.4 μs	508.4 μs

No spectacular changes, but almost everything improves a little. I think group zero-one is not all that much different, but I'm confused by the regression of break.

Bodigrim · 2021-01-11T18:42:45Z

On my machine benchmarks are:

findIndexOrEnd/takeWhile                 mean 36.32 μs  ( +- 1.579 μs  )
findIndexOrEnd/dropWhile                 mean 39.87 μs  ( +- 1.647 μs  )
findIndexOrEnd/break                     mean 39.18 μs  ( +- 1.237 μs  )
findIndexOrEnd/group zeroes              mean 4.920 μs  ( +- 138.9 ns  )
findIndexOrEnd/group zero-one            mean 232.8 μs  ( +- 19.99 μs  )
findIndexOrEnd/groupBy (>=)              mean 151.0 μs  ( +- 7.790 μs  )
findIndexOrEnd/groupBy (>)               mean 440.4 μs  ( +- 19.53 μs  )

vs.

findIndexOrEnd/takeWhile                 mean 29.68 μs  ( +- 728.2 ns  )
findIndexOrEnd/dropWhile                 mean 32.34 μs  ( +- 624.3 ns  )
findIndexOrEnd/break                     mean 32.33 μs  ( +- 488.3 ns  )
findIndexOrEnd/group zeroes              mean 4.733 μs  ( +- 78.60 ns  )
findIndexOrEnd/group zero-one            mean 190.5 μs  ( +- 6.970 μs  )
findIndexOrEnd/groupBy (>=)              mean 128.9 μs  ( +- 3.507 μs  )
findIndexOrEnd/groupBy (>)               mean 380.9 μs  ( +- 18.70 μs  )

So performance improves all across the board, and I do not observe any regression for break.

noughtmare · 2021-01-11T18:46:25Z

It occurred to me that these benchmarks will need to be changed if rewrite rules are added that would rewrite dropWhile (== x) to something that uses breakByte or spanByte like the strict module has. Maybe we should pre-emptively change the benchmark?

Bodigrim · 2021-01-11T18:52:03Z

@noughtmare yes, this is a good idea, let's replace dropWhile (== 0) by dropWhile even or something similar.

noughtmare · 2021-01-11T19:57:45Z

I have changed the benchmarks.

Bodigrim · 2021-01-11T20:18:59Z

Thanks!

sjakobi reviewed Dec 19, 2020

View reviewed changes

noughtmare force-pushed the master branch from dcacf41 to 3015685 Compare January 11, 2021 16:43

Bodigrim approved these changes Jan 11, 2021

View reviewed changes

Bodigrim requested a review from sjakobi January 11, 2021 18:43

sjakobi approved these changes Jan 11, 2021

View reviewed changes

noughtmare added 2 commits January 11, 2021 20:43

Add benchmarks for lazy functions depending on findIndexOrEnd

a960c7d

Deduplicate findIndexOrEnd by exporting it from Data.ByteString.Internal

a003f39

noughtmare force-pushed the master branch from 3015685 to a003f39 Compare January 11, 2021 19:45

Bodigrim added this to the 0.11.1.0 milestone Jan 11, 2021

Bodigrim merged commit eea70ff into haskell:master Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate findIndexOrEnd by exporting it from Data.ByteString.Internal #337

Deduplicate findIndexOrEnd by exporting it from Data.ByteString.Internal #337

noughtmare commented Dec 19, 2020

sjakobi left a comment

noughtmare commented Dec 19, 2020 •

edited

Loading

Bodigrim commented Dec 19, 2020

Bodigrim commented Jan 9, 2021

noughtmare commented Jan 11, 2021 •

edited

Loading

noughtmare commented Jan 11, 2021 •

edited

Loading

Bodigrim commented Jan 11, 2021

noughtmare commented Jan 11, 2021

Bodigrim commented Jan 11, 2021

noughtmare commented Jan 11, 2021

Bodigrim commented Jan 11, 2021

Deduplicate findIndexOrEnd by exporting it from Data.ByteString.Internal #337

Deduplicate findIndexOrEnd by exporting it from Data.ByteString.Internal #337

Conversation

noughtmare commented Dec 19, 2020

sjakobi left a comment

Choose a reason for hiding this comment

noughtmare commented Dec 19, 2020 • edited Loading

Bodigrim commented Dec 19, 2020

Bodigrim commented Jan 9, 2021

noughtmare commented Jan 11, 2021 • edited Loading

noughtmare commented Jan 11, 2021 • edited Loading

Bodigrim commented Jan 11, 2021

noughtmare commented Jan 11, 2021

Bodigrim commented Jan 11, 2021

noughtmare commented Jan 11, 2021

Bodigrim commented Jan 11, 2021

noughtmare commented Dec 19, 2020 •

edited

Loading

noughtmare commented Jan 11, 2021 •

edited

Loading

noughtmare commented Jan 11, 2021 •

edited

Loading