Reimplement splits #349

treeowl · 2019-12-29T22:04:11Z

Using Data.List.inits to produce the initial segments saves
quite a bit of time.

Using `Data.List.inits` to produce the initial segments saves quite a bit of time.

moodmosaic

Using AutoBench, I've compared the time perf of the two implementations and the result looks good.

fast_splits is the proposed one, splits is the current one.

 ―― Test summary ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

  Programs       fast_splits, splits
  Data           Random, size range [0,5..100]
  Normalisation  nf
  QuickCheck     ✔
  GHC flags      n/a

 ―― Analysis ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

  fast_splits
    Size          0       5       10      15      20      25      30      35
                  40      45      50      55      60      65      70      75
                  80      85      90      95      100
    Time    (μs)  0.031   0.616   2.533   4.202   5.301   7.294   15.82   15.95
                  17.88   22.98   26.58   33.20   35.82   44.96   57.78   111.0
                  118.4   124.9   202.4   272.8   271.3
    Std dev (μs)  16.85
    Average variance introduced by outliers: 98% (severely inflated)

    Fits          y = 3.49e-16 + 1.80e-14x + 8.75e-13x² + 3.35e-11x³
                      + 2.56e-12x⁴
                  y = -7.62e-12 - 3.42e-10x - 1.16e-8x² + 3.96e-10x³
                  y = -3.87e-5 + 2.01e-10x + 2.82e-8x²

  splits
    Size          0       5       10      15      20      25      30      35
                  40      45      50      55      60      65      70      75
                  80      85      90      95      100
    Time    (μs)  0.016   0.847   2.865   5.787   7.992   11.59   15.40   20.79
                  33.53   41.96   40.39   57.93   70.99   77.91   145.4   213.9
                  222.8   152.8   426.5   278.9   296.1
    Std dev (μs)  34.35
    Average variance introduced by outliers: 98% (severely inflated)

    Fits          y = -2.90e-5 + 2.85e-10x + 3.65e-8x²
                  y = -5.65e-5 + 5.21e-7xlog₂(x)
                  y = 6.63e-12 + 3.08e-10x + 1.05e-8x² + 2.51e-10x³

 ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Unless @HuwCampbell or @jacobstanley has a different opinion, we can merge this after addressing the style nit comment above.

hedgehog/src/Hedgehog/Internal/Tree.hs

treeowl · 2020-01-03T15:14:50Z

I'm not into that style guide. Do what you like when you merge.

…

On Fri, Jan 3, 2020, 9:34 AM Nikos Baxevanis ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In hedgehog/src/Hedgehog/Internal/Tree.hs <#349 (comment)> : > @@ -320,12 +321,13 @@ mapMaybeT p m = -- ] -- splits :: [a] -> [([a], a, [a])] -splits = \case - [] -> - [] - x : xs -> - ([], x, xs) : - fmap (\(as, b, cs) -> (x : as, b, cs)) (splits xs) +-- This implementation is significantly more efficient than the others I've +-- tried when the lists are long, thanks to the carefully optimized +-- implementation of Data.List.inits. +splits = \xs -> go (List.inits xs) xs + where + go (front : fronts) (x : xs) = (front, x, xs) : go fronts xs + go _ _ = [] (If not, we can do that instead.) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#349?email_source=notifications&email_token=AAOOF7PYXMPW6VRHLWWC3ALQ35EH7A5CNFSM4KBDHYAKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQTRO7Y#discussion_r362829257>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7I6FLWWLEA5QUI2UUDQ35EH7ANCNFSM4KBDHYAA> .

…into treeowl-splits

treeowl · 2020-01-03T22:43:07Z

@moodmosaic I don't think your benchmark has enough data to really support those regression curves. There's too much noise, and presumably heavy cache effects on the low end. It all looks good for my change, but I think you'll see more pronounced/reliable effects with larger list sizes. Moreover, a cubic fit (as shown in the graph) doesn't seem to have any obvious theoretical support; I'd expect the reality to be approximately piecewise quadratic.

moodmosaic · 2020-01-03T22:48:09Z

Yep, agreed, FYI it's tested with sizes up to 100, and the cubit as well as the NF were just the defaults. I just wanted to have something to refer to, if needed. Better from nothing, I suppose.

treeowl and others added 2 commits December 29, 2019 17:02

Reimplement splits

2951997

Using `Data.List.inits` to produce the initial segments saves quite a bit of time.

Some doc on the bounds of Size (hedgehogqa#38) (hedgehogqa#346)

e2a3c5c

moodmosaic approved these changes Jan 3, 2020

View reviewed changes

hedgehog/src/Hedgehog/Internal/Tree.hs Outdated Show resolved Hide resolved

Merge branch 'splits' of https://github.com/treeowl/haskell-hedgehog …

ca90940

…into treeowl-splits

Style nits

a50da80

moodmosaic merged commit b143363 into hedgehogqa:master Jan 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement splits #349

Reimplement splits #349

treeowl commented Dec 29, 2019

moodmosaic left a comment

treeowl commented Jan 3, 2020 via email

treeowl commented Jan 3, 2020

moodmosaic commented Jan 3, 2020

Reimplement splits #349

Reimplement splits #349

Conversation

treeowl commented Dec 29, 2019

moodmosaic left a comment

Choose a reason for hiding this comment

treeowl commented Jan 3, 2020 via email

treeowl commented Jan 3, 2020

moodmosaic commented Jan 3, 2020