Optimize folds #273

Bodigrim · 2020-08-27T23:09:20Z

This is #151 rebased, see explanations there. Closes #206. CC @sergv

Bodigrim · 2020-08-28T18:01:56Z

Benchmarking appeared to be too tempting. Here are results, and I think they are hilarious! foldl' is roughly 5-10% faster and foldr' is 10-15% faster. Next, filter is 15-20% faster, mapAccumL, mapAccumR and scanl are 20-25% faster, while scanr is 25-30% faster. It would be fantastic to merge this before 0.11.0.0 release.

Benchmark          Before     After    Speedup
----------------------------------------------
foldl'/96         328.1 ns   328.0 ns   0.03%
foldl'/192        694.2 ns   671.5 ns   3.27%
foldl'/384        1.398 μs   1.333 μs   4.65%
foldl'/768        2.786 μs   2.658 μs   4.59%
foldl'/1536       5.556 μs   5.323 μs   4.19%
foldl'/3072       11.08 μs   10.43 μs   5.87%
foldl'/6144       22.15 μs   20.94 μs   5.46%
foldl'/12288      44.73 μs   41.36 μs   7.53%
foldl'/24576      89.31 μs   82.52 μs   7.60%
foldl'/49152      180.7 μs   164.2 μs   9.13%

foldr'/96         386.3 ns   339.9 ns  12.01%
foldr'/192        742.6 ns   658.3 ns  11.35%
foldr'/384        1.452 μs   1.297 μs  10.67%
foldr'/768        2.879 μs   2.569 μs  10.77%
foldr'/1536       5.746 μs   5.123 μs  10.84%
foldr'/3072       11.54 μs   10.33 μs  10.49%
foldr'/6144       23.23 μs   20.32 μs  12.53%
foldr'/12288      46.08 μs   40.24 μs  12.67%
foldr'/24576      92.97 μs   80.99 μs  12.89%
foldr'/49152      187.0 μs   162.0 μs  13.37%

filter/96         421.7 ns   352.0 ns  16.53%
filter/192        807.4 ns   670.6 ns  16.94%
filter/384        1.587 μs   1.298 μs  18.21%
filter/768        3.146 μs   2.618 μs  16.78%
filter/1536       6.314 μs   5.136 μs  18.66%
filter/3072       12.45 μs   10.20 μs  18.07%
filter/6144       24.72 μs   20.28 μs  17.96%
filter/12288      49.51 μs   40.25 μs  18.70%
filter/24576      98.68 μs   80.45 μs  18.47%
filter/49152      201.1 μs   161.9 μs  19.49%

mapAccumL/96      528.3 ns   402.5 ns  23.81%
mapAccumL/192     1.023 μs   777.7 ns  23.98%
mapAccumL/384     2.021 μs   1.508 μs  25.38%
mapAccumL/768     4.004 μs   3.033 μs  24.25%
mapAccumL/1536    8.027 μs   6.032 μs  24.85%
mapAccumL/3072    16.02 μs   11.97 μs  25.28%
mapAccumL/6144    32.08 μs   23.72 μs  26.06%
mapAccumL/12288   64.24 μs   47.95 μs  25.36%
mapAccumL/24576   125.7 μs   95.53 μs  24.00%
mapAccumL/49152   258.2 μs   195.4 μs  24.32%

mapAccumR/96      512.7 ns   397.0 ns  22.57%
mapAccumR/192     989.2 ns   761.9 ns  22.98%
mapAccumR/384     1.937 μs   1.506 μs  22.25%
mapAccumR/768     3.951 μs   2.995 μs  24.20%
mapAccumR/1536    7.999 μs   5.944 μs  25.69%
mapAccumR/3072    15.54 μs   11.76 μs  24.32%
mapAccumR/6144    31.44 μs   23.33 μs  25.80%
mapAccumR/12288   61.30 μs   46.41 μs  24.29%
mapAccumR/24576   120.9 μs   93.71 μs  22.49%
mapAccumR/49152   246.2 μs   194.0 μs  21.20%

scanl/96          477.3 ns   373.0 ns  21.85%
scanl/192         925.4 ns   702.9 ns  24.04%
scanl/384         1.830 μs   1.389 μs  24.10%
scanl/768         3.557 μs   2.760 μs  22.41%
scanl/1536        7.205 μs   5.453 μs  24.32%
scanl/3072        14.24 μs   11.06 μs  22.33%
scanl/6144        28.79 μs   22.24 μs  22.75%
scanl/12288       56.75 μs   43.64 μs  23.10%
scanl/24576       115.1 μs   86.57 μs  24.79%
scanl/49152       239.4 μs   181.1 μs  24.35%

scanr/96          469.5 ns   342.8 ns  26.99%
scanr/192         915.0 ns   653.5 ns  28.58%
scanr/384         1.797 μs   1.285 μs  28.49%
scanr/768         3.606 μs   2.539 μs  29.59%
scanr/1536        7.133 μs   5.110 μs  29.36%
scanr/3072        14.17 μs   10.08 μs  28.86%
scanr/6144        28.16 μs   20.22 μs  28.20%
scanr/12288       57.23 μs   40.76 μs  28.78%
scanr/24576       112.5 μs   81.62 μs  27.45%
scanr/49152       235.4 μs   165.6 μs  29.65%

chessai · 2020-08-28T18:13:10Z

Holy crap.

…

On Fri, Aug 28, 2020, 13:02 Bodigrim ***@***.***> wrote: Benchmarking appeared to be too tempting. Here are results, and I think they are hilarious! foldl' is roughly 5-10% faster and foldr' is 10-15% faster. Next, filter is 15-20% faster, mapAccumL, mapAccumR and scanl are 20-25% faster, while scanr is 25-30% faster. It would be fantastic to merge this before 0.11.0.0 release. Benchmark Before After Speedup ---------------------------------------------- foldl'/96 328.1 ns 328.0 ns 0.03% foldl'/192 694.2 ns 671.5 ns 3.27% foldl'/384 1.398 μs 1.333 μs 4.65% foldl'/768 2.786 μs 2.658 μs 4.59% foldl'/1536 5.556 μs 5.323 μs 4.19% foldl'/3072 11.08 μs 10.43 μs 5.87% foldl'/6144 22.15 μs 20.94 μs 5.46% foldl'/12288 44.73 μs 41.36 μs 7.53% foldl'/24576 89.31 μs 82.52 μs 7.60% foldl'/49152 180.7 μs 164.2 μs 9.13% foldr'/96 386.3 ns 339.9 ns 12.01% foldr'/192 742.6 ns 658.3 ns 11.35% foldr'/384 1.452 μs 1.297 μs 10.67% foldr'/768 2.879 μs 2.569 μs 10.77% foldr'/1536 5.746 μs 5.123 μs 10.84% foldr'/3072 11.54 μs 10.33 μs 10.49% foldr'/6144 23.23 μs 20.32 μs 12.53% foldr'/12288 46.08 μs 40.24 μs 12.67% foldr'/24576 92.97 μs 80.99 μs 12.89% foldr'/49152 187.0 μs 162.0 μs 13.37% filter/96 421.7 ns 352.0 ns 16.53% filter/192 807.4 ns 670.6 ns 16.94% filter/384 1.587 μs 1.298 μs 18.21% filter/768 3.146 μs 2.618 μs 16.78% filter/1536 6.314 μs 5.136 μs 18.66% filter/3072 12.45 μs 10.20 μs 18.07% filter/6144 24.72 μs 20.28 μs 17.96% filter/12288 49.51 μs 40.25 μs 18.70% filter/24576 98.68 μs 80.45 μs 18.47% filter/49152 201.1 μs 161.9 μs 19.49% mapAccumL/96 528.3 ns 402.5 ns 23.81% mapAccumL/192 1.023 μs 777.7 ns 23.98% mapAccumL/384 2.021 μs 1.508 μs 25.38% mapAccumL/768 4.004 μs 3.033 μs 24.25% mapAccumL/1536 8.027 μs 6.032 μs 24.85% mapAccumL/3072 16.02 μs 11.97 μs 25.28% mapAccumL/6144 32.08 μs 23.72 μs 26.06% mapAccumL/12288 64.24 μs 47.95 μs 25.36% mapAccumL/24576 125.7 μs 95.53 μs 24.00% mapAccumL/49152 258.2 μs 195.4 μs 24.32% mapAccumR/96 512.7 ns 397.0 ns 22.57% mapAccumR/192 989.2 ns 761.9 ns 22.98% mapAccumR/384 1.937 μs 1.506 μs 22.25% mapAccumR/768 3.951 μs 2.995 μs 24.20% mapAccumR/1536 7.999 μs 5.944 μs 25.69% mapAccumR/3072 15.54 μs 11.76 μs 24.32% mapAccumR/6144 31.44 μs 23.33 μs 25.80% mapAccumR/12288 61.30 μs 46.41 μs 24.29% mapAccumR/24576 120.9 μs 93.71 μs 22.49% mapAccumR/49152 246.2 μs 194.0 μs 21.20% scanl/96 477.3 ns 373.0 ns 21.85% scanl/192 925.4 ns 702.9 ns 24.04% scanl/384 1.830 μs 1.389 μs 24.10% scanl/768 3.557 μs 2.760 μs 22.41% scanl/1536 7.205 μs 5.453 μs 24.32% scanl/3072 14.24 μs 11.06 μs 22.33% scanl/6144 28.79 μs 22.24 μs 22.75% scanl/12288 56.75 μs 43.64 μs 23.10% scanl/24576 115.1 μs 86.57 μs 24.79% scanl/49152 239.4 μs 181.1 μs 24.35% scanr/96 469.5 ns 342.8 ns 26.99% scanr/192 915.0 ns 653.5 ns 28.58% scanr/384 1.797 μs 1.285 μs 28.49% scanr/768 3.606 μs 2.539 μs 29.59% scanr/1536 7.133 μs 5.110 μs 29.36% scanr/3072 14.17 μs 10.08 μs 28.86% scanr/6144 28.16 μs 20.22 μs 28.20% scanr/12288 57.23 μs 40.76 μs 28.78% scanr/24576 112.5 μs 81.62 μs 27.45% scanr/49152 235.4 μs 165.6 μs 29.65% — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#273 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEOIX2Y2VJQTZZEYNB5I7QLSC7WKHANCNFSM4QNQHV6Q> .

sjakobi

Thanks for picking this up, @Bodigrim! :)

The speed-ups are amazing!

IIUC all these changes do is lift static arguments out of the inner worker functions. Isn't this something that GHC ought to be able to do automatically?! Is this a compiler bug?

I noticed that you didn't add benchmarks for the following functions:

foldl
foldr
any
all
spanByte
splitWith
zipWith'
findIndexOrEnd

No idea whether any of these need additional scrutiny though.

Regarding correctness, I assume that all the changed functions have reasonable test coverage, but I haven't checked.

BTW, how did you produce the table of benchmark results, @Bodigrim? This is a very nice format.

Bodigrim · 2020-08-30T20:54:13Z

Isn't this something that GHC ought to be able to do automatically?!

I do not expect GHC being able to do it automatically, certainly not in the case when constant arguments are trailing ones. The closest optimization I know about is call-pattern specialization, but it does not cover our case.

One can check Core generated for these functions:

{-# LANGUAGE BangPatterns #-}

module Lib (countFromTo, countFromTo') where

countFromTo :: Int -> Int -> Int
countFromTo !from !to = go 0 from to
  where
    go !acc !f !t
      | f == t = acc
      | otherwise = go (acc + 1) (f + 1) t

countFromTo' :: Int -> Int -> Int
countFromTo' !from !to = go 0 from
  where
    go !acc !f
      | f == to = acc
      | otherwise = go (acc + 1) (f + 1)

The inner loop of countFromTo becomes

$wgo :: Int# -> Int# -> Int# -> Int#
$wgo
  = \ (ww_s1xz :: Int#) (ww1_s1xD :: Int#) (ww2_s1xH :: Int#) ->
      case ==# ww1_s1xD ww2_s1xH of {
        __DEFAULT -> $wgo (+# ww_s1xz 1#) (+# ww1_s1xD 1#) ww2_s1xH;
        1# -> ww_s1xz
      }

Notice ww2_s1xH remains passed around, no magic optimization happens.

I noticed that you didn't add benchmarks for the following functions:

It does not make much sense to benchmark lazy foldl and foldr. Other function require some clever choice of input strings to avoid short-circuiting and several variants of predicates. Like, should we benchmark any with a predicate (== 99) or (/= 99) or odd, etc.? I think provided benchmarks are representative enough to show that it is certainly beneficial to pass less arguments recursively. (I can add more benchmarks, if someone insists on additional scrutiny, of course)

Regarding correctness, I assume that all the changed functions have reasonable test coverage, but I haven't checked.

There are tests, but I still encourage reviewers to verify changes closely.

BTW, how did you produce the table of benchmark results, @Bodigrim? This is a very nice format.

By hand :)

Bodigrim · 2020-09-05T19:44:07Z

Anyone else willing to take a look please? @hsyl20 @vdukhovni @fumieval

Data/ByteString.hs

vdukhovni · 2020-09-05T21:36:49Z

Data/ByteString.hs

-            y <- peekByteOff p2 n
-            pokeByteOff r n (f x y)
-            zipWith_ (n+1) p1 p2 r
+    go p1 p2 = zipWith_ 0


A general question. Under what conditions is it actually better to avoid passing fixed arguments on the stack in recursive calls, and instead allocate a closure around the fixed arguments, that avoids having to copy them into each stack frame?

I must admit I don't really understand what the compiler does here, but if the new version has to perform closure allocations that the old version does not, then perhaps the cost of carrying the extra arguments around is worth the ultimate reduction in GC pressure.

If the benchmarks only measure speed, but don't measure GC pressure, we may not be doing the right thing here (and elsewhere...). The benchmarks would have to run long enough to take the GC overhead into account...

I think it is always preferable to avoid passing fixed arguments (unless it is a particularly degenerate case). An additional closure (if any) is allocated only once per call, which is not a big deal at all, but additional arguments are passed for each element of the string.

Closure allocation is however much more expensive than pushing a parameter onto the stack, so for sufficiently short strings the savings may be small, and the ultimate GC costs non-negligible if one processes a lot of short strings, or even a lot of strings period, sometimes it is better to pay the price as you go than to accumulate a lot of garbage to later clean up and stall.

Mind you, I'd like to hear from @hsyl20 et. al. on what the actual tradeoffs are here. I don't yet have a clear enough sense of when the various costs and benefits come into play...

Yes closure allocation can be costly. I have a patch for containers that removes this kind of closure for a lookup in IntMap.

In this specific zipWith' case, we don't allocate any closure as far as I can tell from looking at STG. create and zipWith_ are inlined and transformed into join points.

maoe · 2020-09-07T03:12:23Z

Isn't this something that GHC ought to be able to do automatically?!

I do not expect GHC being able to do it automatically, certainly not in the case when constant arguments are trailing ones. The closest optimization I know about is call-pattern specialization, but it does not cover our case.

This optimization is called Static Argument Transformation (SAT). It's implemented in GHC but is limited to be applied to functions that have two or more static arguments.

There's a performance measurement and some discussions in Compilation by transformation for non-strict functional languages by Andre Santos. As you can see in the thesis, SAT is not always a win.

EDIT: Fixed the broken link to the thesis.

vdukhovni · 2020-09-07T04:03:23Z

Isn't this something that GHC ought to be able to do automatically?!

I do not expect GHC being able to do it automatically, certainly not in the case when constant arguments are trailing ones. The closest optimization I know about is call-pattern specialization, but it does not cover our case.

This optimization is called Static Argument Transformation (SAT). It's implemented in GHC but is limited to be applied to functions that have two or more static arguments.

There's a performance measurement and some discussions in Compilation by transformation for non-strict functional languages by Andre Santos. As you can see in the thesis, SAT is not always a win.

Thanks (though the above link no longer works, I was able to find another copy.
It raises in more details some of the concerns that I was only able to guess at. I don't know what's actually happening in this regard in GHC ~20 years since that work, and would be interested in learning more. Bottom line, I'd like to have "better theory" for this PR, so that we fully understand the costs/benefits (unless there somehow is no cost and it is all "benefit").

Bodigrim · 2020-09-07T18:33:59Z

OK, so according to Santos' thesis, the only downside of SAT is a suspected increase in heap allocations, when we run a function many times on short inputs.

I don't know what's actually happening in this regard in GHC ~20 years since that work, and would be interested in learning more.

Join point optimization comes to the rescue. For example, foldl' (\acc x -> acc + fromIntegral x) 0 s, where foldl' is defined per this PR, compiles to

foo :: ByteString -> Int
foo
  = \ (w_svf0 :: ByteString) ->
      case w_svf0 of { BS ww1_svf3 ww2_svf4 ww3_svf5 ->
      let {
        end_svdU :: Addr#
        end_svdU = plusAddr# ww1_svf3 ww3_svf5 } in
      join {
        exit_Xj :: Int# -> State# RealWorld -> Int
        exit_Xj (ww4_sveT :: Int#) (w1_sveQ :: State# RealWorld)
          = case touch# ww2_svf4 w1_sveQ of { __DEFAULT -> I# ww4_sveT } } in
      joinrec {
        $wgo_sveZ :: Int# -> Addr# -> State# RealWorld -> Int
        $wgo_sveZ (ww4_sveT :: Int#)
                  (ww5_sveX :: Addr#)
                  (w1_sveQ :: State# RealWorld)
          = case eqAddr# ww5_sveX end_svdU of {
              __DEFAULT ->
                case readWord8OffAddr# ww5_sveX 0# w1_sveQ of
                { (# ipv1_a6d7, ipv2_a6d8 #) ->
                jump $wgo_sveZ
                  (+# ww4_sveT (word2Int# ipv2_a6d8))
                  (plusAddr# ww5_sveX 1#)
                  ipv1_a6d7
                };
              1# -> jump exit_Xj ww4_sveT w1_sveQ
            }; } in
      jump $wgo_sveZ 0# ww1_svf3 realWorld#
      }

Notice join and joinrec instead of lets, and jumps.

Bodigrim · 2020-09-07T18:43:31Z

And here are benchmarks for short strings. The new version is again consistently faster:

Benchmark          Before     After   
------------------------------------
foldl'/1          12.22 ns  9.253 ns 
foldl'/2          16.06 ns  12.39 ns 
foldl'/4          23.30 ns  18.45 ns 
foldl'/8          42.77 ns  32.32 ns 
foldl'/16         78.52 ns  61.57 ns 
foldl'/32         136.0 ns  110.7 ns 
foldl'/64         253.2 ns  205.0 ns 
foldr'/1          12.42 ns  9.727 ns 
foldr'/2          15.99 ns  12.77 ns 
foldr'/4          23.85 ns  18.95 ns 
foldr'/8          43.01 ns  33.50 ns 
foldr'/16         78.72 ns  61.53 ns 
foldr'/32         134.2 ns  112.5 ns 
foldr'/64         251.8 ns  214.1 ns 
mapAccumL/1       24.08 ns  19.07 ns 
mapAccumL/2       30.70 ns  22.93 ns 
mapAccumL/4       41.47 ns  31.18 ns 
mapAccumL/8       67.68 ns  46.85 ns 
mapAccumL/16      107.2 ns  83.84 ns 
mapAccumL/32      188.0 ns  144.7 ns 
mapAccumL/64      343.3 ns  268.8 ns 
mapAccumR/1       24.67 ns  20.64 ns 
mapAccumR/2       30.21 ns  24.52 ns 
mapAccumR/4       41.00 ns  31.47 ns 
mapAccumR/8       66.84 ns  47.55 ns 
mapAccumR/16      106.2 ns  85.31 ns 
mapAccumR/32      186.8 ns  145.1 ns 
mapAccumR/64      337.9 ns  270.4 ns 
scanl/1           23.33 ns  19.55 ns 
scanl/2           28.32 ns  23.14 ns 
scanl/4           38.07 ns  30.99 ns 
scanl/8           62.14 ns  48.54 ns 
scanl/16          99.18 ns  82.06 ns 
scanl/32          174.0 ns  139.0 ns 
scanl/64          316.2 ns  251.5 ns 
scanr/1           23.21 ns  20.04 ns 
scanr/2           28.15 ns  23.43 ns 
scanr/4           37.48 ns  30.66 ns 
scanr/8           55.57 ns  50.68 ns 
scanr/16          98.10 ns  78.62 ns 
scanr/32          170.6 ns  131.7 ns 
scanr/64          311.5 ns  239.3 ns 
filter/1          30.60 ns  29.08 ns 
filter/2          35.00 ns  33.74 ns 
filter/4          42.83 ns  39.42 ns 
filter/8          58.78 ns  52.38 ns 
filter/16         87.78 ns  78.67 ns 
filter/32         150.2 ns  131.2 ns 
filter/64         283.7 ns  243.8 ns

vdukhovni · 2020-09-07T21:17:41Z

OK, so according to Santos' thesis, the only downside of SAT is a suspected increase in heap allocations, when we run a function many times on short inputs.

I don't know what's actually happening in this regard in GHC ~20 years since that work, and would be interested in learning more.

Join point optimization comes to the rescue. For example, foldl' (\acc x -> acc + fromIntegral x) 0 s, where foldl' is defined per this PR, compiles to

Thanks for that, some clarity is emerging. The only remaining question for me is whether all of the modified functions fall into this case, or whether for some reason some slip through the cracks.

Bodigrim · 2020-09-07T23:28:25Z

The only remaining question for me is whether all of the modified functions fall into this case, or whether for some reason some slip through the cracks.

Changes to foldl and foldr save one static argument and do not introduce additional closures.
For foldl', foldr', any, all, spanByte, filter, findIndexOrEnd all introduced closures are joins (and we save one static argument).
We save two static argument in mapAccumL, mapAccumR, scanl, scanr, zipWith'. So, as explained in Santos's thesis, SAT optimization is certainly beneficial (not sure why GHC decided not to apply it automatically).
Finally, for splitWith we save one static argument and all additional closures are joins. Both versions of the function allocate one letrec closure, but previously it had 7 arguments and only 4 now.

vdukhovni · 2020-09-07T23:39:16Z

The only remaining question for me is whether all of the modified functions fall into this case, or whether for some reason some slip through the cracks.

Changes to foldl and foldr save one static argument and do not introduce additional closures.

For foldl', foldr', any, all, spanByte, filter, findIndexOrEnd all introduced closures are joins (and we save one static argument).

We save two static argument in mapAccumL, mapAccumR, scanl, scanr, zipWith'. So, as explained in Santos's thesis, SAT optimization is certainly beneficial (not sure why GHC decided not to apply it automatically).

Finally, for splitWith we save one static argument and all additional closures are joins. Both versions of the function allocate one letrec closure, but previously it had 7 arguments and only 4 now.

Cool. I'm sold. Much appreciated.

sjakobi · 2020-09-08T18:54:34Z

Thanks for the background info and investigation, everyone! :)

Ping @hsyl20!

vdukhovni · 2020-09-08T21:20:27Z

Thanks for the background info and investigation, everyone! :)

Ping @hsyl20!

Indeed it would now perhaps be interesting to know whether the compiler has enough smarts to notice that the values are just passed around unmodified (even at the ends of the argument list), and could/should perhaps perform the same optimisation automatically when it already knows that it is add join points, and there's no associated closure allocation cost.

hsyl20

Thanks a lot for this work! LGTM

Bodigrim · 2020-09-09T17:25:44Z

All credit goes to @sergv. Thanks, @sergv!

treeowl · 2020-09-27T19:51:39Z

Do these all/any rules actually fire? RULES matching on operator sections (especially of the (== a) variety) smell flaky. And matching on class methods (== as opposed to eqWord8) is at least historically troublesome.

Bodigrim · 2020-09-27T21:01:41Z

@treeowl They do.

{-# OPTIONS_GHC -ddump-rule-firings #-}

module AnyZero (anyZero) where

import Prelude hiding (any)
import Data.ByteString (ByteString, any)

anyZero :: ByteString -> Bool
anyZero = any (== 0)

{-
Rule fired: Class op fromInteger (BUILTIN)
Rule fired: integerToWord (BUILTIN)
Rule fired: narrow8Word# (BUILTIN)
Rule fired: Class op == (BUILTIN)
Rule fired: ByteString specialise any (== x) (Data.ByteString)

==================== Tidy Core ====================
...
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
anyZero :: ByteString -> Bool
anyZero = anyByte anyZero1
-}

treeowl · 2020-09-27T21:03:38Z

Nice surprise!

treeowl · 2020-09-27T21:04:29Z

I remember rules on class methods not only not firing but also screwing up other optimizations somehow. I guess they fixed it.

chessai · 2020-09-27T21:09:51Z

I think the concern is reasonable, esp. on older GHCs. It would be interesting if one could construct a minimal example now that causes a relevant misfire.

…

On Sun, Sep 27, 2020, 16:04 David Feuer ***@***.***> wrote: I remember rules on class methods not only not firing but also screwing up other optimizations somehow. I guess they fixed it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#273 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEOIX2YHUSGNG24SUNLI7DLSH6SGTANCNFSM4QNQHV6Q> .

Bodigrim · 2020-09-27T21:34:30Z

Cf. #70.

Bodigrim added performance blocked: benchmark-needed labels Aug 27, 2020

Bodigrim force-pushed the optimize-folds branch from abf36cb to 31e1122 Compare August 28, 2020 17:53

Bodigrim removed the blocked: benchmark-needed label Aug 28, 2020

Bodigrim requested review from sjakobi and hsyl20 August 28, 2020 18:02

sjakobi approved these changes Aug 29, 2020

View reviewed changes

vdukhovni suggested changes Sep 5, 2020

View reviewed changes

Bodigrim and others added 6 commits September 6, 2020 20:01

Benchmark fold, mapAccum, scan and filter

e9f563f

Remove haddock comment for a function in a where-block

ab39a28

Improve folds to pass less argument on each recursive invocation

a625760

Rules for any (== x) and all (/= x)

b5fa768

Review suggestions

0258e1f

Improve documentation for scanl/scanr

6b1bb6e

Bodigrim force-pushed the optimize-folds branch from 31e1122 to 6b1bb6e Compare September 6, 2020 22:56

Bodigrim added 2 commits September 7, 2020 19:44

Benchmark folds for smaller strings

eee71e3

No reason to resort for unboxed manipulations in splitWith

fe5f08d

Bodigrim force-pushed the optimize-folds branch from 066fbde to fe5f08d Compare September 7, 2020 19:24

vdukhovni approved these changes Sep 7, 2020

View reviewed changes

hsyl20 approved these changes Sep 9, 2020

View reviewed changes

Bodigrim merged commit 35796e7 into haskell:master Sep 9, 2020

Bodigrim deleted the optimize-folds branch September 9, 2020 17:25

Bodigrim added this to the 0.11.0.0 milestone Sep 9, 2020

sjakobi mentioned this pull request Dec 19, 2020

Try optimizing loops in Data.ByteString.findIndex[End] in the style of #273 #338

Closed

sjakobi mentioned this pull request Jan 18, 2021

More inner loops with more arguments than necessary #350

Open

7 tasks

sjakobi mentioned this pull request Mar 9, 2022

Some inner loops may benefit from (manual) static argument transformation haskell-unordered-containers/unordered-containers#373

Open

Bodigrim mentioned this pull request Mar 6, 2023

Lambda Lifting Chapter haskellfoundation/hs-opt-handbook.github.io#63

Open

doyougnu mentioned this pull request Mar 7, 2023

The Static Argument Transformation Chapter haskellfoundation/hs-opt-handbook.github.io#62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize folds #273

Optimize folds #273

Bodigrim commented Aug 27, 2020 •

edited

Loading

Bodigrim commented Aug 28, 2020

chessai commented Aug 28, 2020 via email

sjakobi left a comment

Bodigrim commented Aug 30, 2020

Bodigrim commented Sep 5, 2020

vdukhovni Sep 5, 2020

Bodigrim Sep 6, 2020

vdukhovni Sep 6, 2020

hsyl20 Sep 9, 2020

maoe commented Sep 7, 2020 •

edited

Loading

vdukhovni commented Sep 7, 2020

Bodigrim commented Sep 7, 2020

Bodigrim commented Sep 7, 2020

vdukhovni commented Sep 7, 2020

Bodigrim commented Sep 7, 2020

vdukhovni commented Sep 7, 2020

sjakobi commented Sep 8, 2020

vdukhovni commented Sep 8, 2020

hsyl20 left a comment

Bodigrim commented Sep 9, 2020

treeowl commented Sep 27, 2020

Bodigrim commented Sep 27, 2020

treeowl commented Sep 27, 2020

treeowl commented Sep 27, 2020

chessai commented Sep 27, 2020 via email

Bodigrim commented Sep 27, 2020

Optimize folds #273

Optimize folds #273

Conversation

Bodigrim commented Aug 27, 2020 • edited Loading

Bodigrim commented Aug 28, 2020

chessai commented Aug 28, 2020 via email

sjakobi left a comment

Choose a reason for hiding this comment

Bodigrim commented Aug 30, 2020

Bodigrim commented Sep 5, 2020

vdukhovni Sep 5, 2020

Choose a reason for hiding this comment

Bodigrim Sep 6, 2020

Choose a reason for hiding this comment

vdukhovni Sep 6, 2020

Choose a reason for hiding this comment

hsyl20 Sep 9, 2020

Choose a reason for hiding this comment

maoe commented Sep 7, 2020 • edited Loading

vdukhovni commented Sep 7, 2020

Bodigrim commented Sep 7, 2020

Bodigrim commented Sep 7, 2020

vdukhovni commented Sep 7, 2020

Bodigrim commented Sep 7, 2020

vdukhovni commented Sep 7, 2020

sjakobi commented Sep 8, 2020

vdukhovni commented Sep 8, 2020

hsyl20 left a comment

Choose a reason for hiding this comment

Bodigrim commented Sep 9, 2020

treeowl commented Sep 27, 2020

Bodigrim commented Sep 27, 2020

treeowl commented Sep 27, 2020

treeowl commented Sep 27, 2020

chessai commented Sep 27, 2020 via email

Bodigrim commented Sep 27, 2020

Bodigrim commented Aug 27, 2020 •

edited

Loading

maoe commented Sep 7, 2020 •

edited

Loading