foldl' doens't inline on basic example #329

Boarders · 2020-11-25T23:29:17Z

I have the following code:

countBracketsBA :: ByteArray -> Int
countBracketsBA = foldlBA addBA 0
  where
    addBA :: Int -> Word8 -> Int
    addBA acc 40 = acc + 1
    addBA acc 41 = acc - 1
    addBA acc _  = acc

foldlBA :: forall a b. (Prim a) => (b -> a -> b) -> b -> ByteArray -> b
foldlBA f z arr = go 0 z
  where
    go i acc
      | i < maxI  = go (i + 1) (f acc (indexByteArray arr i)) 
      | otherwise = acc
    maxI = sizeofByteArray arr `quot` sizeOf (undefined :: a)

countBrackets :: ByteString -> Int
countBrackets = ByteString.foldl' addBS 0
  where
    addBS :: Int -> Word8 -> Int
    addBS acc 40 = acc + 1
    addBS acc 41 = acc - 1
    addBS acc _  = acc

I then run this with the following benchmark:

n :: Int
n = 100000

input :: [Word8]
input = replicate n 40 <> replicate n 41

getInputs :: IO (ByteArray.ByteArray, ByteString.ByteString)
getInputs =
  do
    let ba = ByteArray.byteArrayFromList input
    let bs = ByteString.pack input
    pure (ba, bs)

main :: IO ()
main = do
  defaultMain . pure $
    env getInputs $ \ ~(bytes, bs) ->
      bgroup "bracket count"
        [ bench "BA" $ nf countBracketsBA bytes
        , bench "BS" $ nf countBrackets   bs       
        ]

This gives the following result:

benchmarking bracket count/BA
time                 259.7 μs   (257.4 μs .. 261.8 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 258.9 μs   (257.5 μs .. 260.7 μs)
std dev              5.351 μs   (3.942 μs .. 8.183 μs)
variance introduced by outliers: 14% (moderately inflated)

benchmarking bracket count/BS
time                 1.302 ms   (1.265 ms .. 1.339 ms)
                     0.996 R²   (0.995 R² .. 0.999 R²)
mean                 1.263 ms   (1.252 ms .. 1.279 ms)
std dev              45.81 μs   (32.79 μs .. 63.27 μs)
variance introduced by outliers: 25% (moderately inflated)

Lyxia (in the fp slack) informed me that this is because foldl' was not being inlined and indeed if I add a definition in the same module (due to Lyxia):

myfoldl' :: (a -> Word8 -> a) -> a -> ByteString -> a
myfoldl' f v (PS fp off len) =
      accursedUnutterablePerformIO $ withForeignPtr fp $ \p ->
        go v (p `plusPtr` off) (p `plusPtr` (off+len))
    where
      -- tail recursive; traverses array left to right
      go !z !p !q | p == q    = return z
                  | otherwise = do x <- peek p
                                   go (f z x) (p `plusPtr` 1) q

then I get good performance:

benchmarking bracket count/BS inline
time                 177.8 μs   (176.4 μs .. 179.8 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 178.8 μs   (177.6 μs .. 182.9 μs)
std dev              6.526 μs   (2.579 μs .. 12.89 μs)
variance introduced by outliers: 34% (moderately inflated)

I'm not quite sure if this is an issue with GHC or bytestring but it doesn't seem ideal to have such behavior. This was tested with bytestring-0.10.10.1 and ghc-8.8.4 so I don't know if the same issue appears in the latest release nor the latest GHC.

The text was updated successfully, but these errors were encountered:

Bodigrim · 2020-11-25T23:34:10Z

Probably similar to #23. A PR rewriting fold{l,r}{,'} to the form foldl' f v = \(BS fp len) -> ... would be appreciated.

Boarders · 2020-11-25T23:41:30Z

I'll check that gives the desired performance and then give it a go.

Boarders · 2020-11-26T00:45:47Z

I found that the eta-expansion does indeed work but surprisingly the best performing version resulted from simply removing the INLINE pragma, I wonder if any experimentation has been done with that idea in bytestring?

Bodigrim · 2020-11-26T00:56:34Z

I have not experimented much with inlining in bytestring. It would be interesting to compare Core output for countBrackets.

{-# OPTIONS_GHC -ddump-simpl -dsuppress-all -dno-suppress-type-signatures #-}

Bodigrim added the pr-welcome label Dec 7, 2020

Boarders mentioned this issue Jan 13, 2021

Convert folds to take two arguments #345

Merged

Bodigrim linked a pull request Jan 13, 2021 that will close this issue

Convert folds to take two arguments #345

Merged

Bodigrim mentioned this issue Feb 20, 2021

Straighten folds and scans. #364

Merged

6 tasks

Bodigrim closed this as completed in #345 May 19, 2021

Bodigrim added this to the 0.11.2.0 milestone Aug 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

foldl' doens't inline on basic example #329

foldl' doens't inline on basic example #329

Boarders commented Nov 25, 2020

Bodigrim commented Nov 25, 2020

Boarders commented Nov 25, 2020

Boarders commented Nov 26, 2020

Bodigrim commented Nov 26, 2020

foldl' doens't inline on basic example #329

foldl' doens't inline on basic example #329

Comments

Boarders commented Nov 25, 2020

Bodigrim commented Nov 25, 2020

Boarders commented Nov 25, 2020

Boarders commented Nov 26, 2020

Bodigrim commented Nov 26, 2020