[All About Strictness - 12 Sep 2017 Michael Snoyman](https://www.fpcomplete.com/blog/2017/09/all-about-strictness)

# Bang!

how we can force Haskell to be more strict in its evaluation. 

In [1]:
{-# LANGUAGE BangPatterns #-}
add :: Int -> Int -> Int
add !x !y = x + y

main :: IO ()
main = do
  let !five = add (1 + 1) (1 + 2)
      !seven = add (1 + 2) (1 + 3) -- not used but evaluated immediately, no thunk here

  putStrLn $ "Five: " ++ show five
main

Five: 5

bang patterns are just **syntactic sugar** for something else. And in this case, that something else is the ***seq function***. This function looks like:

`seq :: a -> b -> b`

In [2]:
:t seq

we could implement this type signature yourself, of course, by just ignoring the a value:

In [3]:
badseq :: a -> b -> b
badseq a b = b

However, seq **uses primitive operations from GHC itself** to ensure that, ***when b is evaluated, a is evaluated too***.

In [4]:
add :: Int -> Int -> Int
add x y =
  let part1 = seq x part2
      part2 = seq y answer
      answer = x + y
   in part1   -- m: part1 refers transitively to answer
-- Or more idiomatically
add x y = x `seq` y `seq` x + y

add 3 7

10

In [5]:
main :: IO ()
main = do
  let five = add (1 + 1) (1 + 2)
      seven = add (1 + 2) (1 + 3)

  five `seq` seven `seq` putStrLn ("Five: " ++ show five) -- ? m: evaluation of putStrLn is not forced
main  

Five: 5

# Tracing evaluation

In [6]:
import Debug.Trace

add :: Int -> Int -> Int
add x y = x + y

main :: IO ()
main = do
  let five = trace "five" (add (1 + 1) (1 + 2))
      seven = trace "seven" (add (1 + 2) (1 + 3))

  putStrLn $ "Five: " ++ show five
--Five: five
--5


In [7]:
{-# LANGUAGE BangPatterns #-}
import Debug.Trace

add :: Int -> Int -> Int
add x y = x + y

main :: IO ()
main = do
  let !five = trace "five" (add (1 + 1) (1 + 2))
      !seven = trace "seven" (add (1 + 2) (1 + 3))

  putStrLn $ "Five: " ++ show five
  
  five `seq` seven `seq` putStrLn ("Five: " ++ show five)
  
--seven
--five
--Five: 5
--Five: 5

the order of their output may be different than you expect. On my system, for example, seven prints before five. That's because GHC retains the right to rearrange order of evaluation in these cases.

By contrast, if you use five `seq` seven `seq` putStrLn ("Five: " ++ show five), it will (should?) **always come out in the same order**: first five, then seven, then "Five: 5". This gives **a bit of a lie** to my claim that bang patterns are always a **simple translation to seqs**. 

However, as long as your expressions are **truly pure**, you will be **unable to observe the difference** between the two.

In [8]:
{-# LANGUAGE BangPatterns #-}
import Debug.Trace

add :: Int -> Int -> Int
add x y = x + y

main :: IO ()
main = do
  let five = trace "five" (add (1 + 1) (1 + 2))
  let seven = trace "seven" (add (1 + 2) (1 + 3))
  five `seq` seven `seq` putStrLn ("Five: " ++ show five)
--five
--seven
--Five: 5

1. Haskell is lazy by default
1. You can use bang patterns and seq to make things strict
1. By contrast, in strict languages, you can **use closures to make things lazy**
1. You can **see if a function is strict** in its **arguments** by **passing in bottom (undefined)** and seeing if it explodes in your face
1. The **trace function** can help you see this as well

### implement an average function

use a helper datatype, calling RunningTotal, to capture both the cumulative sum and the number of elements we've seen so far

In [9]:
data RunningTotal = RunningTotal
  { sum :: Int
  , count :: Int
  }

printAverage :: RunningTotal -> IO ()
printAverage (RunningTotal sum count)
  | count == 0 = error "Need at least one value!"
  | otherwise = print (fromIntegral sum / fromIntegral count :: Double)

-- | A fold would be nicer... we'll see that later
printListAverage :: [Int] -> IO ()
printListAverage =
  go (RunningTotal 0 0)
  where
    go rt [] = printAverage rt
    go (RunningTotal sum count) (x:xs) =
      let rt = RunningTotal (sum + x) (count + 1)
       in go rt xs

main :: IO ()
main = printListAverage [1..1000000]

main

500000.5

We're going to run this with run time statistics turned on so we can look at memory usage:

`$ stack ghc average.hs && ./average +RTS -s `

Lo and behold, our memory usage is through the roof! Probably we should force evaluation of the newly constructed `rt` before recursing back into `go`. Unfortunately, this results in exactly the same memory usage as we had before!???

```
[1 of 1] Compiling Main             ( average.hs, average.o )
  Linking average ...
  500000.5
     258,654,528 bytes allocated in the heap
     339,889,944 bytes copied during GC
      95,096,512 bytes maximum residency (9 sample(s))
       1,148,312 bytes maximum slop
             164 MB total memory in use (0 MB lost due to fragmentation)
             ```

## Weak Head Normal Form
[there's a great Stack Overflow answer](https://stackoverflow.com/a/6889335/369198)

In [10]:
main = putStrLn $ undefined `seq` "Hello World"
main

 t will print an error about undefined, since it will try to evaluate undefined before it will evaluate "Hello World", and because **putStrLn** is ***strict in its argument***.

In [11]:
main = putStrLn $ Just undefined `seq` "Hello World"
main

Hello World

It turns out that when we talk about forcing evaluation with **seq**, we're only talking about **evaluating to weak head normal form (WHNF)**. For most data types, this means ***unwrapping one layer of constructor***.

`Just undefined`

In the case of `Just undefined`, it means that we unwrap the Just data constructor, but **don't touch the `undefined`** within it. 

with a standard data constructor?, the impact of using seq is the same as ***pattern matching the outermost constructor***.

If you want to **monomorphise**, for example, you can implement a function of type 

`seqMaybe :: Maybe a -> b -> b` 

and use it in the main example above.

In [12]:
seqMaybe :: Maybe a -> b -> b
seqMaybe Nothing b = b
seqMaybe (Just _) b = b

main :: IO ()
main = do
  putStrLn $ Just undefined `seqMaybe` "Hello World"
  --putStrLn $ undefined `seqMaybe` "Goodbye!"
  
  putStrLn $ error `seq` "Hello" -- ! error without arguments:  any function applied to too few values is automatically in WHNF.
  putStrLn $ (\x -> undefined) `seq` "World" -- a function fully applied to its arguments. It's no longer a function, it's a value. 
  --putStrLn $ error "foo" `seq` "Goodbye!"  
  --putStrLn $ undefined 5 `seq` "Hello" 
  --putStrLn $ (error "foo" :: Int -> Double) `seq` "Hello" 
  --putStrLn $ (error "foo" :: Void -> ()) `seq` "Hello" 
  
data Void  
:t undefined 5  
:t error
main  

Hello World
Hello
World

Instead of putting the bangs on the RunningTotal value, I'm putting them **on the values within the constructor**, forcing them to be evaluated at each loop

We're no longer accumulating a huge chain of thunks, and our maximum residency drops to 44kb. (Total allocations, though, are still up around 192mb. 
We need to play around with other optimizations outside the scope of this post to deal with the total allocations, so we're going to ignore this value for the rest of the examples

In [13]:
go rt [] = printAverage rt
go (RunningTotal !sum !count) (x:xs) =   -- <=== !sum !count
  let rt = RunningTotal (sum + x) (count + 1)
   in go rt xs

Alternative approach:

**forces evaluation** of the new sum and count **before constructing the new RunningTotal value**. I like this version a bit more, as it's forcing evaluation at the correct point: when creating the value, ***instead of on the next iteration of the loop when destructing it***.

Moral of the story: make sure you're evaluating the thing you actually need to evaluate, ***not just its container***!

In [14]:
-- alternative
go rt [] = printAverage rt
go (RunningTotal sum count) (x:xs) =
  let !sum' = sum + x
      !count' = count + 1
      rt = RunningTotal sum' count'
   in go rt xs

## deepseq

fully evaluate down to normal form (NF), meaning all thunks have been evaluated inside our values. 

semi-standard (meaning it ships with GHC) library to handle this: deepseq. It works by providing an **`NFData`** type class the defines how to reduce a value to normal form (via the **`rnf`** method).

In [15]:
import Control.DeepSeq

instance NFData RunningTotal where
  rnf (RunningTotal sum count) = sum `deepseq` count `deepseq` ()
  
printListAverage :: [Int] -> IO ()
printListAverage =
  go (RunningTotal 0 0)
  where
    go rt [] = printAverage rt
    go (RunningTotal sum count) (x:xs) =
      let rt = RunningTotal (sum + x) (count + 1)
       in rt `deepseq` go rt xs  -- <=================

In [17]:
-- or with default implementation
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics (Generic)

data RunningTotal = RunningTotal
  { sum :: Int
  , count :: Int
  }
  deriving Generic
instance NFData RunningTotal

We can use this not only to avoid space leaks (as we're doing here), but also **to avoid accidentally including exceptions inside thunks within a value**. For an example of that, check out the tryAnyDeep function from the safe-exceptions library.

## Strict data