### Chapter 8: Writing Fast code

The rest of this chapter talks about important aspects of writing fast code.  We will walk through how to sum the first $n$ whole numbers in the fastest way.  In each case, we will make a function (called `sum1, sum2, ...`) and then test the overall speed of the function and discuss why features are faster than others. We will use the macro `@time` which will determine the time it takes to run. 

#### Sum Function 1
Consider first a for loop:

In [1]:
function sum1(n::Int)
  local arr = collect(1:n)
  local sum = 0
  for i in arr
    sum += i
  end
  sum
end

sum1 (generic function with 1 method)

This function takes in a positive number `n`, creates an array and then sums the elements of the array as a for loop.  We do some tests with this using the `@time` macro:

In [2]:
@time sum1(1_000_000)

  0.001074 seconds (2 allocations: 7.629 MiB)


500000500000

In [3]:
@time sum1(10_000_000)

  0.081637 seconds (2 allocations: 76.294 MiB, 78.27% gc time)


50000005000000

In [4]:
@time sum1(100_000_000)

  0.320986 seconds (2 allocations: 762.939 MiB, 3.18% gc time)


5000000050000000

In [6]:
@time sum1(1_000_000_000)

  6.382928 seconds (2 allocations: 7.451 GiB, 7.76% gc time)


500000000500000000

Notice though that in parentheses, it says the number of allocations.  Since the first one had an array of 1 million 64-bit integers (or 8 bytes), it is almost 8 megabytes. 

The factor of 100 higher created an array of 762 megabytes, which is not insignificant.  In short, it is expensive to allocation memory.

In [7]:
function sum2(n::Integer)
  local sum = 0
  for i=1:n
    sum+=i
  end
  sum
end

sum2 (generic function with 1 method)

This function doesn't use an array, since we don't really need one.  Let's see what happens

In [8]:
@time sum2(100_000_000)

  0.000000 seconds


5000000050000000

In [9]:
@time sum2(1_000_000_000)

  0.000000 seconds


500000000500000000

In [10]:
@time sum2(10_000_000_000)

  0.000000 seconds


-5340232216128654848

Why is this one faster than `sum1` ? What's going on with the last one?

In [11]:
function sum3(n::Int)
    local sum = big(0)
    for i=1:n
        sum+=i
    end
    sum
end

sum3 (generic function with 1 method)

If you said "overflow" in the above section, you win a prize--although I don't have a prize to give. :( 

Generally, if overflow is a problem, let's switch to `BigInt`s like above

In [12]:
@time sum3(1_000_000)

  0.114697 seconds (3.00 M allocations: 45.776 MiB)


500000500000

In [14]:
@time sum3(10_000_000)

  1.598673 seconds (30.00 M allocations: 457.764 MiB, 17.72% gc time)


50000005000000

In [13]:
@time sum3(100_000_000)

 17.001321 seconds (300.00 M allocations: 4.470 GiB, 20.31% gc time)


5000000050000000

We aren't going to have overflow problems, but you should notice that it is much slower to do operations with `BigInt`s. 

#### Exercise
Write a function similar to `sum3` however use `Int128` as the result (this should be the zero for local sum variable.) Call this `sum4` and time it comparsed to both `sum2` and `sum3`. 

Let's try using the `reduce` function:

In [15]:
function sum5(n::Int)
  reduce(+,1:big(n))
end

sum5 (generic function with 1 method)

In [16]:
@time sum5(1_000_000)

  0.405620 seconds (5.00 M allocations: 91.553 MiB, 23.12% gc time)


500000500000

In [17]:
@time sum5(10_000_000)

  3.003192 seconds (50.00 M allocations: 915.528 MiB, 18.37% gc time)


50000005000000

Note, this is much slower than the `sum3` method.  Let's try the built-in `sum` function:

In [18]:
@time sum(1:big(10)^6)

  0.000333 seconds (41 allocations: 816 bytes)


500000500000

In [19]:
@time sum(1:big(10)^20)

  0.000037 seconds (42 allocations: 904 bytes)


5000000000000000000050000000000000000000

In [20]:
@time sum(1:big(10)^40)

  0.000011 seconds (42 allocations: 992 bytes)


50000000000000000000000000000000000000005000000000000000000000000000000000000000

In [21]:
@time sum(1:big(10)^100)

  0.000015 seconds (42 allocations: 1.227 KiB)


50000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

What's going on?

### Summary of fast code:

* Stick with `Int64` if possible.  Always faster than `BigInt`
* don't create an array unless you need to.  Allocating memory is a slow process.
* use the built-in methods whenever possible.  They have been optimized. Julia is often super smart about some operations.

### 7.7: Computing Fibonacci Numbers

Let's look at the fibonacci numbers.  If $f_1=1,f_2=1$, then 
$$f_n=f_{n-1}+f_{n-2}\qquad\text{for $n\geq2$}$$

Let's find a fibonacci function that is recursive:

In [22]:
function fibonacci(n::Integer)
    if n <= 2
        return 1
    else
        return fibonacci(n-1) + fibonacci(n-2)
    end
end

fibonacci (generic function with 1 method)

The first 10 can be found in the following way:

In [23]:
map(fibonacci,1:10)

10-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13
 21
 34
 55

An alternative is:

In [24]:
fibonacci(n::Int) = n<=2 ? 1 : fibonacci(n-1) + fibonacci(n-2)

fibonacci (generic function with 2 methods)

This seems reasonable, but if we find the 40th one:

In [25]:
@time fibonacci(40)

  0.452290 seconds


102334155

In [26]:
@time fibonacci(41)

  0.738618 seconds


165580141

In [27]:
@time fibonacci(42)

  1.186162 seconds


267914296

In [28]:
@time fibonacci(43)

  2.375551 seconds


433494437

This isn't looking good. I'm sure (without trying we can find the 100th one.) Why is this so slow?

We going to see how many function evaluations are made.  Consider the adapted fibonacci code to compute the number of times it is evaluated:

In [29]:
function fibonacciEval(n::Integer)
  global num_evals
  if n==1 || n==2
    num_evals +=1
    return 1
  else
    num_evals += 2
    return fibonacciEval(n-1) + fibonacciEval(n-2)
  end
end

fibonacciEval (generic function with 1 method)

In [30]:
num_evals=0
fibonacciEval(5)
num_evals

13

In [31]:
num_evals=0
fibonacciEval(20)
num_evals

20293

In [33]:
num_evals=0
fibonacciEval(42)
num_evals

803742886

Consider the following fibonacci based on a for loop:

In [34]:
function fibonacci2(n)
  local x,y = (1,1)
  for i = 1:n-1
    x,y = (y, x+y)
  end
  x
end

fibonacci2 (generic function with 1 method)

In [35]:
@time fibonacci2(50)

  0.000000 seconds


12586269025

In [36]:
@time fibonacci2(100)

  0.000000 seconds


3736710778780434371

#### Summary of Recursive functions
- Often writing recursive functions is easy, especially if that is the way they are defined. 
- However a short recursive function is not necessarily fast. 