# Problem 1: Continued Fractions

In this problem, you will write a macro, analogous to the `@evalpoly` macro in Base (discussed in lecture 4) that computes a truncated [continued-fraction expansion](https://en.wikipedia.org/wiki/Continued_fraction).  That is,

```jl
@cf x a0 a1 a2 a3 a4
```

will compute

$$
\frac{1}{x + \frac{a_0}{x + \frac{a_1}{x + \frac{a_2}{x + \frac{a_3}{x + a_4}}}}}
$$

Note that `x` can be a real or complex number or any other type supporting `+` and `/`.

Ideally, your implementation should completely inline the computation, like `@evalpoly` and `Base.@horner`.  It may perform some algebraic transformation on the expression to simplify it first (e.g. to reduce the number of divisions).   You don't need to worry about roundoff errors (i.e. you can do algebraic transformations that may slightly change the rounding errors in floating-point arithmetic).

To start you off, here is a sample implemenation, that just does a function call (with the most obvious expression) and is rather slow:

In [3]:
function cf(x, a...)
    z = inv(one(x)) # initialize z this way for type stability
    for i = length(a):-1:1
        z = x + a[i] / z
    end
    return inv(z)
end

cf (generic function with 1 method)

In [4]:
macro cf(x, a...)
    Expr(:call, :cf, x, a...)
end

@cf (macro with 1 method)

In [5]:
@cf 3 4 5 6

0.24242424242424243

In [6]:
@cf 3//4 5 6 7 # exact rational arithmetic

756//3047

To see that we are doing the right thing, it would be nice to evaluate the expression symbolically and see it nicely formatted.  Fortunately, we can do just this with the [SymPy](https://github.com/JuliaPy/SymPy.jl) package.  (Do `Pkg.add("SymPy")` if you have not installed it yet.)

In [7]:
using SymPy

@cf Sym(:x) 3 4 5 6

        1        
-----------------
          3      
x + -------------
            4    
    x + ---------
              5  
        x + -----
            x + 6

Note that the `Sym(:x)` function creates a "symbolic" expression $x$, so that subsequent computations on it produce new symbolic expressions, and the final symbolic expression is rendered nicely in the notebook.   See the SymPy documentation for more on this cool package — under the hood, it is using the powerful [Python SymPy](http://www.sympy.org/en/index.html) module for symbolic algebra.

# Problem 2: implementing broadcast

The `broadcast(f, args...)` function in Julia is very powerful: it takes a function `f` and applies it "elementwise" to each argument, but "expands" (or "broadcasts") lower-dimensional arguments to match higher-dimensional ones.  Implementing such a function efficiently is rather tricky, however.

In this problem, you will implement your *own* `broadcast` function, which only works for *numbers and vectors of numbers*.   We will start with a slow (but working) implementation and you will try to make it faster.

(No fair calling Julia's built-in `broadcast`, though!)

In [8]:
# handle the no-argument and all-number special cases so that we
# don't need to deal with them in the general version below:
mybroadcast(f::Function) = f()
mybroadcast(f::Function, args::Number...) = f(args...)

# like broadcast, but only works for numbers and vectors of numbers
function mybroadcast(f::Function, args::Union{Number,AbstractVector}...)
    assert(!isempty(args)) # empty case should be handled above
    
    # compute the length and type of the result:
    n = -1
    T = Bool
    for a in args
        if isa(a, AbstractVector)
            if n == -1
                n = length(a)
            elseif n != length(a)
                throw(DimensionMismatch())
            end
            T = promote_type(T, eltype(a))
        else
            T = promote_type(T, typeof(a))
        end
    end
    assert(n >= 0) # should have been at least one vector arg
    result = Array{T}(n)
    
    for i = 1:n
        result[i] = f(map(a -> isa(a, AbstractVector) ? a[i] : a, args)...)
    end
    
    return result
end

mybroadcast (generic function with 3 methods)

In [9]:
mybroadcast(+, 1, 1:10)

10-element Array{Int64,1}:
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11

In [10]:
mybroadcast(+, 1, 1:3, [10,100,1000])

3-element Array{Int64,1}:
   12
  103
 1004

Hooray, it seems to work!

Actually, it's not quite right, because it assumes that the `result` type can be determined purely from the types of the *arguments*, when in fact it also depends on the *function* `f`:

In [11]:
broadcast(sqrt, 1:3)

3-element Array{Float64,1}:
 1.0    
 1.41421
 1.73205

In [12]:
mybroadcast(sqrt, 1:3)

LoadError: InexactError()

It gives an `InexactError` because it thinks that `result` is an `Array{Int,1}`, matching `eltype(1:3)`, when in fact the `sqrt` function produces a floating-point result from an integer argument.  A floating-point result like `sqrt(2)` cannot be stored in an array of integers, so an exception is thrown.

Handling this kind of "type computation" properly is *very* tricky.   For the problem set, we will punt: **we will only use `mybroadcast` with functions whose output type can be computed simply from the arguments as above**.

Now, let's time it against the built-in `broadcast` function:

In [13]:
using BenchmarkTools

"""
Like `@benchmark`, but returns only the minimum time in ns.
"""
macro benchtime(args...)
    b = Expr(:macrocall, Symbol("@benchmark"), map(esc, args)...)
    :(time(minimum($b)))
end

@benchtime

In [14]:
using Base.Test # gives us the handy @test macro

In [15]:
x = rand(10000)
@test broadcast(+, x, 1) == mybroadcast(+, x, 1)
@benchtime(mybroadcast(+, $x, 1)) / @benchtime(broadcast(+, $x, 1))

1029.19816700611

Holy cow!  Our `mybroadcast` function is **1000×** slower than `broadcast`.  But one *must* be able to do better — the `broadcast` function itself is written in Julia, with no special help from the compiler.  Can you?

Hint: the key trick is to make sure that (a) the compiler can figure out things like `T` at *compile-time* and (b) all of those `if` statements are decided at *compile-time*.

# Problem 3: Linear algebra

Huge gains in performance can be achieved for *sparse* matrices: matrices that are mostly zero.

However, what about matrices that are mostly some *other* value?  e.g. suppose you have a matrix that is mostly 1's, like:

$$
A = \begin{pmatrix} 
        3 & 0 & 1 & 1 & 1 & \cdots \\
        0 & 3 & 0 & 1 & 1 & \cdots \\
        1 & 0 & 3 & 0 & 1 & \cdots \\
        \ddots & \ddots & \ddots & \ddots & \ddots & \cdots \\
        \cdots & 1 & 1 & 1 & 0 & 3
    \end{pmatrix}
$$

Can you solve $Ax = b$ quickly?  That is, speed up the `mysolve` function:

In [16]:
function mymatrix{T}(::Type{T}, m::Int)
    A = ones(T, m, m)
    for i = 1:m
        A[i,i] = 3
    end
    for i = 1:m-1
        A[i,i+1] = A[i+1,i] = 0
    end
    return A
end
      
mymatrix(m::Integer) = mymatrix(Int, Int(m))

# solve Ax = b, returning x, for the "mostly ones" matrix A above
function mysolve{T<:Number}(b::AbstractVector{T})
    m = length(b)
    A = mymatrix(float(T), m)
    return A \ b
end

mysolve (generic function with 1 method)

Let's just check that the matrix looks like what we expect, and that it is invertible:

In [17]:
mymatrix(6)

6×6 Array{Int64,2}:
 3  0  1  1  1  1
 0  3  0  1  1  1
 1  0  3  0  1  1
 1  1  0  3  0  1
 1  1  1  0  3  0
 1  1  1  1  0  3

In [18]:
eigvals(mymatrix(6))

6-element Array{Float64,1}:
 0.75302
 1.42423
 2.44504
 3.18728
 3.80194
 6.38849

Yup, all positive eigenvalues in fact! (Can you prove this?)

`mysolve` will be relatively slow, dominated by the $O(m^3)$ operations for the `\`:

In [19]:
b = rand(1000)
bench = @benchmark mysolve($b)

BenchmarkTools.Trial: 
  samples:          302
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  15.27 mb
  allocs estimate:  12
  minimum time:     13.92 ms (0.00% GC)
  median time:      15.52 ms (10.18% GC)
  mean time:        16.56 ms (8.67% GC)
  maximum time:     24.06 ms (12.33% GC)

If you think about it, this is actually pretty fast.  Gaussian elimination requires $2m^3/3 + O(m)$ flops, so let's compute the flop rate in gigaflops that we are getting:

In [20]:
length(b)^3 * 2/3 / time(minimum(bench))

47.90813901792147

So, we are getting **almost 50 Gflops** on my **2.5 GHz** laptop.  Modern linear-algebra libraries (here, LAPACK and OpenBLAS) are amazing.  But you can do better!

Hint: focus on the $O(m^3)$ operations for the solve first, not the $O(m^2)$ operations to construct the matrix.  See if you can write $A$ as "sparse + nice".