Almost all of the libraries for creating neural networks (Tensorflow, Theano, Torch, etc) are using automatic differentiation (AD) in one way or another. It has applications in the other parts of the mathematical world as well since it is a clever and effective way to calculate the gradients, effortlessly. It works by first creating a computational graph of the operations and then traversing it in either forward mode or reverse mode. Let's see an implementation using operator overloading for each of the modes to calculate the first order derivative. I'll be using the same examples as used in the Colah's blog [here](http://colah.github.io/posts/2015-08-Backprop/). I highly recommned reading it first. Since that article already has an excellent explanation I'll be mainly focusing on the implementation part. It may not be the best performing piece of code for AD but I think it's the simplest for getting your head around the concept.

In [None]:
import Base.+, Base.*

<h2 class="section-heading">Forward Mode</h2>

In [None]:
type ADFwd
    value::Float64
    derivative::Float64
    
    ADFwd(val::Float64) = new(val, 0)
    ADFwd(val::Float64, der::Float64) = new(val, der)
end

In [None]:
function adf_add(x::ADFwd, y::ADFwd)
    return ADFwd(x.value + y.value, x.derivative + y.derivative)
end
+(x::ADFwd, y::ADFwd) = adf_add(x, y)

function adf_mul(x::ADFwd, y::ADFwd)
    return ADFwd(x.value * y.value, y.value * x.derivative + x.value * y.derivative)
end
*(x::ADFwd, y::ADFwd) = adf_mul(x, y)

In [None]:
function testForwardMode(x::ADFwd,y::ADFwd)
    (x+y)*(y + ADFwd(1.0))
end 

Now let's get the partial derivative of testForwardMode with respect to x. To do this, we will need to pass in a unit vector pointing along the x axis as the increment for evaluating the Jaobian against, therefore we pass in 1 when creating the ADFwd for x, and 0 for the others.

In [None]:
xFwd = ADFwd(2.0, 1.0)
yFwd = ADFwd(1.0)

In [None]:
xFwdDer = testForwardMode(xFwd, yFwd)

In [None]:
xFwd.derivative

In [None]:
yFwd.derivative

In [None]:
xFwdDer.derivative

Let us do the same to calculate the derivative with respect to 'y'.

In [None]:
xFwd = ADFwd(2.0)
yFwd = ADFwd(1.0, 1.0)

In [None]:
yFwdDer = testForwardMode(xFwd, yFwd)

In [None]:
xFwd.derivative

In [None]:
yFwd.derivative

In [None]:
yFwdDer.derivative

<h2 class="section-heading">Reverse Mode</h2>

In [None]:
type ADRev
    value::Float64
    derivative::Float64
    derivativeOp::Function
    parents::Array{ADRev}
    
    ADRev(val::Float64) = new(val, 0, ad_constD, Array(ADRev,0))
    ADRev(val::Float64, der::Float64) = new(val, der, ad_constD, Array(ADRev,0))
end

function ad_constD(prevDerivative::Float64, adNodes::Array{ADRev})
    return 0
end

In [None]:
function adr_add(x::ADRev, y::ADRev)
    result = ADRev(x.value + y.value)
    result.derivativeOp = adr_addD
    push!(result.parents, x)
    push!(result.parents, y)
    return result
end
function adr_addD(prevDerivative::Float64, adNodes::Array{ADRev})
    adNodes[1].derivative = adNodes[1].derivative + prevDerivative * 1
    adNodes[2].derivative = adNodes[2].derivative + prevDerivative * 1
    return
end
+(x::ADRev, y::ADRev) = adr_add(x, y)

In [None]:
function adr_mul(x::ADRev, y::ADRev)
    result = ADRev(x.value * y.value)
    result.derivativeOp = adr_mulD
    push!(result.parents, x)
    push!(result.parents, y)
    return result
end
function adr_mulD(prevDerivative::Float64, adNodes::Array{ADRev})
    adNodes[1].derivative = adNodes[1].derivative + prevDerivative * adNodes[2].value
    adNodes[2].derivative = adNodes[2].derivative + prevDerivative * adNodes[1].value
    return
end
*(x::ADRev, y::ADRev) = adr_mul(x, y)

In [None]:
xRev = ADRev(2.0)
yRev = ADRev(1.0)

In [None]:
function f(x::ADRev,y::ADRev)
    (x+y)*(y + ADRev(1.0))
end 

In [None]:
function backprop(graph::ADRev)
    current = graph
    # set the derivative to 1
    current.derivative = 1
    bfs = [current]
    while length(bfs) != 0
        current = pop!(bfs)
        currDerivative = current.derivative
        current.derivativeOp(currDerivative, current.parents)
        numParents = length(current.parents)
        for i=1:numParents 
            push!(bfs, current.parents[i])
        end
    end
    return graph
end

In [None]:
fRev = backprop(f(xRev, yRev))

In [None]:
xRev.derivative

In [None]:
yRev.derivative

<h2 class="section-heading">References:</h2>

- [Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)
- [Calculus on Computational Graphs: Backpropagation](http://colah.github.io/posts/2015-08-Backprop/)
- [ALGORITHMIC/AUTOMATIC DIFFERENTIATION](http://blog.tombowles.me.uk/2014/09/10/ad-algorithmicautomatic-differentiation/)
- [Efficient Calculation of Derivatives using Automatic Differentiation](https://www.duo.uio.no/bitstream/handle/10852/41535/Kjelseth-Master.pdf?sequence=9)