In [1]:
import tensorflow as tf

# Automatic Differentiation
Gradients play a critical role in machine learning and most modern machine learning frameworks provide tools for computing gradients automatically, reducing the need to work out the derivatives by hand. This notebook demonstrates several approaches to computing gradients using [numerical differentiation](https://en.wikipedia.org/wiki/Numerical_differentiation), [symbolic differentiation](https://en.wikipedia.org/wiki/Computer_algebra) and [TensorFlow gradients](https://www.tensorflow.org/api_docs/python/tf/gradients). 

## Deriving formulas by hand 
Our goal is to compute a first order and second order partial derivative for a simple [polynomial](https://en.wikipedia.org/wiki/Polynomial) $f = 2ab^2 + 3a^2b$. Specifically we want to evaluate the polynomial and its derivatives $\frac{\partial f}{\partial a}$ and $\frac{\partial^2 f}{\partial a \partial b}$ at $a = 2$ and $b = 3$. The first step is to work out the derivatives by hand so that we can test our automated implementations, we will apply the following basic calculus rules:

**Sum Rule**

$
\begin{align}
f &= g + h \\
\frac{\partial f}{\partial x} &= \frac{\partial g}{\partial x} + \frac{\partial h}{\partial x}
\end{align}
$

**Product Rule**

$
\begin{align}
f &= gh \\
\frac{\partial f}{\partial x} &= g\frac{\partial h}{\partial x} + g\frac{\partial h}{\partial x}
\end{align}
$

**Power Rule**

$
\begin{align}
f &= g^{n} \\
\frac{\partial f}{\partial x} &= ng^{n-1}\frac{\partial g}{\partial x}
\end{align}
$

Applying the above rules, we get:

$
\begin{align}
f &= 2ab^2 + 3a^2b \\
\frac{\partial f}{\partial a} &= 2b^2 + 6ab \\
\frac{\partial^2 f}{\partial a \partial b} &= 4b + 6a \\
\end{align}
$


Now that we worked out the derivatives, let's evaluate them at $a = 2$ and $b = 3$ so we have some test data when evaluating our automated approaches

In [2]:
def py_f(a, b):
    return 2 * a * (b ** 2) + 3 * (a ** 2) * b

def py_f_da(a, b):
    return 2 * (b ** 2) + 6 * a * b

def py_f_da_db(a, b):
    return 4 * b + 6 * a

py_f(a=2, b=3), py_f_da(a=2, b=3), py_f_da_db(a=2, b=3)

(72, 54, 24)

## Numerical differentiation
Our first attempt at automatic differentiation is through [numerical differentiation](https://en.wikipedia.org/wiki/Numerical_differentiation) inspired by the limit formula for derivatives:

$
\begin{align}
\frac{\partial f}{\partial x} = \lim _{h\to 0}{\frac {f(x+h)-f(x)}{h}}
\end{align}
$

According to this definition, we can approximate the derivative by calculating the function at $x$ and $x+h$ and calcluating the slope between the two points, this makes sense because the derivative is the slope of the curve in the region of $x$. Our formula will be slightly more complicated because we are computing a partial derivative so we only want to apply the $h$ offset to the variable we are taking the derivative with respect to.

In [3]:
def numerical_partial(f, darg, h=.0001):
    def df(**kwargs):
        dkwargs = kwargs.copy()
        dkwargs[darg] += h
        return (f(**dkwargs) - f(**kwargs)) / h
    return df

The `numerical_partial` function generates a new function which will calculate the partial derivative of `f` with respect to `darg`, let's use this to generate our partial derivative functions:

In [4]:
n_f_da = numerical_partial(py_f, 'a')
n_f_da_db = numerical_partial(n_f_da, 'b')

Now we are ready to evaluate our partial derivatives, we see they are very close to the exact partials we found earlier:

In [5]:
py_f(a=2, b=3), n_f_da(a=2, b=3), n_f_da_db(a=2, b=3)

(72, 54.00090000009072, 24.000500786769408)

Although numerical differentiation is simple and relatively easy to implement, it is computationally expensive because of the need to compute the underlying function `f` multiple times, this get's worse for higher order partial derivatives.

## Symbolic differentiation
Our second approach will be [symbolic differentiation](https://en.wikipedia.org/wiki/Computer_algebra). This approach involves representing the function `f` as an [expression](https://en.wikipedia.org/wiki/Expression_(computer_science) and implementing an `sevaluate` method which can calculate a value of the expression for a particular set of variable values and an `sderivative` which transforms the expression into another expression based on the rules of calculus. We will implement a very simple, highly restricted expression and methods which only works with simple polynomials.

We will represent polynomials as nested `tuple` objects, for example, our polynomial $f = 2ab^2 + 3a^2b$ is represented as:

In [6]:
s_f = ((2, ('a', 1), ('b', 2)), (3, ('a', 2), ('b', 1)))

This format is a very limited, toy example which was chosen just because its simple and gets the job done for this example. Now that we have a format, let's implement `sevaluate` and see if we can calculate the value of this function at $a=2$ and $b=3$:

In [7]:
def sevaluate(expression, variable_values):
    total = 0
    for term in expression:
        coeff, *variables = term
        term_total = coeff
        for variable, exponent in variables:
            variable_value = variable_values[variable]
            term_total *= variable_value ** exponent
        total += term_total
    return total

In [8]:
sevaluate(s_f, {'a': 2, 'b':3})

72

Our evaluate function appears to work, now we need a way to convert that expression into another expression which represents the partial derivative, we will implement `sderivative` which will perform the transformation. Again, this function is just a toy and can only deal with this very limited case of a simple polynomial:

In [9]:
def sderivative(expression, dvariable):
    dexpresion = []
    for term in expression:
        coeff, *variables = term
        dterm = []
        for variable, exponent in variables:
            if variable == dvariable:
                coeff *= exponent
                exponent -= 1
            dterm.append((variable, exponent))
        dterm.insert(0, coeff)
        dexpresion.append(tuple(dterm))
    return tuple(dexpresion)

Let's use our new method to compute the expression corresponding to both of the partial derivatives we are interested in:

In [10]:
s_f_da = sderivative(s_f, 'a')
s_f_da

((2, ('a', 0), ('b', 2)), (6, ('a', 1), ('b', 1)))

In [11]:
s_f_da_db = sderivative(s_f_da, 'b')
s_f_da_db

((4, ('a', 0), ('b', 1)), (6, ('a', 1), ('b', 0)))

The above expressions appear correct, let's test it out and see if we get the three values we expect

In [12]:
sevaluate(s_f, {'a': 2, 'b': 3}), sevaluate(s_f_da, {'a': 2, 'b': 3}), sevaluate(s_f_da_db, {'a': 2, 'b': 3})

(72, 54, 24)

Our toy system for computing symbolic derivatives worked and gave us exact answers without requiring us to work out the derivatives by hand. Our final example shows how TensorFlow can be used to compute exact gradients automatically through a similar (but vastly more sohpisticated) computational graph.

## TensorFlow
In our final example we will use TensorFlow to represent our function `f` as a Tensor and use the `tf.gradients` method to generate gradient tensors. The first step is to create a tensor for our polynomial

In [13]:
a = tf.placeholder(shape=(), dtype=float, name='a')
b = tf.placeholder(shape=(), dtype=float, name='b')
t_f = 2 * a * (b ** 2) + 3 * (a ** 2) * b

Note that TensorFlow used operator overloading to build up a computation graph and store a handle to the graph in `t_f`, we haven't calculated any values yet. Now let's use `tf.gradients` to get our partial derivatives:

In [14]:
t_f_da = tf.gradients(t_f, a)[0]
t_f_da_db = tf.gradients(t_f_da, b)[0]

Finally we are ready to calculate our values:

In [15]:
with tf.Session() as sess:
    print(sess.run((t_f, t_f_da, t_f_da_db), feed_dict={a:2, b:3}))

(72.0, 54.0, 24.0)


## Conclusion
This notebook demonstrated some approaches to automatic differentiation which is a core feature of most machine learning frameworks. By providing a way to automatically compute gradients, modern machine learning frameworks make it much simpler to experiment with different activation functions and neural network topologies without needing to work out the potentially complex math involved in generating a gradient for backpropagation.