In [2]:
%pylab inline
%config InlineBackend.figure_format = 'retina'
from ipywidgets import interact

Populating the interactive namespace from numpy and matplotlib


# A simple example of a rudimentary autodifferentiation package
I used some advanced features in this notebook: classes, inheritance, and generators. I tried to comment and include some additional examples (see Extra Notes at the end)

**Warning: this is just an experimntal example. In practice, we will use TensorFlow for tasks that involve automatic differentiation.**

In [3]:
class Autodiff_Node(object): 
    ## A class is a recipe for creating objects (with methods and atributes).
    ## This is called a 'base class', which is like a boiler plate recipe that 
    ## many other classes will use a starting point, each making specific 
    ## changes.


    ## All methods (unless otherwise specified) must have the first argument
    ## a variable called `self`, which is a copy of the object itself. Hence,
    ## one can access any method or atribute in the object throught the `self`
    ## variable.
    def __init__(self, parents): 
        """Parameters:
        ---------------
        `parents` a list of `Autodiff_Node` objects corresponding to the graph
            parents."""
        ## initializer gets called once when you create (or instantiate) an 
        ## object
        self._set_parents(parents)
        self._output_data = None
    def _set_parents(self, parents):
        self.parents = parents
        return None
    def set_output_data(self, y):
        self._output_data = y
        return None
    def get_output_data(self):
        return self._output_data
    ## a static modthod just means it doesn't depend on the data in `self`, so 
    ## `self` does not need to be an argument
    @staticmethod 
    def function(x): 
        """Given input `x` return output `y`"""
        ## this is just a place holder (or template) to be used to create 
        ## specific types of Node objects
        return NotImplementedError
    ## a static modthod just means it doesn't depend on the data in `self`, so 
    ## `self` does not need to be an argument
    @staticmethod
    def backpropagation_function(x, y, output_gradient): 
        """
        Parameters:
        --------------------
        `x` is the input variable(s): a list of tensors one for each input from 
            a graph parent.
        `y` is the output variable(s): a list of tensors one for each ouput to 
            a graph child.
        `output_gradient` is the gradient (list of partial derivatives) of a 
            scalar function with respect to one or more output variables.
        
        Returns:
        --------------------
        `input_gradient` is the gradient (list of partial derivatives) of a 
            scalar function with respect to one or more input variables."""
        ## this is just a place holder (or template) to be used to create 
        ## specific types of Node objects
        return NotImplementedError
    def eval(self):
        """Evaluate the output of the node, moving from necessary inputs 
        through the DAG in the forward direction."""
        ## recursively call eval for each node until input variables are reached
        x = [node.eval() for node in self.parents]
        return self.function(x)
    def _eval_and_save_output(self):
        ## this is a stateful approach and should be used with care. This method 
        ## will alter one of the atributes. This can lead to confusing and hard 
        ## to diagnose bugs. It is best to avoid doing this whenever possible.

        ## recursively call eval for each node until inputs are reached
        x = [node._eval_and_save_output() for node in self.parents]
        y = self.function(x)
        ## internal data, or state, is modified here. Specifically the 
        ## `self._output_data` attribute.
        self.set_output_data(y) 
        return y
    def _get_gradient(self, output_gradient):
        ## This is a helper function to assemble the gradients, moving backward 
        ## through the DAG. We must call `_eval_and_save_output()` before 
        ## using this method
        x = [node.get_output_data() for node in self.parents]
        ## We use internal state here, which assumes that 
        ## `_eval_and_save_output()` was called before using this method
        y = self.get_output_data() 
        input_gradient = self.backpropagation_function(x, y, output_gradient)
        ## We use recursion combined with generators (see examples at the end of 
        ## this notebook)
        for node, sub_gradient in zip(self.parents, input_gradient):
            ## recursive call to the same method attached to the parent nodes
            for inner_gradient in node._get_gradient(sub_gradient): 
                yield inner_gradient
    def compute_gradient(self): 
        """Assumes the node has scalar output"""
        ## computing gradients is very simple with the `Autodiff_node` class

        ## the dangerous stateful call must precede the gradient calculation
        self._eval_and_save_output() 
        ## the input is always simply `1.0` because partial_L/partial_L = 1
        return [g for g in self._get_gradient(1.)] 
    def __add__(self, b):
        ## You can define the "+" operator (and other operators)
        a = self
        return Add(a, b)

In [4]:
class Add(Autodiff_Node):
    """Add two input nodes"""
    ## this defines a node type specifically for addition, it 'inherits' all 
    ## of the methods and atributes from its base class, `Autodiff_Node`. Think
    ## of these as default methods. Any methods that are redefined here are used 
    ## instead of the default methods from the base class
    def __init__(self, a, b):
        ## initializer gets called once when you create (or instantiate) an 
        ## object
        parents = [a, b]
        super().__init__(parents) ## calls `__init__` method of the base class
    ## a static modthod just means it doesn't depend on the data in `self`, so 
    ## `self` does not need to be an argument
    @staticmethod
    def function(x):
        a = x[0]
        b = x[1]
        return a + b
    @staticmethod
    def backpropagation_function(x, y, output_gradient):
        input_gradient = [output_gradient*1, output_gradient*1]
        return input_gradient

class Multiply(Autodiff_Node):
    """Multiply two input nodes"""
    def __init__(self, a, b):
        parents = [a, b]
        super().__init__(parents)
    @staticmethod
    def function(x):
        a = x[0]
        b = x[1]
        return a*b
    @staticmethod
    def backpropagation_function(x, y, output_gradient):
        a = x[0]
        b = x[1]
        input_gradient = [output_gradient*b, output_gradient*a]
        return input_gradient

class Tanh(Autodiff_Node):
    """Apply the `tanh` function to an input node"""
    def __init__(self, x):
        parents = [x]
        super().__init__(parents)
    @staticmethod
    def function(x):
        return np.tanh(x[0])
    @staticmethod
    def backpropagation_function(x, y, output_gradient):
        dydx = 1./np.cosh(x[0])**2
        input_gradient = [output_gradient*dydx]
        return input_gradient

class Input_Variable(Autodiff_Node):
    """Input Variables have a specific fixed value. Use these to hold parameters 
    and variables. Gradient of a node with a scalar output will be a list of 
    partial derivatives with respect to these Input Variables.
    
    Parameters:
    ---------------
    `value` the numerical value of the variable (scalar in this example)."""
    def __init__(self, value):
        self.value = value
        parents = []
        super().__init__(parents)
    @staticmethod
    def function(x):
        return self.value
    @staticmethod
    def backpropagation_function(x, y, output_gradient):
        input_gradient = output_gradient
        return input_gradient
    def eval(self): 
        ## this overrides the default `eval` method defined in `Autodiff_Node`
        ## base class
        return self.value
    def _eval_and_save_output(self): ## another override
        self.set_output_data(self.value)
        return self.value
    def _get_gradient(self, output_gradient): ## another override
        yield output_gradient

# Simple example

In [7]:
w = Input_Variable(1.2)
u = Input_Variable(2.)
b = Input_Variable(-3.)

s1 = Multiply(w, u)
s2 = Add(s1, b)
# s2 = s1 + b

L = Tanh(s2)

L.eval()

-0.5370495669980354

In [8]:
L.compute_gradient()

[1.4231555251744454, 0.8538933151046673, 0.7115777625872227]

In [9]:
## exact gradient for comparison
_g1 = 1./cosh(w.eval()*u.eval() + b.eval())**2
print('gradient (w, u, b):', _g1*u.eval(), _g1*w.eval(), _g1)

gradient (w, u, b): 1.4231555251744454 0.8538933151046673 0.7115777625872227


### We can evaluate at any of the nodes

In [10]:
s2.eval()

-0.6000000000000001

In [11]:
s2.compute_gradient()

[2.0, 1.2, 1.0]

# What happens when an input variable is an input into more than one node?

In [5]:
w = Input_Variable(1.2)
u = Input_Variable(2.)
b = Input_Variable(-10.)

s1 = Multiply(w, u)
s2 = Add(s1, b)

s3 = Multiply(s2, u)

L = Tanh(s3)

L.eval()

-0.9999999999998745

In [6]:
L.compute_gradient() ## four outputs but we only have three input variables

[1.0036163614810117e-12,
 6.02169816888607e-13,
 5.018081807405058e-13,
 -1.906871086813922e-12]

### The resulting gradient has two components for the same variable
We probably just need to sum all of the elements of the output that correspond to the same input variable
-------------------------------------------------------------


# Generalizations:
  1. We could generalize this to account for multidimensional array valued inputs and outputs (this might work in the current form for some operations). For example, suppose we want to do a matrix vector product. We wouldn't want to create a gigantic graph with all the little individual additions and multiplications. It would be far more efficient to define a new node type. We could call it `Matrix_Vector_Product` for example. 
  2. We could also generalize so that we can use `if` statements, possibly even loops

# Extra notes:

In [None]:
## a little generator example 
[1 for j in arange(4)]

[1, 1, 1, 1]

In [None]:
## a little generator example 
[j for j in arange(4)]

[0, 1, 2, 3]

In [None]:
## a little generator example 
[[i for i in arange(j)] for j in arange(4)]

[[], [0], [0, 1], [0, 1, 2]]

In [None]:
## an advanced generator example using `yield` statements
def reverse_arange(n):
    for i in arange(n):
        yield n - 1 - i
[val for val in reverse_arange(3)]

[2, 1, 0]

In [None]:
## an advanced generator example using `yield` statements
def i1(n, m):
    for i in arange(n):
        for j in i2(m):
            yield j
def i2(n):
    for i in arange(n):
        yield i

[val for val in i1(3, 2)]

[0, 1, 0, 1, 0, 1]

In [None]:
[val for val in i1(2, 3)]

[0, 1, 2, 0, 1, 2]

In [None]:
## an advanced generator example using `yield` statements
def i1(n, m):
    for i in arange(n):
        for j in i2(m):
            yield j
def i2(n):
    for i in arange(n):
        for j in i3():
            yield j
def i3():
    yield 5
    
[val for val in i1(3, 2)]

[5, 5, 5, 5, 5, 5]

In [None]:
## an advanced generator example using `yield` statements
def i1(n, m):
    for i in arange(n):
        for j in i2(m):
            yield j
def i2(n):
    for i in arange(n):
        for j in i3():
            yield j
def i3():
    yield 5
    yield 3

    
[val for val in i1(3, 2)]

[5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3]

In [None]:
x = 1. + 5j
?x

In [None]:
def complex_add(x, y):
    return (x.real + y.real, x.imag + y.imag)

# Further Reading:
You can read more about how automatic differentiation is implemented in the powerful STAN library here https://arxiv.org/abs/1509.07164

# Dual numbers
Dual numbers are a bit like complex numbers. Recall, a complex number can be written as 
$$ z = x + iy,$$
where $i$ is the imaginary unit. Using $i^2 = -1$, one can work out the rules for arithmetic operations over complex numbers. For example, 
$$z_1\cdot z_2 = (x_1 + iy_1)(x_2 + iy_2) = x_1x_2 - y_1y_2  + i(x_1y_2 + y_1x_2).$$

Dual numbers can be written as
$$\xi = y + \epsilon y',$$
where $y\in\mathbb{R}$, $y'\in\mathbb{R}$, and $\epsilon$ is an infinitesimal unit. Using $ \epsilon^2 = 0 $, we can define arithmetic operations for these numbers. For example,
$$\xi_1\cdot \xi_2 = (y_1 + \epsilon y_1')(y_2 + \epsilon y_2') = y_1y_2 + \epsilon(y_1y_2' + y_1'y_2).$$
Notice that the second term on the right hand side is suggestive of the product rule. Likewise,
$$\xi^2 = \left(x + \epsilon x'\right)^2 = x^2 + \epsilon(2xx').$$

Numerically, one can implement dual numbers in a similar way as complex numbers. We simply staple together two floating point numbers, one for each part. We can then define operators for arithmetic operations involving dual numbers.