In [None]:
# default_exp core

# Implementing autograd from scratch

> This will have both code and documentation in it

In [None]:
#hide
from nbdev.showdoc import *
%load_ext autoreload
%autoreload 2
from IPython.core.debugger import set_trace
from IPython.display import Markdown as md


In [None]:
import numpy as np
import numpy as onp
import autograd 
path_assets = './assets/'

### Prerequsites 

* you know what backpropagation is and roughly how it works
* you know what autodifferentiation is 

### What are we trying to find?


We have to have a function we want to differentiate. This function will take in some number of variables: one, two, three... however many. If our function is called $f$, we are looking to find $df/dv$, where $v$ is one of these variables.  

Here we define a function in two variables: x and y. $$f = \log(x^2) + y^2 + xy$$

In [None]:
def f(x,y):    return np.log(x**2) + y**2 + x*y

The aim is to create a function `grad` that takes `f` as an input and returns a function with its gradient wrt a variable. So for our function above, `grad` can give us the answer of either $df/dx$ or $df/dy$, depending on what variable we tell it to differentiate with respect to. Let's define `grad`, giving it dummy values for now.

How do we choose this variable? We don't do it by name, telling the function either 'x' or 'y'. Rather, we give it a number (call it `argnum`), and say, "Differentiate `f` with respect to the `argnum`th input." `f` has an argument list, and `argnum` is the index of this variable list, which corresponds to the variable we are interested in. 

`grad` has a nested structure: it returns a function `gradfun` that in turn returns the gradient. Notice the use of `*args` and `**kwargs` below. This is needed in the inner function `gradfun` because a) it doesnt' know how many arguments `f` takes; b) you can choose to call those arguments either with or without keywords. If you call them without keywords, the arguments are stored in `*args`, and if you call them with keywords, the arguments are stores in `**kwargs`.

In [None]:
def grad(f, argnum = 0): 
    """Returns a function that finds the gradient"""
    def gradfun(*args, **kwargs):
        """Returns the actual gradient """
        print ("Args", args)
        print ("Kwargs", kwargs)
        if len(args) != 0: x,y = args
        if len(kwargs) != 0: x,y = kwargs.values()
        # Dummy values. Returns correct gradient only for our function f above. 
        # Use these values until we calculate the true ones using autodiff. 
        #### remove this code once true code written
        if   argnum == 0: return 2*x * np.log(x**2) + y  # df/dx
        elif argnum == 1: return 2*y + x                 # df/dy
        #### 
        # true autograd code goes here 
        ####
    return gradfun
# example usage
dfdx = grad(f, argnum = 0)
dfdy = grad(f, argnum = 1)
print("dfdx", dfdx(1,2))      # call gradient w/out keywords, values go into *args    in gradfun 
print("dfdy", dfdy(x=13,y=4)) # call gradient with  keywords, values go into **kwargs in gradfun 

Args (1, 2)
Kwargs {}
dfdx 2.0
Args ()
Kwargs {'x': 13, 'y': 4}
dfdy 21


So that's our goal. We are trying to build this `grad` function properly, following the structure defined above. 

### Building a computation graph 

Say you had some expression, like $ (4 \times 5) + 2 - 4$. I'm sure you know the answer to this, but how would a computer work it out? 

You may remember the order of operations used to work out these expressions; I learnt the acronym [BODMAS](https://www.mathsisfun.com/operation-order-bodmas.html) to remember these. Python has its own order of operations too, governed by the hierarchy of [operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence). Python breaks down an expression and executes it in this order. You can view the breakdown of an expression as a computation graph, where each node is an action (x3, +5, log) and where the graph doesn't have any loops (it is a DAG). 

The computation graph is very important to us. It is used in the backpropagation algorithm. The idea is that becuase each node of the graph is a simple operation (like +, x, log), working out its derivative is also easy. The gradient at a point on the computation graph is called the local gradient. Start at the head node and combine local gradients together until you reach a leaf node. Then this leaf node has the answer you seek. 

Now we need a way to build the computation graph. This is quite a clever idea actually. Typically your functions are made up of operators from the numpy package. What we will do is create a copy of the numpy package and use that copy instead of the original numpy package. The copy works exactly like the original does, except that when it executes functions, it builds a computation graph for us. 

The easiest way to show this is by an example. Let's say we had the following function: 
$$ logistic(z) = \frac{1}{1 + \exp(-z)} $$

We would implement this in code like this

In [None]:
def logistic(z): return 1 / (1 + np.exp(-z))


Numpy typically overwrites common operators, meaning it replaces $+, \times, / $ with its own numpy equivalents `np.add`, `np.multiply`, `np.divide` and so on. It does this defining the methods  `__add__`,`__mult__`,`__div__` in the `numpy.ndarray` class. So the effect of this is that `logistic(z)` gets transformed into something like this: 

In [None]:
def logistic2(z): return np.reciprocal(np.add(1, np.exp(np.negative(z))))

Let's apply the order of operations, or in other words the order that Python breaks down this expression. This has the same effect as constructing a number of intermediate variables, one after each operation. These intermediate variables store the result of a simple operation with the previous one, like $+, \times, / $ etc. These simple operations are called **primitives** and they are important later. Let's call the intermediate variables $t_1, t_2, t_3...$, the input to the function $z$, and the final value $y$. 

In [None]:
def logistic3(z): 
    t1 = np.negative(z) 
    t2 = np.exp(t1)
    t3 = np.add(1, t2)
    y = np.reciprocal(t3) 
    return y

Next we want to turn `logistic2` into a computation graph, with nodes and links between them. Here's what this graph looks like.

<img src="assets/node_tree.svg" width="240"/>

We will eventually define a class called `Node` and each node in the graph will be a member of this class. We define the links between nodes with the 'parents' attribute. Leaf nodes do not have parents. The leaf nodes above are $1$ and $z$.  

Below is a representation of the computational graphs, now using Nodes. The numbers in `value` indicate the value of that intermediate variable. The function was given $z=1.5$ as an input and outputs $y=0.818$. 

<img src="assets/node_tree.png" width="800"/>

Let's confirm we get the same answer. 

In [None]:
np.round(logistic3(1.5),3)

0.818

#### Constructing the Node class

Now we can construct a first version of the Node class. For each Node, we need at least `value`, a function (`fun`) and `parents`. Let's create an tuple called `recipe` that we store `fun` and `value` in. 

We also will create a function called `initialise_root` that starts off the graph. A root of the tree doesn't have any parents, its function is just the identity function, and it has no value. 

In [None]:
class Node:
    """A node in a computation graph."""
    def __init__(self, value, fun, parents):
        self.parents = parents
        self.recipe = (fun, value)
        # my additions
        self._value = value
        self._fun = fun 

    def initialize_root(self):
        self.parents = []
        self.recipe = (lambda x: x, None)

Now we have the Node class, we could manually build a computational graph if we wanted to. (We don't create a Node for $1$ or other scalars, just intermediate variables)

In [None]:
import numpy as np
val_z = 1.5 
z = Node(val_z, None, [])
val_t1 = np.negative(val_z)
t1 = Node(val_t1,np.negative, [z])
val_t2 = np.exp(val_t1)
t2 = Node(val_t2, np.exp, [t1])
val_t3 = np.add(val_t2, 1)
t3 = Node(val_t3, np.add, [t2])
val_y = np.reciprocal(val_t3)
y = Node(val_y, np.reciprocal, [t3])
print(round(y._value,3)) # same answer as before
del np

0.818


Creating the computational graph this way is both manual and clunky. Time now to build it automatically. 

### Creating a new version of Numpy

It would be great if numpy created a node for each intermediate variable and added it to our graph. But it won't do that. So it's time to make our own version of Numpy so we can trace the flow of execution and create our computation graph. 

First we'll delete the reference to our current numpy and import it under a new name, `onp`

In [None]:
try:     del np
except:  pass 
import numpy as _np, numpy as onp

Throughout this section, I'll refer to the original numpy as `onp`, and the version we are building as `anp`. 

There's a few things we have to do: 

* create new versions of `onp` functions. For example, we need to create a function `anp.add` that does everything `onp.add` does, but it also adds a node to the computation graph when called. Same with `anp.multiply`, `anp.divide` etc. 
* overload operators like $+, \times, /$ so that they use the `anp` versions: `anp.add`, `anp.multiply`, `anp.divide`, by defining functions like `__add__`, `__mul__`, `__div__`. 

Each numpy operation is wrapped in a *primitive*. 


In [None]:
# Non differentiable functions 
nograd_functions = [
    _np.ndim, _np.shape, _np.iscomplexobj, _np.result_type, _np.zeros_like,
    _np.ones_like, _np.floor, _np.ceil, _np.round, _np.rint, _np.around,
    _np.fix, _np.trunc, _np.all, _np.any, _np.argmax, _np.argmin,
    _np.argpartition, _np.argsort, _np.argwhere, _np.nonzero, _np.flatnonzero,
    _np.count_nonzero, _np.searchsorted, _np.sign, _np.ndim, _np.shape,
    _np.floor_divide, _np.logical_and, _np.logical_or, _np.logical_not,
    _np.logical_xor, _np.isfinite, _np.isinf, _np.isnan, _np.isneginf,
    _np.isposinf, _np.allclose, _np.isclose, _np.array_equal, _np.array_equiv,
    _np.greater, _np.greater_equal, _np.less, _np.less_equal, _np.equal,
    _np.not_equal, _np.iscomplexobj, _np.iscomplex, _np.size, _np.isscalar,
    _np.isreal, _np.zeros_like, _np.ones_like, _np.result_type
]

In [None]:
import types
unchanged_types = {float, int, type(None), type}
int_types = {_np.int, _np.int8, _np.int16, _np.int32, _np.int64, _np.integer}
function_types = {_np.ufunc, types.FunctionType, types.BuiltinFunctionType}

In [None]:
old = _np.__dict__, 


In [None]:
globals()

{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['',
  '# default_exp core',
  "#hide\nfrom nbdev.showdoc import *\nget_ipython().run_line_magic('load_ext', 'autoreload')\nget_ipython().run_line_magic('autoreload', '2')\nfrom IPython.core.debugger import set_trace\nfrom IPython.display import Markdown as md",
  "import numpy as np\nimport autograd \npath_assets = './assets/'",
  'def f(x,y):    return np.log(x**2) + y**2 + x*y',
  'def grad(f, argnum = 0): \n    """Returns a function that finds the gradient"""\n    def gradfun(*args, **kwargs):\n        """Returns the actual gradient """\n        print ("Args", args)\n        print ("Kwargs", kwargs)\n        if len(args) != 0: x,y = args\n        if len(kwargs) != 0: x,y = kwargs.values()\n        # Dummy values. R

#### Utility functions 

These are used throughout the program 

In [None]:
def subvals(x, ivs):
"""
    Changes an iterable x to a tuple and replace the ivs[0]th value of x with ivs[1], for every tuple in ivs. 
ivs is a list of (int, value) pairs, hence the name 

For example: 
x = [0,1,2,3,4,5,6]
ivs = [(0,'s1'), (4, 's2'), (5, 's3')]
subvals(x, ivs)
# gives ('s1', 1, 2, 3, 's2', 's3', 6)
    """
    x_ = list(x)
    for i, v in ivs:
        x_[i] = v
    return tuple(x_)

def subval(x, i, v):
    x_ = list(x)
    x_[i] = v
    return tuple(x_)

# If True, the value is Box.
isbox  = lambda x: type(x) in box_types  # almost 3X faster than isinstance(x, Box)
# Get value from a Box.
getval = lambda x: getval(x._value) if isbox(x) else x

def toposort(end_node):
    child_counts = {}
    stack = [end_node]
    while stack:
        node = stack.pop()
        if node in child_counts:
            child_counts[node] += 1
        else:
            child_counts[node] = 1
            stack.extend(node.parents)

    childless_nodes = [end_node]
    while childless_nodes:
        node = childless_nodes.pop()
        yield node
        for parent in node.parents:
            if child_counts[parent] == 1:
                childless_nodes.append(parent)
            else:
                child_counts[parent] -= 1

def wraps(fun, namestr="{fun}", docstr="{doc}", **kwargs):
    def _wraps(f):
        try:
            f.__name__ = namestr.format(fun=get_name(fun), **kwargs)
            f.__doc__ = docstr.format(fun=get_name(fun), doc=get_doc(fun), **kwargs)
        finally:
            return f
    return _wraps

def wrap_nary_f(fun, op, argnum):
    namestr = "{op}_of_{fun}_wrt_argnum_{argnum}"
    docstr = """\
    {op} of function {fun} with respect to argument number {argnum}. Takes the
    same arguments as {fun} but returns the {op}.
    """
    return wraps(fun, namestr, docstr, op=get_name(op), argnum=argnum)

get_name = lambda f: getattr(f, '__name__', '[unknown name]')
get_doc  = lambda f: getattr(f, '__doc__' , '')

IndentationError: expected an indented block (<ipython-input-120-90e1965a0d8d>, line 11)

### Boxes 

A Box is a class that wraps around a Node. Autograd package has several types of Box. Autodidact just has Box and ArrayBox. So that is what we will use here. 

In [None]:
class Box:
    # These belong to all members of class Box
    type_mappings = {} 
    types = set()
    
    def __init__(self, value, node): 
        self._value = value
        self._node = node 
        
    
    
    @classmethod
    def register(cls, value_type): 
        """cls is Box if called by a Box object, ArrayBox if called with ArrayBox etc"""
        Box.types.add(cls)
        Box.type_mappings[value_type] = cls
        #Box.type_mappings[cls] = cls
    
box_type_mappings = Box.type_mappings
box_types = Box.types

#### Functions with boxes 

In [None]:
def new_box(value, node):
    """Box an unboxed value.
    Args:
      value: unboxed value
      node: Node corresponding to this boxed value.
    Returns:
      Boxed value.
    """
    try:
        return box_type_mappings[type(value)](value, node)
    except KeyError:
        raise TypeError("Can't differentiate w.r.t. type {}".format(type(value)))

In [None]:
def find_top_boxed_args(args):
    top_boxes = []
    for i,o in enumerate(args):
        top_boxes.append((i,o))
    
#     top_trace_id = -1
#     top_boxes = []
#     for argnum, arg in enumerate(args):
#         if isbox(arg):
#             if arg._trace_id > top_trace_id:
#                 top_boxes = [(argnum, arg)]
#                 top_trace_id = arg._trace_id
#             elif arg._trace_id == top_trace_id:
#                 top_boxes.append((argnum, arg))
    return top_boxes

#### Primitives 

In [None]:
def primitive(f_raw):
    """Wraps a function so that its gradient (vjp) can be specified and its
    invocation can be recorded."""
    def f_wrapped(*args, **kwargs):
        set_trace()
        boxed_args = find_top_boxed_args(args)
        if boxed_args:
            # Get nodes for each boxed argument.
            parents = tuple(box._node for _, box in boxed_args)
            
            # Note that we use a recursive call here in order to also augment
            # outer calls to trace() with lower trace_ids. See TraceStack's
            # docstring for details.
            ans = f_wrapped(*argvals, **kwargs)
            # Create a new node
            node = Node(ans, f_wrapped, parents)
            return new_box(ans, node)
        else:
            return f_raw(*args, **kwargs)
    return f_wrapped


Let's play with this for a bit 

In [None]:
box_z = Box(z._value, z)
box_t1 = Box(t1._value, t1)
box_t2 = Box(t2._value, t2)
box_t3 = Box(t3._value, t3)
box_y = Box(y._value, y)
box_y.register('string_type')

In [None]:
new_add = primitive(onp.add)

In [None]:
new_add(box_z,box_t1)

> [0;32m<ipython-input-157-9695c471e093>[0m(6)[0;36mf_wrapped[0;34m()[0m
[0;32m      4 [0;31m    [0;32mdef[0m [0mf_wrapped[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      5 [0;31m        [0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 6 [0;31m        [0mboxed_args[0m [0;34m=[0m [0mfind_top_boxed_args[0m[0;34m([0m[0margs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      7 [0;31m        [0;32mif[0m [0mboxed_args[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      8 [0;31m            [0;31m# Get nodes for each boxed argument.[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> n
> [0;32m<ipython-input-157-9695c471e093>[0m(7)[0;36mf_wrapped[0;34m()[0m
[0;32m      5 [0;31m        [0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      6 [0;31m        [0mboxed_args[0m [0;34m=

BdbQuit: 

In [None]:
%debug

> [0;32m<ipython-input-107-3f5577b96125>[0m(8)[0;36m<genexpr>[0;34m()[0m
[0;32m      6 [0;31m        [0;32mif[0m [0mboxed_args[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      7 [0;31m            [0;31m# Get nodes for each boxed argument.[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 8 [0;31m            [0mparents[0m [0;34m=[0m [0mtuple[0m[0;34m([0m[0mbox[0m[0;34m.[0m[0m_node[0m [0;32mfor[0m [0m_[0m[0;34m,[0m [0mbox[0m [0;32min[0m [0mboxed_args[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      9 [0;31m[0;34m[0m[0m
[0m[0;32m     10 [0;31m            [0;31m# Note that we use a recursive call here in order to also augment[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> boxed_args
*** NameError: name 'boxed_args' is not defined
ipdb> locals
<built-in function locals>
ipdb> locals()
{'.0': <tuple_iterator object at 0x10cd88748>}
ipdb> boxed_args
*** NameError: name 'boxed_args' is not defined
ipdb> q


Nothing interesting so far. 

How do we create a box? We could use the constructor. But there is also this other function defined for us called `new_box`. 

This won't work yet because we haven't added any box type mappings yet. If we add something to the type_mappings, it means we can differentiate with respect to that thing. 

In [None]:
try: new_box(1.5, z)
except: print("doesn't work yet")

doesn't work yet


#### The ArrayBox 

An ArrayBox is the same thing as *numpy.ndarray*, except it will not use functions from the original numpy `onp`, but rather it will use the new functions we created in `anp`. It inherits from `Box`. 

The ArrayBox is where we define all our operations. When I'm trying to streamline things, I'll try to condense Box and ArrayBox into one thing. I don't think you need two classes for these for a simple prototype. 

Now we haven't defined any of the `anp` functions yet, so this won't work. So let's just use `onp` ones and see if we can get something working. 

In [None]:
class ArrayBox(Box): 
    def __bool__(self):  return bool(self._value)
    def __add__(self, other): return onp.add(self, other)
    def __sub__(self, other): return onp.subtract(self, other)
    def __mul__(self,other): return onp.multiply(self, other)

In [None]:
nondiff_methods = [
    'all',
    'any',
    'argmax',
    'argmin',
    'argpartition',
    'argsort',
    'nonzero',
    'searchsorted',
    'round']

diff_methods = [
    'clip',
    'compress',
    'cumprod',
    'cumsum',
    'diagonal',
    'max',
    'mean',
    'min',
    'prod',
    'ptp',
    'ravel',
    'repeat',
    'reshape',
    'squeeze',
    'std',
    'sum',
    'swapaxes',
    'take',
    'trace',
    'transpose',
    'var']
for method_name in nondiff_methods + diff_methods:
    setattr(ArrayBox, method_name, anp.__dict__[method_name])

In [None]:
box_y = ArrayBox(y._value, y)
box_z = ArrayBox(z._value, z)
box_y + 1

RecursionError: maximum recursion depth exceeded while calling a Python object

In [None]:
class Base: 

    def __init__(self,value): self._value = value   
        
    def __bool__(self):
        return bool(self._value)
    __nonzero__ = __bool__

In [None]:
class Test(Base): 
    
    def __add__(self,other): return onp.add(self,other)

In [None]:
t1,t2 = Test(1), Test(2)

In [None]:
t1 + t2

RecursionError: maximum recursion depth exceeded while calling a Python object

In [None]:
import numpy as anp
class ArrayBox(Box):
    """Box for np.ndarray.
    Anything you can do with an np.ndarray, you can do with an ArrayBox.
    """
    #@primitive
   # def __getitem__(A, idx): return A[idx]

    # Constants w.r.t float data just pass though
    shape = property(lambda self: self._value.shape)
    ndim  = property(lambda self: self._value.ndim)
    size  = property(lambda self: self._value.size)
    dtype = property(lambda self: self._value.dtype)
    T = property(lambda self: anp.transpose(self))
    def __len__(self): return len(self._value)
    def astype(self, *args, **kwargs): return anp._astype(self, *args, **kwargs)

    def __neg__(self): return anp.negative(self)
    def __add__(self, other): return anp.add(     self, other)
    def __sub__(self, other): return anp.subtract(self, other)
    def __mul__(self, other): return anp.multiply(self, other)
    def __pow__(self, other): return anp.power   (self, other)
    def __div__(self, other): return anp.divide(  self, other)
    def __mod__(self, other): return anp.mod(     self, other)
    def __truediv__(self, other): return anp.true_divide(self, other)
    def __matmul__(self, other): return anp.matmul(self, other)
    def __radd__(self, other): return anp.add(     other, self)
    def __rsub__(self, other): return anp.subtract(other, self)
    def __rmul__(self, other): return anp.multiply(other, self)
    def __rpow__(self, other): return anp.power(   other, self)
    def __rdiv__(self, other): return anp.divide(  other, self)
    def __rmod__(self, other): return anp.mod(     other, self)
    def __rtruediv__(self, other): return anp.true_divide(other, self)
    def __rmatmul__(self, other): return anp.matmul(other, self)
    def __eq__(self, other): return anp.equal(self, other)
    def __ne__(self, other): return anp.not_equal(self, other)
    def __gt__(self, other): return anp.greater(self, other)
    def __ge__(self, other): return anp.greater_equal(self, other)
    def __lt__(self, other): return anp.less(self, other)
    def __le__(self, other): return anp.less_equal(self, other)
    def __abs__(self): return anp.abs(self)
    def __hash__(self): return id(self)

In [None]:
# Register ArrayBox as the type to use when boxing np.ndarray and scalar values.
ArrayBox.register(np.ndarray)
for type_ in [float, np.float64, np.float32, np.float16,
              complex, np.complex64, np.complex128]:
    ArrayBox.register(type_)

NameError: name 'np' is not defined

In [None]:
b1._node

#### Primitives 

In [None]:
def primitive(f_raw):
    """Wraps a function so that its gradient (vjp) can be specified and its
    invocation can be recorded."""
    @wraps(f_raw)
    def f_wrapped(*args, **kwargs):
        # Fetch boxed arguments with largest trace_id.  This ensures that the
        # computational graph being constructed only consists of other nodes
        # from the same call to trace().
        boxed_args, trace_id = find_top_boxed_args(args)
        if boxed_args:
            # Replace some elements of args with corresponding unboxed values.
            argvals = subvals(args, [(argnum, box._value) for argnum, box in boxed_args])
            # Get nodes for each boxed argument.
            parents = tuple(box._node for _, box in boxed_args)

            # Get argument indices for each boxed argument.
            argnums = tuple(argnum for argnum, _ in boxed_args)

            # Calculate result of applying original numpy function.
            #
            # Note that we use a recursive call here in order to also augment
            # outer calls to trace() with lower trace_ids. See TraceStack's
            # docstring for details.
            ans = f_wrapped(*argvals, **kwargs)

            # Create a new node
            node = Node(ans, f_wrapped, argvals, kwargs, argnums, parents)
            return new_box(ans, trace_id, node)
        else:
            return f_raw(*args, **kwargs)
    return f_wrapped

### Wrapping Nodes in Boxes 

Boxes are used to indicate the variable you are differentiating with respect to 

### Making VJPs

### Backwards pass

### Resources

I used the following resources to put this document together. 

Lecture slides by Roger Grosse: https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf

In [None]:
#hide
from nbdev.export import notebook2script
notebook2script()