In [None]:
# default_exp core

# Implementing autograd from scratch

> This will have both code and documentation in it

In [None]:
#hide
from nbdev.showdoc import *
%load_ext autoreload
%autoreload 2
from IPython.core.debugger import set_trace

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
from 
import numpy as np
import autograd 

### What are we trying to find?


We have to have a function we want to differentiate. This function will take in some number of variables: one, two, three... however many. If our function is called $f$, we are looking to find $df/dv$, where $v$ is one of these variables.  

Here we define a function in two variables: x and y. $$f = \log(x^2) + y^2 + xy$$

In [None]:
def f(x,y):    return np.log(x**2) + y**2 + x*y

The aim is to create a function `grad` that takes `f` as an input and returns a function with its gradient wrt a variable. So for our function above, `grad` can give us the answer of either $df/dx$ or $df/dy$, depending on what variable we tell it to differentiate with respect to. Let's define `grad`, giving it dummy values for now.

How do we choose this variable? We don't do it by name, telling the function either 'x' or 'y'. Rather, we give it a number (call it `argnum`), and say, "Differentiate `f` with respect to the `argnum`th input." `f` has an argument list, and `argnum` is the index of this variable list, which corresponds to the variable we are interested in. 

`grad` has a nested structure: it returns a function `gradfun` that in turn returns the gradient. Notice the use of `*args` and `**kwargs` below. This is needed in the inner function `gradfun` because a) it doesnt' know how many arguments `f` takes; b) you can choose to call those arguments either with or without keywords. If you call them without keywords, the arguments are stored in `*args`, and if you call them with keywords, the arguments are stores in `**kwargs`.

In [None]:
def grad(f, argnum = 0): 
    """Returns a function that finds the gradient"""
    def gradfun(*args, **kwargs):
        """Returns the actual gradient """
        #set_trace()
        print ("Args", args)
        print ("Kwargs", kwargs)
        # Dummy values. Returns correct gradient only for our function f above. 
        # Use these values until we calculate the true ones using autodiff. 
        #### remove this code once true code written
        if   argnum == 0: return 2*x * np.log(x**2) + y  # df/dx
        elif argnum == 1: return 2*y + x                 # df/dy
        #### 
        # true autograd code goes here 
        ####
    return gradfun
# example usage
dfdx = grad(f, argnum = 0)
dfdy = grad(f, argnum = 1)
print("dfdx", dfdx(1,2))      # call gradient w/out keywords, values go into *args    in gradfun 
print("dfdy", dfdy(x=13,y=4)) # call gradient with  keywords, values go into **kwargs in gradfun 

Args (1, 2)
Kwargs {}
dfdx 137.37736658799992
Args ()
Kwargs {'x': 13, 'y': 4}
dfdy 21


So that's our goal. We are trying to build this `grad` function properly, following the structure defined above. 

### Building a computation graph 

Say you had some expression, like $ (4 \times 5) + 2 - 4$. I'm sure you know the answer to this, but how would a computer work it out? 

You may remember the order of operations used to work out these expressions; I learnt the acronym [BODMAS](https://www.mathsisfun.com/operation-order-bodmas.html) to remember these. Python has its own order of operations too, governed by the hierarchy of [operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence). 

In [None]:
Autodifferentiation relies on the computation graph of 

The computation graph is made up of Nodes 

### Wrapping Nodes in Boxes 

Boxes are used to indicate the variable you are differentiating with respect to 

### Making VJPs

### Backwards pass

In [None]:
#hide
from nbdev.export import notebook2script
notebook2script()

Converted 00_core.ipynb.
Converted index.ipynb.
