# Milestone 1

## Introduction

Being able to calculate derivatives is crucial for optimization, probabilistic inference, modeling in physics, and much more. However, functions in the real-world are often very complex, and it can be very challenging to calculate the derivative of those functions. Our automatic differentiation (AD) software computes the derivative of any function by breaking the function down into elementary functions and using chain rule (see **Background** for more details). The AD software can automatically calculate the derivatives of any arbitrary order, with a high accuracy to machine precision. The software has many applications, such as in sensitivity analysis, numerical methods, and machine learning. 

## Background

Automatic differentiation is possible because any complicated function can be represented as a combination of **elementary functions**, such as addition, multiplication, exponential function, and trigonometric function. In other words,  $f(x)$ can be represented as $g_{n}(g_{n-1}(g_{n-2}(...g_1(x)))))$, where $g_i(x)$ is the value of the elementary function at x.

The **chain rule** is then applied to calculate the function's derivative. Recall that using the chain rule, the derivative of function $h\left(u\left(t\right)\right)$ is $\dfrac{\partial h}{\partial t} = \dfrac{\partial h}{\partial u}\dfrac{\partial u}{\partial t}.$

For example, let's say that we want to compute $f^{\prime}\left(\dfrac{\pi}{16}\right)$ of a complicated function:
$$f\left(x\right) = x - \exp\left(-2\sin^{2}\left(4x\right)\right).$$

The evaluation trace below shows how the function is broken down into combinations of elementary functions. The table also keeps track of derivatives of each elementary function.

| Trace    | Elementary Operation &nbsp;&nbsp;&nbsp;| Derivative &nbsp;&nbsp;&nbsp; | $\left(f\left(a\right), \space f^{\prime}\left(a\right)\right)$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|
| :------: | :----------------------:               | :------------------------------: | :------------------------------: |
| $x_{1}$  | $\dfrac{\pi}{16}$                      | $1$                | $\left(\dfrac{\pi}{16}, 1\right)$ |
| $x_{2}$  | $4x_{1}$                               | $4\dot{x}_{1}$                 | $\left(\dfrac{\pi}{4}, 4\right)$ |
| $x_{3}$  | $\sin\left(x_{2}\right)$               | $\cos\left(x_{2}\right)\dot{x}_{2}$            | $\left(\dfrac{\sqrt{2}}{2}, 2\sqrt{2}\right)$ |
| $x_{4}$  | $x_{3}^{2}$                            | $2x_{3}\dot{x}_{3}$                   | $\left(\dfrac{1}{2}, 4\right)$ |
| $x_{5}$  | $-2x_{4}$                              | $-2\dot{x}_{4}$ | $\left(-1, -8\right)$ |
| $x_{6}$  | $\exp\left(x_{5}\right)$               | $\exp\left(x_{5}\right)\dot{x}_{5}$ | $\left(\dfrac{1}{e}, - \dfrac{8}{e}\right)$ |
| $x_{7}$  | $-x_{6}$                               | $-\dot{x}_{6}$                  | $\left(-\dfrac{1}{e}, \dfrac{8}{e}\right)$ |
| $x_{8}$  | $x_{1} + x_{7}$                        | $\dot{x}_{1} + \dot{x}_{7}$ | $\left(\dfrac{\pi}{16} - \dfrac{1}{e}, 1 + \dfrac{8}{e}\right)$ |

Therefore, $\space f^{\prime}\left(\dfrac{\pi}{16}\right) = 1 + \frac{8}{e} = 3.9430355293715385. $

A **computational graph** drawn below can be used to visualize the evaluation trace. Each node with incoming edge (arrow) represents an elementary function of the edge's tail node.

![](fig/graph1.png)

Using **dual numbers** allows us to represent the derivative of elementary functions without symbolically calculating the derivative. Dual number has a real part and a dual part ($z = x + \epsilon x^{\prime}$). If there is function $f(x)$ and $x$ is extended to be $x + \epsilon x^{\prime}$, the resulting dual part would contain the derivative of the function. Dual part conveniently vanishes when the order of the dual number is higher than 1, because $\epsilon^2 = 0$.

## Software organization

* The final module will be structured as follows:
    
    -autodiffpy\
         -autodiffpy\
              -__init__.py
              -autodiff.py
              -dualnumber.py  
         -tests\
              -autodiff_test.py
              -dualnumber_test.py
         -Examples\
              -benchmarkdata.txt
              -sin-cos.py
              -linearfunction.py
         -Docs\
              -Tutorial.ipynb
              -update.md
         -README.md
         -Setup.py
         -requirements.txt
         -LICENSE


* There will be three main module in our library:
        -autodiff module
            -autodiff class:
                1. __add__ 
                2. __sub__ 
                3. __mul__ 
                4. __div__                  
            -reverse functions:
                1. __radd__
                2. __rsub__ 
                3. __rmul__ 
                4. __rdiv__  
        -dualnumber module:
            -dualnumber class: Functions to perform math operations

             
* Test suite set up:    We will have our test suite in both TravisCI and Coveralls.
     
* Distribution:    We will distribute our package through PyPi.
    * ex) `pip install autodiffpy`


## Implementation

### Modules

#### autodiff

The module **autodiff** will contain two main components: the class *Autodiff* and the method *jacobian*.

The *Autodiff* class will allow users to generate variables, and then use those variables to form an equation. The class will then perform automatic differentiation on that equation, by (1) calculating the numerical value of that equation, and (2) calculating the numerical value of that functions’ derivatives with respect to those variables. The class will be able to calculate the derivatives of the equation up to any order, starting from the first order (the first derivative).


To start this process, the user will first initialize each variable of the desired equation separately, as a different instance of the *Autodiff* class.  Each instance will require the following inputs:

* name [string, required]: The name that the user would like to use for this variable.  
* val [float/numpy matrix, required]: The numerical value/matrix of values that the user would like to assign to this variable.
* order [integer, required; default=1]: The highest order to which the user would like to calculate the equation’s derivative.

This initialization will create the following attributes for each instance:
* name [string]: This stores the given name of the variable.
* dualnum[dualnumber]: This stores the initial value of the variable.
* order [integer]: This stores the input order of the variable.

The user will then be able to perform mathematical operations on these variables in the form of an equation.  Doing so will return a new instance of the *Autodiff class*, which will have the following output attributes relevant to the user:


* val [Autodiff]: This returns dualnumber.real, which provides the value of the equation.
* der [dictionary of Autodiffs]: This returns a dictionary containing the hierarchy of ordered derivatives stored in the dualnumber.dual values, which provides the derivative(s) of the equation.

From this returned instance of the *Autodiff* class, the user will therefore have numerical values/matrices of values for both the equation and its derivatives up to any order.


Underneath the ‘hood’ of the code, so to speak, the *Autodiff* class will contain private dunder methods that the user should not attempt to access.  These methods will override elementary operations (\__add__, \__sub__, \__mul__, \__div__, etc.) and reverse elementary operations (\__radd__, \__rsub__, \__rmul__, \__rdiv__, etc.).  Each overridden method will pass the operation to the separate *Dualnumber* class in the **autodiff** module.  The *Dualnumber* class will use dual numbers to calculate the derivatives with respect to each unique variable key name contained in the variables’ attribute dictionary der.  (See the description of the *Dualnumber* class for more details.)  The overridden methods will then each return a new instance of the *Autodiff class*, which will have the updated equation value/matrix of values and derivative values/matrix of values stored in its attributes.


The example below demonstrates how the user will interact with our *Autodiff* class in our software:

```python
>>> # Import the Autodiff class
>>> from autodiffpy import autodiff.Autodiff as AD
>>> # Create variable instances of the class
>>> x = AD(name=”x”, val=3, order=2)
>>> y = AD(name=”y”, val=-4.5, order=2)
>>> # Define the equation to evaluate
>>> f = x**2 + y - x/y
>>> # Output the results (real output won’t have rounded values)
>>> print(f.val) # Numerical value of equation
5.1667
>>> print(f.der[“1”]) # Numerical values of equation’s first-order derivatives
{“x”:6.2222, “y”:0.8519}
>>> print(f.der[“2”]) # Numerical values of equation’s second-order derivatives
{“xx”:2, “xy”:-0.0494, “yx”:-0.0494, “yy”:0.0658}
```

The *jacobian* method will allow the user to return the derivatives of an instance of the Autodiff class in numpy array form, organized in any desired variable sequence and at any calculated derivative order.  It will accept the following inputs:
* ad [*Autodiff* class instance, required]: The Autodiff variable for which the derivatives are to be printed.
* order [integer, required; default=1]: The order of the derivative desired.
* sequence [list of strings, optional]: The sequence of variables by name in the matrix, if so desired.  If this input is not given, then the method will return the matrix in an unordered sequence.

Calling this *jacobian* method will return a numpy array containing the desired derivatives.


The below example, which continues from the previous example, demonstrates the operation of the *jacobian* method:

```python
>>> # Import the jacobian method
>>> from autodiffpy import autodiff.jacobian as jac
>>> # Print the previously-calculated derivatives in numpy array form (real output won’t have rounded values)
>>> jac(ad=f, order=1, sequence=[“y”, “x”])
[0.8519, 6.2222] #Returned in numpy array form
>>> jac(ad=f, order=2, sequence=[“x”, “y”])
[[2, -0.0494], [-0.0494, 0.0658]] #Returned in numpy array form
```

#### autodiff_math

Mathematical operations, exluding simple arithmetic operations, are included in the autodiff_math module. The user will be required to import this module to perform these operations on an autodiff instance, because numpy and other standard math libraries in python will not be able to handle autodiff instances. These operations include trigonmetric functions, logarithmic functions, exponential functions, and power functions. The operations within this module are able to create a new autodiff instance with a properly updated value and an updated list of derivatives.

An example of the user interface is below:

```python
from autodiffpy import autodiff
import autodiff_math as adm

x = autodiff('x',5) #creates new autodiff instance
f = adm.log(x) #creates a new autodiff instance with the value and derivative of log(x)
```

The autodiff_math module handles the log function as follows:

```python
def log(ad):
    try:
        if ad.val<=0: #can only calculate logarithmic values for positive numbers
            raise ValueError
        anew = autodiff(name = ad.name, val = np.log(ad.val), der = ad.der) #create new autodiff class
        for key in ad.der:
            anew.der[key] = ad.der[key]/ad.val #update list of derivatives
        return anew
    except TypeError:
        print("Error: input should be autodiff instance")
    except ValueError:
        print('Error: cannot evaluate the log of a nonpositive number')
```

### External Dependencies

* Numpy package: for organizing Jacobian array and to calculate sin, log, cos, etc. within the dualnumber methods.