# CS207 Milestone2

## Introduction

The premise of this project relies on the question - "Why are derivatives important?". Derivatives are used to explain the instantaneous rate of change and is a building block in the mathematics behind optimization problems. They are a fundamental tool of calculus. Since real world problems are continuous and seldomly neatly segmented into discrete blocks, we need derivatives to help us understand the patterns that we see occurring in the real world. This can range from applications in the field of medicine to that of rocket science. The process of finding a derivative is called differentiation and is a fundamental operation in the field of calculus.

While we learn differentiation by hand, more complex equations require us to use computers to compute the results.Automatic Differentiation is a way for computers to break down the steps required to compute the derivative of a function. Computers no matter their level of processing power, all have a sequence of elementary arithmetic operations and elementary functions that can be put together via the chain rule to calculate complex, higher-order tasks. Our software library will take in these complex, higher-order tasks and produce accurate results via Automatic Differentiation. 

Another method that computers use to compute derivatives is Symbolic Differentiation. Symbolic Differentiation requires complete knowledge of all control flow in the program and can get exponentially large which results in a slow-down of the system when computing complex workloads. We choose to focus on the Automatic Differentiation implementation in this project because of this reason.

We are building a software library to allow users to utilize Automatic Differentiation methods on machine calculations in order to guarantee the precision and accuracy of their calculations (this is particularly useful in scientific computing scenarios). 


## Background
#### What is AD?
As mentioned in the introduction, computers are able to compute elementary arithmetic operations and functions extremely well. When a computer is tasked with a derivative equation however, they can end up utilizing step sizes for limit operation that are too big or too small. When this occurs, the delta between the approximation and the actual value can vary significantly and randomly. In most computing scenarios, and especially scientific ones, accuracy is required and non-negotiable. Therefore, we turn to automatic differentiation to save the day. AD utilizes the simple chain rule - that the derivative of each sub-expression can be calculated recursively to obtain the final derivatives - to overcome the problem of inaccuracy that computers are presented with.

#### Why AD?

There are three kinds of differentiation: Numerical Differentiation, Symbolic Differentiation, and Automatic Differentiation. 

_Numerical Differentiation_ works by estimating the derivative of a mathematical function using values of the function and is done by completing finite difference approximations. This way of calculating a derivative requires a lot of computation when faced with a high-dimensional function and therefore cannot scale into more complex endeavors.

*Symbolic Differentiation* utilizes the product rule and the chain rule to restructure the original function into a simplified expression that can be handled by the computer for more efficient calculations. Symbolic Differentiation requires access to and transformation of source code. It can be expensive to build and performs redundant computation.

*Automatic Differentiation* also uses the chain rule but does so by breaking components of the function down into elementary functions that can be computed with high accuracy and speed allowing for better utilization of computing resources in application. This method is simple to implement and verify. It is also easy to use while producing accurate results at the level that Symbolic Differentiation produces.

#### Derivatives
The derivative of a function measures the sensitivity of the function output with respect to its input value. The derivative is often described as the 'instantaneous rate of change'.


$$f'(a)=\lim _{h\to 0}{\frac {f(a+h)-f(a)}{h}}$$

#### Chain Rule

The chain rule allows us to break down the computation of a derivative of an object comprised of two or more functions as can be seen in the example below. It is a fundamental building block towards understanding how Automatic Differentation works.

$${\frac  {dz}{dx}}={\frac  {dz}{dy}}\cdot {\frac  {dy}{dx}}$$

#### How does AD work?
An example of how AD works is provided below using the chain rule:

\begin{aligned}
y=f(g(h(x)))=f(g(H(w_{0})))=f(g(w_{1}))=f(w_{2})=w_{3}
\end{aligned}

\begin{aligned}
w_{0}=x\\
w_{1}=h(w_{0})\\
w_{2}=g(w_{1})\\
w_{3}=f(w_{2})=y\\
\end{aligned}

$$
\frac{\partial{y}}{\partial{x}} = \frac{\partial{y}}{\partial{w_{2}}} \cdot \frac{\partial{w_{2}}}{\partial{w_{1}}} \cdot \frac{\partial{w_{1}}}{ \partial{x}}
$$

Forward mode states that goes from the inside to the outside, while reverse mode is from the outside to the inside. In the case above:

Forward mode calculates: $\frac{\partial{w_{i}}}{\partial{x}} = \frac{\partial{w_{i}}}{\partial{w_{i-1}}}\cdot \frac{\partial{w_{i-1}}}{\partial{x}}$ and $w_3 = y$

Reverse mode calculates: $\frac{\partial{y}}{\partial{w_{i}}} = \frac{\partial{y}}{\partial{w_{i+1}}} \cdot \frac{\partial{w_{i+1}}}{\partial{w_{i}}}$ and $w_{0} = x$


#### Forward and Reverse Mode

_Forward Mode_

Forward mode applies the chain rule to each basic operation in a forward trace - at every stage it will evaluate the operater as well as the gradient in lockstep. 

_Reverse Mode_

There are two phases in reverse mode: the forward phase and the backward phase. During the forward phase, all intermediate variables are evaluated and the results are stored in memory. In the backward phase, the chain rule is utilized to propagate back the derivatives.

#### Elementary Functions

Elementary functions are defined as funtioncs of one variable which is a finite sum, product, and/or composition of rational functions, sin, cos, exp, and their inverse functions. We've listed a sample of elementary functions and their corresponding derivatives in the table below.


|$$f(x)$$ |	$$f'(x)$$
| :- |:-|
|$$c$$|	$0$|
|$$x$$ |$1$|
|$$x^n$$|$$nx^{n-1}$$|
|$$\frac{1}{x}$$|$$-\frac{1}{x^2}$$|
|$$e^x$$|$$e^x$$|
|$$log_ax$$|$$\frac{1}{x \ln a}$$|
|$$\ln x$$|$$\frac{1}{x}$$|
|$$\sin(x)$$|$$\cos(x)$$|
|$$\cos(x)$$|$$-\sin(x)$$|
|$$\tan(x)$$|$$\frac{1}{\cos^2x}$$|

#### Jacobians 
The Jacobian matrix of a vector-valued function is the matrix of all of its first-order partial derivatives as can be seen in the matrix representation below. If the Jacobian matrix is a square matrix, it is known as the Jacobian determinant.
<img src="./pictures/Figure4.png">

#### Computational Graph Representation

The computational graph representation allows us to visualize how the computer is breaking down the calculations into the fundamental steps we described above. For example, in lecture, we took the following equation and broke it down into the respective steps for calculation. The graph can be viewed as a flowchart and is easier for the viewer to understand how the elementary functions operate on each variable in the process.

Example Equation:
$$f(x) = 6\exp(x^2)+\cos(2x)$$


Computational Graph:
<img src="./pictures/Figure3.png">


Evaluation Table:

| Trace | Elementary Function | Current Value | Elementary Function Derivative | Delta x       
| :- |:-| :- | :- | :-
|$x_{1}$| $x$ | $x$ | 1| 1 <br>
|$x_{2}$ | $x_{1}^{2}$ | $x_{1}^{2}$ | $2x_{1}$ | $2x_{1}$  <br>
|$x_{3}$ | $exp(x_{2}^{2})$|$exp(x_{2}^{2})$ |$2xexp(x_{2}^{2})$| $2xexp(x_{2}^{2})$ <br>
|$x_{4}$|$6x_{3}$| $6x_{3}$ |$x_{3}$|$x_{3}$ <br>
|$x_{5}$| $2x_{1}$ | $2x_{1}$ |$x_{1}$ | $x_{1}$ <br>
|$x_{6}$|$cos(x_{5})$ |$cos(x_{5})$|$sin(x_{5})$|$sin(x_{5})$ <br>
|$x_{7}$|$x_{4}+x_{6}$|$x_{4}+x_{6}$|$2xexp(x_{3}^{2})+sin(x_{5})$ | $2xexp(x_{3}^{2})+sin(x_{5})$




## How to Use PackageName
### How to Use *AutoDiff*


#### Installation via Github (for developers and users)
Users are able to install our package via Github through following commands:

```bash
git clone https://github.com/StanAndyJohn/cs207-FinalProject.git

```
Create a virtual environment and call it `env`.
```bash
virtualenv env
```

Activate the virtual environment and install the dependencies.
```bash
source env/bin/activate
pip install -r requirements.txt
```

Open a Python interpreter on the virtual environment and import the module

```python
>>> import autoDiff.forward_mode as ad
```

#### Introduction to basic usage of the package

After successful installation, the user will first import our package.
```python
>>> import autoDiff.forward_mode as ad
```
We have the following options provided:

##### Scalar functions of scalar values
Goal:  gradient of the expression $f(x) = alpha * x + 6$.
Input:  a variable x and then the symbolic expression for `f`.
```python
>>> x = ad.Variable(7)
>>> f = 7 * x + 6
```
Special function: sin,cos,exp,etc.
```python
>>> f = 7 * ad.func.sin(x) + 6
```
Goal: evaluate the gradients of f with respect to x.
```python
>>> print(f.val, f.der)
```
f.val returns value of f 
f.der returns gradient of f with respect to x.

Goal: second derivatives of f with respect to x
```python
>>> print(f.der2)
```
f.der2 returns second derivative of f with respect to x.

##### Scalar functions of vectors - Type 1 (not supported in this milestone)
Goal: gradient of the expression $f(x_1,x_2) = x_1 x_2 + x_1$. 
Input: two variables `x1` and `x2` and the symbolic expression for `f`.
```python
>>> x1 = ad.Variable(2,name='x1')
>>> x2 = ad.Variable(3,name='x2')
>>> f = x1 * x2 + x1
```
Goal: values and gradients of f with respect to x1 and x2
```python
>>> print(f.val, f.der)
```
f.val returns dictionaries of values of f 
f.der returns dictionaries of gradients of f with respect to x1 and x2.

Goal: second derivatives of f with respect to x1 and x2
```python
>>> print(f.der2)
```

f.der2 will then contain dictionaries of values and gradients of f with respect to x1 and x2, i.e., $\frac{\partial^2 f}{\partial x_1^2}$, $\frac{\partial^2 f}{\partial x_2^2}$, $\frac{\partial^2 f}{\partial x_1 \partial x_2}$ and $\frac{\partial^2 f}{\partial x_2 \partial x_1}$ as a dictionary with keys `'x1x1'`, `'x2x2'`, `'x1x2'` and `'x2x1'` respectively.

##### Scalar functions of vectors - Type 2 (not supported in this milestone)
Goal: gradient of the expression $f(x_1, x_2) = (x_1 - x_2)^2$ where $x_1$ and $x_2$ are vectors themselves. 

Input  two variables `x1` and `x2` and the symbolic expression for `f`.
```python
>>> x1 = ad.Variable([2, 3, 4], name='x1')
>>> x2 = ad.Variable([3, 2, 1], name='x2')
>>> f = (x1 - x2)**2
```
Goal: values and gradients of f with respect to $x_1$ and $x_2$
```python
>>> print(f.val, f.der, f.der2)
```

##### Vector functions of vectors
Goal: gradients of the system of functions 
$$f_1 = x_1 x_2 + x_1$$
$$f_2 = \frac{x_1}{x_2}$$

i.e.
$$\mathbf{f}(x1,x2)=(f_1(x_1,x_2),f_2(x_1,x_2))$$
Input: two variables `x1` and `x2` and the symbolic expression for `f`.
```python
>>> x1 = ad.Variable(3, name = 'x1')
>>> x2 = ad.Variable(2, name = 'x2')
>>> f1 = x1 * x2 + x1
>>> f2 = x1 / x2
```
Goal:  the gradients of f with respect to x1 and x2
```python
>>> print(f1.val, f2.val, f1.der, f2.der)
```
The Jacobian $\mathbf{J}(\mathbf{f})$ =(f1', f2') = (f1.der, f2.der)

Goal: second derivatives (Hessian matrix)
```python
>>> print(f1.der2, f2.der2)
```

## Software Organization 
###### Discuss how you plan on organizing your software package.

- What will the directory structure look like?
```bash
├── autoDiff
│   ├── element_func.py
│   ├── forward_mode.py
│   └── operator.py
├── demos
│   ├── demo1.py
│   └── demo2.py
├── tests
│   ├── __init__.py
│   ├── test_element_func.py
│   └── test_operators.py
├── docs
│   ├── pictures.ipynb
│   ├── milestone1.ipynb
│   └── milestone2.ipynb
├── .codecov.yml
├── .travis.yml
├── LICENSE.md
├── README.md
└── requirements.txt
```
- What modules do you plan on including? What is their basic functionality?

   - \_\_init__.py: initializes the package
   - AutoDiff.py: implements basic data structure and algorithms of the forward mode of automatic differentiation and operator overloading methods. element_func.py is imported in AutoDiff.py so that the elementary functions can be called within this parent class.
   - element_func.py: implements the elementary functions that complete the calculations for forward mode operation.


- Where will your test suite live? Will you use TravisCI? CodeCov?

   Our test suite will be under the tests folder. We plan to have 2 test files, one for AD, the other for additional features. TravisCI and CodeCov will be used in our project. 
   
   We have put all of our test files in the directory under cs207-FinalProject/test/ and have set up Travis Cl to validate each pull request for build completion. Codecov has also been setup to validate the total numebr of lines that have been tested.
    
    
- How will you distribute your package (e.g. PyPI)?

   We will distribute our package on PyPI. More information regarding how to use our package is discussed in the How To Use section. 
   
   
- How will you package your software? Will you use a framework? If so, which one and why? If not, why not?
   
   We will closely follow the instructions on https://packaging.python.org/tutorials/packaging-projects/ to package our software. As of now, we decide not to use a framework to package our software because our software will include only some Python modules and other files which do not depend on other frameworks. A standard Python’s native packaging should be sufficient for our software.
   
   
- Other considerations?

  If time allows, we are thinking of building a user friendly UI for our software. Some web frameworks for Python are Django or Flask.



## Implementation
###### Discuss how you plan on implementing the forward mode of automatic differentiation.

- What are the core data structures?
   - dictionary: use to keep track of the partial derivatives
   
   
- What classes will you implement? What method and name attributes will your classes have?

##### Elementary Functions

We've included the elementary funtions listed below in elementy_func.py: <br>

- sin(x)<br> 
- sinh(x)<br> 
- arcsin(x)<br> 
- cos(x)<br> 
- cosh(x)<br> 
- arccos(x)<br> 
- tan(x)<br> 
- tanh(x)<br> 
- arctan(x)<br> 
- exp(x)<br> 
- log(x)<br>
- sqrt(x)<br>

These methods take an input variable object or scalar and return a new variable class object or scalar after the operation has been completed using the chain rule.

| Classes | Description | Attributes | Methods         
| :- |:------------- | :- | :-
|Variable|  an auto-differentiation class with the overloaded operators | der: dictionary of derivatives | Operator Overloading Methods: \_\_add__, \_\_radd__, \_\_sub__, \_\_rsub__, \_\_mul__, \_\_rmul__, \_\_pow__, \_\_rpow__, \_\_itruediv__, \_\_rtruediv__, \_\_neg__

- What external dependencies will you rely on?
   
   - numpy: ~1.17.x
   

- How will you deal with elementary functions like sin, sqrt, log, and exp (and all the others)?
  
  We will implement these elementary functions in element_func.py which will be imported in our Variable class in autoDiff.py. <br>
  
- What aspects have you not implemented yet? What else do you plan on implementing?
 
  We have not implemented the Jacobian matrix in milestone2 but will include that in our final deliverable.


### Future Features

Additional features that we are working on addding include the following:

##### Second Derivative (Hessian)
In addition to using automatic differentiation for the first derivatives, we plan on building the functionality for the user to compute second derivatives. We will be overloading existing operaters and elementary functions in order to accommodate the new functionality for second derivatives.

##### Newton's Optimizer
Newton's method is an iterative approach to finding the roots of a differentiable function and can be used in a wide variety of optimization problems. Given a starting point, we construct a quadratic approximation to the objective function that matches the first and second derivative values at that point. After that, we will minimize the approximate quadratic function instead of the the original objective function. This step is repeated iteratively until the optimization objective is reached. 

##### Newton Solver
Newton's Method mentioned above will be implemented programmatically in order to calculate the optimization problem.

We define function $\mathbf{F_0}$ with a guessed solution at $\mathbf{x_0}$. Once that has been set up, we will use Newton's method to find $\mathbf{x_1}$ after which we can then update the function to $\mathbf{F_1}$. This is repeated until $\mathbf{F_n}(\mathbf{x_n})$ is close to zero.


### Future Work

Additional work that we will be completing include the following:

##### User installation via ```pip```
Users are able to install our package via “pip” through following commands:

Create a virtual environment and call it `env`.
```bash
virtualenv env
```

Activate the virtual environment and install the package.
```bash
source env/bin/activate
pip install AutoDiff-StanAndyJohn
```

Open a Python interpreter on the virtual environment and import the module
```python
>>> import autoDiff.forward_mode as ad
```