# Milestone 2 Documentation

Group 24: Jessica A Wijaya, Shujian Zhu, Malik Wagih, William Palmer


# Introduction

Evaluation of derivatives is integral to many machine learning methods. For this purpose, two main methods could be used: symbolical and numerical differentiation. Symbolical differentiation, though straightforward, its implementation requires complex expression manipulation in computer algebra systems, making it very costly to evaluate; the method is also limited to closed-form input expressions. On the other hand, numerical differentiation computes the function derivative by approximating it using small values of step size h; though numerical simpler and faster than symbolic diffferentiation, it suffers from stability issues and round-off or truncation errors.

To address the weaknesses of both these methods, automatic differentiation (AD) was introduced. Since, it has been applied in different areas, such as  engineering design optimization, structural mechanics, and atmospheric sciences; its application to machine learning methods popularised AD. Therefore, due to the important role AD plays in many scientific fields, we introduce a python package that provides user-friendly methods for performing forward-mode AD. Our package supports the evaluation of first derivatives of functions defined by user at given input value. 

# Background

All numerical computation can be seen as a combination of elementary operations for which the derivatives are known. The derivatives of the overall composition can be found by combining the derivatives of elementary operations through the chain rule. Such elementary functions include arithmetic operations (addition, subtraction, multiplication, division), sign switch, and functions such as exponential, logarithm, and the trigonometric (e.g. $sin(x)$, $cos(x)$). Traces of these elementary operations can be represented by a trace table or a computational graph. Trace table are originally used to trace the value of variables as each line of code is executed. As an example of this flow, Table 1 shows the evaluation trace of elementary operations of the computation $f(x_1) = ln(x_1) + 3*x_1$ and Figure 1 gives an example of a graphic representation of function $f(x_1)$ by its elementary operations. 



### Table 1

| Trace       | Elementary function | Current Function Value | Function Derivative |
| ------------- |:-------------:|:-------------:|:-------------:|
| X1      | X1            |  c             | 1  |
| X2      | ln(X1)            | ln(c)      | 1/c |
| X3      | 3 * X1            |  3c        | 3   |
| X4      | X2 + X3             | ln(c) + 3c | 1/c + 3 |


### Figure 1
![Figure 1](sample_trace_graph.png)



#### Forward Mode

The forward mode of AD starts from the input value and compute the derivative of intermediate variables with respect to the input value. Applying the chain rule to each elementary operation in the forward primal trace, we generate the corresponding derivative trace, which gives us the derivative in the final variable. Forward mode AD can also be viewed as evaluating a function using dual numbers, which are defined as $a+b\epsilon$, where $a, b \in \mathbf{R}$ and $\epsilon$ is a nilpotent number such that $\epsilon^2 = 0$ and $\epsilon \neq 0$. It can be shown that the coefficient of $\epsilon$ after evaluating a function is exactly the derivative of that function, which also works for chain rule.

Forward mode can be considered as the computation of the Jacobian-Vector product. Given $F : R^n → R^m$ and the Jacobian $J = DF(x) ∈ R^{m×n}$.
![Figure 2](JacobianMatrix.png)

One sweep of the <span style="color:blue"> forward mode </span> can calculate one column vector of the Jacobian, <span style="color:blue"> $J \hat{x} $ </span>, where $ \hat{x} $ is a column vector of seeds.

In comparison, one sweep of the <span style="color:red"> reverse mode </span> can calculate one row vector of the Jacobian, <span style="color:red"> $\overline{y} J $ </span>, where $ \overline{y} $ is a row vector of seeds.

This is why the <span style="color:blue"> forward mode </span> is very efficient to compute $F : R → R^m$, while the <span style="color:red"> reverse mode </span> best suited to compute $G : R^m → R$. While the efficiency of each sweep for the forward and reverse mode is the same, the reverse mode requires access to intermediate variables thereby needing more memory, while forward mode does not come with this baggage. 

# How to use the package

At this point, there is no virtual environment required or available for downloaded yet. All the user need to do to use the package is to download the file AutoDiff.py and imported it to the script as we did below (ignore the first 2 lines, as those are just meant to change the path directory to the folder where we keep the AutoDiff.py).

In [18]:
import sys
sys.path.insert(0, '/Users/jessica/Documents/CS207/cs207-FinalProject/ADclass')

from AutoDiff import *

To compute the function value and the derivative of a particular function, first the user have to create an instance of an AD object, with the following inputs:
- **function of interest (*string)** : The function has to be passed in as a string in the form that python would be able to run (e.g. using '\*' for multiplication, using '\*\*' for power, etc.) For example, if the function is 3x<sup>2</sup>, the user has to pass it as '3\*x\*\*2'
- **the variable (*string)**: this is the variable for which the function derivative will be calculated in respect to, passed in as a string, e.g. 'x' or 'y'. Normally, this variable is expected to be contained in the function passed in earlier (otherwise, the function is then just a constant, with a derivative of 0)
- **value to be evaluated at (*float)**: the package will then compute the function value and the derivative when the variable is equal to this value

For example, if we are trying to evaluate the value and derivative of **f(x) = 3x<sup>2</sup>** at **x = 1**, then the inputs are '3\*x\*\*2', 'x', and 1. The user then can create the AD object instance as below:

In [21]:
my_AD = AD('3*x**2', 'x', 1)

To get the value of the function and the derivative, the user can call .val and .der respectively. See below for example:

In [22]:
print("Function Value:", my_AD.val)
print("Derivative:", my_AD.der)

Function Value: 3
Derivative: 6


# Software Organization

In the future, we envision the package uses PyPI to distribute our package. 

The core auto-differentiation class will be in the AutoDiff.py file, which contain the class AD and class AutoDiff1 (see below section for details on how these classes are implemented and used). 

The folder tests/ will contain all the files needed for testing purposes, including the file test.py that contain the codes used to test the core auto-differentiation class. We are using travis and codecov to run the tests. The badges will be included in the README.md file to display the coverages. 

In the next iteration, we will also implement an additional functional feature, which will be contained in the folder extension. This extended module can be an additional class developed to be used by the user for finding the roots of a given function, or an animated illustration of the forward mode autodifferentiation displayed in a webpage, etc. At this point, we have not decided what this additional feature is, but the structure of the module will follow the structure below.

The package will have the following directory structure:
- setup.cfg (to be implemented later)
- setup.py (to be implemented later)
- LICENSE.txt (to be implemented later)
- README.md
- ADclass/
    - \__init__.py (to be implemented later)
    - AutoDiff.py
    - requirements.txt
- test/
    - .travis.yml
    - test.py
- extension/ (to be decided and implemented later)
    - optimization.py
    - rootfinder.py
    - visualization/
        - index.html
        - js/ main.js
        - css/ style.css
        - ...


Currently, the package is not on PyPI yet, but we plan on distributing it through PyPI in the future so the directory structure above reflects what will it be like once we have all of those ready to be distributed. Currently, the user can just download the standalone python file directly and import it in the script, as demonstrated in the above section.


# Description of current implementation. 


The core class will be the class 'AutoDiff1', with the main object attributes being the (1) function value and (2) derivative value, both in numeric data tyoe (integer/float). These attributes' value(s) can be obtained by calling .val and .der respectively. 

We created the additional class 'AD' as the helper class that will be initialized by the user: when the user creates an instance of the AD class and passes the inputs (function string, variable(s), and value(s) to be evaluated at), this AD object initialization will (1) evaluate the string that the user passed in (using eval function), and (2) create instance(s) of AutoDiff1 object. 

Every variable that is passed by the user and is contained in the function given by the user will be initialized as an AutoDiff1 object, and every operation done on this object will create another AutoDiff1 object with an updated function value and derivative. This is all done by overiding the dunder functions, including the basic operations for addition, multiplication, subtraction, and division  (e.g. \__add__, \__radd__, \__mul__, \__rmul__, etc.). Our implementation on these functions will be able to handle two types of input: AutoDiff1 object(s) and scalar. We also override the dunder methods for power and negation. For the elementary functions, we implemented the relevent functions (such as $exp(x)$, $sin(x)$, $cos(x)$, etc.) and evaluated the value by using the math module of the Python Standard Library. 

As a consequence of the way we developed the elementary functions (such as $exp(x)$, $sin(x)$, $cos(x)$, etc.) as AutoDiff1 class methods, they need to be called as 'x.sin()' instead of 'sin(x)'. To help solve this issue, we then created helper functions that will be called when the user-defined-functions contains $sin(x)$ or $cos(x)$ or $exp(x)$, etc.; these functions will then call the AutoDiff1 class methods where all the main computation take place. 

In addition, we also created additional functions to handle power rule, product rule, and quotientrule, as helper functions to solve more complicated operations. 

At this point, there is no dependencies yet. We just need to import the math module, which comes with python standard library.

In the next iteration, we are going to complete the implementation for the rest of the elementary functions that hasn't been included at this point. We are going to continue using the math module of the Python Standard Library for evaluation of these functions. In addition, we are going also going to expand the robustness of our codes to handle vector inputs and multiple variables inputs. For vector inputs, we are planning on using a for-loop that will iterate over the length of the vector, and each iteration of the loop evaluating the values for each element. For multiple variable inputs, we are planning to create a library that will store the values evaluated for each variable. When the user asked to print the derivative, the function will return a list (whose length equals to the number of variable(s) input), and each element of that list will represent the derivative of the function with respect to each particular variable.

# Future Features