# Milestone 1
#### Group members: Robin Robinson, Anna Midgley, Nora Hallqvist, Sebastian Weisshaar

# Introduction 

There are four main approaches to computing the derivative which are namely manual, symbolic, numerical, or automatic differentiation. Manual differentiation is the calculation of the derivative expression by hand and its coding. It is time-consuming and prone to error especially as systems become more complex. Numerical differentiation is the finite difference approximation of the derivative using values of the original function evaluated at points. Its advantage is that it is easy to implement whilst its disadvantage is truncation and round-off errors. Symbolic differentiation uses packages such as Mathematica, to produce the exact derivative expression as its output. It addresses the weaknesses of manual and numerical differentiation; however, it frequently results in “expression swell”. Expression swell is the phenomenon of much larger representation of a derivative as opposed to representation of the original function. Symbolic differentiation does not consider the fact that many sub-expressions in the different derivative expressions are common. This leads to inefficiency in its calculation. Symbolic and manual differentiation require the model to be defined as closed-form expressions which limits the expressivity of the model as it cannot use branches or loops. The fourth technique, automatic differentiation (AD), obtains the exact result of the derivative by breaking down the expression into elementary arithmetic operations and elementary functions and applying the chain rule to the derivatives of these operations.

# Background

## **Automatic Differentiation**

The key idea of Automatic Differentation is to decompose the calculations into elementary functions and  evaluate the functions derivative by combining each elementary function's derivative by using the chain rule.


## **Chain Rule**

The **chain rule** is a method to differentiate composite functions. This is especially useful in automatic differentation when extended to the multivariate case.

For example, suppose we have a multivariate function $f(u_{1}(t),u_{2}(t))$ and we want to evaluate the derivative $f$ with respect $t$. Then we have the following expression: 

$$ \frac{\partial f(u_{1}(t),u_{2}(t))}{\partial t} = \frac{\partial f}{\partial u_{1}}\frac{\partial u_{1}}{\partial t} + \frac{\partial f}{\partial u_{2}}\frac{\partial u_{2}}{\partial t}$$

The above can also be represented in vector notation. Replacing $t$ with $x \in R^{m}$, the chain rule can more elegantly be written in terms of the differential vector operator $\nabla$ with respect to the vector $x$.

$$\nabla_{x}f = \frac{\partial f}{\partial u_{1}}\nabla_{x}u_{1}  + \frac{\partial f}{\partial u_{2}}\nabla_{x}u_{2} $$

where $f = f(u_{1}(x_{1},...,x_{m}),u_{2}(x_{1},...,x_{m}))$

Automatic differentiation commonly involves several dependent variables (i.e intermediate steps), hence we need to generalize the chain rule to support a function $f$ of $n$ other functions (i.e $f((u_{1}(x_{1},...,x_{m}),u_{2}(x_{1},..,x_{m}),..,u_{n}(x_{1},..,x_{m}))$)

Now the gradient of $f$ is instead given by: 

$$\nabla_{x}f = \sum_{i=1}^{n} \frac{\partial f}{\partial u_{i}(x)} \nabla_{x}u_{i}(x)$$

where $x \in R^{m}$


## **Compuational Graph**

Automatic differentiation can be represented as a graph structure. This is done by decomposing the function into its primitive operations (e.g binary arithmic operations and unary operations). The computation graph is then simply constructed by representing each elementary operation as a node, and connecting each node by an edge. 

By convention the nodes are represented by the following variables:  
- Input variables : $v_{i-m} = x_{i}$ , $i = 1,...,m$
 - Intermediate variables: $v_{i}$ , $i = 1,..,n$


## **Evaluation Trace**

The evaluation trace stores the intermediate results $v_{i}$ evaluated at a given point, and includes a tangent trace, which records the directional derivative of each intermediate variable $v_{i}.$ The final derivative of the original function is calculated by combining derivatives of the intermediate results by using the chain rule.






# How to Use *autodiff*

The name of our package is *autodiff*. Its functionality is to provide the user with methods to undergo automatic differentiation of both univariate and multivariate functions. We envision the user should import the autodiff and functions modules. The imported unary functions should be used to define the function that is to be evaluated. This function can either be a scalar or an array. The user should define an input, seed, and mode, and these parameters are used to instantiate the AutoDiff class. The input is the point at which the derivative/function is evaluated at. The seed is the vector direction at which the derivative is computed in, commonly denoted as p. The seed vector must be the same shape as the input vector. 

An example of how we envision the user would interact with the package is demonstrated below.

**Pseudo Code**


```Python
import autodiff as AD
from functions import sin,cos,ln,exp,tanh,...

f = lambda x: {ln(x[0]) + sin(x[0]+x[1])
input = [1,1]
seed = [1,0]
diff_f = AutoDiff(f,input,seed,mode='Forward')
deriv = diff_f.derivative()  #returns derivative value 
eval = diff_f.function_value() #returns function value evaluated at input 

```






# Organizational Structure  

### Directory Structure

*Note that the directory structure is subject to change depending on how we decide to implement reverse mode. The alternative would be to create two subpackages, one for forward mode and another for reverse mode.*

|-- src/ <br>
|&emsp;|-- Autodiff/ <br>
|&emsp;|&emsp;|-- \_\_init__.py <br>
|&emsp;|&emsp;|-- autodiff.py <br>
|&emsp;|&emsp;|-- forward.py <br>
|&emsp;|&emsp;|-- reverse.py <br>
|&emsp;|&emsp;\\-- functions.py <br>
|&emsp;|-- demo <br>
|&emsp;\\-- tests <br>
|-- Docs/ <br>
|&emsp;\\-- milestone1.ipynb <br>
|-- LICENSE <br>
|-- pyproject.toml <br>
\\-- README.md <br>


### Modules
- **\_\_init__.py**: Imports all of the necessary modules

- **autodiff.py**: Contains the AutoDiff class.
    - The AutoDiff class is the main interface between the user and the package. The class takes in a function, input vector, seed vector, and a mode (forward or backward). It provides the user the functionality to compute the derivative and the function evaluation. 

- **forward.py**: Contains the DualNumber class
    - The DualNumber class is a data structure that handles a real and dual part. The class also overloads all elementary operations, including addition, multiplication, division, power, and negative. 
    - Example of overloading the (+) operator:
    ``` Python
    def __add__(self, other):
        if isinsance(other,dual):
            return Dual(self.val + other.val, other.dual + other.dual)
        else 
            return Dual(self.val + argument,self.dual)
    
    def __radd__(self, other):
        return self.__add__(other)
    ```

- **reverse.py**: Contains classes including nodes and computational graph
    - This idea will be futher developed in the next milestone.
    
- **functions.py**: Contains defined unary functions for dual numbers
    - Example functions:
    ``` Python
    def log(dual_num):
        val = np.log(dual_num.val)
        dual = (1/dual_num.val) * dual_num.real
        return Dual(val, dual)
    ```

### Demo and Test Suite
Both the test and demo suites will live in the source code (src) directory. The test folder will contain Python files that test the AutoDiff class. The demo folder will contain example usages of the package. 

### Package Distribution
The package will be distributed using PyPI with PEP517/518. We will add a pyproject.toml file to our project. This enables us to build our package using a PEP517 builder and distribute our package using PyPI. Other developers can install it using pip install. 


# Implementation

The table below lists all of the classes needed to implement forward mode, with their respective methods and attributes. The core data structure for forward mode is DualNumber. We will implement DualNumber first, followed by AutoDiff. After which, we will move on to implementing the extension of the project. For forward mode, we will not need a computational graph as we are using a dual number data structure. However, if we decide to implement reverse mode as an extension of the project, we will need to implement a Graph class to track the nodes of the computational graph. 

| **Class**      | **AutoDiff**               | **DualNumbers**                                                                                      |   |   |
|----------------|----------------------------|------------------------------------------------------------------------------------------------------|---|---|
| **Attributes** &emsp;&emsp;| f, input, seed, mode       | real, dual                                                                                           |   |   |
| **Methods**    | function_value, derivative &emsp;&emsp;| \_\_add__, \_\_radd__, \_\_mul__, \_\_rmul__, \_\_truediv__, \_\_rtruediv__, \_\_pow__, \_\_neg__ |   |   |

<br>

As mentioned earlier, for dual numbers, we will overload the binary arithmatic operations. Unary functions such  as ```sin``` or ```log``` will be defined in functions.py file for dual number data structures. In order to perform the forward pass for reverse mode we will need similar overloading of basic arithmatic operations in the Node class. In addition, similar unary functions will be created for the reverse mode. A different approach would be to use a graph for both forward and reverse mode, which would reduce the redundancy in the overloading of functions. This will be investigated at a later stage. 

We would like our package to be able to compute the derivative of both scalar and vector functions that can depend on both scalar and vector parameters. To ensure this, we have two requirements for how the user defines the function. The first is that the input of the function must be the same size as the number of parameters (m). The second is that the function that is passed to the AutoDiff must be a list or an array of functions of size of the number of outputs (n).    

For example, if we have the function (m=2 and n=3):

$\textbf{f}(\textbf{x}) = \begin{bmatrix}
x_{1} + x_{2}\\
\sin(x_{1}) \\
\cos(x_{1}x_{2})
\end{bmatrix}$

```Python
def f(x):
    return [x[0] + x[1], sin(x[0]), cos(x[0]*x[1])]
```


We will depend on the Numpy library, and no other libraries at this stage. We will use Numpy to define the unary functions, to ensure that the functions are evaluated efficienctly. An intentional decision was made to not depend on other libraries to ensure the package had lightweight requirements. 

# Licensing


We chose the GNU General Public License for our project. Our decision was based on weighing the interest of the public in free code versus the interest of developers to develop propietary code. 

An important aspect in this decision is that automatic differentiation has been implemented in many other open source projects. Therefore any developer will have access to many different automatic differentiation packages. A developer interested in making propietary software, software where the source code is not available, therefore has enough other options for automatic differentiation packages. Hence, the interest of the public in open access code outweighs the interest of developers. 

This is also connected to the fact we do not intend on registering any patents connected to our project for the simple fact that we the techniques used are not novel. Under the GNU General Public License any user that uses our package will have to make that software publicly available. This is due to the copyleft spirit of the license. We want to contribute to the free software community by developing our project under a copyleft license. 

For these reasons we selected the GNU General Public License. A LICENSE file is included in the root to inform the user and a COPYING file contains the entire license. 

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=79a6e7ec-6879-4152-9130-64109e831642' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>