# Introduction

[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) is a method for numerically finding the derivative of a function at a given point. It can be used to find derivatives of complex functions where computing the symbolic derivative can be impossible or computationally costly.

Automatic differentiation is more accurate than other numerical differentiation methods such as the [finite difference method](https://en.wikipedia.org/wiki/Finite_difference_method). The finite difference method attempts to find the derivative of a function at a given point by adding a small perturbation ($\epsilon$):

$$ \frac{\partial f}{\partial x} \approx \frac{f(x+\epsilon) - f(x)}{\epsilon} $$

When the perturbation is too large, the estimate for the derivative is not accurate. When the perturbation is too small, it starts to amplify floating point errors. 

Automatic differentiation achieves high accuracy while avoiding amplified floating point errors by (1) breaking down the function into a sequence of elementary functions (e.g., sin, cos, log, and exp), (2) calculating the exact derivation of these elementary functions, (3) and finally combining them using the [chain rule](https://en.wikipedia.org/wiki/Chain_rule), the [product rule](https://en.wikipedia.org/wiki/Product_rule), and simple mathematical operations (such as addition and multiplication). 

Given its speed and precision, automatic differentiation is popular within the field of computational science where it has numerous applications. This software package as an implementation of automatic differentiation using Python.

---

# Background

## Motivation

Automatic differentiation allows us to compute the true analytic derivative of a function to machine precision. At a high level, this is done by breaking down complex functions into their elementary components and propogating their derivative via the chain rule. Other methods of finding derivatives include symbolic differentiation and finite fifference. Symbolic differentiation is also accurate to machine precision but is more computationally costly and its implementation is more complex. Finite difference is easier and less costly to implement but can quickly lead to floating point errors. As such, automatic differentiation is a more precise and lightweight methodology to use.

## Some Calculus

The [product rule](https://en.wikipedia.org/wiki/Product_rule) is used to find the derivative of the product of two or more functions. In its simplest form, if $f$ and $g$ are functions, the derivative of their product is given by the following equation: 

$$ [f(x)g(x)]' = f'(x)g(x)+f(x)g'(x) $$

The chain rule is used for computing the derivative of the composition of two or more functions. In its simplest form, if $f$ and $g$ are functions, the derivative of their composition is given by the following equation:

$$ [f(g(x))]' = f'(g(x))*g'(x) $$

## Modes of Automatic Differentiation

The automatic differentiation method can be implemented in two ways depending on how the chain rule is utilized. Consider the formulation:

$$ f(x) = g_3(g_2(g_1(x))) $$

Using the chain rule, this function's derivative at point $x = a$ is computed as:

$$ f'(a) = g_3'(.)*g_2'(.)*g_1'(a) $$

The forward mode of automatic differentiation recursively propagates the calculated derivative from the right: first calculates $g_1'(a)$, then $g_2'(.)$, then $g_3'(.)$, and so on. 

The reverse mode of automatic differentiation recursively propagates the calculated derivative from the left: first calculates $g_3'(.)$, then $g_2'(.)$, then $g_1'(a)$, and so on.

In our implementation, we will focus on the forward mode. 

## Forward Mode

A useful tool associated with the forward mode is the computational trace. Using the computational trace, we can list the steps required to go from input values (the point at which the derivative is evaluated) to the input function.  

Consider the following function:

$$ f(x) = sin(e^{2x}) $$

Say we want to evaluate the derivative of this function at $x = 5$. The steps for calculating the derivative using forward mode are given in the following table:

| Trace | Elementary Function | Function Value | Derivative | Derivative Value |
| :------: | :----------: | :-------: | :---------: | :--------: |
| $x_{1}$ | $x$ | 5 | $\dot{x}_{1}$ | $1$ |
| $x_{2}$ | $2x_{1}$ | $10$ | $2\dot{x}_{1}$ | $2$ |
| $x_{3}$ | $e^{x_{2}}$ | $e^{10}$ | $e^{x_{2}}\dot{x}_{2}$ | $2e^{10}$ |
| $x_{4}$ | $sin(x_{3})$ | $sin(e^{10})$ | $cos(x_{3})\dot{x}_{3}$ | $2e^{10}cos(e^{10})$ |

We get the required derivative value $2e^{10}cos(e^{10})$.

## Dual Numbers

We utilize [dual numbers](https://en.wikipedia.org/wiki/Dual_number) in our implementation of the forward mode of automatic differentiation. Dual numbers are an extension to real numbers (similar to complex numbers). Dual numbers introduce a new element (typically represented by $\epsilon$) with the useful property $\epsilon^2 = 0$.  Using dual numbers and [Taylor series](https://en.wikipedia.org/wiki/Taylor_series), we can find the derivative of a function quickly. 

Say we have a function $f$ and we want to find its derivative at point $x = a$. We will set $x = a + b\epsilon$ and find the Taylor expansion:

$$ f(a+b\epsilon) = \sum_{n=0}^{\infty} \frac{(b\epsilon)^nf^{(n)}(a)}{n!} = f(a) + b\epsilon f'(a) $$

All higher-order terms are equal to $0$ because of the dual number property $\epsilon^2 = 0$. For function $f$, we get its derivative at point $x = a$ directly from the second term in the Taylor expansion.

---

# How To Use

## Approach
In thinking through how to use our software package, we considered the following:

#### Who are our users?
We decided to target somewhat tech-savvy users who would be comfortable with a command line application as opposed to a more approachable and user-friendly GUI. Any single user should be able to install the package easily onto their machine without requiring broader deployment. 

#### Where should it run? 
We expect our package to be installed and run on any desktop devices through the command line. It should work regardless of the operating system as long as the user has Python on their machine (our assumption is that most users will have Python pre-installed on their Mac or Linux machines or will be able to easily download it otherwise).

## Packaging Choice
We plan to use pip as the package manager. We considered some other options such as conda install but ultimately chose pip because Python developers are already familiar with it and we did not want the installation process to be a barrier to using our program. We plan to register the name of our package on PyPI and upload source distributions via the setup.py file. To ensure that our package can be downloaded regardless of the user’s Python version, we plan to use pip instead of pip3. 

## Installation via GitHub

To install our package from GitHub, a user should enter the following in the command line:

1. Create a new virtual environment (using Conda):

`conda create --name test`

2. Activate virtual environment:

`conda activate test`

3. Clone our source code:

`git clone https://github.com/make-AD-ifference/cs207-FinalProject.git`

4. Change current directory to the cloned repo:

`cd cs207-FinalProject/`

5. Install dependencies:

`pip install -r requirements.txt`

6. Start up Python and import AutoDiff class:

`>>> python`

`>>> from autodiff.autodiff import AutoDiff`

`>>> x = AutoDiff(2,3)`

`>>> f = x**2`

`>>> f`

`AutoDiff(4,12)`

**[See Implementation section below for further details on using AutoDiff]**

7. Finally, quit Python and deactivate environment:

`>>>exit()`

`conda deactivate`


## Installation via pip (to be implemented)

To install our package, a user should enter the following in the command line:

`pip install make_ad_ifference` 
(assuming "make_ad_ifference" is our root directory name)

`import autodiff`
("autodiff" is a placeholder for the name of our main sub directory that will contain all our classes)

Users will then have access to all the functions that are part of the sub directory. 

**[We have not released our package on PyPI yet as per instructions for Milestone 2. We will update and finalize this section once the package is released on PyPI.]**


---


# Software Organization


## Directory Structure

The directory structure will look like:

`make-AD-ifference/`

>`setup.py`

>`.gitignore`

>`.travis.yml`

>`.requirements.txt`

>`coverage.txt`

>`README.md`

>`LICENSE`

>`__init__.py`

> `demo.py`

>`autodiff/`

>>`autodiff.py`

>`docs/`

>>`milestone1.ipynb`

>>`milestone2.ipynb`

>`tests/`

>>`test_newton.py`

>>`test_autodiff.py`

>`scratch/`

>>`dual.py`

>>`test_dual.py`

## Modules
- `autodiff.py`: contains the autodifferentiation class `AutoDiff`, which implements Forward Mode AD. This module will serve as our custom library that allows users to evaluate functions and their derivatives for each input value. The class also includes custom methods that our program will support. Each method returns the value and derivative of the specified function. See the implementation section for more details. 

- `test_autodiff.py`: contains our test suite for the `autodiff.py` module. It currently includes tests for single input and scalar functions but will be expanded in the future when our program supports more complex functions such as multi-variable functions.

- `test_newton.py`: contains our script for testing the use of our `AutoDiff` class for finding the roots of a functions using Newton's root finding method.

- `demo.py`: contains a demonstration of the newton root finding algorithm using both dunder method operations and non-build-in operations. Examples used are $x^2$ and $sin(x) + cos(x)$.


## Testing

All testing will be done using TravisCI, and evaluation of the tests will be done using CodeCov. Our main test suite `test_autodiff.py` currently lives in the `tests/` folder and any additional testing suites will be added to the same folder. 

---


# Implementation


## Class: AutoDiff

We will use a simple one-class structure to implement forward mode autodifferentiation. The AutoDiff class will keep track of the trace value and the derivative of functions for each input variable. It will have two attributes `val` and `der` which represent the value and the derivative respecitvely. We will have a set of elementary functions that can easily be expanded. These elementary functions will serve as building blocks for users to assemble the function they wish to evaluate.

We’ll provide users with an understandable interface that conveys which elementary functions our software supports. If a user enters a function that is a combination of any of these, our program will be able to handle the input. Otherwise, we will return a descriptive error and allow users to enter a new function. We will expand this class in the future to include additional methods and support for more complex functions e.g. multi-variable functions.

Elementary functions defined in the AutoDiff class:
- Ln (natural log)
- Log 
- Exp
- Sin
- Cos
- Tan
- Sqrt

Elementary mathematical operations supported (note that the cummutative properties of operations are preserved):
- Addition
- Subtraction
- Multiplication
- Division
- Power


## More Details about Attributes & Methods
We have overloaded the methods in class `AutoDiff` to give the user flexibility in how functions are entered. Overloaded functions will support elementary operations between two `AutoDiff` objects or 1 `AutoDiff` object and 1 scalar. This way, we also preserve the cummutative nature of functions where needed. As an example, the user may input f = 2x + 3 or f = 3 + 2x. Regardless of the order, will be able to return the correct value and derivative. 
Each `AutoDiff` object will have as attributes the value and the derivative, calculated using trace variables and elementary operations, for a given input value. 

More on the custom operations and functions supported in class `AutoDiff`:
- `__init__`:  constructor of class `AutoDiff`. Initializes an `AutoDiff` object, setting `self.der` initially to 1. Takes in an input value `val` at which to evaluate the function value and derivative.
- `__add__`: overloaded addition function. Supports adding two `AutoDiff` objects or 1 `AutoDiff` object and 1 scalar.
- `__str__`: returns the string value of the function.
- `__repr__`: returns the string value of the function.
- `__radd__`: Supports addition of two `AutoDiff` objects or 1 `AutoDiff` object and 1 scalar regardless of input order. Ensures cummutative property of addition is preserved.
- `__sub__`: overloaded subtraction function. Supports subtraction between two `AutoDiff` objects or 1 `AutoDiff` object and 1 scalar.
- `__rsub__`: Supports subtraction of the form scalar - `AutoDiff` instead of `AutoDiff` - scalar.
- `__mul__`: overloaded multiplication function. Supports multiplying two `AutoDiff` objects or 1 `AutoDiff` object and 1 scalar.
- `__rmul__`: overloaded multiplication function. Supports multiplying two `AutoDiff` objects or 1 `AutoDiff` object and 1 scalar regardless of input order. Ensures cummutative property of multiplication is preserved.
- `__truediv__`: overloaded division function. Supports dividing an `AutoDiff` object by another `AutoDiff` object or an `AutoDiff` object by a scalar.
- `__rtruediv__`: Supports division of the form scalar / `AutoDiff` instead of `AutoDiff` / scalar.
- `__pow__`: overloaded power function. Supports an `AutoDiff` object to the power of another `AutoDiff` object or an `AutoDiff` object to the power of a scalar.
- `__neg__`: returns negated `AutoDiff` object.
- `sin`: returns sine of `AutoDiff` object.
- `cos`: returns cosine of `AutoDiff` object.
- `tan`: returns tangent of `AutoDiff` object.
- `ln`: returns natural log of `AutoDiff` object.
- `log`: returns log of `AutoDiff` object with base `base` as second input.
- `exp`: returns exponential of `AutoDiff` object.
- `sqrt`: returns square root of `AutoDiff` object.

## Using AutoDiff

- Begin by initializing an AutoDiff object with a given value and derivation:

`x = AutoDiff(2,3)`

- Then define the function in the following manner:

`f = x**2`

`f = AutoDiff.sin(x)`

`f = AutoDiff.log(x,2)`

- The function's value can then be accessed as `f.val`

- The function's derivative can then be access as `f.der`

**[We will further develop this section after we implement handling of vectors.]**

## External Dependencies
- `Numpy`: In order to run our program, users will have to import the numpy library as follows:
`import numpy as np`. The Numpy library provides support for the elementary mathematical functions and operations handled by our program. As such, we have included `numpy` in the `requirements.txt` file.

## Efficiency

We will have to consider things such as memory accesses, which can greatly speed up or slow down the code. We will also have to consider numerical precision, although this shouldn’t be an issue with our hard-coding of functions and derivatives. Another possible consideration is memory overhead. Efficient storage of the functions and evaluations is crucial to making the code usable.

---


# Future Work

## Future Implementations

In the future our package will be extended to handle vector valued functions.
Generally, we will be able to handle functions of the form

$$ f: \mathbb{R}^m \to \mathbb{R}^n $$

for arbitrary $m$ and $n$.

In order to accomplish this we will need the Jacobian matrix

$$\textbf{J} = \left[\begin{matrix} \frac{\partial f_1}{\partial x_1} & \cdots  & \frac{\partial f_1}{\partial x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}  \end{matrix} \right]$$
Our current implementation handles functions of the form 
$$ f: \mathbb{R} \to \mathbb{R}^n .$$
This needs to be expanded to multiple inputs, which can most likely be implemented by modifying each function in the `AutoDiff` class to return partial derivatives with respect to the inputs as vectors, rather than just scalars.

For example the adding function would be modified from this:


`def __add__(self, other):`

>`try:`

>>`new_val = self.val + other.val`

>>`new_der = self.der + other.der`

>`except AttributeError:`

>>`new_val = self.val + other`

>>`new_der = self.der`

> `return AutoDiff(new_val, new_der)`

to be something along the lines of:

`def __add__(self, other):`

>`try:`

>>`new_val = self.val + other.val`

>>`if(isinstance(other, AutoDiff):`

>>>`new_der = [self.der, other.der]`

>>`else:`

>>>`new_der = self.der + other.der`

>`except AttributeError:`

>>`new_val = self.val + other`

>>`new_der = self.der`

> `return AutoDiff(new_val, new_der)`

Note: this specific implementation has not been tested yet, but this is the idea our first try will be based on.

## Extensions

Possible future extensions are to implement the reverse mode, or a root finding and optimization suite.
The root finding suite would employ a variety of derivative based root-finding methods such as Newton's method, fixed-point iteration, Laguerre's method, etc.
Adding a root finding suite is relatively straightforward.
We would add an additional directory within `autodiff/` called `root_finding/` that will contain scripts for running each method.
The optimization suite will be inside the `autodiff/optimization/` directory.
Possible algorithms to be implemented are gradient descent, stochastic gradient descent, constrained optimization, Newton's method, etc.
Each method will get its own script. 
For example, Newton's method would live in `/makeADifference/autodiff/root_finding/newton.py`.

The reverse mode implementation is trickier and not finalized at this point.
The plan would be to specify in the `AutoDiff` object which mode, forward or reverse, we would want to compute, potentially default to forward mode with the option to specify reverse mode.
This is because the different modes are utilized for different situations, and automatically computing one defeats the purpose of having the other.
Automatic calling of one mode or the other can also lead to inefficiency.
In order to actually compute the reverse mode, we will need to hold on to the evaluation trace and derivatives, which can potentially be accomplished by using a flag in each function to return the previous derivatives and values as well as current derivatives and values.
This would look something like

`def __exp__(self, reverse=False):`
> `new_val = np.exp(self.val)`

> `new_der = np.exp(self.der)`

>`if(reverse):`

>> `self.previous.append(new_val, new_der)`



> `return AutoDiff(new_val, new_der)`
where `self.previous` is a list containing the derivatives and values of the evaluation trace.

---

# Resources

- David Sondak, *CS 207 Lectures* (https://harvard-iacs.github.io/2019-CS207/lectures/)
- Philipp Hoffmann, *A Hitchhiker’s Guide to Automatic Differentiation* (https://doi.org/10.1007/s11075-015-0067-6)
- Richard D. Neidinger, *Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming* (https://www.neidinger.net/SIAMRev74362.pdf)
- Baydin, et al., *Automatic Differentiation in Machine Learning: a Survey*
(https://arxiv.org/pdf/1502.05767.pdf)