# Introduction

[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) is a method for numerically finding the derivative of a function at a given point. It can be used to find derivatives of complex functions where computing the symbolic derivative can be impossible or computationally costly.

Automatic differentiation is more accurate than other numerical differentiation methods such as the [finite difference method](https://en.wikipedia.org/wiki/Finite_difference_method). The finite difference method attempts to find the derivative of a function at a given point by adding a small perturbation ($\epsilon$):


$$ \frac{\partial f}{\partial x} \approx \frac{f(x+\epsilon) - f(x)}{\epsilon} $$

When the perturbation is too large, the estimate for the derivative is not accurate. When the perturbation is too small, it starts to amplify floating point errors. 

Automatic differentiation achieves high accuracy while avoiding amplified floating point errors by (1) breaking down the function into a sequence of elementary functions (e.g., sin, cos, log, and exp), (2) calculating the exact derivation of these elementary functions, (3) and finally combining them using the [chain rule](https://en.wikipedia.org/wiki/Chain_rule), the [product rule](https://en.wikipedia.org/wiki/Product_rule), and simple mathematical operations (such as addition and multiplication). 

Given its speed and precision, automatic differentiation is popular within the field of computational science where it has numerous applications. This software package as an implementation of automatic differentiation using Python.

\

### References
- David Sondak, *CS 207 Lectures* (https://harvard-iacs.github.io/2019-CS207/lectures/)
- Philipp Hoffmann, *A Hitchhiker’s Guide to Automatic Differentiation* (https://doi.org/10.1007/s11075-015-0067-6)
- Richard D. Neidinger, *Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming* (https://www.neidinger.net/SIAMRev74362.pdf)
- Baydin, et al., *Automatic Differentiation in Machine Learning: a Survey*
(https://arxiv.org/pdf/1502.05767.pdf)

\

---

# Background

## Some Calculus

The [product rule](https://en.wikipedia.org/wiki/Product_rule) is used to find the derivative of the product of two or more functions. In its simplest form, if $f$ and $g$ are functions, the derivative of their product is given by the following equation: 

$$ [f(x)g(x)]' = f'(x)g(x)+f(x)g'(x) $$

The chain rule is used for computing the derivative of the composition of two or more functions. In its simplest form, if $f$ and $g$ are functions, the derivative of their composition is given by the following equation:

$$ [f(g(x))]' = f'(g(x))*g'(x) $$

\

## Modes of Automatic Differentiation

The automatic differentiation method can be implemented in two ways depending on how the chain rule is utilized. Consider the formulation:

$$ f(x) = g_3(g_2(g_1(x))) $$

Using the chain rule, this function's derivative at point $x = a$ is computed as:

$$ f'(a) = g_3'(.)*g_2'(.)*g_1'(a) $$

The forward mode of automatic differentiation recursively propagates the calculated derivative from the right: first calculates $g_1'(a)$, then $g_2'(.)$, then $g_3'(.)$, and so on. 

The reverse mode of automatic differentiation recursively propagates the calculated derivative from the left: first calculates $g_3'(.)$, then $g_2'(.)$, then $g_1'(a)$, and so on.

In our implementation, we will focus on the forward mode. 

\

## Forward Mode

A useful tool associated with the forward mode is the computational trace. Using the computational trace, we can list the steps required to go from input values (the point at which the derivative is evaluated) to the input function.  

Consider the following function:

$$ f(x) = sin(e^{2x}) $$

Say we want to evaluate the derivative of this function at $x = 5$. The steps for calculating the derivative using forward mode are given in the following table:

| Trace | Elementary Function | Function Value | Derivative | Derivative Value |
| :------: | :----------: | :-------: | :---------: | :--------: |
| $x_{1}$ | $x$ | 5 | $\dot{x}_{1}$ | $1$ |
| $x_{2}$ | $2x_{1}$ | $10$ | $2\dot{x}_{1}$ | $2$ |
| $x_{3}$ | $e^{x_{2}}$ | $e^{10}$ | $e^{x_{2}}\dot{x}_{2}$ | $2e^{10}$ |
| $x_{4}$ | $sin(x_{3})$ | $sin(e^{10})$ | $cos(x_{3})\dot{x}_{3}$ | $2e^{10}cos(e^{10})$ |

We get the required derivative value $2e^{10}cos(e^{10})$.

\

## Dual Numbers

We utilize [dual numbers](https://en.wikipedia.org/wiki/Dual_number) in our implementation of the forward mode of automatic differentiation. Dual numbers are an extension to real numbers (similar to complex numbers). Dual numbers introduce a new element (typically represented by $\epsilon$) with the useful property $\epsilon^2 = 0$.  Using dual numbers and [Taylor series](https://en.wikipedia.org/wiki/Taylor_series), we can find the derivative of a function quickly. 

Say we have a function $f$ and we want to find its derivative at point $x = a$. We will set $x = a + b\epsilon$ and find the Taylor expansion:

$$ f(a+b\epsilon) = \sum_{n=0}^{\infty} \frac{(b\epsilon)^nf^{(n)}(a)}{n!} = f(a) + b\epsilon f'(a) $$

All higher-order terms are equal to $0$ because of the dual number property $\epsilon^2 = 0$. For function $f$, we get its derivative at point $x = a$ directly from the second term in the Taylor expansion.

\

### References
- David Sondak, *CS 207 Lectures* (https://harvard-iacs.github.io/2019-CS207/lectures/)
- Philipp Hoffmann, *A Hitchhiker’s Guide to Automatic Differentiation* (https://doi.org/10.1007/s11075-015-0067-6)

\

---

# How To Use

## Approach
In thinking through how to use our software package, we considered the following:

####- Who are our users?
We decided to target somewhat tech-savvy users who would be comfortable with a command line application as opposed to a more approachable and user-friendly GUI. Any single user should be able to install the package easily onto their machine without requiring broader deployment. 

####- Where should it run? 
We expect our package to be installed and run on any desktop devices through the command line. It should work regardless of the operating system as long as the user has Python on their machine (our assumption is that most users will have Python pre-installed on their Mac or Linux machines or will be able to easily download it otherwise).

\

## Packaging choice
We plan to use pip as the package manager. We considered some other options such as conda install but ultimately chose pip because Python developers are already familiar with it and we did not want the installation process to be a barrier to using our program. We plan to register the name of our package on PyPi and upload source distributions via the setup.py file. To ensure that our package can be downloaded regardless of the user’s Python version, we plan to use pip instead of pip3. 
 
\

## User How to
To install our package, a user should enter the following in the command line:

`pip install make_ad_ifference` 
(assuming "make_ad_ifference" is our root directory name)

`import home_directory` 
("home_directory" is a placeholder for the name of our main sub directory that will contain all our classes)

Users will then have access to all the functions that are part of the sub directory.

\

---


# Software Organization

We will use a three class structure. We will have separate classes for (1) dual numbers, (2) functions, and (3) auto-differentiation. We have decided to use separate classes for these because we feel each is sufficiently different from the others to warrant its own class.

\

## Class: Dual Numbers

Our dual numbers class is fairly straightforward. It will perform standard operations on dual numbers such as multiplication, addition, etc. The necessary operations such as `__add__` will be overridden. It will take in the real and dual parts for instantiation.

\

## Class: Functions

The function class will be built on modules in order to make function evaluation easier. We will have a set of elementary functions we are comfortable working with that can easily be expanded. The user will be able to create a function from these predefined functions. For functions such as sin, exponentials, we will have their derivative stored for accurate evaluation. This class is helpful because it will allow us to essentially black-box the function and derivative evaluation for autodifferentiation. Here the functions will be stored as a NumPy array for fast access and low memory overhead.

We’ll provide users with an understandable interface that conveys which elementary functions our software supports. If a user enters a function that is a combination of any of these, our program will be able to handle the input. Otherwise, we will return a descriptive error to the user and allow them to enter a new function.

Elementary functions defined in the functions class:
- Log (natural log)
- Log2 (log with base 2)
- Log10 (log with base 10)
- Exp
- Sin
- Cos
- Tan

Elementary mathematical operations supported:
- Addition
- Subtraction
- Multiplication
- Division
- Power

\

## Class: Auto-differentiation

The last class is the auto-differentiation ("AD") class itself. It will be instantiated with a function object. This class will implement the forward mode, computational graph, and evaluation table. These three items will be computed automatically upon instantiation. The visualization functionality is used primarily for debugging as in not intended to be in-depth, interactive, or necessarily very pretty.

\

## Directory Structure

The directory structure will look like

`home_directory/`
>`setup.py` \

>`README.md` \

>`LICENSE` \

>`__init__.py` \

>`functions/` \

>>`functions.py` \

>>`functions_and_derivatives.py` \


>`autodiff/` \

>>`autodiff.py` \


>`dual/` \

>>`dual.py` \


>`test/` \

>>`test.py`

\

## Testing

All testing will be done using TravisCI, and evaluation of the tests will be done using CodeCov. We will also not be using a framework because we would like to gather a full understanding of the intricacies of package creation.

All tests we formulate will be stored in the test.py file.

\

## Efficiency

We will have to consider things such as memory accesses, which can greatly speed up or slow down the code. We will also have to consider numerical precision, although this shouldn’t be an issue with our hard-coding of functions and derivatives. Another possible consideration is memory overhead. Efficient storage of the functions and evaluations is crucial to making the code usable.

\

---


# Implementation

Our core data structures will be NumPy arrays for function evaluation, dictionaries for computational graph storage, and a Pandas dataframe for evaluation table storage. Each of these is chosen for its efficient lookup and memory management. 

An example is seen below:

$f(x) = x^2 + x$

This function can be broken down to the elementary operations
`[(lambda x: x, None), (lambda x: x*2, 0), (lambda x, y: x+y, 0, 1)]`, where the value at each index is `(function, input nodes)`.

The computational graph can be seen below: \\
![graph](https://drive.google.com/uc?id=1ViYJw_DBL8DpDTr0kz3oatsJkarHaxWR)

This graph will be stored in a dictionary as:

`dict = {0:[(None, None)], 1:[(0, None)], 2:[(1, None), (3, POW)], 4:(1, None),(3, +)}`

In general, the structure will be:

`{key: [(input node 1, operation), (input node 2, operation)]}`

Lastly, we will have an evaluation table stored as a Pandas Dataframe.
For the above example, the table would be:
![evaluation_trace](https://drive.google.com/uc?id=1oXAXaZohHruYydPYGDRCssdLhETPj0cR)

Each column of the table shown here is a column in the dataframe, and each node number is the index of the row.

\

---