# Introduction

*text here*

# Background

#### Part 1: Chain Rule 

At the heart of automatic differentiation is the Chain Rule that enables us to decompose a complex derivative into a set of derivatives involving elementary functions of which we know explicit forms. 

We will first introduce the case of 1-D input and generalize it to multidimensional inputs.

1-D input: Suppose we have a function $ h(u(t)) $ and we want to compute the derivative of $ h $ with respect to $ t $. This derivative is given by

$$
\begin{align}
\frac{dh}{dt} = \frac{\partial h}{\partial u} \frac{du}{dt}\\
\end{align}
$$

Before introducing vector inputs, let's first take a look at the gradient operator $ \nabla $

That is, for  $ y\colon \mathbb {R} ^{n} \to \mathbb {R} $, its gradient $ \nabla y \colon \mathbb {R} ^{n} \to \mathbb {R} ^{n}$ is defined at the point $ x = (x_1, ..., x_n) $ in n-dimensional space as the vector

$$
\begin{align}
\nabla y(x) =
\begin{bmatrix}
{\frac {\partial y}{\partial x_{1}}}(x)
\\
\vdots 
\\
{\frac {\partial y}{\partial x_{n}}}(x)
\end{bmatrix}
\end{align}
$$

Multidimensional (or Vector) inputs: Suppose we have a function $ h(y_1(x), ..., y_n(x)) $ and we want to compute the derivative of $ h $ with respect to $ x $. This derivative is given by:

$$
\begin{align}
\nabla h_x = \sum_{i=1}^n \frac{\partial h}{\partial y_i} \nabla y_i(x)\\
\end{align}
$$

#### Part 2: Evaluation (Forward) Trace
Definition: Suppose x = $ \begin{bmatrix} {x_1} \\ \vdots \\ {x_m} \end{bmatrix} $, we defined $ v_{k - m} = x_k $ for $ k = 1, 2, ..., m $ in the evaluation trace.

Motivation: The evaluation trace introduces intermediate results $ v_{k-m} $ of elementary unary or binary operations. 

#### Part 3: Computation (Forward) Graph

We have associated each $ v_{k-m} $ to a node in a graph for a visualization of the partial ordering.

#### Part 4: Computing the derivative

Let's return to the gradient $ \nabla $

Definition of gradient operator: we project the gradient from before in the direction of $ p $

$$ D_p v_j = (\nabla v_j)^T p = (\sum_{i < j} \frac{\partial{v_j}} {\partial{v_i}} \nabla v_i)^T p = \sum_{i < j} \frac{\partial{v_j}} {\partial{v_i}} (\nabla v_i)^T p = \sum_{i < j} \frac{\partial{v_j}} {\partial{v_i}} D_p v_i$$ 

Higher dimension: We recursively apply the same technic introduced above to each entry of the vector valued function f

Two take away messages: 

1) We can compute the derivative of $ v_j $ with knowledge of $ v_i $ and $ D_p v_i $ for $ i < j $.

2) Once a child node is evaluated, its parent node(s) are no longer needed. There is no need to store the full graph of and pairs.

#### Part 5: Dual Number

Definition: we define a dual number $ z_j = v_j + D_p v_j \epsilon $ such that $ \epsilon^2 = 0 $ where $ v_j $ corresponds to primal trace and $ D_p v_j $ corresponds to tangent trace. 

$ f(z_j) = f(v_j + D_p v_j \epsilon) = f(v_j) + f'(v_j) D_p v_j \epsilon $ using a Taylor series expansion

All higher term vanish because of the definition $ \epsilon^2 = 0 $

Advantage: Operations on Dual Number pertain the form of Taylor expansion.

Consider the following example
$$
\begin{align}
z_1 &= a_1 + b_1 \epsilon \\ 
z_2 &= a_2 + b_2 \epsilon \\
z_1 + z_2 &= (a_1 + a_2) + (b_1 + b_2) \epsilon \\
z_1 z_2 &= a_1 a_2 + (a_1 b_2 + a_2 b_1) \epsilon \\
\end{align}
$$

We subsititute $ a_1 = u, b_1 = u' $ and $ a_2 = v $ and $ b_2 = v' $,
$$
\begin{align}
z_1 + z_2 &= (u + v) + (u' + v') \epsilon \\
z_1 z_2 &= u v + (u v' + u' v) \epsilon \\
\end{align}
$$

# How to Use $\textit{PackageName}$

*text here*

# Software Organization

*text here*

# Implementation

*text here*

# Licensing

*text here*