```julia
ϵ = 1e-10rand()
@show ϵ
@show (1+ϵ)
# ϵ = 8.22018698287886e-11
# 1 + ϵ = 1.0000000000822018

ϵ = eps(Float64)
@show ϵ
@show (1+ϵ)
# ϵ = 2.220446049250313e-16
# 1 + ϵ = 1.0000000000000002
```

## Finite Differences

$$f'(x) = \lim_{\epsilon \rightarrow 0} \frac{f(x+\epsilon)-f(x)}{\epsilon}$$



Finite differencing directly follows from this definition by choosing a small
$\epsilon$. However, choosing a good $\epsilon$ is very difficult. If $\epsilon$
is too large than there is error since this definition is asymtopic. However,
if $\epsilon$ is too small, you receive roundoff error.

## Differencing in a Different Dimension: Complex Step Differentiation

$$f(x+ih) = f(x) + f'(x)ih + \mathcal{O}(h^2)$$
$$if'(x) = \frac{f(x+ih) - f(x)}{h} + \mathcal{O}(h)$$
$$f'(x) = \frac{Im(f(x+ih))}{h} + \mathcal{O}(h)$$
since $Im(f(x)) = 0$ (since it's real valued!)



But to understand the computational advantage, 
    recall that x is pure real, and thus x+ih is a complex number where the h never directly interacts with x since a complex number is a two dimensional number where you keep the two pieces separate. 

Thus there is no numerical cancellation by using a small value of h, and thus, due to the relative precision of floating point numbers, both the real and imaginary parts will be computed to (approximately) 16 digits of accuracy for any choice of h.

### Derivatives measures sensitivity

$$f(x + \epsilon) = f(x) + f'(x) \epsilon + o(\epsilon).$$
we will ignore higher-order terms; formally we set $\epsilon^2 = 0$

$$f(x + \epsilon) \rightsquigarrow f(x) + \epsilon f'(x)$$
$$f(x + \mathbf{\beta} \epsilon) \rightsquigarrow f(x) + \epsilon \beta f'(x)$$

note: 1 is the derivate of x

## Dual Numbers

Thus, to extend the idea of complex step differentiation beyond complex analytic
functions, we define a new number type, the *dual number*. 

A dual number is a multidimensional number where the sensitivity of the function is propagated
along the dual portion.

Here we will now start to use $\epsilon$ as a dimensional signifier

$$f(x + \epsilon) \rightsquigarrow f(x) + \epsilon f'(x)$$
$$f(x + \epsilon) \rightsquigarrow g(x) + \epsilon g'(x)$$

$$f(x + \epsilon) \plusmn g(x + \epsilon) = [f(x) \plusmn g(x)] + \epsilon[f'(x) \plusmn g'(x)]$$

$$f(x + \epsilon) \cdot g(x + \epsilon) = [f(x) \cdot g(x)] + \epsilon[f(x) \cdot g'(x) + g(x) \cdot f'(x) ]$$
$$f(x + \epsilon) / g(x + \epsilon) = [f(x) \cdot g(x)] + \epsilon[\frac{g(x) \cdot f'(x) - f(x) \cdot g'(x)}{g(x)^2}  ]$$

## Chain rule

$$f(g(x + \epsilon)) = f(g(x) + \epsilon g'(x)) = f(g(x)) + \epsilon f'(g(x)) g'(x) $$

Note that the chain rule has been explicitly encoded in the derivative part.

Now we are in the position to differentiate arbitrary code that is composed of these operations. 

Are we ready to differentiate arbitrary code ? Not quite yet !!

If you encode rules for higher order primitives,  and also for conditionals then you can differentiate arbitrary code.

In [3]:
## Here comes the coding part

### Higher dimensions

$$f: \mathbb{R}^n \to \mathbb{R}$$

$$f(x + \epsilon) = f(x) + \nabla f(x) \cdot \epsilon + o(\epsilon)$$

where $x \in \mathbb{R}^n$ and $\nabla f(x)$ is the **gradient** of $f$ at $x$, 
i.e. the vector of partial derivatives in each direction.


We can think of these as $\epsilon$ perturbations in different directions, which satisfy $\epsilon_i^2 = \epsilon_i \epsilon_j = 0$, and we will call $\epsilon$ the vector of all perturbations

## Finite Differences