## Raison d'être
This notebook is to explain why `(A+A')*x` can be used to verify `ForwardDiff.gradient(x->x'A*x,x)` in `./Automatic Differentiation in 10 Minutes.ipynb`.

In [2]:
A = randn(3,3)

3×3 Array{Float64,2}:
 -0.31633   -1.85252    0.262466
 -0.119067   0.673018  -0.719999
 -0.820102  -0.898565  -1.71801

In [3]:
x = randn(3)

3-element Array{Float64,1}:
 -0.7187265639441195
 -0.4072388215076498
  0.4596722352439819

In [1]:
using ForwardDiff

In [4]:
ForwardDiff.gradient(x->x'A*x,x)

3-element Array{Float64,1}:
  1.001287888379385
  0.12486790174782009
 -0.5195084166634044

In [5]:
(A+A')*x

3-element Array{Float64,1}:
  1.0012878883793852
  0.12486790174781992
 -0.5195084166634043

### `size` of an array in Julia
Since I am still relatively new to Julia, let's recall how Julia deals with **shapes/sizes**.

In [2]:
size(A)

(3, 3)

In [4]:
size(x)

(3,)

In [5]:
A*x

3-element Array{Float64,1}:
  3.1010046540491953
 -2.961991502064001
 -2.9322573933725877

In [6]:
size(A*x)

(3,)

In [7]:
size(x')

(1, 3)

In [8]:
size(x'*A')

(1, 3)

In [9]:
A*reshape(x,3,1)

3×1 Array{Float64,2}:
  3.101004654049196
 -2.961991502064001
 -2.9322573933725877

In [10]:
size(A*reshape(x,3,1))

(3, 1)

In [11]:
x'*x

2.290914008018504

In [15]:
x'*reshape(x,3,1)

1×1 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
 2.290914008018504

### Differential
$$
\begin{align*}
  f:\quad &\mathbb{R}^n \to \mathbb{R} \\
       &x \mapsto x^{T}Ax
\end{align*}
$$


For all $x, h \in \mathbb{R}^{n}$, We have
$$
\begin{align*}
  f(x+h) &= (x+h)^{T} A (x+h)                      \\
         &= x^{T}Ax + h^{T}Ax + x^{T}Ah + h^{T}Ah  \\
         &= f(x) + x^{T}A^{T}h + x^{T}Ah + h^{T}Ah  \\
         &= f(x) + x^{T}(A^{T} + A)\,h + h^{T}Ah\,.  \\
\end{align*}
$$

Since the transformation $h \mapsto x^{T}(A^{T} + A)\,h$ is linear, if we can prove that $h^{T}Ah = o(\lVert h \rVert)$, then we have $\nabla f(x) =
\left(x^{T}(A^{T} + A)\right)^{T} = (A + A^{T})x.$

We shall use the concept of matrix norm in the following proof; for those whose memory about matrix norm has gotten a little rusty, they may refer to one of the following sources
- [my own maths notes](https://github.com/phunc20/maths/blob/master/definitions/matrices/norm/motivation.pdf)
- [Wikipedia is also a good one for this matter](https://en.wikipedia.org/wiki/Matrix_norm)

Ok, we are to prove that $h^{T}Ah = o(\lVert h \rVert)$ and that, by definition, requires that for all $\epsilon > 0$, we must show that there exists $\delta > 0$
s.t. $| h^T A h| < \epsilon$ whenever $\lVert h \rVert < \delta\,.$ So, let $\epsilon > 0$ be arbitrary.

We have
$$
  | h^T A h| \underbrace{\le}_{|v\,\cdot\, w| \,\le\, \lVert v \rVert \lVert v \rVert \,\forall\, v,\, w\, \in\, \mathbb{R}^{n} }
  \lVert h \rVert \lVert Ah \rVert 
  \overbrace{\le}^{{\lVert A \rVert} {\lVert h \rVert} \,\ge\, \lVert Ah \rVert} {\lVert h \rVert} {\lVert A \rVert} {\lVert h \rVert}\,.
$$

If $\lVert A \rVert \neq 0$, then it suffices to take $\delta = \frac{\epsilon}{\lVert A \rVert}$. Indeed,
$$
  \lVert h \rVert < \frac{\epsilon}{\lVert A \rVert} \implies {\lVert h \rVert} {\lVert A \rVert} {\lVert h \rVert} < \epsilon {\lVert h \rVert} \implies  | h^T A h | < \epsilon {\lVert h \rVert}\,.
$$

If $\lVert A \rVert = 0$, then $A = 0$ and $| h^T A h | = 0 < \epsilon$ trivially.