# Chapter 5 Vector Calculus
* **Finding good parameters can be phrased as an optimization problem**
![Screen%20Shot%202020-11-20%20at%206.17.20%20AM.png](attachment:Screen%20Shot%202020-11-20%20at%206.17.20%20AM.png)
* **A function $f$ is a quantity that relates two quantities to each other. In this book, these quantities are typically inputs $ \pmb{x} \in \mathbb{R}^D $ and targets(function values) $f(\pmb{x}) $, which we assume are real-valued if not stated otherwise. Here $\mathbb{R}^{D}$ is the *domain* of $f$, and the functiom values $f(\pmb{x})$ are the *image/codomain* of $f$.
$$ f: \mathbb{R}^{D} \to \mathbb{R} \\
\pmb{x} \mapsto f(\pmb{x}) $$
where the first function specifies that $f$ is a mapping from $\mathbb{R}^{D} $ to $\mathbb{R}$ and the second equation specifies the explicit assignment of an input $\pmb{x}$ to a function value $f(\pmb{x})$. A function $f$ assigns every input $\pmb{x}$ to exactly one function value $f(\pmb{x})$**
* **Dot Product: function $ f(\pmb{x}) = \pmb{x}^{T} \pmb{x} , \pmb{x} \in \mathbb{R}^{2} $ can be specified as 
$$ \mathbb{R}^{2} \to \mathbb{R} \\
\pmb{x} \mapsto x_1^2 + x_2^2 $$**

## 5.1 Differentiation of Univarate Functions
* **Difference Quotient. 
$$ \frac{\delta y}{\delta x} := \frac{f (x + \delta x) - f(x)}{ \delta x}$$
computes the slope of the secant line through two points on the graph of $f$.**

    * **The difference quotient can also be considered the average slope of $x$ and $x + \delta x $ if we assume $f$ to be a linear function.**
    * **In the limit for $\delta x \to 0$, we obtian the tangent of $f$ at $x$, if $f$ is differentiable. The tangent is then derivative of $f$ at $x$.**

* **Derivative. For $h > 0 $, the derivative at $x$ is defined as the limit 
$$ \frac{d f}{d f} := \lim_{h\to 0}\frac{f(x + h) - f(x)}{h} $$**
    * **The derivative of $f$ points in the direction of steepest ascent of $f$.**


### 5.1.1 Taylor Series
* **The *Taylor polynomial* of degree *n* of $ f: \mathbb{R}\to\mathbb{R} $ at $x_0$ is defined as:
$$ T_n (x) := \sum_{k=0}^{\infty}\frac{f^{(k)}(x_0)}{k!}(x-x_0)^{k} $$,
where $ f^{(k)} (x_0)$ is the kth derivative of $f$ at $x_0$ (which we assume exists) and $ f^{(k)} (x_0)$ are the coefficients of the polynomail.**
* **For a smooth function $ f \in \mathcal{C}^{\infty}, f: \mathbb{R} \to \mathbb{R}$，**
* **For $x_0 = 0$. We obtain the *Maclaurin series* as a special instance of the Taylor series. If $f(x) = T_{\infty} (x)$, then $f$ is called *analytic*.**
    * **In general, a Taylor polynomial of degree *n* is an approximation of a function, which does not need to be a polynomial.**
    * **The Taylor polynomial is similar to a $f$ in a neighborhood around $x_0$. However, a Taylor polynomial of degree $n$ is an exact representation of a polynomial $f$ of degree $k \leq n $ since all derivatives $ f^{(i)}, i > k$ vanish.**
$$ \cos(x) = \sum_{k=0}^{\infty}(-1)^{k} \frac{1}{(2k)!} x^{2k}$$
$$ \sin(x) = \sum_{k=0}^{\infty} (-1)^{k} \frac{1}{(2k + 1)!}x^{2k+1}$$
* **A Taylor series is a sepcial case of a power series
$$ f(x)= \sum_{k=0}^{\infty} a_k (x - c)^{k} $$
where $a_k$ are coefficients and $x$ is a constant.**

### 5.1.2 Differentiation Rules
**Product rule:**<br>
$$ (f(x)g(x))' =f'(x) g(x) + f(x)g'(x)$$
**Quotient Rule**<br>
$$ (\frac{f(x)}{g(x)})' = \frac{f'(x)g(x)-f(x)g'(x)}{(g(x))^2}$$
**Sum Rule**<br>
$$(f(x)+g(x))'=f'(x)+g'(x)$$
**Chain Rule**<br>
$$ (g(f(x)))'=(g\circ f)'(x) = g'(f(x))f'(x)$$
**Here, $g \circ f$ denotes fucntion composition $ x \mapsto f(x) \mapsto g(f(x))$**

## 5.2 Partial Differentiation and Gradients
* **The generalization of the derivative to functions of several variables is the *gradient***
    * **We find the gradient of the function $f$ with resepct to *x* by *varying one variable at a time* and keeping the others constant. The gradient is then the collection of these *partial derivatives*.**
* **Partial Derivative. For a function $f: \mathbb{R}^{n} \to \mathbb{R}, \pmb{x} \mapsto f(\pmb{x}), \pmb{x} \in \mathbb{R}^{n} $ of $n$ variables $x_1, \dots, x_n$, we define the *partial derivatives* as 
$$ \frac{\partial f}{\partial x_1} = \lim_{h\to 0} \frac{f(x_1 + h, x_2,\dots, x_n) - f(\pmb{x})}{h}$$
$$ \vdots $$
$$ \frac{\partial f}{\partial x_n} = \lim_{h\to 0} \frac{f(x_1, \dots, x_{n-1}, x_n + h) - f(\pmb{x})}{h}$$
and collcet them in the row vector
$$\nabla_{x} f = \mathrm{grad} f = \frac{d f}{d \pmb{x}} = [\frac{\partial(\pmb{x})}{\partial{x_1}} \frac{\partial f(\pmb{x})}{\partial x_2} \quad \dots \quad \frac{\partial f(\pmb{x})}{\partial x_2}] \in \mathbb{R}^{1 \times n} $$**
**where $n$ is the number of variables and $1$ is the dimension of the image/range/codomain of $f$.**
    * **The row vector is called the *gradient* of $f$ or the *Jacobian* and is the generalization of the derivative**
* **Here are the general product rule, sum rule, and chain rule:**
* **Product Rule:**
$$ \frac{\partial}{\partial\pmb{x}}(f(\pmb{x})g(\pmb{x})) = \frac{\partial f}{\partial \pmb{x}} g(\pmb{x}) + f(\pmb{x}) \frac{\partial g}{\partial \pmb{x}} $$
* **Sum Rule**
$$ \frac{\partial}{\partial \pmb{x}}(f(\pmb{x})+g(\pmb{x})) = \frac{\partial f}{\partial\pmb{x}}+\frac{\partial g}{\partial\pmb{x}} $$
* **Chain Rule**
$$ \frac{\partial}{\partial \pmb{x}}(g \circ f)(\pmb{x})= \frac{\partial}{\partial \pmb{x}}(g(f(\pmb{x})))= \frac{\partial g}{\partial f} \frac{\partial f}{\partial \pmb{x}} $$

### 5.2.2 Chain Rule
* **Consider a function $ f: \mathbb{R}^{2} \to \mathbb{R} $ of two variables $ x_1, x_2$. 
$$ \frac{d f}{d t} = [ \frac{\partial f}{\partial x_1} \quad \frac{\partial f}{\partial x_2} ] 
\begin {bmatrix}
\frac{\partial_{x_1}(t)}{\partial t} \\
\frac{\partial_{x_2}(t)}{\partial t} \\
\end {bmatrix}
= \frac{\partial f}{\partial x_1} \frac{\partial x_1}{\partial t} + \frac{\partial f}{\partial x_2} \frac{\partial x_2}{\partial t} $$
where $d$  denotes the gradient and $\partial $ partial derivatives.**

* **If $f(\textit{x}_1, \textit{x}_2)$ is a function fo $x_1$ and $x_2$, whhere $x_1(s,t) $ and $x_2(s,t)$ are themselves of two variables $s$ and $t$, the chian rule yields partial derivaties**
![Screen%20Shot%202020-11-20%20at%205.53.00%20PM.png](attachment:Screen%20Shot%202020-11-20%20at%205.53.00%20PM.png)


## 5.3 Gradients of Vector-Valued Functions
* **For a function $\pmb{f}: \mathbb{R}^{n} \to \mathbb{R}^{m}$ and a vector $ \pmb{x} = [x_1, \dots, x_n]^T \in \mathbb{R}^{n} $, the corresponding vector of function values is given as
$$ \pmb{f}(\pmb{x}) = 
\begin {bmatrix}
f_1(\pmb{x}) \\
\vdots \\
f_m(\pmb{x})\\
\end {bmatrix} \in \mathbb{R}^ m $$
writing the vector-valued function in this way allows us to view a vector-valued function.**
* **Therefore, the partial derivative of a vector-valued function $ \pmb{f}: \mathbb{R}^{n} \to \mathbb{R}^{m} $ with respect to $ x_i \in \mathbb{R}, i = 1, \dots, n$ is given as the vector
$$ \frac{\partial \pmb{f}}{\partial x_i}
= 
\begin {bmatrix}
\frac{\partial f_1}{\partial x_i} \\
\vdots\\
\frac{ \partial f_m}{ \partial x_i} \\
\end {bmatrix}
=
\begin {bmatrix}
\lim_{h\to 0}
\frac{f_1(x_1,\dots, x_{i-1}, x_{i}+h, x_{i+1},\dots, x_n)-f_1(\pmb{x})}{h}\\
\vdots \\
\lim_{h\to 0}
\frac{f_m(x_1, \dots, x_{i-1}, x_{i}+h, x_{i+1} \dots, x_n) -f_m(\pmb{x})}{h}
\end {bmatrix}
\in 
\mathbb{R}^{m}
$$**

* **The gradient of $\pmb{f}$ with respect to a vector is the row vector of the partial derivatives. Every partial derivative $\frac{\partial \pmb{f}}{\partial x}$ is itself a column vector. Therefore, we obtain the gradient of $\pmb{f}: \mathbb{R}^{n} \to \mathbb{R}^{m}$ with respect to $\pmb{x} \in \mathbb{R}^{n} $ by collecting these partial derivatives:**
$$ \frac{\mathrm{d} \pmb{f(x)}}{\mathrm{d} \pmb{x}} =
\begin {bmatrix}
\frac{\partial \pmb{f(x)}}{\partial x_1} \quad \dots \quad 
\frac{\partial \pmb{f(x)}}{\partial x_n}\\
\vdots \quad \quad \quad \vdots \\
\frac{\partial f_m(\pmb{x})}{\partial x_1} \quad \dots \quad 
\frac{\partial f_m(\pmb{x})}{\partial x_n}\\
\end {bmatrix}
=
$$

* **The collection of all first-order partial derivatives of vector-valued function $\pmb{f}: \mathbb{R}^{n} \to \mathbb{R}^{m}$ is called the *Jacobian*. The Jacobian $\pmb{J}$ is an $ m \times n$ matrix.**
![Screen%20Shot%202020-11-20%20at%207.11.25%20PM.png](attachment:Screen%20Shot%202020-11-20%20at%207.11.25%20PM.png)

* **Special Case: A function $ f : \mathbb{R}^{n} \to \mathbb{R}^{1} $, which maps a vector $ \pmb{x} \in \mathbb{R}$ onto a scalar, possesses a Jacobina that is a row vector( matrix of dimension $ 1 \times n$)**
$$\nabla_{x} f = \mathrm{grad} f = \frac{d f}{d \pmb{x}} = [\frac{\partial(\pmb{x})}{\partial{x_1}} \quad \dots \quad \frac{\partial f(\pmb{x})}{\partial x_2}] \in \mathbb{R}^{1 \times n} $$
* **We use the *numerator layour* of the derivative, i.e., the derivative $ \frac{\mathrm{d} \pmb{f}}{\mathrm{d} \pmb{x}} $ of $\pmb{f} \in \mathbb{R}^{m} $ with respect to $\pmb{x} \in \mathbb{R}^{n} $ is an $m \times n$ matrix, where the elements of $\pmb{f}$ define the rows and elements of $\pmb{x}$ define the columns of the corresponding Jacobian. The *denominator layout* is the transpose of the numerator layout.**

* **Given two vectors $ \pmb{b}_1 = [1, 0]^{T} $, $ \pmb{b}_2 = [0, 1]^{T} $ as the sides of the unit square, the area given by**
![Screen%20Shot%202020-11-21%20at%206.43.05%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%206.43.05%20AM.png)
**In order to transform to a paralleologram with the sides $ \pmb{c}_1 = [-2, 1]^{T}, \pmb{c}_2 = [1, 1]^{T} $. The are is given as the absolute value of the determinant**
![Screen%20Shot%202020-11-21%20at%206.45.17%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%206.45.17%20AM.png)
* **Approach 1. We first identify both $ { \pmb{b}_1, \pmb{b}_2} $ and $ {\pmb{c}_1, \pmb{c}_2} $ as bases of $ \mathbb{R}^{2} $. We effectively perform a change of basis from $ (\pmb{b}_1, \pmb{b}_2) $ to $ (\pmb{c}_1, \pmb{c}_2) $ and we are looking for a transformation matrix that implements the change. We identify the desired basis change matrix as**
$$ \pmb{J} =
\begin {bmatrix}
-2 \quad 1 \\
1 \quad 1 \\
\end {bmatrix}
$$
* **For nonlinear transformation, $ f$ maps the coordinate representation of any vector $ \pmb{x} \in \mathbb{R}^{2} $ with respect to $ ( \pmb{b}_1, \pmb{b}_2)$ onto the coordinate representation $ y \in \mathbb{R}^{2} $ with respect to $( \pmb{c}_1, \pmb{c}_2)$. We need to identify the mapping so that we can compute how an area( or volume) changes when it is being transformed by $\pmb{f} $**
* **If the coordinate trnasformation is linear, the *Jacobian* recovers exactly the basis change matrix.**
* **If the coordinate transformation is nonlinear, the *Jacobin* approximates this non-linear transformation locally within a linear one.**
    * **The aboslute value of the *Jacobian determinant* $ \mid det(\pmb{J}) \mid $ is the factor by which areas or volumnes are scaled when coordinates are transformed**

* **If $f: \mathbb{R} \to \mathbb{R} $, the gradient is simply a scalar. For $ f: \mathbb{R}^{D} \to \mathbb{R} $, the gradient is a $ 1 \times D$ row vector. For $ \pmb{f}: \mathbb{R} \to \mathbb{R}^{E} $, the gradient is an $E \times 1$ column vecotr, and for $\pmb{f}: \mathbb{R}^{D} \to \mathbb{R}^{E} $, the gradient is an $E \times D $ matrix.**
![Screen%20Shot%202020-11-21%20at%207.13.39%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%207.13.39%20AM.png)
![Screen%20Shot%202020-11-21%20at%207.16.08%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%207.16.08%20AM.png)

![Screen%20Shot%202020-11-21%20at%207.41.14%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%207.41.14%20AM.png)

## 5.4 Gradients of Matrices
* **We can think of tensor as a multidimensional array that collects partial derivatives. For example, if we compute the gradient of an $ m \times n $ matrix $\pmb{A} $ with respect to $ p \times q $ matrix $\pmb{B} $, the resulting Jacobian would be $ (m \times n) \times (p \times q)$,i.e., a four-dimensional tensor $\pmb{J}$, whose entries are given as $ J_{ijkl} = \frac{\partial A_{ij}}{\partial B_{kl}}$**
* **Since matrices represent linear mappings, we can exploit the fact that there is a vector-space isomophism (linear, invertible mapping) between the space $ \mathbb{R}^{m \times n} $ of $ m \times n $ matrices and the space $ \mathbb{R}^{mn} $ of $mn$ and $pq$ respectively.**
![Screen%20Shot%202020-11-21%20at%207.43.06%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%207.43.06%20AM.png)

![Screen%20Shot%202020-11-21%20at%208.05.53%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%208.05.53%20AM.png)

![Screen%20Shot%202020-11-21%20at%208.06.22%20AM.png](attachment:Screen%20Shot%202020-11-21%20at%208.06.22%20AM.png)