# Matrix differentiation

### The Jacobian matrix

Let $\psi:\mathbb{R}^n \rightarrow \mathbb{R}^m$, or equivalently:

<br>

$$  
\psi(\boldsymbol{x}) 
=
\begin{bmatrix}
\psi_1(\boldsymbol{x}) \\
\vdots \\
\psi_m(\boldsymbol{x})
\end{bmatrix}
=
\begin{bmatrix}
\psi_1(x_1, \dots, x_n) \\
\vdots \\
\psi_m(x_1, \dots, x_n)
\end{bmatrix}
$$

<br>

The *Jacobian* of $\psi$ is defined as follows.

<br>

$$
J_{\psi} = \frac{\partial \psi}{\partial \boldsymbol{x}} 
=
\begin{bmatrix}
&\frac{\partial \psi_1}{\partial x_1} &\frac{\partial \psi_1}{\partial x_2} &\dots &\frac{\partial \psi_1}{\partial x_n} \\
&\frac{\partial \psi_2}{\partial x_1} &\frac{\partial \psi_2}{\partial x_2} &\dots &\frac{\partial \psi_2}{\partial x_n} \\
&\vdots &\vdots &\ddots &\vdots \\
&\frac{\partial \psi_m}{\partial x_1} &\frac{\partial \psi_m}{\partial x_2} &\dots &\frac{\partial \psi_m}{\partial x_n}
\end{bmatrix}
$$

<br>

Notice that, if $\psi:\mathbb{R} \rightarrow \mathbb{R}^m$ (i.e., $n=1$), then the Jacobian is a $m \times 1$ matrix, e.g., a column vector. 

<br>

$$
J_{\psi} = \frac{\partial \psi}{\partial x} 
=
\begin{bmatrix}
\frac{\partial \psi_1}{\partial x} \\
\frac{\partial \psi_2}{\partial x} \\
\vdots \\
\frac{\partial \psi_m}{\partial x}
\end{bmatrix}
$$

<br>

On the other hand, if $\psi:\mathbb{R}^n \rightarrow \mathbb{R}$ (i.e., $m=1$), then the Jacobian is a $1 \times n$ matrix, e.g., a row vector.

<br>

$$
J_{\psi} = \frac{\partial \psi}{\partial \boldsymbol{x}} 
=
\begin{bmatrix}
\frac{\partial \psi}{\partial x_1} \frac{\partial \psi}{\partial x_2} \dots \frac{\partial \psi}{\partial x_n}
\end{bmatrix}
$$

<br>

When $\psi:\mathbb{R}^n \rightarrow \mathbb{R}$, the transpose of the row vector $J_{\psi}$ is called the *gradient* of $\psi$ and denoted by $\nabla \psi$.

<br>


### The Jacobian of the dot product

### The Jacobian of a linear form

Let $\boldsymbol{y} = A \boldsymbol{x}$ be a linear form with $\boldsymbol{y} \in \mathbb{R}^m$, $A \in \mathbb{R}^{m \times n}$ and $\boldsymbol{x} \in \mathbb{R}^n$, or equivalently:

<br>

$$  
\boldsymbol{y}
= 
A\boldsymbol{x} 
=
\begin{bmatrix}
a_{11}x_1 + a_{12}x_2 + \dots + a_{1n}x_n \\
a_{21}x_1 + a_{22}x_2 + \dots + a_{2n}x_n \\
\vdots \\
a_{m1}x_1 + a_{m2}x_2 +\dots + a_{mn}x_n 
\end{bmatrix}
=
\begin{bmatrix}
y_1(x_1, \dots, x_n) \\
\vdots \\
y_m(x_1, \dots, x_n)
\end{bmatrix}
$$

<br>

From the definition of Jacobian it is immediate to notice that $J_{ij} = \frac{\partial y_i}{\partial x_j} $, and hence: 

<br>

$$
\frac{\partial A\boldsymbol{x}}{\partial \boldsymbol{x}} =
\begin{bmatrix}
&\frac{\partial y_1}{\partial x_1} &\frac{\partial y_1}{\partial x_2} &\dots &\frac{\partial y_1}{\partial x_n} \\
&\frac{\partial y_2}{\partial x_1} &\frac{\partial y_2}{\partial x_2} &\dots &\frac{\partial y_2}{\partial x_n} \\
&\vdots &\vdots &\ddots &\vdots \\
&\frac{\partial y_m}{\partial x_1} &\frac{\partial y_m}{\partial x_2} &\dots &\frac{\partial y_m}{\partial x_n}
\end{bmatrix}
= A
$$

<br>

**Takeaway:** $\frac{\partial }{\partial \boldsymbol{x}} A \boldsymbol{x} = A$.

<br>

### The Jacobian of Af(**x**)

Given $A f(\boldsymbol{x})$ with $A \in \mathbb{R}^{m \times n}$ and $f(\boldsymbol{x}):\mathbb{R}^l \rightarrow \mathbb{R}^n$, or equivalently:

<br>

$$  
Af(\boldsymbol{x})
=
\begin{bmatrix}
&a_{11} &a_{12} & \dots  & a_{1n} \\
&a_{21} &a_{22} & \dots  & a_{2n} \\
&\vdots &\vdots &\ddots & \vdots \\
&a_{m1} &a_{m2} & \dots  & a_{mn} 
\end{bmatrix}
\begin{bmatrix}
f_1(x_1, \dots, x_l) \\ 
f_2(x_1, \dots, x_l) \\ 
\vdots \\
f_n(x_1, \dots, x_l)
\end{bmatrix}
=
\begin{bmatrix}
\sum_{i=1}^{n} a_{1i}f_i \\ 
\sum_{i=1}^{n} a_{2i}f_i \\
\vdots \\
\sum_{i=1}^{n} a_{mi}f_i
\end{bmatrix}
=
\begin{bmatrix}
y_1(f(x_1), \dots, f(x_l)) \\
\vdots \\
y_m(f(x_1), \dots, f(x_l))
\end{bmatrix}
$$

<br>

From the definition of Jacobian matrix:

<br>

$$
J(\boldsymbol{y}(\boldsymbol{x}) ) = \frac{\partial \boldsymbol{y}(\boldsymbol{x}) }{\partial \boldsymbol{x}} =
\begin{bmatrix}
&\frac{\partial y_1}{\partial x_1} &\frac{\partial y_1}{\partial x_2} &\dots &\frac{\partial y_1}{\partial x_l} \\
&\frac{\partial y_2}{\partial x_1} &\frac{\partial y_2}{\partial x_2} &\dots &\frac{\partial y_2}{\partial x_l} \\
&\vdots &\vdots &\ddots &\vdots \\
&\frac{\partial y_m}{\partial x_1} &\frac{\partial y_m}{\partial x_2} &\dots &\frac{\partial y_m}{\partial x_l}
\end{bmatrix}
=
\begin{bmatrix}
&\sum_{i=1}^{n} a_{1i}\frac{\partial f_i}{\partial x_1} &\sum_{i=1}^{n} a_{1i}\frac{\partial f_i}{\partial x_2} &\dots &\sum_{i=1}^{n} a_{1i}\frac{\partial f_i}{\partial x_l} \\
&\sum_{i=1}^{n} a_{2i}\frac{\partial f_i}{\partial x_1} &\sum_{i=1}^{n} a_{2i}\frac{\partial f_i}{\partial x_2} &\dots &\sum_{i=1}^{n} a_{2i}\frac{\partial f_i}{\partial x_l} \\
&\vdots &\vdots &\ddots &\vdots \\
&\sum_{i=1}^{n} a_{mi}\frac{\partial f_i}{\partial x_1} &\sum_{i=1}^{n} a_{mi}\frac{\partial f_i}{\partial x_2} &\dots &\sum_{i=1}^{n} a_{mi}\frac{\partial f_i}{\partial x_l}
\end{bmatrix}
=
\begin{bmatrix}
&a_{11} &a_{12} & \dots  & a_{1n} \\
&a_{21} &a_{22} & \dots  & a_{2n} \\
&\vdots &\vdots &\ddots & \vdots \\
&a_{m1} &a_{m2} & \dots  & a_{mn} 
\end{bmatrix}
\begin{bmatrix}
&\frac{\partial f_1}{\partial x_1} &\frac{\partial f_1}{\partial x_2} &\dots &\frac{\partial f_1}{\partial x_l} \\
&\frac{\partial f_2}{\partial x_1} &\frac{\partial f_2}{\partial x_2} &\dots &\frac{\partial f_2}{\partial x_l} \\
&\vdots &\vdots &\ddots &\vdots \\
&\frac{\partial f_n}{\partial x_1} &\frac{\partial f_n}{\partial x_2} &\dots &\frac{\partial f_n}{\partial x_l}
\end{bmatrix}
= A J(f(\boldsymbol{x})) = A \frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}} \in \mathbb{R}^{n \times l}
$$

<br>

**Takeaway:** $\frac{\partial }{\partial \boldsymbol{x}} A f(\boldsymbol{x}) = A \frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}$.

<br>