# Matrix Differentiation Rules

It is often convenient to express the derivatives in matrix form and use vectorized operations when updating the weights in neural networks. The reason for this is that vectorized operations in `numpy` for example are much faster than using Python loops. In this notebook, we will summarize a couple of useful rules for matrix differentiation.

:::{#def-m-from-vec-n}
Let $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ be a function that maps a m-dimensional vector to an n-dimensional vector. Then the derivative of $f$ with respect to a vector $x$ is a matrix (called the Jacobian matrix of $f$) of shape $m \times n$ and is given by

$$
\frac{\partial f}{\partial x} = \begin{pmatrix}
\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{pmatrix}
$$
:::

:::{#def-matrix-to-scalar}

Let $f: \mathbb{R}^{m \times n} \rightarrow \mathbb{R}$ be a function that maps a matrix to a scalar (e.g. a loss function). Then the derivative of $f$ with respect to a matrix $W$ is a matrix of the same shape as $W$ and is given by

$$
\frac{\partial f}{\partial W} = \begin{pmatrix}
\frac{\partial f}{\partial W_{11}} & \cdots & \frac{\partial f}{\partial W_{1n}} \\
\vdots & \ddots & \vdots \\
\frac{\partial f}{\partial W_{m1}} & \cdots & \frac{\partial f}{\partial W_{mn}}
\end{pmatrix}
$$
:::

As an example, let's look at a function $f: \mathbb{R}^{1} \rightarrow \mathbb{R}^{2}$ that maps a scalar to a 2-dimensional vector. The function is defined as

$$
f(x) = \begin{pmatrix} x^2 \\ x^3 \end{pmatrix}
$$

The Jacobian matrix of $f$ is

$$
\frac{\partial f}{\partial x} = \begin{pmatrix}
\frac{\partial f_1}{\partial x} & \frac{\partial f_2}{\partial x}
\end{pmatrix} = \begin{pmatrix}
2x & 3x^2
\end{pmatrix}
$$

