# Matricial Calculus

Below are presented basic concepts of matrix calculus, which consists of extending the concepts for differential and integral calculus in spaces of greater dimensionality.

## Gradient 

Suppose a multivariate function that takes multiple inputs (represented in matrix $ A\in\mathbb{R}^{m\times n}$) and returns a scalar output $s\in\mathbb{R}$, so $f:\mathbb{R}^{m\times n}\rightarrow\mathbb{R}$.

The gradient of the function $f$ with respect to its input  $A\in\mathbb{R}^{m\times n}$ is the matrix of partial derivatives defined as:

$$\nabla_{A}f\left(A\right)\in\mathbb{R}^{m\times n}=\begin{bmatrix}\frac{\partial f\left(A\right)}{\partial A_{1,1}} & \frac{\partial f\left(A\right)}{\partial A_{1,2}} & \ldots & \frac{\partial f\left(A\right)}{\partial A_{1,n}}\\
\frac{\partial f\left(A\right)}{\partial A_{2,1}} & \frac{\partial f\left(A\right)}{\partial A_{2,2}} & \ldots & \frac{\partial f\left(A\right)}{\partial A_{2,n}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial f\left(A\right)}{\partial A_{m,1}} & \frac{\partial f\left(A\right)}{\partial A_{m,2}} & \ldots & \frac{\partial f\left(A\right)}{\partial A_{m,n}}
\end{bmatrix}$$

in compact notation, each entry is given by:

$$\left(\nabla_{A}f\left(A\right)\right)_{i,j}=\frac{\partial f\left(A\right)}{\partial A_{i,j}}$$

in particular, for an entry defined in a vector $\vec{x}\in\mathbb{R}^{n}$ the gradient is defined as:

$$\nabla_{\vec{x}}f\left(\vec{x}\right)=\begin{bmatrix}\frac{\partial f\left(A\right)}{\partial x_{1}}\\
\vdots\\
\frac{\partial f\left(A\right)}{\partial x_{n}}
\end{bmatrix}.$$

In [None]:
#Excercise 1

It is important to note that the gradient **is only defined if the function returns a scalar**. This means that for example, it is not possible to take the gradient of $A\,\vec{x}$, since the result of such matrix product is a vector, and not a scalar.

The partial matrix derivative is also a linear operator, such as the partial derivative of a multivariable function, so that it then satisfies the properties of homogeneity and superposition:

* $\nabla_{\vec{x}}\left(f\left(\vec{x}\right)+g\left(\vec{x}\right)\right)=\nabla_{\vec{x}}f\left(\vec{x}\right)+\nabla_{\vec{x}}g\left(\vec{x}\right)$

* For a scalar $s\in\mathbb{R}, \nabla_{\vec{x}}\left(s\,f\left(\vec{x}\right)\right)=s\,\nabla_{\vec{x}}f\left(\vec{x}\right)$

In [None]:
#Excercise 2

An example of a multidimensional function with an input vector is the function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$

$$f\left(\vec{z}\right)=\vec{z}^{T}\vec{z}=\sum_{i=1}^{n}z_{i}^{2}$$

which, as observed, calculates the product point $\vec{z}\cdot\vec{z}$ of its input vector $\vec{z}=\begin{bmatrix}z_{1}\\
\vdots\\
z_{m}
\end{bmatrix} $

Examining each of the $m$ partial derivatives $\frac{\partial f\left(\vec{z}\right)}{\partial z_{k}}$ (you can ignore the fact that the input is given by a vector and treat like any multivariable function) you have:

$\require{cancel}$

$$\frac{\partial f\left(\vec{z}\right)}{\partial z_{k}}=\cancelto{0}{\frac{\partial}{\partial z_{k}}z_{1}^{2}}+\cancelto{0}{\frac{\partial}{\partial z_{k}}z_{2}^{2}}+\ldots+\cancelto{2\,z_{k}}{\frac{\partial}{\partial z_{k}}z_{k}^{2}}+\ldots+\cancelto{0}{\frac{\partial}{\partial z_{i}}z_{n}^{2}}=2\,z_{k}.$$


Thus the gradient vector is given by:

$$\nabla_{\vec{z}}f\left(\vec{z}\right)=\begin{bmatrix}\frac{\partial f\left(\vec{z}\right)}{\partial z_{1}}\\
\vdots\\
\frac{\partial f\left(\vec{z}\right)}{\partial z_{n}}
\end{bmatrix}=\begin{bmatrix}2\,z_{1}\\
\vdots\\
2\,z_{n}
\end{bmatrix}=2\,\vec{z}.$$

and so then the equivalent of the derivative of a quadratic function of a variable is:

$$\nabla_{\vec{z}}f\left(\vec{z}\right)=\nabla_{\vec{z}}\left(\vec{z}^{T}\vec{z}\right)=2\,\vec{z}.$$

In [1]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np


fig = plt.figure()
fig2 = plt.figure()
fig3 = plt.figure()
ax = fig.gca(projection='3d')
ax2 = fig2.gca(projection='3d')
ax3 = fig3.gca(projection='3d')

# Make data.
X = np.arange(0, 2, 0.01)
Y = np.arange(0, 2, 0.01)
X, Y = np.meshgrid(X, Y)
#dot product for 2d vector
Z = np.multiply(X,X) + np.multiply(Y,Y);

# Plot the surface.
surf = ax.plot_surface(X, Y, Z)
# Customize the z axis.
ax.set_zlim(0, 5)

#analytical gradient
Xag = 2 * X;
Yag = 2 * Y;


#numerical gradient
h = 0.01 #dx step
gy, gx = np.gradient(Z, h)

# Plot the surface.
surf2 = ax2.plot_surface(X, Y, Xag)
# Customize the z axis.
ax2.set_zlim(0, 5)

# Plot the surface.
surf3 = ax3.plot_surface(X, Y, gx)
# Customize the z axis.
ax3.set_zlim(0, 5)

(0, 5)

What happens if the input of the function is multiplied by a matrix $A\in\mathbb{R}^{m\times n}$, so that the gradient $\nabla f\left(A\,\vec{x}\right)$, with $\vec{x}\in\mathbb{R}^{n}$? 

The gradient of $f$ must be interpreted as the evaluation of it at the point $A\,\vec{x}=\vec{z}$, so the gradient is given by:
$$\nabla f\left(A\,\vec{x}\right)=\nabla\left(\left(A\,\vec{x}\right)^{T}\left(A\,\vec{x}\right)\right)=2\left(A\,\vec{x}\right)=2\,A\,\vec{x}\in\mathbb{R}^{m}$$

Generalizing the previous function, which receives a vector $\vec{x}\in\mathbb{R}^{n}$  as input, and with a known vector $\vec{b}\in\mathbb{R}^{n}$:

$$f\left(\vec{x}\right)=\vec{b}^{T}\vec{x}=\sum_{i=1}^{n}b_{i}x_{i}$$

with its partial derivative is then given by:

$$\frac{\partial f\left(\vec{x}\right)}{\partial x_{k}}=\frac{\partial}{\partial x_{k}}\sum_{i=1}^{n}b_{i}x_{i}=b_{k} \enspace .$$

And that's why we have:
$$\nabla_{\vec{x}}\left(\vec{b}^{T}\vec{x}\right)=\vec{b}$$

In [None]:
#Excercise 4

Now consider the quadratic function (which, as already seen, results in a scalar):

$$f\left(\vec{x}\right)=\vec{x}^{T}\,A\,\vec{x}=\vec{x}^{T}\,\begin{bmatrix}- & \vec{a}_{1,:}^{T} & -\\
- & \vec{a}_{2,:}^{T} & -\\
 & \vdots\\
- & \vec{a}_{m,:}^{T} & -
\end{bmatrix}\,\begin{bmatrix}x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}=\begin{bmatrix}x_{1} & \ldots & x_{n}\end{bmatrix}\,\begin{bmatrix}\vec{a}_{1,:}^{T}\:\vec{x}\\
\vec{a}_{2,:}^{T}\:\vec{x}\\
\vdots\\
\vec{a}_{m,:}^{T}\:\vec{x}
\end{bmatrix}=\sum_{i=1}^{n}\sum_{j=1}^{n}A_{i,j}x_{i}x_{j}$$

To calculate the partial derivative $\frac{\partial f\left(\vec{x}\right)}{\partial x_{k}}$ for each component $x_{k}$ of the input vector $\vec{x}$, decompose the nested summations in the cases in which the row and column of such summation is different from $k$, in which the row is equal to $k$, in addition to the case in which the column is equal to $k$, and finally, when it is in the row and column $k$:

$$\frac{\partial f\left(\vec{x}\right)}{\partial x_{k}}=\frac{\partial}{\partial x_{k}}\sum_{i=1}^{n}\sum_{j=1}^{n}A_{i,j}x_{i}x_{j}$$

$$\Rightarrow\frac{\partial f\left(\vec{x}\right)}{\partial x_{k}}=\frac{\partial}{\partial x_{k}}\left[\sum_{i\neq k}^{n}\sum_{j\neq k}^{n}A_{i,j}x_{i}x_{j}+\sum_{i\neq k}^{n}A_{i,k}x_{i}x_{k}+\sum_{j\neq k}^{n}A_{k,j}x_{k}x_{j}+A_{k,k}x_{k}^{2}\right]$$

$$\Rightarrow\frac{\partial f\left(\vec{x}\right)}{\partial x_{k}}=\sum_{i\neq k}^{n}A_{i,k}x_{i}+\sum_{j\neq k}^{n}A_{k,j}x_{j}+2A_{k,k}x_{k}=\sum_{i=1}^{n}A_{i,k}x_{i}+\sum_{j=1}^{n}A_{k,j}x_{j}$$

Since it is assumed that in the quadratic form $A$ is symmetric, which means that $A=A^{T}\Rightarrow A_{i,j}=A_{j,i}$, we have:

$$\Rightarrow\frac{\partial f\left(\vec{x}\right)}{\partial x_{k}}=\sum_{i=1}^{n}A_{i,k}x_{i}+\sum_{j=1}^{n}A_{k,j}x_{j}=2\sum_{i=1}^{n}A_{k,i}x_{i}.$$

That is why it is concluded that the gradient of the quadratic form is given by:

$$\nabla_{\vec{x}}\left(\vec{x}^{T}\,A\,\vec{x}\right)=2\,A\,\vec{x}.$$

In [None]:
#Excercise 5

The following matrix derivatives are then concluded:

* $\nabla\left(\vec{x}^{T}\vec{x}\right)=2\,\vec{x}$

* $\nabla\left(\left(A\,\vec{x}\right)^{T}\left(A\,\vec{x}\right)\right)=2\,A\,\vec{x}$

* $\nabla_{\vec{x}}\left(\vec{b}^{T}\vec{x}\right)=\vec{b}$

* $\nabla_{\vec{x}}\left(\vec{x}^{T}\,A\,\vec{x}\right)=2\,A\,\vec{x}$