<a href="https://colab.research.google.com/github/rahiakela/mathematics-for-machine-learning/blob/main/deep-learning-book-maths/2_11_determinant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##The determinant

We saw in [2.8](https://hadrienj.github.io/posts/Deep-Learning-Book-Series-2.8-Singular-Value-Decomposition/) that a matrix can be seen as a linear transformation of the space. The determinant of a matrix ${A}$ is a number corresponding to the *multiplicative change* you get when you transform your space with this matrix (see a comment by Pete L. Clark in [this SE question](https://math.stackexchange.com/questions/668/whats-an-intuitive-way-to-think-about-the-determinant)). A negative determinant means that there is a change in orientation (and not just a rescaling and/or a rotation). As outlined by Nykamp DQ on [Math Insight](https://mathinsight.org/determinant_linear_transformation), a change in orientation means for instance in 2D that we take a plane out of these 2 dimensions, do some transformations and get back to the initial 2D space. Here is an example distinguishing between positive and negative determinant:

<img src="https://github.com/rahiakela/mathematics-for-machine-learning/blob/main/deep-learning-book-maths/images/positive-negative-determinant.png?raw=1" width="400" alt="Comparison of positive and negative determinant" title="Comparison of the effect of positive and negative determinants">
<em>The determinant of a matrix can tell you a lot of things about the transformation associated with this matrix</em>

You can see that the second transformation can't be obtained through rotation and rescaling. Thus the sign can tell you the nature of the transformation associated with the matrix!

In addition, the determinant also gives you the *amount* of transformation. If you take the *n*-dimensional unit cube and apply the matrix ${A}$ on it, the absolute value of the determinant corresponds to the area of the transformed figure. You might believe me more easily after the following example.

### Example 1.

To calculate the area of the shapes, we will use simple squares in 2 dimensions. The unit square area can be calculated with the Pythagorean theorem taking the two unit vectors.

<img src="https://github.com/rahiakela/mathematics-for-machine-learning/blob/main/deep-learning-book-maths/images/unit-square-area.png?raw=1" width="300" alt="Illustration of the unit square area and the unit vectors in two dimensions" title="The unit square area">
<em>The unit square area</em>

The lengths of $i$ and $j$ are $1$ thus the area of the unit square is $1$.

Let's start by creating both vectors in Python:

##Setup

In [None]:
import numpy as np

from PIL import Image

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Plot parameters
sns.set()
%pylab inline
pylab.rcParams['figure.figsize'] = (4, 4)
plt.rcParams['xtick.major.size'] = 0
plt.rcParams['ytick.major.size'] = 0
# Avoid inaccurate floating values (for inverse matrices in dot product for instance)
# See https://stackoverflow.com/questions/24537791/numpy-matrix-inversion-rounding-errors
np.set_printoptions(suppress=True)

Populating the interactive namespace from numpy and matplotlib


In [None]:
def plot_vectors(vecs, cols, alpha=1):
  """
  Plot set of vectors.

  Parameters
  ----------
  vecs : array-like
      Coordinates of the vectors to plot. Each vectors is in an array. For
      instance: [[1, 3], [2, 2]] can be used to plot 2 vectors.
  cols : array-like
      Colors of the vectors. For instance: ['red', 'blue'] will display the
      first vector in red and the second in blue.
  alpha : float
      Opacity of vectors

  Returns:

  fig : instance of matplotlib.figure.Figure
      The figure of the vectors
  """
  plt.axvline(x=0, color='#A9A9A9', zorder=0)
  plt.axhline(y=0, color='#A9A9A9', zorder=0)

  for i in range(len(vecs)):
      if (isinstance(alpha, list)):
          alpha_i = alpha[i]
      else:
          alpha_i = alpha
      x = np.concatenate([[0,0],vecs[i]])
      plt.quiver([x[0]],
                  [x[1]],
                  [x[2]],
                  [x[3]],
                  angles='xy', scale_units='xy', scale=1, color=cols[i],
                alpha=alpha_i)

## Intuition

The trace is the sum of all values in the diagonal of a square matrix.

$$
{A}=
\begin{bmatrix}
    2 & 9 & 8 \\\\
    4 & 7 & 1 \\\\
    8 & 2 & 5
\end{bmatrix}
$$

$$
\mathrm{Tr}({A}) = 2 + 7 + 5 = 14
$$

Numpy provides the function `trace()` to calculate it:

In [None]:
A = np.array([
  [2, 9, 8],
  [4, 7, 1],
  [8, 2, 5]
])
A

array([[2, 9, 8],
       [4, 7, 1],
       [8, 2, 5]])

In [None]:
A_trace = np.trace(A)
A_trace

14

GoodFellow et al. explain that the trace can be used to specify the Frobenius norm of a matrix (see [2.5](https://hadrienj.github.io/posts/Deep-Learning-Book-Series-2.5-Norms/)). The Frobenius norm is the equivalent of the $L^2$ norm for matrices. It is defined by:

$$
||{{A}}||_F=\sqrt{\sum_{i,j}A^2_{i,j}}
$$

Take the square of all elements and sum them. Take the square root of the result. This norm can also be calculated with:

$$
||{{A}}||_F=\sqrt{Tr({{AA}^T})}
$$

We can check this. The first way to compute the norm can be done with the simple command `np.linalg.norm()`:

In [None]:
np.linalg.norm(A)

17.549928774784245

The Frobenius norm of ${A}$ is 17.549928774784245.

With the trace the result is identical:

In [None]:
np.sqrt(np.trace(A.dot(A.T)))

17.549928774784245

Since the transposition of a matrix doesn't change the diagonal, the trace of the matrix is equal to the trace of its transpose:

$$
Tr({A})=Tr({A}^T)
$$

In [None]:
assert np.trace(A) == np.trace(A.T)

##Trace of a product

$$
Tr({ABC}) = Tr({CAB}) = Tr({BCA})
$$

###Example 1: Trace property

Let's see an example of this property.

$$
{A}=
\begin{bmatrix}
    4 & 12 \\\\
    7 & 6
\end{bmatrix}
$$

$$
{B}=
\begin{bmatrix}
    1 & -3 \\\\
    4 & 3
\end{bmatrix}
$$

$$
{C}=
\begin{bmatrix}
    6 & 6 \\\\
    2 & 5
\end{bmatrix}
$$

In [None]:
A = np.array([
  [4, 12],
  [7, 6]
])

B = np.array([
  [1, -3], 
  [4, 3]
])

C = np.array([
  [6, 6], 
  [2, 5]
])

In [None]:
np.trace(A.dot(B).dot(C))

531

In [None]:
np.trace(C.dot(A).dot(B))

531

In [None]:
np.trace(B.dot(C).dot(A))

531

$$
{ABC}=
\begin{bmatrix}
    360 & 432 \\\\
    180 & 171
\end{bmatrix}
$$

$$
{CAB}=
\begin{bmatrix}
    498 & 126 \\\\
    259 & 33
\end{bmatrix}
$$

$$
{BCA}=
\begin{bmatrix}
    -63 & -54 \\\\
    393 & 594
\end{bmatrix}
$$

$$
Tr({ABC}) = Tr({CAB}) = Tr({BCA}) =  531
$$

In [None]:
assert np.trace(A.dot(B).dot(C)) == np.trace(C.dot(A).dot(B)) == np.trace(B.dot(C).dot(A)) == 531

## References

[Trace (linear algebra) - Wikipedia](https://en.wikipedia.org/wiki/Trace_(linear_algebra))

[Numpy Trace operator](https://docs.scipy.org/doc/numpy/reference/generated/numpy.trace.html)