# 2.7. Inverses and Determinants

More on matrix-vector multiplication as being an operator that sends $\vec x$ to $A \vec x$, and what various $A$ do to $\vec x$.

Other topics:
- Invertible/singular
- Determinant


Orthogonal matrix: $Q^T Q = I$, so $Q^T = Q^{-1}$, and by definition, $Q$ is invertible.

### List of Equivalent Properties for Invertible Matrices

## Special Matrices

To wrap up this section, we'll outline a few types of matrices one should know about. Each of these types of matrices have unique properties that make them useful to us. It may take until Chapter 5 (on Eigenvalues and Eigenvectors) to see the full spectrum of their uses, but it's good to know about them now.



### Symmetric Matrices

:::{admonition} Definition: Symmetric Matrix

A matrix $S$ is **symmetric** if $S = S^T$.

Note that only square matrices can be symmetric; the transpose of a $n \times d$ matrix is a $d \times n$ matrix, and in order for a matrix to be equal to its transpose, we must have $n = d$.
:::

In the style of MIT's Gilbert Strang, I've used $S$ throughout this section to refer to symmetric matrices.

Just so we see one, here's an example of a symmetric matrix:

$$S = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \end{bmatrix}$$

Symmetric matrices will appear in increasing regularity from here on out, as we start to move back to the study of linear regression. One key symmetric matrix that appears quite frequently is the matrix $X^T X$, where $X$ is **any** $n \times d$ matrix. That's right: even if $X$ isn't square, $X^T X$ is symmetric! We can verify that this is true by using the properties of the transpose:

$$(X^T X)^T = X^T (X^T)^T = X^T X$$

Since $X^T$ is $d \times n$ and $X$ is $n \times d$, $X^T X$ is $d \times d$. This also tells us that $XX^T$ is a $n \times n$ matrix, and it is also symmetric, as can be verified using the logic above.

What is $X^T X$? Let's denote the $j$th column of $X$ as $X_{:, j}$. (This is a bit of notational abuse, but I want to make sure to not use $\vec x_i$ to denote the $i$th column of $X$, as we've gotten used to referring to the $i$th observation of a dataset as $\vec x_i$.)

Then, we can write $X^T X$ as:

$$\begin{align*} X^TX &= \begin{bmatrix} - X_{:, 1}^T - \\ - X_{:, 2}^T - \\ \vdots \\ - X_{:, d}^T - \end{bmatrix} \begin{bmatrix} \mid & \mid & \mid \\  X_{:, 1} &  X_{:, 2} & \ldots &  X_{:, d} \\ \mid & \mid & \mid \end{bmatrix} \\ &= \begin{bmatrix}  X_{:, 1} \cdot  X_{:, 1} &  X_{:, 1} \cdot  X_{:, 2} & \ldots &  X_{:, 1} \cdot  X_{:, d} \\  X_{:, 2} \cdot  X_{:, 1} &  X_{:, 2} \cdot  X_{:, 2} & \ldots &  X_{:, 2} \cdot  X_{:, d} \\ & \vdots & \\  X_{:, d} \cdot  X_{:, 1} &  X_{:, d} \cdot  X_{:, 2} & \ldots &  X_{:, d} \cdot  X_{:, d} \end{bmatrix}\end{align*}$$

This is a $d \times d$ matrix, where the entry at position $(i, j)$ (and $(j, i)$) is the dot product of the $i$th column of $X$ and the $j$th column of $X$. This matrix is sometimes called the **Gram matrix** of $X$.

Another use case for symmetric matrices is to define quadratic forms. A **quadratic form** is a polynomial in which all terms are of degree two.

- With one variable: $q(x) = ax^2$.
- With two variables: $q(x, y) = ax^2 + bxy + cy^2$.
- With three variables: $q(x, y, z) = ax^2 + bxy + cy^2 + dyz + ez^2 + fxz$.

I've used $x$, $y$, and $z$ to denote the variables above just to make the pattern clear, but in general, I'd use $x_1$, $x_2$, and so on.

It turns out that mean squared error, $R_\text{sq}$, can be expressed as a quadratic form – so we better study how they work. In terms of matrices and vectors, a quadratic form is a function of the form $$q(\vec x) = \vec x^T S \vec x$$ where $S$ is a symmetric matrix. For example, consider the symmetric matrix:

$$S = \begin{bmatrix} 2 & 3 \\ 3 & 7 \end{bmatrix}$$

If $\vec x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$, then:

$$\begin{align*} q(\vec x) &= \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2 & 3 \\ 3 & 7 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ &= \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2x_1 + 3x_2 \\ 3x_1 + 7x_2 \end{bmatrix} \\ &= 2x_1^2 + 3x_1x_2 + 3x_2x_1 + 7x_2^2 \\ &= 2x_1^2 + 6x_1x_2 + 7x_2^2 \end{align*}$$

Notice how we can read the coefficients of the quadratic form from the symmetric matrix $S$.

:::{tip} Activity
:class: dropdown

Consider the $5 \times 2$ matrix $X$, defined below.

$$X = \begin{bmatrix} 1 & -2 \\ -1 & 3 \\ 2 & 0 \\ 0 & -1 \\ 3 & 2 \end{bmatrix}$$

Compute $X^TX + \frac{1}{2} I$. We'll use matrices of the form $X^TX + \lambda I$ in Chapter 5.

Something about the Rayleigh quotient.
:::

### Diagonal Matrices

:::{admonition} Definition: Diagonal Matrix

A square matrix $D$ is **diagonal** if all entries off the diagonal are zero. In other words, $D_{ij} = 0$ for all $i \neq j$.

$$D = \begin{bmatrix} d_1 & 0 & \ldots & 0 \\ 0 & d_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & d_n \end{bmatrix}$$

:::

The identity matrix is a diagonal matrix. As another example, here's another $4 \times 4$ diagonal matrix:

$$D = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & -9 & 0 \\ 0 & 0 & 0 & \frac{1}{2} \end{bmatrix}$$

Diagonal matrices become useful when studying powers of matrices, like $A^2$ or $A^3$, which we'll need to do in Chapter 5 when we learn about how Google's search algorithm works. If $A$ is an arbitrary $n \times n$ matrix, then computing $A^2 = AA$ requires us to compute a full matrix multiplication. But, if $D$ is diagonal, then $D^2$ is just a diagonal matrix with each entry squared:

$$D^2 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 81 & 0 \\ 0 & 0 & 0 & \frac{1}{4} \end{bmatrix}$$

Unless otherwise specified, diagonal matrices are assumed to be square. But, we'll occasionally see non-square diagonal matrices, like the $3 \times 4$ matrix:

$$D = \begin{bmatrix} 5 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & -9 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$$

This is a diagonal $5 \times 3$ matrix. All entries $D_{ij}$ where $i \neq j$ are zero. These will come up in Chapter 5 when we study the singular value decomposition, which will unlock a powerful unsupervised machine learning technique. (Aren't you excited for it?)

Related to the idea of a diagonal matrix is that of an upper triangular matrix. A square matrix $U$ is **upper triangular** if all entries below the diagonal are zero. In other words, $U_{ij} = 0$ for all $i > j$.

$$U = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix}$$

Similarly, a square matrix $L$ is **lower triangular** if all entries above the diagonal are zero, i.e. $L_{ij} = 0$ for all $i < j$.

$$L = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix}$$

In more traditional linear algebra courses, quite a bit of time is spent on upper-and-lower triangular matrices, and how they can be used to solve systems of linear equations. One even often studies the **LU decomposition**, which is a factorization of a matrix into an upper triangular matrix and a lower triangular matrix, i.e. $A = LU$. We will spend quite some time on matrix factorizations, but I've chosen to omit the $LU$ decomposition in favor of other factorizations that are more useful for machine learning.

### Orthogonal Matrices

Consider the vectors $\vec q_1$, $\vec q_2$ and $\vec q_3$ defined below.

$$
\vec q_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \quad \vec q_2 = \begin{bmatrix} 0 \\ \frac{6}{\sqrt{45}} \\ \frac{-3}{\sqrt{45}} \end{bmatrix}, \quad \vec q_3 = \begin{bmatrix} 0 \\ \frac{2}{\sqrt{20}} \\ \frac{4}{\sqrt{20}} \end{bmatrix}
$$

TODO draw a picture of the vectors in 3D space.

You should notice two things:

1. All three vectors are unit vectors, i.e. $\lVert \vec q_1 \rVert = \lVert \vec q_2 \rVert = \lVert \vec q_3 \rVert = 1$. (This is the role of the $\sqrt{45}$ and $\sqrt{20}$ in the denominators.)
2. All three vectors are orthogonal to each other, as each pair of vectors has a dot product of 0. Verify this yourself before moving on.

In other words, we have:

$$\vec q_i \cdot \vec q_j = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases}$$

The vectors $\vec q_1$, $\vec q_2$, and $\vec q_3$ are together called **orthonormal**, which is just another way of saying that they are orthogonal unit vectors.

Suppose we define $Q$ to be the matrix whose columns are $\vec q_1$, $\vec q_2$, and $\vec q_3$.

$$Q = \begin{bmatrix} \mid & \mid & \mid \\ \vec q_1 & \vec q_2 & \vec q_3 \\ \mid & \mid & \mid \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & \frac{6}{\sqrt{45}} & \frac{2}{\sqrt{20}} \\ 0 & \frac{-3}{\sqrt{45}} & \frac{4}{\sqrt{20}} \end{bmatrix}$$

$Q$ has some magnificent properties. Watch what happens when we compute $Q^T Q$, which is a symmetric matrix containing the dot products of the columns of $Q$:

$$Q^T Q = \begin{bmatrix} 1 & 0 & 0 \\ 0 & \frac{6}{\sqrt{45}} & \frac{-3}{\sqrt{45}} \\ 0 & \frac{2}{\sqrt{20}} & \frac{4}{\sqrt{20}} \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & \frac{6}{\sqrt{45}} & \frac{2}{\sqrt{20}} \\ 0 & \frac{-3}{\sqrt{45}} & \frac{4}{\sqrt{20}} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} = I$$

This is the identity matrix!

:::{admonition} Definition: Orthogonal Matrix

A square matrix $Q$ is **orthogonal** if $Q^T Q = QQ^T = I$.

Such a matrix $Q$ has orthonormal columns (and also, orthonormal rows!).
:::

$Q$ is the letter typically used to denote orthogonal matrices.

You should notice above that I added in an additional equality, $QQ^T = I$. This implies that if $Q$'s columns are orthonormal, then its rows are also orthonormal. To fully justify this, we'll need to wait until Chapter 2.7, where we'll see that $Q$ is **invertible**. The short answer is that the inverse of a **square** matrix $A$ is the matrix $A^{-1}$ such that $AA^{-1} = A^{-1}A = I$, and since $Q^TQ = I$, we have $Q^T = Q^{-1}$. Part of the challenge is that not all square matrices are invertible – but fortunately, $Q$ is. (Notice that it's columns are linearly independent!)

Orthogonal matrices are useful in a variety of ways. One of the most important is that they preserve lengths of vectors. If $Q$ is orthogonal and $\vec v$ is a vector, then:

$$\lVert Q \vec v \rVert = \lVert \vec v \rVert$$

This is because:

$$\begin{align*} \lVert Q \vec v \rVert^2 &= (Q \vec v)^T (Q \vec v) \\ &= \vec v^T Q^T Q \vec v \\ &= \vec v^T I \vec v \\ &= \vec v^T \vec v \\ &= \lVert \vec v \rVert^2 \end{align*}$$

We'll continue to see orthogonal matrices through Chapter 5, when we discover the singular value decomposition.

:::{tip} Activity
:class: dropdown
I defined an orthogonal matrix as a **square** matrix whose columns are ortho**normal**, not just orthogonal. Why did we need to require that $Q$ be square, or that the columns of $Q$ are unit vectors? Let's investigate.
:::