In [1]:
import sympy
from sympy import Matrix, Rational, sqrt, symbols, zeros
import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt

# Linear algebra

## Session 08: 

## Gerhard Jäger

### June 22, 2022

## Dimensions of the row space of $R$

Now let us focus on the row space of $R$, which is the column space of 

$$
R^T = \left[\begin{array}{r}
1 & 0 & 0\\
2 & 0 & 0\\
0 & 1 & 0\\
-2 & -1 & 0\\
0 & 3 & 0
\end{array}\right]
$$

By the construction of reduced row echelon forms, each row of $R$, i.e. each column of $R^T$, either contains a pivot or it is all-zero. By the same argument, each pivot row of $R^T$ contains a $1$ in one column and $0$ in all other columns, and all pivot columns are different.

Two oberservations follow from this:

- The pivot columns are linearly independent.
- The pivot columns form a basis of $C(R^T)$.

Since the number of pivot columns equals $r$, the rank of $R$, **the row space of $R$ has $r$ dimensions**.


When we bring $R^T$ in reduced row echelon form, pivot columns remain pivot columns:


$$
\begin{aligned}
R^T &= \left[\begin{array}{r}
1 & 0 & 0\\
2 & 0 & 0\\
0 & 1 & 0\\
-2 & -1 & 0\\
0 & 3 & 0
\end{array}\right]\\
\mathrm{rref}(R^T) &= \left[\begin{array}{r}
1 & 0 & 0\\
0 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 0\\
0 & 0 & 0
\end{array}\right]
\end{aligned}
$$

Hence the number of dimensions of $N(R^T)$, the left nullspace of $R$, is $m-r$.


### Generalizing to all matrices

We found that for an $m\times n$ matrix $R$ in reduced row echelon form with rank $r$:

- the row space of $R$ and the column space of $R$ both have $r$ dimensions
- the null space of $R$ has $n-r$ dimensions
- the left null space of $R$ has $m-r$ dimensions

Next it will be shown that this holds not just for matrices in reduced row echelon form, but for all matrices $m\times n$ matrices.

Let 

$$
A = \left[\begin{array}{r}
-1 & -2 & -2 & 4 & -6\\
-2 & -4 & - \frac{7}{2} & \frac{15}{2} & - \frac{21}{2}\\
2 & 4 & 3 & -7 & 9
\end{array}\right]
$$

When performing Gauss-Jordan elimination, we multiply $A$ repeatedly with matrices from the left.





These matrices have one of two forms:

- elimination matrix, i.e. a matrix with 
    - $1$s at the diagonal,
    - one non-zero entry off the diagonal, and
    - $0$s everywhere else
- an invertible diagonal matrix, i.e., a matrix with 
    - non-zero entries along the diagonal and 
    - $0$ everywhere else

In [2]:
A = Matrix([
    [-1, -2, -2, 4, -6],
    [-2, -4, -Rational(7,2), Rational(15,2), -Rational(21,2)],
    [2,4,3,-7,9]
])
A

Matrix([
[-1, -2,   -2,    4,    -6],
[-2, -4, -7/2, 15/2, -21/2],
[ 2,  4,    3,   -7,     9]])

In [3]:
E1 = Matrix([
    [1, 0, 0],
    [-2, 1, 0],
    [0, 0, 1]
])
E1

Matrix([
[ 1, 0, 0],
[-2, 1, 0],
[ 0, 0, 1]])

In [4]:
B = E1 * A
B

Matrix([
[-1, -2,  -2,    4,  -6],
[ 0,  0, 1/2, -1/2, 3/2],
[ 2,  4,   3,   -7,   9]])

In [5]:
E2 = Matrix([
    [1, 0, 0],
    [0, 1, 0],
    [2, 0, 1]
])
E2

Matrix([
[1, 0, 0],
[0, 1, 0],
[2, 0, 1]])

In [6]:
B = E2 * B
B

Matrix([
[-1, -2,  -2,    4,  -6],
[ 0,  0, 1/2, -1/2, 3/2],
[ 0,  0,  -1,    1,  -3]])

In [7]:
E3 = Matrix([
    [1, 0, 0],
    [0, 1, 0],
    [0, 2, 1]
])
E3

Matrix([
[1, 0, 0],
[0, 1, 0],
[0, 2, 1]])

In [8]:
B = E3 * B
B

Matrix([
[-1, -2,  -2,    4,  -6],
[ 0,  0, 1/2, -1/2, 3/2],
[ 0,  0,   0,    0,   0]])

In [9]:
E4 = Matrix([
    [1, 4, 0],
    [0, 1, 0],
    [0, 0, 1]
])
E4

Matrix([
[1, 4, 0],
[0, 1, 0],
[0, 0, 1]])

In [10]:
B = E4 * B
B

Matrix([
[-1, -2,   0,    2,   0],
[ 0,  0, 1/2, -1/2, 3/2],
[ 0,  0,   0,    0,   0]])

In [11]:
D = Matrix([
    [-1, 0, 0],
    [0, 2, 0],
    [0, 0, 1]
])
D

Matrix([
[-1, 0, 0],
[ 0, 2, 0],
[ 0, 0, 1]])

In [12]:
B = D * B
B

Matrix([
[1, 2, 0, -2, 0],
[0, 0, 1, -1, 3],
[0, 0, 0,  0, 0]])

In [13]:
A.rref()[0]

Matrix([
[1, 2, 0, -2, 0],
[0, 0, 1, -1, 3],
[0, 0, 0,  0, 0]])

**Observations**:

- Elimination matrices and invertible diagonal matrices are invertible.
- Let $U$ be an invertible matrix. If a subset of the columns of $B$ form a basis of $C(B)$, than the corresponding columns of $UB$ form a basis for of $C(UB)$


The first statement should be obvious. To invert an elimination matrix, you only have to replace its non-zero off-diagonal entry with its negation.


In [14]:
E1

Matrix([
[ 1, 0, 0],
[-2, 1, 0],
[ 0, 0, 1]])

In [15]:
E1.inv()

Matrix([
[1, 0, 0],
[2, 1, 0],
[0, 0, 1]])

To invert an invertible diagonal matrix, you replace each diagonal entry with its inverse.

In [16]:
D

Matrix([
[-1, 0, 0],
[ 0, 2, 0],
[ 0, 0, 1]])

In [17]:
D.inv()

Matrix([
[-1,   0, 0],
[ 0, 1/2, 0],
[ 0,   0, 1]])

Now consider the second statement. Recall the running example

$$
A = \left[\begin{array}{r}
-1 & -2 & -2 & 4 & -6\\
-2 & -4 & - \frac{7}{2} & \frac{15}{2} & - \frac{21}{2}\\
2 & 4 & 3 & -7 & 9
\end{array}\right]
$$


Here $\{\mathbf a_1, \mathbf a_3\}$ form a basis for the column space.


In [18]:
D = A[:, [0,2]]
D

Matrix([
[-1,   -2],
[-2, -7/2],
[ 2,    3]])

In [19]:
D.solve(A)

Matrix([
[1, 2, 0, -2, 0],
[0, 0, 1, -1, 3]])

In [20]:
D.solve(zeros(2))

NonInvertibleMatrixError: Matrix det == 0; not invertible.

We get from $A$ to $\begin{bmatrix}\mathbf a_1, \mathbf a_3\end{bmatrix}$ by multiplying $A$ with

$$
\begin{aligned}
W &= \begin{bmatrix}
1 & 0\\
0 & 0\\
0 & 1\\
0 & 0\\
0 & 0
\end{bmatrix}\\
AW &= D
\end{aligned}
$$

In general, $W$ is an $n\times k$ matrix, $k\leq n$, with exactly one 1 per column. Let us call such matrices *subset matrices*.

No let us assume:

- $U$ is invertable
- the columns of $BW$ form a basis for $C(B)$

We need to show that $UBW$ form a basis for $C(UB)$.

First we need to show that the columns of $UBW$ are linearly independent, i.e., $\mathbf 0$ is the only solution for

$$
UBW\mathbf x = \mathbf 0
$$

Suppose it is otherwise, i.e, $\mathbf x \neq \mathbf 0$.

Since $U$ is invertable:

$$
\begin{aligned}
U^{-1}UBW\mathbf x &= \mathbf 0\\
BW\mathbf x &= \mathbf 0
\end{aligned}
$$

This contradicts the assumption that $BW$ is a basis for $C(B)$.




By assumption, $BW$ is a basis for $C(B)$. This means that each column vector of $B$ is a linear combination of the column vectors of $BW$. This amounts to saying that there is a matrix $X$ such that

$$
BWX = B
$$

We need to show that $UBW$ is a basis for $UB$, i.e., that there is a matrix $Y$ with

$$
UBWY = UB
$$
For $Y=X$, this follows directly from the assumptions.

Taking everything together, it follows that **the columns of $A$ with the same indices as the pivot columns of rref($A$)** form a basis of $C(A)$.

Therefore $C(A)$ has the same number of dimensions as $C(\mathrm{rref}(A))$, which equals the rank of $A$.

Furthermore, if $U$ is invertible, $U^T$ is also invertible. Therefore, applying Gauss-Jordan elimination does not change the dimensionality of the row space either. Since matrices in reduced row echelon form have the same number of dimensions for the row space and the column space, this also applies to all matrices.





Finally $U$ is invertible. If 

$$
A\mathbf x = \mathbf 0,
$$

then 
$$
\begin{aligned}
UA\mathbf x &= U\mathbf 0 = \mathbf 0
\end{aligned}
$$

If 
$$
UA\mathbf x = \mathbf 0,
$$

then

$$
\begin{aligned}
U^{-1}UA\mathbf x &=  U^{-1}\mathbf 0\\
A\mathbf x &= \mathbf 0
\end{aligned}
$$

This entails that $A$ and $\mathrm{rref}(A)$ have the same nullspace, end therefore the same number of dimensions of their nullspaces.


### Summary

Let $A$ be an $m\times n$ matrix.

- The column space $C(A)$ and the row space $C(A^T)$ both have dimension $r$ (the rank of $A$).
- The nullspace $N(A)$ has dimension $n-r$.
- The left nullspace $N(A^T)$ has dimension $m-r$.

## Orthogonality

Recall: vectors $\mathbf v$ and $\mathbf w$ are **orthogonal** if and only if

$$
\mathbf v^T \mathbf w = \mathbf 0
$$

#### Examples

- $\begin{bmatrix}1\\1 \\0\end{bmatrix}$, $\begin{bmatrix}0\\0 \\1\end{bmatrix}$


- $\begin{bmatrix}1\\1 \end{bmatrix}$, $\begin{bmatrix}2\\-2\end{bmatrix}$


- $\begin{bmatrix}1\\1 \\ 2\end{bmatrix}$, $\begin{bmatrix}-2\\-2\\2\end{bmatrix}$

- ..

### orthogonal spaces

Two vector spaces $\mathbf V$ and $\mathbf W$ are orthogonal if and only if

$$
\forall \mathbf v\in \mathbf V, \mathbf w \in \mathbf W. \mathbf v^T\mathbf w = \mathbf 0
$$

**Examples**

$$
\begin{aligned}
\mathbf V &= \{\begin{bmatrix}x\\0\end{bmatrix}: x \in \mathbb R\}\\[1em]
\mathbf W &= \{\begin{bmatrix}0\\y\end{bmatrix}: y \in \mathbb R\}\\
\end{aligned}
$$

These are of course the $x$-axis and $y$-axis of a 2d-space.

$$
\begin{aligned}
\mathbf V &= \{\begin{bmatrix}x\\y\\0\end{bmatrix}: x,y \in \mathbb R\}\\[1em]
\mathbf W &= \{\begin{bmatrix}0\\0\\z\end{bmatrix}: z \in \mathbb R\}\\
\end{aligned}
$$

These are the $x-y$-plane and the $z$-axis of a 3d-space.

$$
\begin{aligned}
\mathbf V &= \mathrm{span}(
\begin{bmatrix}
1\\
-1\\
0
\end{bmatrix},
\begin{bmatrix}
1\\
1\\
1
\end{bmatrix}
)\\[1em]
\mathbf W &= \mathrm{span}(\begin{bmatrix}-1\\-1\\2\end{bmatrix})\\
\end{aligned}
$$

How do we know whether $\mathbf V$ and $\mathbf W$ are orthogonal?

**Observation** Let $V$ and $W$ be two sets of vectors $\subseteq \mathbb R^n$. 

$\mathrm{span}(V)$ is orthogonal to $\mathrm{span}(W)$ if and only if for all $\mathbf v\in V, \mathbf w \in W$: $\mathbf v$ and $\mathbf w$ are orthogonal.

*Proof*

Suppose $\mathrm{span}(V)$ is orthogonal to $\mathrm{span}(W)$. If $\mathbf v\in V$, then $\mathbf v\in\mathrm{span}(V)$, and likewise for $\mathbf w$. Hence $\mathbf v$ and $\mathbf w$ are orthogonal.

Now suppose for all $\mathbf v\in V, \mathbf w \in W$: $\mathbf v$ and $\mathbf w$ are orthogonal. Let $\mathbf x\in\mathrm{span}(V)$ and $\mathbf y\in\mathrm{span}(W)$.

If $\mathbf x\in\mathrm{span}(V)$ and $\mathbf y\in\mathrm{span}(W)$, $\mathbf x = \sum_i r_i\mathbf v_i$, $\mathbf y = \sum_j s_j\mathbf w_j$ for $r_1,\ldots,r_{|V|}, s_1,\ldots,s_{|W|}\in \mathbb R$.

$$
\begin{aligned}
\mathbf x^T\mathbf y &= (\sum_i r_i\mathbf v_i)^T(\sum_j s_i\mathbf w_j)\\
        &= \sum_i (r_i\mathbf v_i)^T(\sum_j s_j\mathbf w_j)\\
        &= \sum_i \sum_j(r_i\mathbf v_i)^T( s_j\mathbf w_j)\\
        &= \sum_i \sum_jr_i\mathbf v_i^T( s_j\mathbf w_j)\\
        &= \sum_i \sum_jr_is_j\mathbf v_i^T\mathbf w_j\\
        &= \sum_i \sum_jr_is_j\mathbf 0\\
        &= \mathbf 0\\
\end{aligned}
$$

$\dashv$

**Observation**

Let $A$ be an $m\times n$ matrix. Then

- the column space $C(A)$ is orthogonal to the left null space $C(A^T)$, and
- the row space $C(A^T)$ is orthogonal to the null space $N(A)$.

*Proof*

The column space of $A$ is $\mathrm{span}(\{\mathbf a_i|1\leq i \leq n\})$. If $\mathbf x$ is in the left null space of $A$, this means that

$$
A^T \mathbf x = \mathbf 0
$$

It follows that 

$$
\forall i:\mathbf a_i^T\mathbf x = \mathbf 0
$$

Due to the previous observation, it follows that $C(A)$ is orthogonal to $N(A^T)$. 

The proof of the second statement is analogous.

$\dashv$

## Orthogonal projections

Suppose we have two vectors $\mathbf a$ and $\mathbf b$. We want to find the *orthogonal projection from $\mathbf a$ onto the line through $\mathbf b$*. This is a vector $\mathbf p$ with the properties:

- $\mathbf p = x\mathbf b$ ($\mathbf p$ lies on the line defined by $\mathbf a$)
- $\mathbf a - \mathbf p$ is orthogonal to $\mathbf b$

Here is how we find $\mathbf p$:

$$
\begin{aligned}
(\mathbf a - x\mathbf b)^T\mathbf b &= 0\\
(\mathbf a^T - x\mathbf b^T)\mathbf b &= 0\\
\mathbf a^T\mathbf b - x\mathbf b^T\mathbf b &= 0\\
\mathbf a^T\mathbf b &= x\mathbf b^T\mathbf b\\
x &= \frac{\mathbf a^T\mathbf b}{\mathbf b^T\mathbf b}\\
\mathbf p &= \frac{\mathbf a^T\mathbf b}{\mathbf b^T\mathbf b}\mathbf b\\
\end{aligned}
$$

- $\mathbf p$ is called the *projection of $\mathbf a$ onto the line throuhg $\mathbf b$*.
- $\mathbf e = \mathbf a - \mathbf p$ is called the *error*.
- $\mathbf p$ is the point on the line through $\mathbf b$ which is closest to $\mathbf a$, i.e., the point which minimizes the error.

## Orthogonal projections

Now suppose we have a matrix $A$ and a vector $\mathbf b$, and we want to find the *orthogonal projection of  $\mathbf b$ onto the* ***column space*** of $A$.

In other words, we want to find the point $\mathbf p$ which

- is in the column space of $A$, and
- minimizes the error $\mathbf b-\mathbf p$.



- assumptions:

$$
\begin{aligned}
Ax &= \mathbf p\\
\mathbf p + \mathbf e &= \mathbf b\\
A^T\mathbf e &= \mathbf 0
\end{aligned}
$$

- finding the solution

Let us assume that the columns of $A$ are independent. (If this is not the case, we can replace $A$ by some basis of $C(A)$.

**Observation** $(A^TA)$ is invertible if and only if the columns of $A$ are independent.

*Proof*


Suppose $(A^TA)$ is invertible, and let $A\mathbf x = \mathbf 0$. Then it follows

$$
\begin{aligned}
A^TA\mathbf x &= A^T\mathbf 0\\
A^TA\mathbf x &= \mathbf 0\\
\mathbf x &= (A^TA)^{-1}\mathbf 0\\
&= \mathbf 0
\end{aligned}
$$
This entails that the columns of $A$ are independent.

Now suppose the columns of $A$ are independent. The Gauss-Jordan elimination factorizes

$$
A^T = E R,
$$
where $E$ is the combined elimination matrix and $R$ is the reduced row echelon form of $A^T$.

As shown last week, $E$ is invertible.

If the columns of $A$ are independent, $R$ contains $n$ pivot columns, and no free column. It follows that

$$
R^T R = \mathbf I,
$$

since the dot product of a pivot column with itself must be $1$, and the dot product of two different pivot columns must be $0$.



Then we have:

$$
\begin{aligned}
A^TA &= ERR^T E^T\\
&= E~\mathbf I~ E^T\\
&= E E^T\\
\end{aligned}
$$

By construction, $E$ is invertible. Therefore

$$
(A^TA)^{-1} = (E^{-1})^T E^{-1}
$$

$\dashv$

- deriving the solution:

$$
\begin{aligned}
A\mathbf x &= \mathbf p\\
\mathbf p + \mathbf e &= \mathbf b\\
A^T\mathbf e &= \mathbf 0\\
A^T\mathbf b &= A^T\mathbf p + A^T\mathbf e\\
A^T\mathbf b &= A^T\mathbf p\\
&= A^T A\mathbf x\\
\mathbf x &= (A^T A)^{-1}A^T\mathbf b\\
\mathbf p &= A(A^T A)^{-1}A^T\mathbf b\\
\end{aligned}
$$
