In [1]:
import sympy
from sympy import Matrix, Rational, sqrt, symbols, zeros, simplify, exp
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
import matplotlib.pyplot as plt
%matplotlib notebook


# Mathematics for Machine Learning
    
## Session 11: Defective matrices
    
## Gerhard Jäger
    
### November 26, 2024

## Defective matrices

**Defective Matrices** are square matrices that lack a complete set of linearly independent eigenvectors.

- **Characteristics**:
  - Cannot be diagonalized.
  - The geometric multiplicity of at least one eigenvalue is less than its algebraic multiplicity.
- **Implications**:
  - Standard techniques like diagonalization are not directly applicable.
  - Requires alternative methods for analysis, like Jordan canonical form.

### Algebraic vs Geometric Multiplicity

The concept of a defective matrix is rooted in the difference between algebraic and geometric multiplicities of eigenvalues.

- **Algebraic Multiplicity**: The number of times an eigenvalue appears as a root of the characteristic polynomial.
- **Geometric Multiplicity**: The number of linearly independent eigenvectors associated with an eigenvalue (= the dimensionality of the corresponding eigenspace).

**Defective Matrix Criterion**: A matrix is defective if, for any eigenvalue, its geometric multiplicity is less than its algebraic multiplicity.


### Diagonalization and Defective Matrices

Diagonalization is a key concept in understanding defective matrices.

- **Diagonalizable Matrix**: Can be expressed as $$P^{-1}AP = \Lambda,$$ where $ A $ is the matrix, $ P $ contains the eigenvectors, and $ \Lambda $ is a diagonal matrix of eigenvalues.
- **Defective Matrix**: Lacks sufficient number of linearly independent eigenvectors to form matrix $ P $, making diagonalization impossible.

**Result**: Alternative methods, like *Jordan canonical form*, are needed to analyze defective matrices.


### Example of a Defective Matrix

Consider the matrix $$ A = \begin{pmatrix} 5 & 4 \\ -1 & 1 \end{pmatrix} $$.

- **Characteristic Polynomial**: 

$$ (5 - \lambda)(1 - \lambda) + 4 = \lambda^2 - 6\lambda + 9 = (\lambda - 3)^2$$ 


leads to a single eigenvalue $\lambda = 3$ with algebraic multiplicity 2.
- **Eigenvectors**: Solving $$(A - 3I)\mathbf v = 0$$ yields only one linearly independent eigenvector, $\begin{pmatrix}-2\\1\end{pmatrix}$.



### Working with Defective Matrices

Handling defective matrices requires alternative approaches:

- **Jordan Canonical Form**: A way to bring matrices to a nearly diagonal form, useful for defective matrices.
- **Applications**:
  - In fields where diagonalization is crucial but not directly applicable.
  - In studying systems with complex or repeated eigenvalues.




### Understanding Jordan Blocks

A **Jordan Block** is a key component of the Jordan Canonical Form.

- **Definition**: A square matrix with an eigenvalue repeated along the main diagonal, ones on the superdiagonal, and zeros elsewhere.
- **Example**:
  - A 3x3 Jordan block for eigenvalue $ \lambda $ looks like:
    $$
    \begin{pmatrix}
    \lambda & 1 & 0 \\
    0 & \lambda & 1 \\
    0 & 0 & \lambda
    \end{pmatrix}
    $$
    
- **Another example**:
    - a 5x5 matrix with a 2x2 and a 3x3 Jordan block.
    
    $$
    \begin{pmatrix}
    4 & 1 & 0 & 0 & 0\\
    0 & 4 & 0 & 0 & 0\\
    0 & 0 & -2 & 1 & 0\\
    0 & 0 & 0 & -2 & 1\\
    0 & 0 & 0 & 0 & -2\\
    \end{pmatrix}
    $$

**Significance**: Each Jordan block corresponds to an eigenvalue and its associated generalized eigenvectors.


### Generalized Eigenvectors

**Generalized Eigenvectors** are essential in constructing the Jordan Canonical Form.

- **Definition**: Vectors $\mathbf v$ that are not eigenvectors but satisfy $$(A - \lambda I)^k \mathbf v = 0$$ for some integer $$k > 1$$.
- **Role**: Used when a matrix does not have enough linearly independent eigenvectors (defective matrix).
- **Significance**: Help in forming the Jordan blocks where each block corresponds to an eigenvalue and its generalized eigenvectors.


### Example

$$
\begin{aligned}
A &= \begin{pmatrix}
5 & 4\\
-1 &1
\end{pmatrix}\\
\lambda &= 3\\
A-\lambda\mathbf I &= \begin{pmatrix}
2 & 4\\
-1 &-2
\end{pmatrix}\\
(A-\lambda\mathbf I)^2 &= \begin{pmatrix}
0 & 0\\
0 &0
\end{pmatrix}\\
\end{aligned}
$$

- eigenvector: $\begin{pmatrix}2\\-1\end{pmatrix}$
- generalized eigenvector: $\begin{pmatrix}1\\0\end{pmatrix}$
- Jordan Canonical Form:

$$
\begin{aligned}
J &= 
\begin{pmatrix}
3 &1\\
0 & 3
\end{pmatrix}
\end{aligned}
$$

- Factorization of $A$

$$
\begin{aligned}
A &= 
\begin{pmatrix}
2 &1\\
-1 & 0
\end{pmatrix}
\begin{pmatrix}
3 &1\\
0 & 3
\end{pmatrix}
\begin{pmatrix}
2 &1\\
-1 & 0
\end{pmatrix}^{-1}
\end{aligned}
$$


In [2]:
P = Matrix([
    [2, 1],
    [-1, 0]
])
P

Matrix([
[ 2, 1],
[-1, 0]])

In [3]:
J = Matrix([
    [3, 1],
    [0, 3]
])
J

Matrix([
[3, 1],
[0, 3]])

In [4]:
P * J * P.inv()

Matrix([
[ 5, 4],
[-1, 1]])

**However**

- $\begin{pmatrix}0\\1\end{pmatrix}$ is also an eigenvector of $(A-\lambda\mathbf I)^2 = \begin{pmatrix}
0 & 0\\
0 &0
\end{pmatrix}$

But


$$
\begin{aligned}
A &\neq 
\begin{pmatrix}
2 &0\\
-1 & 1
\end{pmatrix}
\begin{pmatrix}
3 &1\\
0 & 3
\end{pmatrix}
\begin{pmatrix}
2 &0\\
-1 & 1
\end{pmatrix}^{-1} = \begin{pmatrix}4 & 2 \\ -\frac{1}{2} & 2\end{pmatrix}
\end{aligned}
$$


### Algorithm to factorize a defective matrix

1. Determine the eigenvalues and their algebraic and geometric multiplicities
2. For each eigenvaluer $\lambda$:
    - For $k = 1, 2, \ldots$
        - determine the rank of $(A-\lambda \mathbf I)^k$
        - stop when the rank is identical for $k$ and $k+1$. This value of $k$ is $k_{\text{max}}$.
     - find a vector $\mathbf v$ such that
         $$
         \begin{aligned}
         (A-\lambda \mathbf I)^{k_\text{max}} \mathbf v &= \mathbf 0\\
         (A-\lambda \mathbf I)^{k_\text{max}-1} \mathbf v &\neq \mathbf 0\\
         \end{aligned}
         $$
     - create a sequence
     $$
        \begin{aligned}
        \mathbf v_1 &= \mathbf v\\
        \mathbf v_2 &= (A-\lambda \mathbf I)^{k_\text{max} - 1}\mathbf v_1\\
        &\vdots \\
        \mathbf v_{k_\text{max}} &= (A-\lambda \mathbf I)\mathbf v_{k_\text{max}-1}
        \end{aligned} 
     $$
    - combine these vectors in this order to a matrix $V_\lambda$
    - if the number of columns of $V$ is smaller than the algebraic multiplicity of $\lambda$: repeat
3. Combine the $V_\lambda$ matrices horizontally into a matrix $V$.

#### Example

In [5]:
A = Matrix([
    [5, 4],
    [-1, 1]
])
A

Matrix([
[ 5, 4],
[-1, 1]])

In [6]:
A.eigenvals()

{3: 2}

The eigenvalue $\lambda=3$ has algebraic multiplicity 2.

In [7]:
(A - 3*sympy.eye(2)).rank()

1

In [8]:
((A - 3*sympy.eye(2))**2).rank()

0

In [9]:
((A - 3*sympy.eye(2))**3).rank()

0

We have $k_\text{max} = 2$. Now let us pick us pick the following vector:

In [10]:
v = ((A - 3*sympy.eye(2))**2).nullspace()[1]
v

Matrix([
[0],
[1]])

`v` is a null-vector of $(A-\lambda \mathbf I)^2$

In [11]:
((A - 3*sympy.eye(2))**2) * v


Matrix([
[0],
[0]])

Check that `v` is not in the nullspace of $A-\lambda \mathbf I$

In [12]:
(A - 3*sympy.eye(2))*v

Matrix([
[ 4],
[-2]])

`v` is a proper generalized eigenvector of degree $k=2$.

The next (and final) vector in the sequence of generalized eigenvectors is

In [13]:
v1 = v
v2 = (A - 3*sympy.eye(2)) * v1
v2

Matrix([
[ 4],
[-2]])

The number of vectors in this sequence equals 2, which is the algebraic multiplicity of $\lambda$. So we can construct

In [14]:
V = Matrix.hstack(v2, v1)
V

Matrix([
[ 4, 0],
[-2, 1]])

In [15]:
J = V.inv() * A * V
J

Matrix([
[3, 1],
[0, 3]])

In [16]:
V * J * V.inv()

Matrix([
[ 5, 4],
[-1, 1]])

**another example:**

In [17]:
A = Matrix([
    [5, 4, 2, 1],
    [0, 1, -1, -1],
    [-1, -1, 3, 0],
    [1, 1, -1, 2]
])
A

Matrix([
[ 5,  4,  2,  1],
[ 0,  1, -1, -1],
[-1, -1,  3,  0],
[ 1,  1, -1,  2]])

In [18]:
A.eigenvals()

{2: 1, 1: 1, 4: 2}

In [19]:
A.eigenvects()

[(1,
  1,
  [Matrix([
   [-1],
   [ 1],
   [ 0],
   [ 0]])]),
 (2,
  1,
  [Matrix([
   [ 1],
   [-1],
   [ 0],
   [ 1]])]),
 (4,
  2,
  [Matrix([
   [ 1],
   [ 0],
   [-1],
   [ 1]])])]

$A$ is defective because the eigenvalue 4 has the algebraic multiplicity of 2 and the geometric multiplicity of 1.




Let us consider $\lambda = 4$.

In [20]:
(A-4*sympy.eye(4)).rank()

3

In [21]:
((A-4*sympy.eye(4))**2).rank()

2

In [22]:
((A-4*sympy.eye(4))**3).rank()

2

Since the rank of $(A-\lambda \mathbf I)^3 = (A-\lambda \mathbf I)^2$, $k_\text{max} = 2$.

Next we find null vector of $(A-\lambda \mathbf I)^2$

In [23]:
((A-4*sympy.eye(4))**2).nullspace()

[Matrix([
 [1],
 [0],
 [0],
 [0]]),
 Matrix([
 [ 0],
 [ 0],
 [-1],
 [ 1]])]

In [24]:
v = ((A-4*sympy.eye(4))**2).nullspace()[1]
v

Matrix([
[ 0],
[ 0],
[-1],
[ 1]])

In [25]:
v1 = v
v2 = (A - 4*sympy.eye(4)) * v1
v2

Matrix([
[-1],
[ 0],
[ 1],
[-1]])

We have two generalized eigenvectors, as many as the algebraic multiplicity of $\lambda = 4$.

In [26]:
V4 = Matrix.hstack(v2, v1)
V4

Matrix([
[-1,  0],
[ 0,  0],
[ 1, -1],
[-1,  1]])

Now we do the same for eigenvector $\lambda = 2$.

In [27]:
(A-2*sympy.eye(4)).rank()

3

In [28]:
((A-2*sympy.eye(4))**2).rank()

3

Here $k_\max=1$, so $v = v_1$ is

In [29]:
v1 = (A-2*sympy.eye(4)).nullspace()[0]
v1

Matrix([
[ 1],
[-1],
[ 0],
[ 1]])

In [30]:
V2 = v1

Finally we go through the procedure for $\lambda = 1$.

In [31]:
(A-sympy.eye(4)).rank()

3

In [32]:
((A-sympy.eye(4))**2).rank()

3

So again, $k_\text{max} = 1$. 

In [33]:
v = v1 = V1 = (A-sympy.eye(4)).nullspace()[0]
V1

Matrix([
[-1],
[ 1],
[ 0],
[ 0]])

In [34]:
V = Matrix.hstack(*[V4, V2, V1])
V

Matrix([
[-1,  0,  1, -1],
[ 0,  0, -1,  1],
[ 1, -1,  0,  0],
[-1,  1,  1,  0]])

In [35]:
J = V.inv() * A * V
J

Matrix([
[4, 1, 0, 0],
[0, 4, 0, 0],
[0, 0, 2, 0],
[0, 0, 0, 1]])

In [36]:
V * J * V.inv()

Matrix([
[ 5,  4,  2,  1],
[ 0,  1, -1, -1],
[-1, -1,  3,  0],
[ 1,  1, -1,  2]])

### Applications of Jordan Canonical Form

**Jordan Canonical Form** is used for:

- **Simplifying Matrix Powers and Polynomials**:
  - Easier computation of $ A^n $ for matrix powers.
- **Computing Matrix Exponential**:
  - Useful in solving systems of linear differential equations by simplifying $ e^{At} $.
- **Theoretical Analysis**:
  - Provides insights into the structure and behavior of linear transformations.

*Note*: The next slides detail the computation of matrix powers and exponentials using JCF.


### Computing Matrix Power using JCF

To compute $ A^n $ for a defective matrix $ A $:

There is a matrix $P$ such that each column is a generalized eigenvector of $A$, allowing the following decomposition:

1. **JCF Decomposition**:
   - Decompose $ A $ into its JCF, $$ A = PJP^{-1} $$.
2. **Compute Power of JCF**:
   - Compute $ J^n $, which is simpler due to the Jordan block structure.






3. **Transform Back**:
   - Compute $$ A^n = PJ^nP^{-1} $$.

*Example*: For a Jordan block $$ J = \begin{pmatrix} \lambda & 1 \\ 0 & \lambda \end{pmatrix}, $$ $ J^n $ involves $ \lambda^n $ and terms with powers of $ n $.


In [37]:
l, n = sympy.symbols("l n")
J = Matrix([
    [l, 1],
    [0, l]
])
J

Matrix([
[l, 1],
[0, l]])

In [38]:
J**n

Matrix([
[l**n, l**(n - 1)*n],
[   0,         l**n]])

## Computing Matrix Exponential using JCF

To compute the matrix exponential $ e^{At} $:

1. **JCF Decomposition**:
   - Decompose $ A $ into its JCF, $ A = PJP^{-1}$.
2. **Exponential of JCF**:
   - Compute $ e^{Jt} $, easier for Jordan blocks.
3. **Transform Back**:
   - Compute $ e^{At} = Pe^{Jt}P^{-1} $.

*Example*: For a Jordan block $$ J = \begin{pmatrix} \lambda & 1 \\ 0 & \lambda \end{pmatrix}, $$ $ e^{Jt} $ involves $ e^{\lambda t} $ and terms with $ t $:

$$
e^{Jt} = \begin{pmatrix}
e^{\lambda t} & te^{\lambda t}\\
0 & e^{\lambda t}
\end{pmatrix}
$$


**Sympy**

In [39]:
A = Matrix([[5, 4], [-1, 1]])

A


Matrix([
[ 5, 4],
[-1, 1]])

In [40]:
P, J = A.jordan_form()
print("J: ")
J

J: 


Matrix([
[3, 1],
[0, 3]])

In [41]:
print("P: ")
P

P: 


Matrix([
[ 2, 1],
[-1, 0]])

In [42]:
P * J * P.inv()

Matrix([
[ 5, 4],
[-1, 1]])

In [43]:
t, k = symbols("t k")

In [44]:
J**k

Matrix([
[3**k, 3**(k - 1)*k],
[   0,         3**k]])

In [45]:
exp(t*J)

Matrix([
[exp(3*t), t*exp(3*t)],
[       0,   exp(3*t)]])

In [46]:
A = Matrix([
    [4, 0, 0, 1],
    [0, 4, 1, 0],
    [0, 0, 4, 0],
    [0, 0, 0, 4]
])
A

Matrix([
[4, 0, 0, 1],
[0, 4, 1, 0],
[0, 0, 4, 0],
[0, 0, 0, 4]])

In [47]:
P, J = A.jordan_form()
print("J: ")
J

J: 


Matrix([
[4, 1, 0, 0],
[0, 4, 0, 0],
[0, 0, 4, 1],
[0, 0, 0, 4]])

In [48]:
print("P: ")
P

P: 


Matrix([
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 0, 1]])

In [49]:
J**k

Matrix([
[4**k, 4**(k - 1)*k,    0,            0],
[   0,         4**k,    0,            0],
[   0,            0, 4**k, 4**(k - 1)*k],
[   0,            0,    0,         4**k]])

In [50]:
exp(t*J)

Matrix([
[exp(4*t), t*exp(4*t),        0,          0],
[       0,   exp(4*t),        0,          0],
[       0,          0, exp(4*t), t*exp(4*t)],
[       0,          0,        0,   exp(4*t)]])

In [51]:
A**k

Matrix([
[4**k,    0,            0, 4**(k - 1)*k],
[   0, 4**k, 4**(k - 1)*k,            0],
[   0,    0,         4**k,            0],
[   0,    0,            0,         4**k]])

## Eigenvalues and the determinant

- every square matrix $A$ can be factorized as
$$
A = U D U^{-1}
$$

where $D$ is either a diagonal matrix or a matrix in Jordan normal form
- in any event:
    - D is upper triangular
    - the diagonal entries of $D$ are the eigenvalues of $A$.
- recall that

    $$
    \begin{align}
    \det(XY) &= \det(X)\det(Y)\\
    \det X^{-1} &= \frac{1}{\det(X)}
    \end{align}
    $$
    
- it follows that

$$
\begin{align}
\det(A) &= \det(UDU^{-1})\\
&= \det(U)\det(D)\det(U^{-1})\\
&=\det(U)\det(D)\frac{1}{\det(U)}\\
&= \det(D)
\end{align}
$$

Since $D$ is triangular, its determinant is the product of the diagonal entries.

$\Rightarrow$ **The determinant of a matrix is the product of its eigenvalues (repeated according to algebraic multiplicity).**

## Trace of a matrix

The **trace** of a matrix is the sum of its diagonal entries.

$$
\text{trace}(A) = \sum_i a_{ii}
$$

**Theorem**
For all square matrices $A$, $B$ of the same size, $\text{trace}(AB) = \text{trace}(BA)$.

*Proof:*

$$
\begin{align}
\text{trace}(AB) &= \sum_i (AB)_{ii}\\
&=\sum_i\sum_j a_{ij}b_{ji}\\
&=\sum_j\sum_i b_{ji}a_{ij}\\
&=\sum_j(BA)_{jj}\\
&=\text{trace}(BA)
\end{align}
$$

Recall that each square matrix $A$ can be factorized into
$$
A = UDU^{-1},
$$
where the diagonal entries of $D$ are the eigenvalues of $A$.

It follows that

$$
\begin{align}
\text{trace}(A) &= \text{trace}(UDU^{-1})\\
&= \text{trace}(UU^{-1}D)\\
&= \text{trace}(D)\\
&=\sum_i \lambda_i
\end{align}
$$

$\Rightarrow$ **The trace of a matrix equals the sum of its eigenvalues** (repeated according to their algebraic multiplicity).

## Symmetric matrices

Symmetric matrices are much more well-behaved than square matrices in general.


Suppose $\lambda_1$ and $\lambda_2$, with 

$$
\begin{aligned}
\lambda_1 &\neq \lambda_2\\
\end{aligned}
$$
are two eigenvalues of $S$. Let $\mathbf v_1, \mathbf v_2$ be corresponding eigenvectors.

$$
\begin{aligned}
\mathbf v_1^T S \mathbf v_2 &= \lambda_2 \mathbf v_1^T\mathbf v_2\\
\mathbf v_1^T S \mathbf v_2 &= \mathbf v_1^T S^T \mathbf v_2\\
&=(\mathbf v_2^T S \mathbf v_1)^T\\
&= \lambda_1 (\mathbf v_2^T\mathbf v_1)^T\\
&= \lambda_1 \mathbf v_1^T\mathbf v_2\\
\lambda_2 \mathbf v_1^T\mathbf v_2 &= \lambda_1 \mathbf v_1^T\mathbf v_2\\
\end{aligned}
$$

Since we assumed that $\lambda_1 \neq \lambda_2$, it follows that

$$
\mathbf v_1^T\mathbf v_2 = 0
$$

In other words, the eigenvectors corresponding to different eigenvalues are **orthogonal** to each other.

**Theorem**

A symmetrix $n\times n$ matrix $S$ has only real eigenvalues. The sum of their algebraic multiplicity equals $n$.

*Proof*

As the proof requires complex numbers, it goes somewhat beyond the scope of this course. I will sketch it nevertheless.

- Each complex number can be written as $a + bi$, where $a$, $b$ are real numbers, and $i = \sqrt{-1}$ is the imaginary unit.
- The *complex conjugate* of a complex number $a + bi$ is $a - bi$. In other words, a complex number $x$ and its conjugate $\overline x$ have the same real part and imaginary parts of the same magnitude and opposite signs.

- Let $x = a + bi$. Then $x\cdot \overline{x} = (a+bi)(a-bi) = a^2-b^2i^2 = a^2+b^2$. In other words, the product of a complex number $x$ and its conjuate $\overline x$ is non-negative, and it is only $0$ if $x=0$.

- The complex conjugate commutes with the standard operations ($x, y$ are complex numbers and $a$ is real):

$$
\begin{align}
\overline x + \overline y &= \overline{x+y}\\
\overline x \cdot \overline y &= \overline{x\cdot y}\\
(\overline{x})^a &= \overline{x^a}
\end{align}
$$



- Let 

$$\det(S-\lambda \boldsymbol I) = a_n\lambda^n + \cdots a_1\lambda + a_0 = 0$$ 

be the characteristic equation of a square matrix $S$, and let $\lambda$ be a solution. In other words, $\lambda$ is an eigenvector of $S$. It then holds that

$$\det(S-\overline\lambda \boldsymbol I) = \overline{a_n\lambda^n + \cdots a_1\lambda + a_0} = \overline 0 = 0$$ 

In other words, $\overline\lambda$ is also an eigenvalue of $S$.

- Let $\boldsymbol x$ be an eigenvector of $S$. Then $\overline{\boldsymbol x}$ is also an eigenvector of $S$.
- Suppose $\lambda \neq \overline\lambda$, and $\lambda, \overline\lambda$ are eigenvalues of $S$. Then there are corresponding eigenvectors $\boldsymbol x$ and $\overline{\boldsymbol x}$ with

$$
\begin{align}
\boldsymbol{x}^T\overline{\boldsymbol{x}} &= 0\\
\sum_i x_i\cdot \overline{x_i} &= 0\\
x_i = 0
\end{align}
$$

This is a contradiction because eigenvectors are by definition different from $\boldsymbol 0$. It follows that $\lambda = \overline\lambda$, i.e., $\lambda$ is a real number.

$\dashv$

**Theorem**

A symmetric matrix is not defective.

*Proof*

Suppose $S$ is defective. This means there must be an eigenvalue $\lambda$ and a corresponding generalized eigenvector $\mathbf v$ such that

$$
\begin{aligned}
(S-\lambda \mathbf I)^2\mathbf v &= \mathbf 0\\
(S-\lambda \mathbf I)\mathbf v &\neq \mathbf 0\\
\end{aligned}
$$

From the first line it follows that

$$
\mathbf v^T (S-\lambda \mathbf I)^2\mathbf v =  0
$$

However, if $S$ is symmetric, so is $(S-\lambda \mathbf I)$. Therefore

$$
\begin{aligned}
\mathbf v^T (S-\lambda \mathbf I)^2\mathbf v &= 
\mathbf v^T (S-\lambda \mathbf I)(S-\lambda \mathbf I)\mathbf v\\
&= \mathbf v^T (S-\lambda \mathbf I)^T(S-\lambda \mathbf I)\mathbf v\\
&= ((S-\lambda \mathbf I)\mathbf v)^T(S-\lambda \mathbf I)\mathbf v\\
&= \|(S-\lambda \mathbf I)\mathbf v\|^2
\end{aligned}
$$

Since $(S-\lambda \mathbf I)\mathbf v \neq \mathbf 0$ by assumption, the expression of the left-hand side must be different from $0$, which is a contradiction.

$\dashv$

Furthermore, it is possible to choose multiple eigenvectors for the same eigenvalue in such a way that they are orthogonal to each other.

Suppose 

- $\lambda$ is an eigenvalue of a symmetric matrix $S$ and 
- the columns of the matrix $V$ are the eigenvectors of $S$ corresponding to $\lambda$.

First observe that each vector within the column space of $V$ is also an eigenvector of $S$ with eigenvalue $\lambda$.

$$
\begin{aligned}
V\mathbf x &=\mathbf b\\
SV &= \lambda V\\
S\mathbf b &= S V\mathbf x\\
&=\lambda V\mathbf x\\
&= \lambda \mathbf b
\end{aligned}
$$



We can transform the $n\times k$ matrix $V$ into an $n\times k$ matrix $U$ in such a way that

- $U$ and $V$ have the same column space, and
- the columns of $U$ are orthogonal to each other and each have length 1. In other words:

$$
U^T U = \mathbf I
$$

1. $\mathbf u_1 = \frac{1}{\|\mathbf v_1\|}\mathbf v_1$
2. for $i = 2,\ldots, k$:
    - let $A = [\mathbf u_1 \cdots \mathbf u_{i-1}]$
    - $\mathbf x = \mathbf v_i - A(A^TA)^{-1}A^T \mathbf v_i$
    - $\mathbf u_i = \frac{1}{\|\mathbf x\|}\mathbf x$
3. $U = [\mathbf u_1 \cdots \mathbf u_k]$
    


In [52]:
V = Matrix([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, -1],
    [0, 0, 1]
])
V

Matrix([
[1, 2,  3],
[4, 5,  6],
[7, 8, -1],
[0, 0,  1]])

In [53]:
v1 = V[:,0]
v2 = V[:,1]
v3 = V[:,2]

In [54]:
u1 = sympy.simplify(v1/v1.norm())

U = u1
U

Matrix([
[  sqrt(66)/66],
[2*sqrt(66)/33],
[7*sqrt(66)/66],
[            0]])

In [55]:
u2 = v2 - U*(U.T * U).inv()*U.T * v2
u2 /= u2.norm()
u2

Matrix([
[3*sqrt(11)/11],
[  sqrt(11)/11],
[ -sqrt(11)/11],
[            0]])

In [56]:
U = Matrix.hstack(u1, u2)
U

Matrix([
[  sqrt(66)/66, 3*sqrt(11)/11],
[2*sqrt(66)/33,   sqrt(11)/11],
[7*sqrt(66)/66,  -sqrt(11)/11],
[            0,             0]])

In [57]:
u3 = v3 - U*(U.T * U).inv()*U.T * v3
u3 /= u3.norm()
u3

Matrix([
[-5*sqrt(159)/159],
[10*sqrt(159)/159],
[-5*sqrt(159)/159],
[    sqrt(159)/53]])

In [58]:
U = simplify(Matrix.hstack(U, u3))
U

Matrix([
[  sqrt(66)/66, 3*sqrt(11)/11, -5*sqrt(159)/159],
[2*sqrt(66)/33,   sqrt(11)/11, 10*sqrt(159)/159],
[7*sqrt(66)/66,  -sqrt(11)/11, -5*sqrt(159)/159],
[            0,             0,     sqrt(159)/53]])

In [59]:
U.T * U

Matrix([
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])

## Orthogonal matrices

A matrix $Q$ is **orthogonal** iff 

- $Q$ is square
- $QQ^T = \mathbf I$

**Corollaries**

If $Q$ is orthogonal, then

- for each column $\mathbf q_i$ of Q, $\|\mathbf q_i\| = 1$
- if $i\neq j$, then $\mathbf q_i$ and $\mathbf q_j$ are orthogonal, i.e., $\mathbf q_i^T\mathbf q_j = 0$
- $Q^T = Q^{-1}$

It follows from what has been said above that the eigenvectors of each symmetric matrix can be chosen so that they form an orthogonal matrix. Therefore, $S$ can be diagonalized as

$$
S = Q\Lambda Q^T
$$

where $Q$ is orthogonal and $\Lambda$ is diagonal.

This is the famous **Spectral Theorem**.