# HIGHLIGHTS of LINEAR ALGEBRA

## MATRIX-MATRIX MULTIPLICATION


**Inner Products** (=row times columns) produce each of the numbers in A * B = C

$$

\begin{bmatrix}

. & . & . \\
a_2_1 & a_2_2 & a_2_3 \\
. & . & . \\

\end{bmatrix}

*

\begin{bmatrix}

. & . & b_1_3 \\
. & . & b_2_3 \\
. & . & b_3_3 \\

\end{bmatrix}

=

\begin{bmatrix}

. & . & . \\
. & . & c_2_3 \\
. & . & . \\

\end{bmatrix}

$$

**Row 2 of A and Column 3 of B give $c_2_3$ in C**

In [11]:
import numpy as np
import torch
import tensorflow as tf

# defining matrix A
A = np.array([[0,0,0],[4,5,6],[0,0,0]])

A

array([[0, 0, 0],
       [4, 5, 6],
       [0, 0, 0]])

In [12]:
#defining matrix B

B = np.array([[0,0,1],[0,0,2],[0,0,3]])

B

array([[0, 0, 1],
       [0, 0, 2],
       [0, 0, 3]])

# A * B does NOT give dot product. 

In [13]:
c_not = A * B

c_not


array([[ 0,  0,  0],
       [ 0,  0, 12],
       [ 0,  0,  0]])

In [14]:
# Calculating c_2_3 manually
# c_2_3 = a_2_1 * b_1_3 + a_2_2 * b_2_3 + a_2_3 * b_3_3

manuel_c_2_3 = 4*1 + 5*2 + 6*3
manuel_c_2_3

32

In [15]:
# Calculating c_2_3 with numpy

c_2_3 = np.dot(A,B)
c_2_3

array([[ 0,  0,  0],
       [ 0,  0, 32],
       [ 0,  0,  0]])

That **Dot Product** $c_2_3$ = (row 2 of A) * (column 3 of B) is a sum of $a$'s times $b$'s:

$$

c_2_3= a_2_1*b_1_3 + a_2_2*b_2_3 + a_2_3*b_3_3 =>

$$

$$

\sum_{k=1}^{3} a_2_k b_k_3 = 

$$

$$

c_i_j= \sum_{k=1}^{n} a_i_k*b_k_j

$$

This is how we usually compute each number AB = C.

But there is another way.

The other way to multiply AB is **columns of A times rows of B**.

We need to see this!

I start with numbers to make two key points:

*one column* $u$ time *one row* $v^T$ produces a *matrix*.

Concentrate first on that piece of AB.

This matrix $uv^T$ is especially simple:


 $$


Outer Product / $uv^T$

=
 
\begin{bmatrix}

2 \\
2 \\
1 \\

\end{bmatrix}

*

\begin{bmatrix}

3 & 4 & 6  \\


\end{bmatrix}

=

\begin{bmatrix}

6 & 8 & 12  \\
6 & 8 & 12  \\
3 & 4 & 6  \\


\end{bmatrix}

=

Rank One Matrix

$$


As m by 1 matrix (a column $u$) times a 1 by p matrix (a row of $v^T$ gives an m by p matrix.

Notice that what is special about the rank one matrix $uv^T$.




All columns of $uv^T$ are multiplies of $u$ = $\begin{bmatrix}

2 \\
2 \\
1 \\

\end{bmatrix}$. 

All rows are multiplies of $v^T$ = $ \begin{bmatrix}

3 & 4 & 6  \\


\end{bmatrix}$.

The column space of $uv^T$ is one dimentional: *the line in the direction of **$u$***.

The dimension of the column space (the number of independent columns) is the **rank of the matrix** - a key number. 

**All nonzero matrices $uv^T$ have rank one**.

They are the perfect building blocks for every matrix.

Notice also: **The row space of $uv^T$ is the line through $v$.**

By definition, the row space of any matrix A is the column space C(A^T) of its transpose A^T.

That way we stay with column vectors.

In the example, we transpose $uv^T$ (**exchange rows with columns**) to get the matrix $vu^T$:

$$

(uv^T)^T

=


\begin{bmatrix}

6 & 8 & 12  \\
6 & 8 & 12  \\
3 & 4 & 6  \\


\end{bmatrix}^T


=


\begin{bmatrix}

6 & 6 & 3  \\
8 & 8 & 4  \\
12 & 12 & 6  \\


\end{bmatrix}

=


\begin{bmatrix}

3  \\
4  \\
6  \\


\end{bmatrix}

*


\begin{bmatrix}

2 & 2 & 1  \\

\end{bmatrix}

=

vu^T

$$




We are seeing the clearest possible example of the first great theorem in linear algebra:

$$

ROW RANK = COLUMN RANK  

$$

$$

r \:independent\:columns\:<=>\:r\:independent\:rows

$$

A nonzero matrix $uv^T$ has one independent column and one independent row.

All columns are multiplies of $u$ and all rows are multiplies of $v^T$.

The rank is $r = 1$ for this matrix.

## $AB$ = SUM OF RANK ONE MATRICES

We turn to the full product $AB$, using columns of A times rows B.

Let $a_1,a_2,...,a_n$ be the $n$ columns of A. Then B must have $n$ rows $b_1 ^*$, $b_2 ^*$, ..., $b_n ^*$. The matrix A can multiply the matrix B.

**Their product $AB$ is the sum of columns $a_k$ times rows $b_k ^*$:

$$
Column-Row\:Multiplication\:of\:Matrices
$$

$$

AB

=


\begin{bmatrix}

| &  & |  \\
a_1 & ... & a_n  \\
| &  & |  \\


\end{bmatrix}

*
\begin{bmatrix}

-- & b_1 ^* & --  \\
 & ... &   \\
-- & b_n ^* & --  \\


\end{bmatrix}

=

a_1*b_1 ^* + a_2*b_2 ^* + ... + a_n*b_n ^*

$$

$$

Sum\:of\:rank\:1\:matrices

$$

Please see 2 by 2 example to show the n=2 pieces (column times row) and their sum AB:


$$

\begin{bmatrix}

1 & 0 \\
3 & 1 \\

\end{bmatrix}


*

\begin{bmatrix}

2 & 4 \\
0 & 5 \\

\end{bmatrix}

=

\begin{bmatrix}

1  \\
3  \\

\end{bmatrix}

*

\begin{bmatrix}

2 & 4 \\

\end{bmatrix}

+
\begin{bmatrix}

0  \\
1  \\

\end{bmatrix}
*
\begin{bmatrix}

0 & 5 \\

\end{bmatrix}

=


\begin{bmatrix}

2 & 4 \\
6 & 12 \\

\end{bmatrix}

+


\begin{bmatrix}

0 & 0 \\
0 & 5 \\

\end{bmatrix}

=


\begin{bmatrix}

2 & 4 \\
6 & 17 \\

\end{bmatrix}

$$


In [16]:
A_Sum = np.array([[1,0],[3,1]])
A_Sum

array([[1, 0],
       [3, 1]])

In [17]:
B_Sum = np.array([[2,4],[0,5]])
B_Sum

array([[2, 4],
       [0, 5]])

In [18]:
C_Sum= np.dot(A_Sum,B_Sum)
C_Sum

array([[ 2,  4],
       [ 6, 17]])

We can count t he multiplications of number times number.

Four multiplications to get 2,4,6,12. Four more to get 0,0,0,5.

A total of 2^3 = 8 multiplications.

Always there are n^3 multiplications when A and B are n by n.

an $mnp$ multiplications when AB = (m by n) times (n by p): n rank one matrices, each of those matrices is m by p.

$$

rows\:times\:columns\:\: = \:\: mp \: inner \: products,\:n\:multiplications\:each\:\:\:=\:mnp
$$

$$

columns\:times\:rows\:\: = \:\: n \: outer \: products,\:mp\:multiplications\:each\:\:\:=\:mnp

$$

When you look closely, they are exactly the same multiplications $a_i _k$ $b_i _j$ in different order.

Here is the algebra proof that each number $c_i _j$ in C = AB is the same be outer products as by inner products:

The $i$,$j$ entry of a $a_k b_k ^*$ is $ a_i _k b_k _j$.


Add to find $c_i _j$ = $\sum^n_k_=_1 a_i _k b_k _j = row\:i \: * \: column\:j. $

## INSIGHTS FROM COLUMN TIMES ROW

**Why is the outer product approach is essential in data science?**

The short answer is :

*We are looking for the important part of a matrix A*


We do not usually want the biggest number in A (though that could be important).

What we want more is the largest piece of A.

**And those pieces are rank one matrices $uv^T$.**

A dominant theme in applied linear algebra is:

$$

Factor\:A\:into\:CR\:and\:look\:at\:the\:pieces\: c_k r_k ^*\:of\:A\:=\:CR.

$$ 

Factoring A into CR is the reverse of multiplying CR = A.

Factoring takes longer, especially if the pieces involve eigenvalues or singular values.

But those numbers have inside information about the matrix A.

That information is not visible until you factor.

Here are five important factorizations, with the standard choice of letters (usually A) for the original product matrix and then for its factors.

$$
1-)\:A\: =\: LU
$$
 

$$
2-)\:A\: =\: QR
$$

$$
3-)\:S\: =\: QΛQ^T
$$

$$

4-)\:A\: =\: XΛX^- ^1

$$

$$

5-)\:A\: =\: UΣV^T

$$



At this point, we simply list key words and properties for each of these factorization.

1-) A = LU comes from elimination.

Combinations of rows take A to U and U back to A. The Matrix L is lower triangular and U is upper triangular.

2-) A = QR comes from orthogonalizing the columns $a_1$ to $a_n$ as in "Gram-Schmidt".

Q has orthonormal columns ($Q^TQ$=$I$) and R is upper triangular.

3-) $S$ = $QΛQ^T$ comes from the eigenvalues $λ_1$,...,$λ_n$ of a symmetric matrix $S$ = $S^T$.

Eigenvalues on the diagonal of Λ.

Orthonormal eigenvectors in the columns of Q.

4-) $A$ =  $QΛQ^- ^1$ is diagonalization when A is n by n with n independent eigenvectors.

Eigenvalues of A on the diagonal of Λ.

Eigenvectors of A in the columns of X.

5-) $A$ = $UΣV^T$ is the Singular Value Decomposition of any matrix A (square or not).

Singular values $σ_1$,...,$σ_r$ in Σ.

Orthonormal singular vectors in U and V.




Let me pick out a favorite (number 3) to illustrate the idea.

This special factorization $QΛQ^T$ starts with a symmetric matirx S.

That matrix has orthogonal unit eigenvectors $q_1$,...,$q_n$.

Those perpendicular eigenvectors (dot products = 0) go into the columns of Q.

S and Q are the kings and queens of linear algebra.

$$

Symmetric\:matrix\:$S$\:\:\:\:\: $S^T$ = $S$\:\:\:\:\:\: All\: s_i _j = s_j _i

$$

$$

Orthogonal\:matrix\:$Q$\:\:\:\:\: $Q^T$ = $Q^- ^1$\:\:\:\:\:\: All\: q_i * q_j = \{0\: for\: $i$ ≠ \:$j$ \:and\: 1\: for \: $i$ = $j$ \}

$$


The diagonal matrix $Λ$ contains real eigenvalues $λ_1$ to $λ_n$.

Every real symmetric matrix $S$ has n orthonormal eigenvectors $q_1$ to $q_n$.

When multiplied by S, the eigenvectors keep the same direction.

They are just rescaled by the number $λ$:

$$

Eigenvector\:$q$\:and\:eigenvalue\:$λ$\:\:\:\:\:\:\: $S_q$ = $λ_q$

$$  

Finding $λ$ and $q$ is not easy for a big matrix.

But n pairs always exist when S is symmetric.

Our purpose here is to see how $SQ$ = $QΛ$ comes column by column from $S_q$ = $λ_q$:


$$

SQ 

=

S

*

\begin{bmatrix}
&  &   \\
&  &   \\
&  &   \\
 &  &   \\
q_1 & ... & q_n  \\
 &  &   \\
&  &   \\
&  &   \\
&  &   \\
\end{bmatrix}

=

\begin{bmatrix}
&  &   \\
&  &   \\
&  &   \\
 &  &   \\
λ_1q_1 & ... & λ_nq_n  \\
 &  &   \\
&  &   \\
&  &   \\
&  &   \\
\end{bmatrix}
$$

$$

=
\begin{bmatrix}
&  &   \\
&  &   \\
&  &   \\
 &  &   \\
q_1 & ... & q_n  \\
 &  &   \\
&  &   \\
&  &   \\
&  &   \\
\end{bmatrix}

*

\begin{bmatrix}
λ_1&  &   \\
& . &   \\
& . &   \\
 & . &   \\
 & ... &   \\
 & . &   \\
& . &   \\
& . &   \\
&  & λ_n  \\
\end{bmatrix}

=

QΛ


$$

Multiply $SQ$ = $QΛ$ are $Q^- ^1$ = $Q^T$ to get $S=QΛQ^T$ = a symmetric matrix.

Each eigenvalue $λ_k$ and each eigenvector $q_k$ contribute a rank one piece $λ_k q_k q_k ^T$ to S.

$$

Rank\:One\:Pieces\:\:\:\: $S=(QΛ)Q^T$ =\: $(λ_1 q_1) q_1 ^T$ + $(λ_2 q_2) q_2 ^T$ + ... +$(λ_n q_n) q_n ^T$ 

$$  

$$

All\:\:Symmetric\:\:\:\:\:\:\: The\: transpose \:of \:$q_i q_i ^T$ \: is \:$q_i q_i ^T$

$$

Please notice that the columns of $QΛ$ are $(λ_1 q_1)$ to $(λ_n q_n)$.

When you multiply a matrix on the right by the diagonal matrix $Λ$, you multiply its columns by the $λ$'s

We close with a comment on the proof of this **Spectral Theorem $S$ = $QΛQ^T$**:

Every symmetric S has n real eigenvalues and n orthonormal eigenvectors.

In eigenvalue and eigenvectors section, we will construct the eigenvalues as the roots of the nth degree polynomial $P_n (λ)$ = determinant of $S - λ I$.

They are real numbers when $S$ = $S^T$.

The delicate part of the proof comes when an eigenvalue $λ_i$ is repeated - it is a double root or an Mth root from a factor $(λ - λ_j)^M$.

In this case, we need to produce M independent eigenvectors.

The rank of $S - λ_jI$ must be n-M.

This is true when $S = S^T$. But it requires a proof.

Similarly, the Singular Value Decomposition $ A $ = $UΣV^T$ requires extra patience when a singular value σ is repeated M times in the diagonal matrix Σ.

Again, there are M pairs of singular vectors $v$ and $u$ with $Av$ = $σu$. Again this true statement requires proof.

**Notation for rows**:

We introduced the symbols $b_1 ^*$, ... , $b_n ^*$ for the rows of the second matrix in AB.

You might have expected $b_1 ^T$, ... , $b_n ^T$ and that was our original choice.

But this notation is not entirely clear- it seems to mean the transposes of the columns of B.

Since that right hand factor could be $U$ or $R$ or $Q^T$ or $X^- ^1$ or $V^T$, it is safer to say definitely:

**We want the rows of that matrix**
