# Introduction to Linear Algebra for Statistical Analysis

## Why Linear Algebra?

Linear Algebra is critical for 

1. Multiple regression
2. Advanced statistical methods
3. Data manipulation
4. Machine learning

It provides efficient ways for:
1. Expressing the computation
2. Managing data in multi-dimensions
3. Handling big data and executing the computation

Linear algebra is probably the most important mathematical pre-requisite, and you will survive this :D 

## Vector

A vector is an **ordered** list of numbers or random variables. 

It can be written as:

| | |  |
| --- | --- | --- |
|$\begin{bmatrix} 1 \\ 2 \\ 3 \\ \end{bmatrix}$ | $\begin{pmatrix} 1 \\ 2 \\ 3 \\ \end{pmatrix}$ | $(1, 2, 3)$|

- The size of a vector is the number of elements in the list
- Elements or entries are the items in the list
- Vector of size $n$ is called $n$ vector
- A single number is called a scalar and has size 1

## Vector notation

- We use symbols to notate a vector, such as $\vec{\beta}, k, p, \beta$ etc
- The ith element of the vector $a$ is denoted as $a_i$. For instance $a = \begin{bmatrix} 3 \\ 1 \\ 5 \\ \end{bmatrix} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \end{bmatrix}$, then $a_2 = 1$


What is the size of the following vector? 

What is the 3rd entry of the vector?

$$
\begin{bmatrix} 1.5 \\ 6 \\ 9 \\ -4 \\ 8 \end{bmatrix}
$$

## Vector Space 

A $n$ vector can be represented as a point in a $n$ dimensional space or a vector from the orginal point to the point. 

For instance, a 2-vector $x=\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ can be visualized as a point in the 2-dimensional space $(x_1, x_2)$ or a vector starting from the orginal point $(0,0)$ to the point $(x_1, x_2)$

![vector](./img/vector.png)

## Vector Space 

A $n$-vector can be visualized in $n$ dimensional space.

For instance, a 3-vector $\begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix}$ can be represented as 

![vector](./img/vector_3d.webp)

## Use vector to represent data

We may use vectors to represent data. For example:

- One sample of multiple variables: representing color in (R, G, B), and red would be $[255 \ 0 \  0 ]$
- Multiple samples of one single variable: price of stock $\begin{bmatrix} 1.2 \\ 1.3 \\ 1.4 \\ 2 \\ 0.5 \\ -1\end{bmatrix}$
- Categorical variable: dummy coding variables
- Countings of categories: word of bag

## Vector operations

### Vector addition

Vectors of the same size can be added, for instance, two vector $a = \begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix}$ and $b =  \begin{bmatrix} -1 \\ 4 \\ 7 \end{bmatrix}$ can be added by simply adding each of their element together

$$
a + b = \begin{bmatrix} 2-1 \\ 3+4 \\ 5+7 \end{bmatrix} = \begin{bmatrix} 1 \\ 7 \\ 12 \end{bmatrix}
$$

Vector minus can do in the same way. 

Note that vectors of different size can not be added or minused. 

### Vector addition rules

The vector addition rules are very similar to the addition rules of scalers.

1. $a + b = b + a$
2. $a + (b + c) = (a + b) + c$
3. $a + \vec{0} = \vec{0} + a = a$   Here $\vec{0}$ is a vector of the same size of $a$ with all elements being $0$
4. $a - a = \vec{0}$

### Vector addition in vector space

Vectors can be represented as displacement in the $n$-dimension. Two vectors can be added as the sum of their displacement. 

For instance, in the following case, $a+b$ can be represented as 

<div>
<img src="./img/vector_addition_1.png" width="500"/>
</div>

### Vector transpose

A vector can be a column vector $a = \begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix}$ or it can be a row vector $b = [2 \ 3\ 5]$.

When we do not specify the dimension of a vector, we assume it is a column vector. 

The operation that transforms a vector from a column vector to a row vector or vice versa is called **transpose**. 

$$
a^T = b
$$

Row vectors are usually denoted as a transposed vector  $a^T = [2 \ 3\ 5]$, because we assume vectors are column vectors. 

### Vector scaler multiplication

A vector can be multiplied by a scalar (a singular number). 

To do so, you need to multiply every element in the vector to the scalar number. 

$$
\begin{align}
a &= 2 \ \ \ \ \ \ \ \ b = \begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix}
\\
ab &= \begin{bmatrix} 2*2 \\ 2*3 \\ 2*5 \end{bmatrix} = \begin{bmatrix} 4 \\ 6 \\ 10 \end{bmatrix}
\end{align}
$$

## Rules of vector scaler multiplication

For vector $a$ and $b$, and scalers $\beta$ and $\gamma$, we have rules for multiplication in a similar way to muplication rules of scalers

1. $(\beta \gamma) a = \beta(\gamma a)$
2. $(\beta + \gamma) a = \beta a + \gamma a$
3. $\beta(a+b) = \beta a + \beta b$

### Vector inner product

Vectors of the same size $n$ can be multiplied as **inner product**. 

The $n$-vector $a$ and the $n$-vector $b$ can be multiplied as their inner product $<a, b>$, which is 

$$
<a, b> = a^T b = a_1 b_1 + a_2 b_2 + a_3 b_3 + ... + a_n b_n = \sum_{i=1}^n a_ib_i
$$

For instance the inner product of $a = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$ and $b = \begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix}$ is 

$$
<a, b> = a^T b = 1*2 + 2*3 + 3*5 = 23
$$

The inner product of a vector with itself is $a^Ta = \sum_{i=1}^n a_i^2$

### Vector norm

The **norm** of a vector $||a||$ can be considered as its length in the $n$ dimensional vector space, or the size of the vector. It is also called **magnitude** of the vector.

$$
||a|| = \sqrt{a_1^2 + a_2^2 + ... + a_n^2} = \sqrt{a^Ta}
$$

For instance, a norm for a vector $x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ is the length of the vector and $||x|| = \sqrt{x_1^2 + x_2^2}$

![vector_norm](./img/vector.png)

### Vector distance

The Euclidean distance between two vectors $a$ and $b$ of the same size $n$ is 

$$dist(a,b) = ||a-b|| = \sqrt{\sum (a_i - b_i)^2}$$

Consider the two $2$ dimensional vectors $a$ and $b$ being two points in the $2$ dimensional space, the distance can be represented by the displacement from $a$ to $b$ as $a-b$ or $b-a$

![vector_distance](./img/vector_distance.png)

## Vector inner product and angle

The inner product of two $n$ vectors $a$ and $b$ can be represented as the product of the norms of the two vectors $||a||||b||$ and the cosine of the angle between the two vectors $\angle(a,b) = \theta$. 

$$
a^T b = ||a|| ||b|| cos(\theta)
$$

The angle between two vectors is critical to describe the relationship between two vectors as it will determine one type of dissimilarity between the two vectors. The smaller the angle between two vectors, the more similar these two vectors will be in the space. We can use the **cosine similarity** to describe this. 

$$
cossimilarity(a, b) = cos(\theta) = \frac{a^Tb}{||a||||b||}
$$


![cosine](./img/cosinesimilarity.png)

## Special relationship between vectors

Depending on the angle between two vectors $a$ and $b$, there are some special relationships between the two vectors:

1. Vectors with the same direction. $\theta=0^{\circ}$ 
   
   $cos(\theta) = 1$ thus $a^Tb = ||a|| ||b||$
   
2. Vectors that are orthogonal with each other, $\theta = 90^{\circ}$
   
   $cos(\theta) = 0$ thus $a^Tb = 0$

3. Vectors that are in opposite direction, $\theta = 180^{\circ}$
   
   $cos(\theta) = -1$ thus $a^Tb = -||a|| ||b||$
4. Vectors that are either in the same direction or in the opposite direction is called **linear dependent**, such that there exists constants $\beta$ to make $a+\beta b = 0$. Otherwise these two vectors are **linear independent**.

## Vectors applied to data

1. Linear regression line can be expressed as the inner product of the vector of indepdent variables and the vector of regression coefficients or slopes and intercept.
   $$
   \begin{align}
   Y_i = \beta_0 * 1 + \beta_1 X_{1,i} + \beta_2 X_{2, i} + ... + \beta_p X_{p, i}\\
   Y_i = \vec{X_i} ^T \vec{\beta}  \ \ \ \ \text{where $\vec{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix}$ and $\vec{X_i} = \begin{bmatrix} 1 \\ X_{1,i} \\ X_{2,i} \end{bmatrix}$}
   \end{align}
   $$



## Vectors applied to data

2. the correlation coefficient between two variables $X$ and $Y$ can be expressed as the cosine similarity of the centered data vector $X' = X-\bar{X}$ and $Y'= Y-\bar{Y}$
   $$
   \begin{align}
   X = \begin{bmatrix} X_1 \\ X_2 \\ ... \\ X_n \end{bmatrix} \ \ \ Y = \begin{bmatrix} Y_1 \\ Y_2 \\... \\ Y_n \end{bmatrix} \ \ \ X' = \begin{bmatrix} X_1-\bar{X} \\ X_2-\bar{X} \\ ... \\ X_n-\bar{X} \end{bmatrix} \ \ \ Y' = \begin{bmatrix} Y_1-\bar{Y} \\ Y_2-\bar{Y} \\ ... \\ Y_n-\bar{Y} \end{bmatrix}
   \\
   r(X, Y) = \frac{\sum(X_i-\bar{X})(Y_i-\bar{Y})}{\sqrt{\sum(X_i-\bar{X})^2}\sqrt{\sum(Y_i-\bar{Y})^2}} = \frac{(X-\bar{X})^T(Y-\bar{Y})}{||X-\bar{X}|| ||Y-\bar{Y}||} = cos(\angle(X', Y'))
   \end{align}
   $$

3. When the two vectors of two variables are orthogonal with $cos(\theta)=0$, the correlation between them is 0. When two variables are linearly independent and the two variables are mean-centered, we can call them **orthogonal**.

## Exercise

For two random variables $X$ and $Y$, a sample of size $n=3$, we write these two random samples as vector $X$ and $Y$:

$$
X=\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \ \ \ \ \ \ \ \ \ \ Y=\begin{bmatrix} -1 \\ -2 \\ 0 \end{bmatrix}
$$

1. Drawing these two vectors in a 3-dimensional space
2. Compute the a linear transformation of $X$ and $Y$ as $Z = 0.5 X + 1Y$
3. Drawing the new vector $Z$ in the 3-dimensional space
4. Compute the norm (length) of these two vectors $||X||$ and $||Y||$
5. Compute the inner product between these two vectors $X^T Y$
6. Compute the sum of these two vectors and the displacement between these two vectors $X+Y$ and $X-Y$
7. Compute the Euclidean distance between these two vectors $||X-Y||$
8. Compute the Cosine similarity between these two vectors $cos(\angle{(X,Y)}) = \frac{X^TY}{||X||||Y||}$
9. Compute the centered vector $X' = X-\bar{X}$ and $Y'=Y-\bar{Y}$
10. Compute the correlation coefficient between $X$ and $Y$ as $r(X, Y) = \frac{X'^T Y'}{||X'||||Y'||}$


# Matrix


<div>
<img src="./img/matrix_movie.avif" width="700"/>
</div>


## Matrix

A **matrix** is an array of numbers, such as

$$
\begin{bmatrix} 
4 & -1 & 3 & 2 \\
3 & 9 & 1 & 4 \\
1 & 2 & 0 & 5
\end{bmatrix}
$$

We usually use the bolded uppercase letter to denote a matrix $\boldsymbol{A}$, and use a lowercase letter with index $a_{i,j}$ to indicate entries in the matrix.

$$
A = \begin{bmatrix} 
a_{11} & a_{12} & a_{13} & a_{14} \\
a_{21} & a_{22} & a_{23} & a_{24} \\
a_{31} & a_{32} & a_{33} & a_{34}
\end{bmatrix}
$$

## Matrix Order

A matrix $\boldsymbol{A}$ with $U$ rows and $V$ columns is said to have **order** $U \times V$. 

An element $a_{ij}$ refers to the element in the $i$th row and $j$th column. 

This can be denoted as 

$$
\boldsymbol{A} = \{a_{ij}\}_{U \times V}
$$

## Matrix, vectors, and scalers

A matrix of size $U \times V$ consists of $U$ rows and $V$ columns. These rows can be considered as row vectors, and columns can be considered as column vectors. 

A $U$ - vector can be considered a matrix of size $U \times 1$, and its transpose can be considered a matrix of size $1 \times U$.

A scaler can be considered as a matrix of size $1 \times 1$.

So, vectors and scalers can be considered as matrices. 

$$
\boldsymbol{A}_{3 \times 2} = \begin{bmatrix} 
a_{11} & a_{12} \\
a_{21} & a_{22} \\
a_{31} & a_{32}
\end{bmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x_{3 \times 1} = \begin{bmatrix} 
x_{11} \\
x_{21} \\
x_{31}
\end{bmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a_{1 \times 1} = [a_{11}]
$$

## Special Matrix

There are a list of matrices that are special and important to know. 

1. Square matrix: Square matrices have the same number of rows and columns.
   $$
   \boldsymbol{X}_{3 \times 3} = \begin{bmatrix} 
    a_{11} & a_{12} & a_{13} \\
    a_{21} & a_{22} & a_{23} \\
    a_{31} & a_{32} & a_{33} \end{bmatrix}
   $$
2. Symmetric matrix: Symmatric matrices are matrices that are symmetric in a way such that $x_{ij} = x_{ji}$
   $$
   \boldsymbol{X}_{3 \times 3} = \begin{bmatrix} 
    1 & 4 & 5 \\
    4 & 2 & 7 \\
    5 & 7 & 3 \end{bmatrix}
   $$
3. Null matrix: Null matrices $\boldsymbol{\emptyset}$ are matrices that all entries are 0.
   $$
   \boldsymbol{\emptyset}_{3 \times 4} = \begin{bmatrix} 
    0 & 0 & 0 & 0 \\
    0 & 0 & 0 & 0\\
    0 & 0 & 0 & 0\end{bmatrix}
   $$


## Special Matrix
4. $1$ matrix: Matrics $\boldsymbol{E}$ are matrices that all entries are 1.
   $$
   \boldsymbol{E}_{3 \times 4} = \begin{bmatrix} 
    1 & 1 & 1 & 1 \\
    1 & 1 & 1 & 1\\
    1 & 1 & 1 & 1\end{bmatrix}
   $$
5. Diagonal matrix: Diagonal matrices $\boldsymbol{D}$ are matrices that is square with 0 in all of the off-diagonal entries.
   $$
   \boldsymbol{D}_{3 \times 3} = \begin{bmatrix} 
    1 & 0 & 0\\
    0 & 4 & 0\\
    0 & 0 & 7\end{bmatrix}
   $$
6. Identity matrix: Identity matrices $\boldsymbol{I}$ are matrices that is diagonal matrices with diagonal entries being all 1.
   $$
   \boldsymbol{I}_{3 \times 3} = \begin{bmatrix} 
    1 & 0 & 0\\
    0 & 1 & 0\\
    0 & 0 & 1\end{bmatrix}
   $$

## Matrix linear independence

A matrix $\boldsymbol{A}_{U \times V}$ can be considered as $V$ rows of $U$ vectors $\vec{a}_1$, $\vec{a}_2$, ..., $\vec{a}_v$

$$
\boldsymbol{A}_{U \times V} = \begin{bmatrix} 
    a_{11} & a_{12} & ... & a_{1v} \\
    a_{21} & a_{22} & ... & a_{2v} \\
    ... & ... & ... & ... \\
    a_{u1} & a_{u2} & ... & a_{uv} \end{bmatrix} = \begin{bmatrix} 
    \vec{a}_1 & \vec{a}_2 & ... & \vec{a}_v \end{bmatrix} \ \ \ \ \ \text{where} \ \ \ \vec{a_j} = \begin{bmatrix} 
    a_{1j} \\ a_{2j} \\ ... \\ a_{uj} \end{bmatrix}
   $$

For the set of the column vectors in the matrix, this matrix is called **linear dependent** if there exists constants $\beta_1$, $\beta_2$, ..., $\beta_v$, such that not all $\beta_i$ are 0, and

$$
\beta_1 \vec{a_1} + \beta_2 \vec{a_2} + ... \beta_v \vec{a_v} = 0
$$

In other words, at least one column vector in the matrix can be a linear combination of other vectors in the matrix. 

Otherwise, the matrix is called **linear independent**.

We can also call this set of $v$ linear independent vectors as a **basis**.

## Orthogonal and orthonormal matrix
For a linear independent matrix, if every column vector is orthogonal to each other $\vec{a_i}^T \vec{a_j}=0$ for $i \ne j$, then we call this matrix an orthogonal matrix. 

If the norm of every column vector in the matrix equals 1,  $||\vec{a_i}|| = 1$, and each column vector is orthogonal to each other, then we call this matrix an **orthonormal matrix**, or an **orthonormal basis**.  

For instance, the matrix $\boldsymbol{T}$ is an orthonormal basis. 

$$
\boldsymbol{T} = \begin{bmatrix} 
1/\sqrt{3} & 1/\sqrt{3} & 1/\sqrt{3} \\
1/\sqrt{2} & -1/\sqrt{2} & 0\\
-1/\sqrt{6} & -1/\sqrt{6} & 2/\sqrt{6}\end{bmatrix}
$$

## Representing Data with Matrix

Matrices are commonly used to represent data. For instance

- A dataset with $p$ variables and $N$ samples will be represented as a $N \times p$ matrix

![matrix](./img/data_matrix.ppm)

### Image dataset

![matrix](./img/image_matrix.png)

### Correlation matrix


<div>
<img src="./img/correlation_matrix.png" width="700"/>
</div>


## Matrix Operations

Similar to the operations we can do to vectors, we can also do them to matrices. 

We will walk them one by one in the following:

### Matrix transpose

Matrix **transpose** is to interchange the columns and rows for the matrix. For instance

$$
\boldsymbol{A}_{3 \times 2} = \begin{bmatrix} 
1 & 2 \\
3 & 4 \\
5 & 6 \end{bmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \boldsymbol{A}^T_{2\times 3} = \begin{bmatrix} 
1 & 3 & 5 \\
2 & 4 & 6 \end{bmatrix}
$$

The transpose of a matrix of size $U \times V$ will have size $V \times U$

$$
\boldsymbol{B}= \boldsymbol{A}^T \ \ \ \ \text{then} \ \ \ \ \ b_{ij} = a_{ji}
$$

If $\boldsymbol{A}$ is symmetric, then $\boldsymbol{A}^T = \boldsymbol{A}$, and vice versa

### Matrix trace

For a square matrix $\boldsymbol{A}_{U \times U}$, the **trace** of this matrices is the sum of its diagonal entries.

$$
tr(\boldsymbol{A}_{U \times U}) = \sum_{i=1}^U a_{ii}
$$

$$
tr(\begin{bmatrix} 
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9 \end{bmatrix}) = 1+5+9 = 15
$$

## Matrix Addition

Matrix **addition** can be applied to matrices with the same order/size. If two matrices have different order, they can not be added. 

When applying matrix addition, simply add each entry at the same position for both matrices. 

$$
\boldsymbol{A}_{U \times V} + \boldsymbol{B}_{U \times V} = \boldsymbol{C}_{U \times V} \ \ \ \ \ \ \ a_{ij} + b_{ij} = c_{ij}
$$

$$
\begin{bmatrix} 
1 & 2 \\
3 & 4 \\
5 & 6 \end{bmatrix} + \begin{bmatrix} 
-1 & 5 \\
4 & 2 \\
9 & 0 \end{bmatrix} = \begin{bmatrix} 
1-1 & 2+5 \\
3+4 & 4+2 \\
5+9 & 6+0 \end{bmatrix} = \begin{bmatrix} 
0 & 7 \\
7 & 6 \\
14 & 6 \end{bmatrix}
$$


## Matrix addition rules

For matrix addition, we will have 

1. $\boldsymbol{A} + \boldsymbol{B} = \boldsymbol{B} + \boldsymbol{A}$
2. $\boldsymbol{A} + (\boldsymbol{B} + \boldsymbol{C}) = (\boldsymbol{A} + \boldsymbol{B}) + \boldsymbol{C}$
3. $(\boldsymbol{A}^T)^T = \boldsymbol{A}$, $(\boldsymbol{A}+\boldsymbol{B})^T = \boldsymbol{A}^T + \boldsymbol{B}^T$

### Matrix scaler multiplication

Similar to vector scaler multiplication, matrix scaler multiplication will multiply each entry in the matrix with the scaler. 

$$
c \boldsymbol{A}_{U \times V} = c\begin{bmatrix} 
a_{11} & ... & a_{1v} \\
... & ... & ... \\
a_{u1} & ... & a_{uv} \\ \end{bmatrix} = \begin{bmatrix} 
ca_{11} & ... & ca_{1v} \\
... & ... & ... \\
ca_{u1} & ... & ca_{uv} \\ \end{bmatrix}
$$

## Matrix multiplication

Matrices ($\boldsymbol{A}_{U \times V}$ and $\boldsymbol{B}_{V \times W}$) can be multiplied only when the number of columns of the left matrix is equal to the the number of rows of the right matrix, otherwise they can not be multiplied together. 

The result of the matrix multiplication (if mutipliable) will have the number of rows equal to the number of rows of the left matrix, and have the number of columns equal to the number of columns in the right matrix. 

$$
\boldsymbol{A}_{U \times V} \boldsymbol{B}_{V \times W} = \boldsymbol{C}_{U \times W}
$$

$$
\boldsymbol{A}_{2 \times 3} \boldsymbol{B}_{3 \times 4} = \boldsymbol{C}_{2 \times 4} \ \ \ \ \ \ \ \text{Multipliable}
$$
$$
\boldsymbol{A}_{3 \times 5} \boldsymbol{B}_{3 \times 3} \ \ \ \ \ \ \ \text{Not Multipliable}
$$
$$
\boldsymbol{A}_{1 \times 10} \boldsymbol{B}_{10 \times 1} = \boldsymbol{C}_{1 \times 1} \ \ \ \ \ \ \ \text{Multipliable}
$$

## Matrix multiplication

The result multiplication matrix $\boldsymbol{C} = \boldsymbol{A}  \boldsymbol{B}$ will have each entry $c_{ij}$ being the inner product of the $i$th row of the left matrix and the $j$the column of the right matrix. 

$$
c_{ij} = \vec{A_i^T} \vec{B_j} = \sum_{k=1}^V a_{ik}b_{kj}
$$

![multiply](./img/multiply_matrices.gif)

## Matrix multiplication practice

For the multiplication the matrix multiplication $\boldsymbol{C} = \boldsymbol{A} \boldsymbol{B}$

1. Is this multipliable?
2. What is the order/size of $\boldsymbol{C}$ ?
3. Calculate $\boldsymbol{C}$.

$$
\boldsymbol{A}_{3 \times 2} = \begin{bmatrix} 
1 & 4 \\
3 & 1 \\
-1 & 0 \end{bmatrix} \ \ \ \  \ \ \ \ \ \ \ \ \boldsymbol{B}_{2 \times 4} = \begin{bmatrix} 
-1 & 2 & 0 & 1\\
1 & 0 & 1 & 4 \end{bmatrix} 
$$

### Rules for matrix multiplication
1. $\boldsymbol{A}\boldsymbol{B} \neq \boldsymbol{B}\boldsymbol{A}$
2. $\boldsymbol{A}(\boldsymbol{B}+ \boldsymbol{C}) = \boldsymbol{A}\boldsymbol{C}+\boldsymbol{A}\boldsymbol{B}$
3. $\boldsymbol{A}(\boldsymbol{B}\boldsymbol{C})=(\boldsymbol{A}\boldsymbol{B})\boldsymbol{C}$
4. $(\boldsymbol{A}\boldsymbol{B})^T = \boldsymbol{B}^T \boldsymbol{A}^T$,  $(\boldsymbol{A}\boldsymbol{B}\boldsymbol{C})^T = \boldsymbol{C}^T \boldsymbol{B}^T\boldsymbol{A}^T$, ...
5. $\boldsymbol{A}\boldsymbol{I} = \boldsymbol{A}$ if multipliable, and similarly $\boldsymbol{I}\boldsymbol{A} = \boldsymbol{A}$ if multipliable

## Matrices as linear transformation

Any $U \times V$ matrix $\boldsymbol{A}$ can be considered as linearly transforming a $V \times 1$ vector $x$ to another $U \times 1$ vector $y$.

$$
y_{U \times 1} = \boldsymbol{A}_{U \times V} x_{V\times 1}
$$

For instance, if we want to transform a vector of samples for random variable $X$ as $x_{N \times 1}$ to a scaler of sample mean $\bar{X}_{1 \times 1}$.

We can do the following.

$$
[\frac{1}{N}\ ...\ \frac{1}{N}]_{1 \times N}
\begin{bmatrix} 
x_1 \\
... \\
x_N \end{bmatrix}_{N \times 1} = \frac{1}{N}\sum x_i = \bar{X}_{1\times 1}
$$

In this case the matrix $[\frac{1}{N}\ ...\ \frac{1}{N}]_{1 \times N}$ is the matrix that apply the linear transformation of the vector $x_{N\times 1}$ to the vector $\bar{X}_{1\times 1}$

## Quadratic Forms

Given a symmetric matrix $\boldsymbol{A}_{U \times U}$ and a $U$ vector $x$, a **quadratic form** is defined as

$$
\begin{align}
x^T \boldsymbol{A} x &= \sum_{i=1}^{U}\sum_{j=1}^V a_{ij} x_ix_j
\\
&= a_{11}x_1^2 + a_{22}x_2^2 + a_{33}x_3^2 + ... + a_{UU}x_U^2 + 2a_{12}x_1x_2 + 2a_{13}x_1x_3 + ... + 2a_{(U-1)U}x_{U-1}x_U
\end{align}
$$

If $x^T \boldsymbol{A} x > 0$ for any $x$ except the 0 vector, $\boldsymbol{A}$ is said to be **positive definite**

If $x^T \boldsymbol{A} x \ge 0$ for any $x$ except the 0 vector, $\boldsymbol{A}$ is said to be **positive semi-definite**

## Matrix Inverse

Matrix inverse is one of the most important operation on a matrix. 

What is an inverse? 

Inverse of a number $x$ is denoted as $x^{-1}$ which is  simple $x^{-1} = \frac{1}{x}$, and we would have $xx^{-1}=1$.

What is the inverse of a matrix $\boldsymbol{A}^{-1}$? It must hold the property such that $\boldsymbol{A}^{-1}\boldsymbol{A}=\boldsymbol{I}$ and $\boldsymbol{A}\boldsymbol{A}^{-1}=\boldsymbol{I}$, where $\boldsymbol{I}$ is the identity matrix. Note that non-square matrices do not have inverse. 

For instance

$$
\boldsymbol{A} = \begin{bmatrix} 
1 & 3 \\
2 & 1 \end{bmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \boldsymbol{A}^{-1} = \begin{bmatrix} 
-1/5 & 3/5 \\
2/5 & -1/5 \end{bmatrix}
$$

Now verify that $\boldsymbol{A}\boldsymbol{A}^{-1} = \boldsymbol{I}$ and $\boldsymbol{A}^{-1}\boldsymbol{A} = \boldsymbol{I}$.

## Matrix Inverse

Some times, $\boldsymbol{A}^{-1}$ may not exists. When this is the case, the matrix $\boldsymbol{A}$ is said to be **singular** or **non-invertible**. 

If $\boldsymbol{A}^{-1}$ exists, then the matrix $\boldsymbol{A}$ is said to be **invertible** or **non-singular**

How do we determine if a matrix is invertible or not? 

We need to calculate **determinant** of the matrix $det(\boldsymbol{A})$ or $|\boldsymbol{A}|$. 

Such that if $|\boldsymbol{A}| = 0$ then $\boldsymbol{A}$ is non-invertible, and if $|\boldsymbol{A}| \neq 0$ then $\boldsymbol{A}$ is invertible.

## Matrix determinant

Each square matrix $\boldsymbol{A}_{U \times U}$ has a determinant $det(\boldsymbol{A})$.

How do we calculate determinant? 

It is simple for a $2 \times 2$ matrix. 

$$
\boldsymbol{A} = \begin{bmatrix} 
a & b \\
c & d \end{bmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ det(\boldsymbol{A}) = ad - bc
$$

Now compute the determinant of the following matrix, and determine if this matrix is invertible or not

$$
\boldsymbol{A} = \begin{bmatrix} 
1 & -2 \\
2 & -4 \end{bmatrix} 
$$

## Matrix determinant for a 3x3 matrix

For a 3x3 matrix, it is a little bit more complex. 


$$
\boldsymbol{A} = \begin{bmatrix} 
a & b & c \\
d & e & f \\
g & h & i
\end{bmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ det(\boldsymbol{A}) = (aei+dhc+gbf) - (efg+fha+ibd)
$$

Now compute the determinant of the following matrix, and determine if this matrix is invertible or not

$$
\boldsymbol{A} = \begin{bmatrix} 
1 & -2 & 0 \\
2 & 1 & -2\\
0 & -2 & 1\end{bmatrix} 
$$

## Finding inverse of matrix 2x2 matrix

For a 2x2 matrix $\boldsymbol{A}$, its inverse $\boldsymbol{A}^{-1}$ is easy to find

$$
\boldsymbol{A} = \begin{bmatrix} 
a & b \\
c & d \end{bmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \boldsymbol{A}^{-1} = \frac{1}{det(\boldsymbol{A})}\begin{bmatrix} 
d & -c \\
-b & a \end{bmatrix} = \frac{1}{ad-bc}\begin{bmatrix} 
d & -b \\
-c & a \end{bmatrix}
$$


## Finding inverse of matrix using minor, cofactor and adjugate

The first step for finding the inverse of the matrix is to find its **minor matrix** $\boldsymbol{M}$. 

How?

For each entry $a_{ij}$ in the matrix $\boldsymbol{A}$, the minor for this entry is the determinant of the matrix after removing the $i$th row and the $j$th column. 

![matrix_inverse](./img/matrix_minors1.png)

## Cofactor matrix

After found the minor matrix for matrix $\boldsymbol{A}$, the next step is to find its cofactor matrix $\boldsymbol{C}$.

The cofactor matrix is by changing the sign of the entries in the minor matrix by multiplying $(-1)^{i+j}$ to the entry $m_{ij}$ in the minor matrix $\boldsymbol{M}$, such that $c_{ij} = (-1)^{i+j}m_{ij}$

You can also check the checkerboard for determining sign of entries in the cofactor matrix.

![matrix_inverse](./img/matrix_cofactor.svg)

## Adjugate matrix and the inverse matrix

The adjugate matrix is just the transpose of the cofactor matrix $\boldsymbol{C}^T$.

And the last step is to multiply the adjugate matrix with the inverse of the determinant of the matrix $\frac{1}{det(\boldsymbol{A})}$. 

$$
\boldsymbol{A}^{-1} = \frac{1}{det(\boldsymbol{A})} \boldsymbol{C}^T
$$

## Matrix inverse practice

Compute the inverse of the following matrix 

$$
\boldsymbol{A} = \begin{bmatrix} 
1 & -2 & 0 \\
2 & 1 & -2\\
0 & -2 & 1\end{bmatrix} 
$$

## Rules of matrix inverse

1. If $\boldsymbol{A}$ is symmetric, then $\boldsymbol{A}^{-1}$ is also symmetric.
2. The inverse of matrix transpose is the transpose of the inverse $(\boldsymbol{A}^T)^{-1} = (\boldsymbol{A}^{-1})^T$
3. The inverse of the matrices product is the product of matrices inverse in opposite direction.
   $$
   (\boldsymbol{A}\boldsymbol{B})^{-1} = \boldsymbol{B}^{-1}\boldsymbol{A}^{-1}
   $$
   $$
   (\boldsymbol{A}\boldsymbol{B}\boldsymbol{C})^{-1} = \boldsymbol{C}^{-1}\boldsymbol{B}^{-1}\boldsymbol{A}^{-1}
   $$
4. The inverse of scalar times a matrix is the scaler inverse times the matrix inverse
   $$
   (c\boldsymbol{A})^{-1} = \frac{1}{c}\boldsymbol{A}^{-1}
   $$
5. The inverse of a digonal matrix is also a digonal matrix with its digonal entries being inversed.
   $$
    (\begin{bmatrix} 
a_{1} & ... & 0 \\
... & ... & ...\\
0 & ... & a_{u}\end{bmatrix})^{-1} = \begin{bmatrix} 
1/a_{1} & ... & 0 \\
... & ... & ...\\
0 & ... & 1/a_{u}\end{bmatrix}
   $$