**CS596 - Machine Learning**
<br>
Date: **2 September 2020**


Title: **Lecture 2: Appendix A**
<br>
Speaker: **Dr. Shota Tsiskaridze**
<br>
Teaching Assistant: **Levan Sanadiradze**

<h1 align="center">Appendix A</h1>

<h3 align="center">System of Linear Equations</h3>

- In **Linear Algebra** we typically use **different notation**.


- A general system of $m$ **linear equations** with $n$ **unknowns** can be written as:
  
$$
\begin{matrix}
a_{11} x_1 + a_{12} x_2 + \cdots + a_{1n} x_n = b_1\\ 
a_{21} x_1 + a_{22} x_2 + \cdots + a_{2n} x_n = b_2\\  
\vdots \\
a_{m1} x_1 + a_{12} x_2 + \cdots + a_{mn} x_n = b_m
\end{matrix}
$$

  where $x_j$ are the unknowns, $a_{ij}$ are the coefficients of the system, and $b_i$ are the constant terms.
<br>

- We can write this system of linear equations in the equivalent matrix form:

  $$A \mathbf{x} = \mathbf{b},$$

  where $A = 
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n}\\
a_{21} & a_{22} & \cdots & a_{1n}\\ 
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} &\cdots & a_{mn}\\
\end{bmatrix} \in \mathbb{R}^{m \times n}$, 
$\mathbf{x} = \begin{bmatrix}
x_{1}\\ 
x_{2}\\ 
\vdots\\
x_{n}
\end{bmatrix}  \in \mathbb{R}^{n}$ and $\mathbf{b} = \begin{bmatrix}
b_{1}\\ 
b_{2}\\ 
\vdots\\
b_{m}
\end{bmatrix}  \in \mathbb{R}^{m}$.

<h3 align="center">Solving a linear system</h3>

- There are **several algorithms** for solving a system of linear equations:
  - **Elimination of variables**. The simplest method for solving a system of linear equations by repeatedly eliminating the variables.
  
  
- **Gaussian elimination**. The matrix is modified using elementary row operations until it reaches reduced **Row Echelon Form**:
 - **Type 1:** Swap the positions of two rows;
 - **Type 2:** Multiply a row by a nonzero scalar;
 - **Type 3:** Add to one row a scalar multiple of another.


- **Cramer's rule**. An explicit formula for the solution of a system of linear equations, with each variable given by a fraction of two determinants;


- **Matrix solution**: If the matrix $A$ is square and has full rank then the system has a unique solution given by $x = A^{-1}b$.

<h3 align="center">Column Space</h3>

- Let $A$ be an $m \times n$ matrix, with column vectors $\mathbf{v}_1, \mathbf{v}_2, \cdots, \mathbf{v}_n$, where $\mathbf{v}_i \in \mathbb{R}^m$.


- The **set** of **all possible linear combinations** of column vectors $\mathbf{v}_1, 
\cdots, \mathbf{v}_n$ is called the **column space**, $C(A)$:

  $$\mathbf{v} = \alpha_1 \mathbf{v}_1 + \cdots + \alpha_n \mathbf{v}_n,$$

  where $\alpha_1, ..., \alpha_n \in \mathbb{R}$ are the scalars.


- Any **linear combination of the column vectors** can be written as the **product of $A$** with a **column vector**:

$$\mathbf{v} = 
\alpha_1
\begin{bmatrix}
a_{11}\\ 
\vdots\\
a_{m1}
\end{bmatrix}
+ \cdots + \alpha_n
\begin{bmatrix}
a_{1n}\\ 
\vdots\\
a_{mn}
\end{bmatrix}
=
\begin{bmatrix}
\alpha_1 a_{11} + \cdots + \alpha_n a_{1n}\\
\alpha_1 a_{21} + \cdots + \alpha_n a_{2n}\\
\vdots \\
\alpha_1 a_{m1} + \cdots + \alpha_n a_{mn}
\end{bmatrix}
=
\begin{bmatrix}
a_{11} & \cdots & a_{1n}\\
\vdots & \vdots & \ddots\\
a_{m1} & \cdots & a_{mn}\\
\end{bmatrix}
\begin{bmatrix}
\alpha_{1}\\ 
\vdots\\
\alpha_{n}
\end{bmatrix}
= A 
\begin{bmatrix}
\alpha_{1}\\ 
\vdots\\
\alpha_{n}
\end{bmatrix}
$$ 

- Therefore, the **column space** of $A$ is the same as the **range** of the corresponding matrix **transformation**.

<h3 align="center">Projection onto Column Space</h3>

- Let's consider a **system of linear equations** in the matrix form $A \mathbf{x} = \mathbf{b}$.


- If $\mathbf{b} \notin C(A)$, then the system does not have a solution.


- We can find an **approximate solution** by projecting $\mathbf{b}$ onto $C(A)$. 


- Let's multiply both sides by $A^T$:

  $$A^TA \mathbf{x} = A^T\mathbf{b},$$

  i.e. we've got the system of **Normal Equation**.
  

- If $A^TA$ has inverse, then we can **find solution analytically**, by simply solving the system of equations:

  $$\hat{\mathbf{x}} = (A^TA)^{-1}A^T \mathbf{b} \equiv A^{+}\mathbf{b} .$$


- $A^{+} = (A^TA)^{-1}A^T$ is also called the **Pseudoinverse** of the matrix $A$.

<h3 align="center">Invertability of $A^TA$</h3>

- When $A^TA$ is **not**  invertible?
  
  
- $A^T A$ is not invertible if some **columns** are **linearly dependent**.


- For example, when we have **too many features** $(m < n)$.


- **Solution**:
  - Remove the linear dependency;
  - Delete some features.

<h3 align="center">Regularization in case of Normal Equations</h3>

- As we expressed above, analytical solution of the Normal Equation is:

  $$\hat{\boldsymbol{\theta}} = (X^TX)^{-1}X^T\mathbf{y} = X^{+}\mathbf{y}.$$


-  We add **regularization term** as:

  $$\hat{\boldsymbol{\theta}} = (X^TX + \lambda E^{+})^{-1}X^T\mathbf{y} = X^{+}\mathbf{y},$$

  where $E^{+} \in \mathbb{R}^{(n+1)\times(n+1)}$ and is almost identity matrix:

$$
E^{+} = \begin{bmatrix}
0 & 0 & \cdots & 0\\
0 & 1 & \cdots & 0\\ 
0 & \vdots & \ddots & \vdots \\
0 & 0 &\cdots & 1
\end{bmatrix}.
$$


- Matrix $X^TX + \lambda E^{+}$ is **allways invertible**!
