# Introduction to Linear Algebra

### Why Linear Algebra
Linear algebra is a branch of mathematics that is widely used throughout science and engineering. A good understanding of linear algebra is essential not only for understanding and working of machine learning algorithms but almost all areas of mathematics.

In this module we will introduce basic concepts of linear algebra required for machine learning algorithms.


### What is Linear Algebra
Linear algebra concerns with linear equations, linear transformations (or maps) and their representations in vector space. We will be exploring these in the subsequent parts of the module.



Some applications of linear algebra:

![Why_Numpy.png](attachment:Why_Numpy.png)

# Basics of Linear Algebra

### Basic Elements

<ol>
<li> <b>Scalars</b> : A scalar is a single number, it is the most basic object in linear algebra. Mathematically, any $s ∈ R$ is called a scalar, where $R$ is set of real numbers. <br>Example: $3, 4.3, \pi, $ etc. 
    <br> </br> </li>
    <br> </br> 
    
<li> <b>Vectors</b>: A vector is an <b>ordered</b> 1-D array of scalars. We can identify each individual number by its index in that ordering. Mathematically, a n-dimensional vector is defined as $x ∈ R^{n}$. <br>Example: $\begin{vmatrix} 2 & 3 & 4\end{vmatrix} ,  \begin{vmatrix} -1.2 & 4 & 9 & 0 \end{vmatrix} $, etc.
<br> </br> </li>
<br> </br> 


<li> <b>Matrices</b>: A matrix is an <b>ordered</b> 2-D array of scalars, so each element is identified by two indices instead of just one. Mathematically, a m*n-dimensional matrix is defined as $x ∈ R^{m*n}$. <br>Example: $\begin{vmatrix} 1 & 2 & 4 \\  9.0 & -10 & 0.9 \end{vmatrix}$
<br> </br> </li>
<br> </br> 
<li> <b>Tensors</b>: A tensor is an <b>ordered</b> array of scalars arranged on a regular grid with a variable number of axes. Mathematically, a tensor is an extension of matrix to higher (>2) dimension spaces. 
<br>Example:  $\begin{vmatrix} \begin{vmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8  \end{vmatrix} \begin{vmatrix} -1 & -4 & -13 & 23 \\ 9 & 6 & 912 & 21 \end{vmatrix} \end{vmatrix}$
    <br> </br> 
</li>

</ol>

### Illustrations

##### Row Vector
![row_vector](row_vector.png) 


##### Column vector
![column_vector](column_vector.png) 


##### Rectangular Matrix
![matrix](matrix.png) 


##### Square Matrix
![square_matrix](square_matrix.png) 




### Properties of vectors:
<ol>
    <li> <b>Addition</b>: Addition of two vectors of same size results in vector of the same dimension. 
        </li>
    <li> <b>Multiplication</b>: Multiplication of a vector with a scalar results in another vector of same dimension.
        </li>     
    </ol>

![vector_addition](vector_addition.png) 


# Arthimetic Operation on matrices

<b>Elementwise operations</b>: Elementwise operatios are defined only for vectors or matrices of same dimensions. For two matrices $A$ and $B$ of same dimension $m*n$, the output matrix $C$ with following operations is defined as:
<ol>

<li><b>Addition</b>: \begin{equation} C[i,j] = A[i,j] + B[i,j],  where:   i ∈ [0,m-1], j ∈ [0,n-1] \end{equation}
    <br> </br> </li>

<li><b>Subtraction</b>: \begin{equation} C[i,j] = A[i,j] - B[i,j],  where:   i ∈ [0,m-1], j ∈ [0,n-1] \end{equation}
    <br> </br> </li>

<li><b>Multiplication</b>: Also called <b>Hadamard product</b>: \begin{equation} C[i,j] = A[i,j]*B[i,j],  where:   i ∈ [0,m-1], j ∈ [0,n-1] \end{equation}
    <br> </br> </li>

<li><b>Division</b>: \begin{equation} C[i,j] = A[i,j]/B[i,j],  where:   i ∈ [0,m-1], j ∈ [0,n-1] \end{equation} 
    <br> </br> </li>



<b>Dot Product</b>: 
<ol>
    <li> Dot product is the most common type of matrix multiplication used in mathematical practice.
        <br> </br> </li>
    <li> Thus the term matrix multiplication refers to dot product unless otherwise stated.
        <br> </br> </li>
    <li> Unlike Hadamard product dot product is defined for matrices $A$ and $B$ with dimenstions $m*n$ and $n*p$ respectively, i.e number of columns of first matrix should be equal to numbe of rows of second matrix. The output $C$ of dimension $m*p$ is defined as:
        \begin{equation} C[i,j] = \sum_{k=0}^{n-1}(A[i,k]*B[k,j]) \end{equation}
    <br> </br> </li>
    </ol>
    
<b> Properties of Dot Product</b>: 
<ol>
    <li> <b>Associative</b>:
        \begin{equation} (AB)C = A(BC) \\ 
        ∀ A ∈ R^{m*n}, B ∈ R^{n*p}, C ∈ R^{p*q}\end{equation}
        <br> </br> </li>
    <li> <b>Distributive</b>: 
        \begin{equation} (A+B)C = AC + BC \\
        C(C+D) = AC + AD \\
        ∀ A, B ∈ R^{m*n} and, C, D ∈ R^{n*p} \end{equation}
        <br> </br> </li>
    <li> <b>Not Commutative</b>:\begin{equation} AB \neq BA \end{equation}
        <br> </br> </li>
    </ol>


<b>Note:</b> The arthimetic multiplication and division are associative, distributive and commutative. The proof of the same can be done easily.

### Illustrations

##### Addition
![addition](addition.png) 


##### Subtraction
![subtraction](subtraction.png) 


##### Multiplication
![multiplication](multiplication.png) 


##### Division
![division](division.png) 


##### Dot Product
![dot_product](dot_product.png) 


##### Dot Product: Not Commutative
![dot_product_not_commutative](dot_product_not_commutative.png) 



# Numpy Tutorial

# Matrix Inversion

### Identity Matrix

<ol>
<li>Mathematically, an identity matrix is a matrix that does not change any vector when we multiply that vector by that matrix. We denote the identity matrix that preserves n-dimensional vectors as $I_n$. Formally, $I_n ∈ R^{n×n}$, and 
\begin{equation} ∀x ∈ R^n, I_n*x = x \end{equation}
    <br> </br> </li>

<li> Informally, an identity matrix is a matrix with all of the entries along the main diagonal as 1, while all of the other entries as zero.
    <br> </br> </li>

   </ol>


### Illustrations

##### Identity Matrix of 2 dimension
![identity_mat2](identity_mat2.png) 



##### Identity Matrix of 4 dimension
![identity_mat4](identity_mat4.png) 


### Matrix Inverse

The matrix inverse of a <b>square matrix</b> $A$ is denoted as $A^{-1}$, and <b>under constraints</b> it is defined as the matrix such that:
\begin{equation}A^{-1}*A = I_n = A*A^{-1}\end{equation}

<ol>
    <li> <b>Constraints:</b> Inverse is only defined for matrices with non-zero determinants. These matrices are called regular/invertible/nonsingular matrices, otherwise the matrix is called singular/noninvertible matrix. 
        <br> </br> </li>
    <li> For non-square matrices, pseudoinverses are defined. E.g. Moore-Penrose inverse
        <br> </br> </li>
    <li> Matrix inverse if exists, is unique. Can you prove it mathematically?
        <br> </br> </li>
    </ol>
            
        

### Matrix Inverse and Deteminent

Consider two matrices $A$ and $B$ defined as:

\begin{equation} A = \begin{vmatrix}a_{11} & a_{12} \\   a_{21} & a_{22}\end{vmatrix} \end{equation}
<br> </br>
\begin{equation} B = \begin{vmatrix}a_{22} & -a_{12} \\   -a_{21} & a_{11}\end{vmatrix} \end{equation}
<br> </br>
The product $A*B$ is:
\begin{equation} A*B = \begin{vmatrix} a_{11}a_{22} - a_{12}a_{21} & 0 \\  0 & a_{11}a_{22} - a_{12}a_{21} \end{vmatrix} =(a_{11}a_{22} - a_{12}a_{21}) \begin{vmatrix} 1 & 0 \\ 0 & 1 \end{vmatrix} \end{equation}

<br> </br>

Hence, $B$ is the inverse matrix of $A$ if and only if $det = a_{11}a_{22} - a_{12}a_{21}$ is not equal to 0. The variable $det$ is called the <b>determinant</b> of a $2*2$ matrix and we can use it to check if a matrix is invertible.




### Matrix Transpose

For $A ∈ R^{m*n}$ the matrix $A^T ∈ R^{n*m}$ defined by:

\begin{equation} A^T_{ij} = A_{ji} \end{equation} 
<br> </br>

is called the transpose of $A$. $A^T$ can be <b>obtained by writing the columns of $A$ as the rows of $A^T$<b>

### Illutstrations

##### Rectangular matrix and its transpose

![transpose_rectangle](transpose_rectangle.png) 



##### Square matrix and its transpose
![transpose_square](transpose_square.png) 


## Properties of matrix inverse:
<ol>
<li>\begin{equation} AA^{−1} = I = A^{−1}A \end{equation}
</li><br> </br>
<li>\begin{equation} (AB)^{−1} = B^{−1}A^{−1} \end{equation}
</li><br> </br>
<li>\begin{equation} (A + B)^{−1} = A^{−1} + B^{−1} \end{equation}
</li><br> </br>

<li>\begin{equation} (A^T)^T = A \end{equation}
</li><br> </br>

<li>\begin{equation} (A + B)^T = A^T + B^T \end{equation}
</li><br> </br>

<li>\begin{equation} (AB)^T = B^TA^T \end{equation}
</li><br> </br>

</ol>


# Special Operations: Norms, eigen values and eigen vectors

In [1]:
### Norm

Norm of a vector is the magnitude of vector. A vector can be defined by its norm and its direction. Example, you can define the gravitational force on any object by Earth with the magnitude and the direction of the force. 

There are several ways to compute the magnitude of a vector, the most common ones are: 
<ol>
    <li> <b>$L^2$ or Euclidean norm:</b> This is the similar to mean squared distance error measure commonly used in machine learning algorithms and is defined by:
        \begin{equation}  \lVert x \lVert_2 = \sqrt{\sum_{i=1}^n x^{2}_{i}} = \sqrt{x^T x} \end{equation}
        </li><br> </br>
    <li> <b>$L^1$ or Manhattan norm:</b> This is the similar to absolute squared distance error measure commonly used in machine learning algorithms and is defined by:
         \begin{equation}  \lVert x \lVert_1 = \sum_{i=1}^n \lvert x_{i} \lvert  \end{equation}
    </li><br> </br>
</ol>
    
Mathematically, a norm is defined by (measured under) any function $f(x)$ that satisfies the following conditions:
<ol>
    <li> $f(x) = 0 \implies x=0$
        </li><br> </br>
    <li> $f(x + y) ≤ f(x) + f(y)$, called the triangle inequality
        </li><br> </br>
    <li> $∀ \alpha \in R,f(\alpha x) = |\alpha|f(x)$
        </li><br> </br>
    </ol>

### Eigen Values and Eigen Vectors: 

We have talked about matrices as transformation of vectors from one space to another. These vectors are , however, represented by their magnitude and direction only. 


For any given matrix transformation on square matrix, there are some <b>special vectors which when transformed by matrix only change in magnitude and not in directions</b>. These vectors are called eigen vectors of the matrix and the corresponding change in magnitude is called the eigen value (of these eigen vectors).


Eigen values and eigen vectors contain <b>summary information of matrix<b> and used in noise reduction, search engines, calculus etc. 

Mathematically, let $A \in R^{n*n}$ be a square matrix. Then $x \in R^n$ is an eigenvector of $A$  and $ \lambda \in R$ it's corresponding eigenvalue if:
\begin{equation} A*x = \lambda * x \end{equation}
</li><br> </br>

Equivaletly:

<ol>
    <li> \begin{equation} (A - \lambda * I_n)x = 0 \end{equation}
       </li><br> </br> 
    <li> \begin{equation} determinant(A − \lambda *I_n) = 0 \end{equation}
       </li><br> </br> 
    <li> If $x$ is an eigen vector of $A$ with value $\lambda$  then $c*x$ is also an eigen vector of $A$ with same eigenvalues, for $c \in R$
    \begin{equation} A(cx) = cAx = c \lambda x = \lambda(cx) \end{equation}
  </ol>
</li><br> </br> 

Generally we use determinant (equation 2.) to compute the eigen values and then compute eigen vectors using eigen values (equation 1). 

####  Working Example

\begin{equation}
A = \begin{vmatrix} 2 & 2 \\ 1 & 3\end{vmatrix}\\
det(\begin{vmatrix} 2-\lambda & 2 \\ 1 & 3 - \lambda\end{vmatrix})=0\\
(2-\lambda)*(3-\lambda) - 2*1 = 0\\
\lambda^2 -5*\lambda + 4 = 0 \\
(\lambda - 4)(\lambda - 1) = 0\\
\end{equation}

$\lambda$ = 4, 1 are the eigen values

Corresponsing to eigen value of 1, the eigen vector is:
\begin{equation}
\begin{vmatrix} 2-1 & 2 \\ 1 & 3-1\end{vmatrix} * \begin{vmatrix} x_1 \\ x_2\end{vmatrix} = \begin{vmatrix}0\\0\end{vmatrix}\\
\end{equation}
which transform to following system of equations:

\begin{equation}
x_1 + 2x_2 = 0, x_1 + 2x_2 = 0\\
x_1 = -2x_2
\end{equation}

So, any vector satifying above will be a eigen vector, e.g. take a vector:
\begin{equation}
\begin{vmatrix} 2 & 2 \\ 1 & 3\end{vmatrix} * \begin{vmatrix} -2 \\ 1\end{vmatrix} = 1* \begin{vmatrix} -2 \\ 1 \end{vmatrix}
\end{equation}

The matrix transforms the eigen vector by keeping its direction intact and changing its magnitude by 1.

For any other vector:

\begin{equation}
\begin{vmatrix} 2 & 2 \\ 1 & 3\end{vmatrix} * \begin{vmatrix} 0 \\ 1\end{vmatrix} = \begin{vmatrix} 2 \\ 3 \end{vmatrix}
\end{equation}
The matrix changes the magnitude as well direction of the vector during transfromation.


# More special matrices

<ol>
<li><b>Unit Vector</b>: A unit vector is a vector with unit norm.
    </li><br> </br>
<li><b>Diagonal matrix</b>: Diagonal have non-zero entries only along the main diagonal and zero values at other indices.
    </li><br> </br>
<li><b>Symmetric matrix</b>: A matrix is symmetric if $A=A_T$.
    </li><br> </br>
</ol>