In [9]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

## Basic setup

Create anaconda environnement
<br>
```bash
conda create -n ml python=3.7.4 jupyter
```
Install fastai library
<br>
```bash
conda install -c pytorch -c fastai fastai
```

# Special types of matrices

Identity matrix $I_{n} \in \mathbb{R}^{n \times n}$ is a matrix which does not changes other matric after multiplication. This kind of matrices contain ones on main diagonal and zero everywhere else 
$$\begin{align} I_{n} &= \begin{pmatrix}
           1, 0, \dots, 0 \\
           0, 1, \dots, 0 \\
           \vdots \\
           0, 0, \dots, 1 \\
         \end{pmatrix}
  \end{align}$$
<br>
or we can define it with property $\forall a \in \mathbb{R}^{1 \times n}$ holds $aI_{n} = a$ or $\forall a \in \mathbb{R}^{n \times 1}$ holds $I_{n}a =a$

In [1]:
import numpy as np

In [2]:
I = np.identity(4)

In [3]:
I

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [4]:
A = np.random.random(size=(4, 5))
B = np.random.random(size=(6, 4))

In [5]:
A

array([[0.16608384, 0.64658418, 0.33660896, 0.55163203, 0.83862164],
       [0.5902026 , 0.72837157, 0.95135543, 0.45924322, 0.81066017],
       [0.52679628, 0.02827689, 0.84247732, 0.51773502, 0.33968376],
       [0.35130613, 0.09634971, 0.3864694 , 0.07685268, 0.26896936]])

In [6]:
B

array([[0.98289049, 0.97048821, 0.09346472, 0.4949944 ],
       [0.75362206, 0.02645528, 0.139597  , 0.69335633],
       [0.54815407, 0.81031219, 0.10572942, 0.89742611],
       [0.85674399, 0.33898318, 0.49022799, 0.51391506],
       [0.00472914, 0.72224002, 0.95829003, 0.23703028],
       [0.2340194 , 0.83559035, 0.71281628, 0.73455152]])

In [7]:
I @ A

array([[0.16608384, 0.64658418, 0.33660896, 0.55163203, 0.83862164],
       [0.5902026 , 0.72837157, 0.95135543, 0.45924322, 0.81066017],
       [0.52679628, 0.02827689, 0.84247732, 0.51773502, 0.33968376],
       [0.35130613, 0.09634971, 0.3864694 , 0.07685268, 0.26896936]])

In [8]:
B @ I

array([[0.98289049, 0.97048821, 0.09346472, 0.4949944 ],
       [0.75362206, 0.02645528, 0.139597  , 0.69335633],
       [0.54815407, 0.81031219, 0.10572942, 0.89742611],
       [0.85674399, 0.33898318, 0.49022799, 0.51391506],
       [0.00472914, 0.72224002, 0.95829003, 0.23703028],
       [0.2340194 , 0.83559035, 0.71281628, 0.73455152]])

Inverse matrix of $A \in \mathbb{R}^{n \times n}$, is the matrix $A^{-1} \in \mathbb{R}^{n \times n}$ for which $A^{-1}A = I$

In [None]:
A = np.random.random(size=(8, 8))
A

In [None]:
invA = np.linalg.inv(A)
invA

In [None]:
Ia = invA @ A
print(Ia)

In [None]:
np.round(Ia)

## SVD

For eny $A \in \mathbb{R}^{n \times m} (\mathbb{C}^{n \times m})$ there exists decomposition:
$$A = U \Sigma V^{T}$$ 
where $U \in \mathbb{R}^{n \times n}(\mathbb{C}^{n \times n})$ is a square matrix, $\Sigma \in \mathbb{R}^{n \times m} (\mathbb{C}^{n \times m})$ ia a diagonal matrix and  $V \in \mathbb{R}^{n \times n} (\mathbb{C}^{n \times n})$ is also a square matrix
#### Note: We only discuss real valued vectors, matrices and tensors in this course

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
A, A.shape

In [None]:
U, S, V_T = np.linalg.svd(A)
U, S, V_T, U.shape, S.shape, V_T.shape

In [None]:
Sg = np.diag(S)
Sg, Sg.shape

In [None]:
np.linalg.norm(A), np.round(U[:, 0] @ U[:, 1].T), np.linalg.norm(U[2, :]), np.linalg.norm(V_T)

# THINGS TO ADD

- Special types of matrices (I, Diagonal, Inverse)
- Norm, Determinant
- Eigenvalues and Eigenvectors
- SVD
- SVD Calculation
- Topic modeling ?? (აქ LDA არის და არა PCA)
- PCA
- Faces

## Determinant as a scaling factor

Here is a link to the [video](https://www.youtube.com/watch?v=Ip3X9LOh2dk) about determinants

Given two vectors x,y in space and some transformation matrix A. 
If we multiply these vectors by given transformation matrix, we will get transformed vectors. 
Area value before and after transformation will be changed with exactly the value of determinant of A.

![SegmentLocal](images/la2/determinant_as_scaling_factor.gif)

## Eigenvalues and Eigenvectors

If we consider transformation matrix A again, and we take bunch of vectors (as much as possible) from given vector 
space where that matrix is doing transformations, we can find some vectors that never change their orientation but maybe they are scaled with some factor. 

Each such vector is called Eigenvector of matrix A which direction isn't affected by transformation, but scale is affected. Scale of that vector will be eigenvalue of that vector.

Each eigenvector has it's eigenvalue and these can be multiple because of several axis of transformation.

### Eigenvalues and Eigenvectors example

![SegmentLocal](images/la2/eigenvalues_and_vectors.gif)

### Non-Eigenvalues and Eigenvectors example

![SegmentLocal](images/la2/non_eigenvalues_and_vectors.gif)

# Matrix Decompositions

[fastai LA](https://nbviewer.jupyter.org/github/fastai/numerical-linear-algebra/blob/master/nbs/1.%20Why%20are%20we%20here.ipynb#Matrix-Decompositions)

[advanced matrix decompositions](https://sites.google.com/site/igorcarron2/matrixfactorizations)

[nfm tutorial](https://perso.telecom-paristech.fr/essid/teach/NMF_tutorial_ICME-2014.pdf)

[topic modeling](https://medium.com/@nixalo/comp-linalg-l2-topic-modeling-with-nmf-svd-78c94330d45f)

[background removal using svd](https://medium.com/@siavashmortezavi/fast-randomized-svd-singular-value-decomposition-using-pytorch-and-gpus-46b627511a6d)

## SVD

short [tutorial](http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm) on SVD from MIT

very interesting medium [blog](https://medium.com/@jonathan_hui/machine-learning-singular-value-decomposition-svd-principal-component-analysis-pca-1d45e885e491) on SVD & PCA 

![title](images/la2/svd_steps.jpeg)

## PCA

### Data Reduction
- PCA is most commonly used to condense the information contained in a large number of original variables into a smaller set of new composite dimensions, with a minimum loss of information.

[Example](https://www.projectrhea.org/rhea/index.php/PCA_Theory_Examples) of using PCA on image compression

#### Mapping of 2D points into 1D. 
PCA Takes the most optimal 1d axis to save data information better, reducing memory by factor of 2. 

![SegmentLocal](images/la2/pca_1d.gif)

### Interpretation
- PCA can be used to discover important features of a large data set. It often reveals relationships that were previously unsuspected, thereby allowing interpretations that would not ordinarily result.
PCA is typically used as an intermediate step in data analysis when the number of input variables is otherwise too large for useful analysis.

[example](https://towardsdatascience.com/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python-8ef87e7915b) usage of pca (and t-SNE) for data visualization

[example](https://github.com/aviolante/sas-python-work/blob/master/tSneExampleBlogPost.ipynb) notebook for comparing PCA and t-SNE for visualizing MNIST data 

# Additional Materials

[jupyter-notebook tips&tricks&shortcuts](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)