## Introduction to Linear Algebra for ML
[Youtube Video by TensorFlow](https://www.youtube.com/watch?v=LlKAna21fLE&ab_channel=TensorFlow)

### Main Topics
<ol>
    <li>Data Representations: representing data in vectors so computers understand</li>
    <li>Vector Embeddings: choosing representations via matrix factorizations</li>
    <li>Dimensionality Reduction: large dimensional data using linear maps (eigenvectors and eigenvalues</li>

</ol>

## 1. Data Representation
Organize information into a vector as computers understand numbers best. </br>

A <b>Vector</b> is a 1-dimensional array of numbers which has both a magnitude and a direction.<\br>

<b>n-dimensional vector space</b> is the totality of all vectors with n entries.

![3-dimensional space](img/3-dimensional.png)

<b>IN ML: Feature Vectors</b> is a vector whose entries represent the features of some object
![Feature Vectore Example](img/feature_vector.png)

### 1.1 Representing Images
**Black & White pixels** correspond to 0s & 1s. grascale pixels are numbers between 0 and 255. Transforming the matrix of 0s & 1s into a 1-dimensional array of numbers requires stacking the rows.

![Images to 1-dimensional Arrays](img/images.png)

### 1.2 Words and Documents
Given a collection of documents, assign the number of times the word shows in a document to the $i^{th}$ entry of the word's vector.

**Example**

Say that you have wiki articles (wiki#1, wiki#2, wiki#3,....) and you want to produce a vector for the word $dog$. The word $dog$ doesn't show in articles 1,3, and 4. However $dog$ shows 7 times in wiki#2, and 51 times in wiki #5.

$$
dog =
\begin{bmatrix}
  0 \\
  7 \\
  0 \\
  0 \\
  51 \\
  \vdots \\
\end{bmatrix}
$$

### 1.3 Yes/ No or Ratings
given a list of movies, vectors can indicate if a user has interacted with the "movie' (1= yes, 0= no) or a user givign a rating between 0 and 5. 

For yes and no (binary 1= yes, 0= no)
$$
User1 =
\begin{bmatrix}
  0 \\
  1 \\
  0 \\
  0 \\
  0 \\
  \vdots \\
  1 \\
\end{bmatrix}
$$

$Or$ in case of 0 to 5 ratings:

$$
User2 =
\begin{bmatrix}
  0 \\
  5 \\
  0 \\
  3 \\
  \vdots \\
  0 \\
  2 \\
\end{bmatrix}
$$

### 1.4 Non-Numerical Data
**One-Hot Encodings:**
Assign to each word a vector with 1 and 0s elsewhere (or "standard basis vector"). So the vector has all 0s except for one 1. 

Lets say you have 4 words: apple, cat, house, and tiger. The vectors associated with it are: 

$$
apple =
\begin{bmatrix}
  1 \\
  0 \\
  0 \\
  0 \\
\end{bmatrix}
$$

$$
cat =
\begin{bmatrix}
  0 \\
  1 \\
  0 \\
  0 \\
\end{bmatrix}
$$

$$
house =
\begin{bmatrix}
  0 \\
  0 \\
  1 \\
  0 \\
\end{bmatrix}
$$
$$
tiger =
\begin{bmatrix}
  0 \\
  0 \\
  0 \\
  1 \\
\end{bmatrix}
$$

### 1.5 Drawbacks
<li>These vectors can be sparse (having lots of zeros)</li>
<li>Possible lack of meaningful relationship (one-hot encodings are never "similar")</li>

### Dot Product
The product of vectors is not another vector. It is used as a similarity measure.
<h3 style="text-align:center">
$
\begin{bmatrix}
  1 \\
  0 \\
  3 \\
\end{bmatrix} $
$\cdot$ $
\begin{bmatrix}
  7 \\
  2 \\
  -1 \\
\end{bmatrix}
$
$=(1)(7) + (0)(2) + (3)(-1) = 4$
</h3>


## 2. Vector Embeddings
An embedding of a vector is another vector in a smaller dimensional space.

Example:
Replace
$
\begin{bmatrix}
  * \\
  * \\
  * \\
  \end{bmatrix}
$ With $
\begin{bmatrix}
  * \\
  * \\
  \end{bmatrix}
$

### 2.1 Matrix Factorizations
A **matrix** is a 2-dimensional array of numbers. It represents a particular process of turning one vector into another: stretching, rotating, scaling ...

A **Matrix** represents a **transformation** of an entire vector space to another (possibly of different dimensions).

**Factorization** undo multiplication. 
![Factorization](img/factorization.png)

**Singular Value Decomposition (SVD)**
Every nxm matrix (n rows, m columns) can be written as a product of three smaller matrices. 

Use **SVD** to find smaller matrices U and V whose product is close to the original matrix. Where Columns and Rows of U and V are candidates for embeddings.
![SVD](img/svd.png)

### 2.2 Neural Netrworks
Feed data vector into a neural network. The output is a vector embedding.
The goal is: compress high dimensional data into a smaller-dimensional more meaningful subspace. Doing without losing too much information.

## 3. Dimensionality Reduction
Eigenvectors (principal components: a transformation for which some vectors never change direction, but change scale. Those vectors are colled **Eigenvectors** and the scaling factor is called **Eigenvalue**

**Eigenvectors** help in data science and ML by encoding valuable information

Checkout [3Blue1Brown](https://www.youtube.com/watch?v=PFDu9oVAE-g&ab_channel=3Blue1Brown) on Eigenvectors and eigenvalues

Given matrix M $
\begin{bmatrix}
  3 & 1 \\
  0 & 2 \\
\end{bmatrix} $ and eigenvector V $
\begin{bmatrix}
  -1 \\
  1 \\
\end{bmatrix}
$. The Eigenvalue is then 2 as below

<h3 style="text-align:center">
$
\begin{bmatrix}
  3 & 1 \\
  0 & 2 \\
\end{bmatrix} $
$\cdot$ $
\begin{bmatrix}
  -1 \\
  1 \\
\end{bmatrix}
$$= 
\begin{bmatrix}
  -2 \\
  2 \\
\end{bmatrix}
$$= 2 \begin{bmatrix}
-1 \\
1 \\
\end{bmatrix}$
</h3>

