In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.datasets import load_breast_cancer
from mpl_toolkits.mplot3d import Axes3D
from IPython.display import display
from scipy import stats
from sklearn.datasets import fetch_olivetti_faces
from PIL import Image

# plt global style
plt.style.use('bmh')

In [None]:
# Functions we will use later

# Mean function with pd.sum
def mean(x):
    return x.sum(axis = None, numeric_only = True)/(len(x) - 1)

# Standard diviation function
def std(x):
    variance = ((x - mean(x))**2).sum(numeric_only = True) / (len(x) - 1)
    std_variation = np.sqrt(variance)
    return std_variation

# Standartize data by extracting the mean and deviding by std
def standartize_data(x):
    return (x - mean(x)) / std(x)


# Function that calculates the covariance matrix of a given dataset
def cov(x):
    return (x.T @ x)/(x.shape[0]-1)

# Function that gets only the numeric data from df
def numeric_only_iris(x):
    '''
    The function takes only the numeric data
    Acts just like "numeric_only=" from pd. 
    '''
    return x.select_dtypes(include = np.number)

# PCA. Theory, uses and implementation.

***

###### Work by: Momchil Georgiev

###### Personal email: m.georgieff.public@gmail.com

###### *Every work cited in this project has been acquired legally through public domains such as university websites, online publications and personal/company blogs.*

***

## Contents:

#### 1. Project Motive
#### 2. PCA and the theory behind it:

    2.1. Standardize the data

        2.1.1 Calculating the mean

        2.1.2 Calculating the variance

        2.1.3 Calculating standard deviation 

    2.2 Eigenvectors and eigenvalues of the covariance matrix

    
#### 3. Some other examples of PCA:

    3.1 Breast cancer dataset
    3.2 Eigenfaces

#### 4. Conclusion
    
#### 5. References

#### 6. Bibliography
 

## 1. Project motive
During my course in data science I had the pleasure to work on a short project about PCA. I wanted to share what I have done and perhaps "shine a little light" on the subject while giving my own spin. 

This notebook is mainly designed for people who have **just started** studying programming and data science. I have tried to explain everything as simple as I can and if it gets technical - there are always sources to read that explain it in more detail.

With that out of the way. Enjoy!

## 2. PCA and the theory behind it

<div style = "text-align:center">
    <img src = "./Media/ezgif-3-b43a62758a.gif" width="20%" height="20%">
</div>

***

Principal component analysis (PCA) put simply is an algorithm used for dimensionality reduction of data. The way it works is by projecting the n-dimensional data to smaller dimensions, say from 10D to 3D while trying to preserve most of the information. It is important to mention that *information* in this sense represents the variance between variables.

> The way I understand it, information is not so easy to define as it may seem at first. One way to think of information is variance: if you have a table with five athletes and their average speed in different elevations (0, 100 and 500m) and every single one of them has the same speed of 13 km/h it doesn't say much. But if they are different you start to see how every one of them does at those elevations. 

PCA is used a lot in statistics, medicine and chemistry for visualization, data optimization and cost reduction when doing sampling so one can guess it is very useful. Now let's see how it is made.

For starters we need to mention that there are a lot of ways to implement PCA (single value decomposition, kernel PCA, sparse PCA and many more)[<sup>\[1\]</sup>](#fn1). In this project we will be going with single value decomposition (SVD for short). This means we have to:

1. Standardize the data and find the covariance matrix
2. Find the eigenvalues and eigenvectors of the cov. matrix
3. Project the variables onto the principal components

### 2.1 Standardize the data
First thing we have to do is to standardize the data since PCA is very sensitive to large differences in values. While we are here we will center the mean at 0 for ease of use in the future. To begin we need to:

1. Calculate the mean of the data
2. Calculate the variance
3. Calculate the standard deviation

For the purposes of our explanation we will use the standard `iris` dataset. Let's import it, name it $D$ and take a look at it.

In [None]:
data = pd.read_csv("https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv")

# Set a copy of the dataset so we can maniplute it freely without messing up the original
D = numeric_only_iris(data)
print(f"Iris dataset shape: {D.shape}")
D.head()

We can see that $D$, has a 4D shape which we want to convert to 2D without loosing much information. So as mentioned, to begin standardizing the data we have the first get the `mean`.

#### 2.1.1 Calculating the mean

The mean is the center of our data. The formula for it is:

$$\bar{x} = \frac{1}{n - 1}\sum_{i=1}^{n}x_i$$

Which in our case is: 

In [None]:
mean(D)

#### 2.1.2 Calculating the variance
Variance means the distance between variables and the mean. The formula for it is:
$$S^{2}(x) = \frac{1}{n - 1} \sum_{i=1}^{n}(x_i - \bar{x})^2$$

While we are here we can see the formula for covariance. Covariance tells how the values of two variables change in relation to each other. The covariance *matrix* tells us the covariance of every variable with every other. Formula:

$$\text{cov}(x, y) = \frac{1}{n - 1} \sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})$$

The range of the function is $Cov(x, y) \in [-1, 1]$. Generally the output number tells us if $x$ and $y$ rise together (if it's >0), if one rises and the other falls (<0) or if they don't jointly vary (=0).

> We will see the value of the covariance matrix in the next point.


#### 2.1.3 Calculating standard deviation
Finally for std deviation we have:
$$S(x) = \sqrt{S^{2}(x)}$$

$S$ tells us how far a variable can be from the mean ($S(x)$ can be >0 or <0). We find that in the above plot the standard deviation is:

In [None]:
std(D)

> The difference between $S^2$ and $S$ is that $S^2$ is the average squared distance from the mean and $S$ is, of course, the square root of the distance.

And will use `zscoring` to standarize everything:
$$z = \frac{{x - \bar{x}}}{{S(x)}}$$
Let's see the first 5 results.

In [None]:
D = standartize_data(D)
D.head()

_Great!_ Now we can get the covariance matrix of the **standardized** variables.

In [None]:
Z = cov(D)
Z

### 2.2 Eigenvalues and eigenvectors of the covariance matrix

<div style="text-align:center">
    <img src="./Media/bench1.jpg" width="40%" height="20%">
</div>

***

Let's take a step back and put aside the calculations we did for a second so we can give an explanation what are these *eigen* thingies.

An eigenvector ($v$) is a vector that doesn't change its *span* when going through transformation, let's say of the matrix $A$. Eigenvalue ($\lambda$) is a scalar value that tells us how much the eigenvector has been scaled. We can define it as:

$$A v = \lambda v$$

This formula is really cool because we see that the product of $v$ and the transformation matrix $A$ equals the *same* vector, but **scaled** by $\lambda$. I think this animation shows it nicely (the red and green vectors don't change their span as the yellow one does, these are the eigenvectors):

<div style="text-align:center">
    <img src="./Media/eigen.gif" width="40%" height="20%">
</div>

> Generally when we use a transformation matrix on a vector we get new vector that has been multiplied by the transformation matrix and it looks like: $A v = v'$. But here the vector remains the same :o

Now we know what are eigenvectors and eigenvalues let's put them in use. It turns out that when we calculate the eigenvectors of a given covariance matrix, the vector that has the **largest eigenvalue** tells us the direction with the biggest variance (information)[<sup>\[2\]</sup>](#fn2)! Second biggest eigenvector is the one *orthogonal* to the first. 

Now is also a good time to mention that in PCA, the eigenvalues of the covariance matrix of the dataset are also called *principal components* (PCs).

Let's see this in our plot. We will:
1. Calculate the eigenvectors with `np.linalg.eig`
2. Sort them from highest to lowest,
3. Create separate variables for the vectors and values

In [None]:
# Calculate eigen- values and vectors of the cov. matrix Z
eigen_values, eigen_vectors = np.linalg.eig(Z)

max_abs_idx = np.argmax(np.abs(eigen_vectors), axis=0)
signs = np.sign(eigen_vectors[max_abs_idx, range(eigen_vectors.shape[0])])
eigen_vectors = eigen_vectors*signs[np.newaxis,:]

In [None]:
eigen_vectors = eigen_vectors.T

# Create a list of tuples with every eigenvector and its eigenvalue
eigen_pairs = [(np.abs(eigen_values[i]), eigen_vectors[i,:]) for i in range(len(eigen_values))]

# Sort values to raise accuracy
eigen_pairs.sort(key=lambda x: x[0], reverse=True)

# Set new variables for the sorted eigen values and vectors for future use
eigen_vals_sorted = np.array([x[0] for x in eigen_pairs])
eigen_vecs_sorted = np.array([x[1] for x in eigen_pairs])

Congrats. Now that we have the eigen- values and vectors of $\text{cov}(D)$ we have everything to start the dimensionality reduction process.

For starters we need to transform out dataset $D_{m\times k}$ into a matrix $X_{k\times n}$ where $n < k$. We would also need a projection matrix $P$ in which every row is an eigenvector of the covariance matrix of $D$. We would use these to calculate the dot product of $D$ and $P$. The expression should look like this:

$$X_{m\times n} = D_{m\times k} P_{m\times k}^T$$

> Note: Here projection and transformation look pretty similar. In short, projection is a transformation in which we reduce the dimension. Think about it - when we put a **3D** ball in front of a lamp the ball projects a **2D** image, a *shadow*. For more information check these publications (they are also in the bibliography)": 
- Shlens, Jonathon. A Tutorial on Principal Component Analysis, Google Research, Mountain View, CA 94043. Link: https://arxiv.org/pdf/1404.1100.pdf
- Xu, Yang. 10-701 Machine Learning (Spring 2012) - Principal Component Analysis. Link: https://www.cs.cmu.edu/~tom/10601_fall2012/slides/pca.pdf

Aaaand this should be it. Let's see it in action! 

In [None]:
# Projection
# R - num of dimensions
R = 2
P = eigen_vecs_sorted[:R]
X = D @ P.T

features = len(P.T)

print(f"Original shape matrix D: {D.shape}")
print(f"Our new matrix: {X.shape}")

*Awesome.* Now for the fun part - plotting! Let's see how much our PCs explain our variance by summing up all the eigenvalues and dividing them per PC.

In [None]:
# Calculate the variance that is explained by every single PC
total_eigen_values = sum(eigen_vals_sorted)
variance_per_pc = []

for i in eigen_vals_sorted:
    k = i / total_eigen_values
    variance_per_pc.append(k)

# Print the variance per PC
print(f"Variance per principal compenent:")

for i in range(len(eigen_vecs_sorted)):   
    print(f"PC{i + 1}: {variance_per_pc[i] * 100:.2f}%")

# Set x and y to number of components and percentage explained variance correspondently
x = np.arange(1, features + 1)
y = np.cumsum(variance_per_pc)

# Make the plot more clear to read and set labels and colors
plt.xticks(np.arange(1, features+1))
plt.xlabel("PC #")
plt.ylabel("% Explained variance")
plt.title("Cumalative sum of the explained variace")

# I hardcoded the colors cause I wanted these exact ones
plt.bar(x, y, color = ["#FFC300", 
                       "#FF5733", 
                       "#C70039", 
                       "#900C3F"])
plt.show()

Now let's see the projected variables onto a scatter plot so we can visualize the data. We will replace the names of the plants with numbers to see which species is where.

In [None]:
# Create a copy of the data
data_copy = data.copy()

# Set the numbers for the species
species_mapping = {"setosa": 0, "versicolor": 1, "virginica": 2}

# Replace the species names with numbers
data_copy["species"] = data_copy["species"].replace(species_mapping)

# Reverse the species mapping dictionary to get species names from numbers
species_names = {v: k for k, v in species_mapping.items()}

# Plot with legend. Here I use sns cause I find it 10x easier to plot the variables with their representing colors
plt.figure(figsize=(6, 4))
sns.scatterplot(x=X[0], y=X[1], hue=data_copy["species"], palette="Set1")

plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title(f"2 components, captures {y[1]:.2f} of total variation")

# Get the handles and labels of the legend
handles, _ = plt.gca().get_legend_handles_labels()

# Modify the legend labels using species names
legend_labels = [species_names[int(label)] for label in _]

plt.legend(handles, legend_labels, title="Species")
plt.show()

Now we can visualize the 4D data into 2D. This is useful when we want to see what the data tells us. In this case we can see that `setosa` differs a lot from the two other species. This wouldn't be so clearly seen just from reading the raw data. 

## 3. Some other examples of PCA
Now we will look at more interesting cases where we can extrapolate some meaningful information!
Here is the list of the datasets we will look at:
- Breast cancer dataset
- Eigenfaces

> From now on I will use sklearn for PCA because I find it more convenient.

#### 3.1 Breast cancer dataset

This is a dataset from the `sklearn` library. We will name it $D_{\text{cancer}}$ so to not get it confused with the first example when we write our code. The set contains 30 features and 569 samples from different people. Let's take a quick look

In [None]:
cancer_ds = load_breast_cancer()
D_cancer = pd.DataFrame(data = cancer_ds.data, columns = cancer_ds.feature_names)
D_cancer.head()

Let's set our goal to transform $D_{\text{cancer}}$ from 30D to 3D.

In [None]:
# Standardize the data
D_cancer_std = StandardScaler().fit_transform(D_cancer)

# Put PCA into a variable for convenience
pca = PCA(n_components = 3).fit(D_cancer_std)

# Perform PCA
D_cancer_pca = pca.transform(D_cancer_std)

# Calculate explained variance
cum_sum_exp_ratio = np.cumsum(pca.explained_variance_ratio_ * 100)

print(f"Original shape of D_cancer: {D_cancer.shape}")
print(f"Shape of the new dataset: {D_cancer_pca.shape}")

From 30 to 3, nice. Now we will plot the results.

In [None]:
x1 = np.arange(1, D_cancer_pca.shape[1] + 1)
y1 = cum_sum_exp_ratio

print("Variance per principal compenent:")
for i in range(len(pca.explained_variance_ratio_)):
    print(f"PC{i + 1}: {pca.explained_variance_ratio_[i] * 100:.2f}%")


plt.bar(x1, y1, color = 'orange')
plt.xlabel('PC #')
plt.ylabel('% Explained variance')
plt.title("Cumalative sum of the explained variance")
plt.show()


From the graph we see that with 3 components we only can explain ~70% of the information. If we want we can increase the number of dimensions to get above 90% explained variance, but the current result will suffice.

Now let's see the 2D plot. It is useful to know that `load_breast_cancer()` has `.target_names` attribute that can tell us which cancers are benign or malignant.

In [None]:
# Get the unique features
unique_features = np.unique(cancer_ds.target)

# Set up a color map for the features
colors = plt.cm.Set1(np.linspace(0, 1, len(unique_features)))

# Create the scatter plot
plt.figure(figsize=[6,4])

for feature, color in zip(unique_features, colors):
    indices = np.where(cancer_ds.target == feature)
    plt.scatter(D_cancer_pca[indices, 0], D_cancer_pca[indices, 1], color=color)

# Add legend and plot the grapgh
legend_labels = cancer_ds.target_names
plt.legend(legend_labels)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("2D Projection of the 30D dataset")
plt.show()

We can see that the benign tumors are clustered together. This can suggest that the benign tumors don't vary that much and tend to have similar shape and size, since the variables aren't as spread as the malignant ones. The latter shows us that there is much more variance in the size and shape. Let's see the 3D graph. 

In [None]:
# Create the scatter plot
fig = plt.figure(figsize = (7, 7))
ax = fig.add_subplot(projection='3d')

for feature, color in zip(unique_features, colors):
    indices = np.where(cancer_ds.target == feature)
    ax.scatter(D_cancer_pca[indices, 0], D_cancer_pca[indices, 1], D_cancer_pca[indices, 2], color=color)

# Add legend
legend_labels = cancer_ds.target_names
ax.legend(legend_labels)
plt.title("3D Projection of the 30D dataset")
ax.set_xlabel("PC2")
ax.set_ylabel("PC2")
ax.set_zlabel("PC3")

plt.show()


The 3D graph gives us a better view of the clustering, but something else might catch the eye. There are some really distant points and we can check which of those are outliers. 

There are many tools for detecting outliers. One simple one is the already mentioned `zscore` which measures the distance from the mean. Let's plot the outliers using this method.

> As mentioned, there are many really cool ways to detect an outlier. Another cool way to do that is to use *mahalanobis* distance[<sup>\[4\]</sup>](#fn4)[<sup>\[5\]</sup>](#fn5), which calculates the distance from point $a$ to the *distribution* of a given set.


We will begin by creating a function `outlierz()` (hehe) that gets matrix X as input and performs zscore on it. The outliers_indexes uses `np.where` to find the indexes for the outliers and returns a list with all of them. `thresh` is manually set and is the threshold that zscore number has to exceed to be considered an outlier (usually equals 3).

In [None]:
def outlierz(X, thresh = 3):
    '''
    Takes X as input. 
    thresh > 3 generally means outlier
    returns list with all of the outliers
    '''
    z_scores = stats.zscore(X, axis = 0)
    outliers_indexies = np.where(z_scores > thresh) 
    
    return X[outliers_indexies[0]]

Let's try it on a 2D plot.

In [None]:
# Create the scatter plot
plt.figure(figsize=[6,4])

for feature, color in zip(unique_features, colors):
    indices = np.where(cancer_ds.target == feature)
    plt.scatter(D_cancer_pca[indices, 0], D_cancer_pca[indices, 1], color=color)
    

# Get and plot the outliers
outliers = outlierz(D_cancer_pca, 3)
plt.scatter(outliers[:, 0], outliers[:, 1], color="lime", marker='x', label='Outliers')

# Add legend and plot the graph
legend_labels = np.append(cancer_ds.target_names, 'Outliers')
plt.legend(legend_labels)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("2D Projection of the 30D dataset with Outliers")
plt.show()

*Hmm*. These outliers are strange. Many of them we weren't expecting to be so close to the center. What's up with that? 

Maybe a 3D plot will give us a better picture.

In [None]:
# Create the scatter plot
fig = plt.figure(figsize=[6, 6])
ax = fig.add_subplot(111, projection='3d')

for feature, color in zip(unique_features, colors):
    indices = np.where(cancer_ds.target == feature)
    ax.scatter(D_cancer_pca[indices, 0], D_cancer_pca[indices, 1], D_cancer_pca[indices, 2], color=color)

# Get and plot the outliers
outliers = outlierz(D_cancer_pca, 3)
ax.scatter(outliers[:, 0], outliers[:, 1], outliers[:, 2], color="lime", marker='x', label='Outliers')

# Add legend and plot the graph
legend_labels = np.append(cancer_ds.target_names, 'Outliers')
ax.legend(legend_labels)
ax.set_xlabel("PC1")
ax.set_ylabel("PC2")
ax.set_zlabel("PC3")
ax.set_title("3D Projection of the 30D dataset with Outliers")

plt.show()

*Aha!* It looks like those outliers were behind the cluster of normal data. At least that's what I think it is. 
 
 > Side note: As I was looking into this I came across the fact that as I increase the value of `n_components` (how many eigenvectors we want), some of the points become outliers. At least when using zscore. Further testing is needed to see if with other methods we will observe the same behavior. For now I think it is simply because of loss of information when we reduce the dimensions, since the first 2-3 PCs capture ~90% of the data (of course it depends on the data, as we saw in this example, the explain variance was less that 90%). It could be that those remaining percents are the reason for the new categorization of variables as outliers, but as mentioned - further tests needed.

#### 3.2 Eigenfaces
PCA Can also be used to form a basis matrix for faces, this is how the method is used to classify images of celebrities. Similar is the method we can use to reconstruct the original faces from the basis matrix which looks really cool. Let's see how we can do that.

We will use the tried-and-true olivetti dataset. The goal is to reconstruct the original faces with the basis matrix. Along the way we will see the "mean face", the face we will get when we get the mean of the dataset.

The workflow is almost exactly the same as with the other examples. Only difference is that to reconstruct a face we use `.inverse_transform` which scales the data back to the original.

In [None]:
# Load the Olivetti faces dataset
dataset_faces = fetch_olivetti_faces()
faces = dataset_faces.data
target = dataset_faces.target
image_shape_faces = dataset_faces.images[0].shape

# Perform PCA on the faces dataset
pca_faces = PCA(n_components=100)  
pca_result_faces = pca_faces.fit_transform(faces)

# Reconstruct the faces using the principal components
reconstructed_faces = pca_faces.inverse_transform(pca_result_faces)

plt.imshow(pca_faces.mean_.reshape(image_shape_faces), cmap="gray")

plt.title("Mean face")
plt.show()

Looks... like it's made of wax. Anyways let's see the others.

In [None]:
# Plot the original and reconstructed faces
# Number of faces to display
n_faces = 2  
fig, axes = plt.subplots(n_faces, 2, figsize=(5, 5))
for i in range(n_faces):
    axes[i, 0].imshow(faces[i].reshape(image_shape_faces), cmap='gray')
    axes[i, 0].set_title('Original Face')
    axes[i, 0].axis('off')

    axes[i, 1].imshow(reconstructed_faces[i].reshape(image_shape_faces), cmap='gray')
    axes[i, 1].set_title('Reconstructed Face')
    axes[i, 1].axis('off')
    

plt.tight_layout()
plt.show()

*It works!* We can get better results when we increase `n_components` if we want.

What about objects?

In [None]:
# Plot an image of a Babylonian clay tablet with mathematical formulas on it
image_path = "./Media/Babylonian-tablet.jpg"
image = np.array(Image.open(image_path).convert('L'))

# Plot
plt.imshow(image, cmap="gray")
plt.title("Original")
plt.show()

Now let's put it through PCA.

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(5, 5))

pc_values = [5, 15, 45, 135]  # Values for the number of principal components

for i, pc in enumerate(pc_values):
    # Put PCA into a variable for convenience
    pca = PCA(n_components=pc).fit(image)
    # Perform PCA
    image_pca = pca.transform(image)
    recon_image = pca.inverse_transform(image_pca)
    
    row = i // 2  # Determine the row index
    col = i % 2  # Determine the column index
    
    axes[row, col].imshow(recon_image, cmap='gray')
    axes[row, col].set_title(f'PC = {pc}')

plt.tight_layout()  # Adjust the spacing between subplots
plt.show()

Now let's create a function that puts an image through PCA and plots so to see some more examples just for fun.

In [None]:
def eigen_print(images, pc_values, s = 5):
    '''
    Takes to values and prints a table with images:
    images - list of image dirs to print
    pc_values - list with ints that we use for different scales of PCA
    s - size of plot, default = 5 but I suggest >= 8
    '''
    for m in images:
        # Get image
        image = np.array(Image.open(m).convert('L'))
        
        # Set plot
        fig, axes = plt.subplots(2, 2, figsize=(s, s))
        
        # Put image through PCA
        for i, pc in enumerate(pc_values):
            # Put PCA into a variable for convenience
            pca = PCA(n_components=pc).fit(image)
            # Perform PCA
            image_pca = pca.transform(image)
            recon_image = pca.inverse_transform(image_pca)

            row = i // 2  # Determine the row index
            col = i % 2  # Determine the column index
            
            axes[row, col].imshow(recon_image, cmap="gray")
            axes[row, col].set_title(f"PC = {pc}")
            
        # Plot
        fig.suptitle(m.split("/")[-1].split(".jpg")[0], fontsize=16)
        plt.tight_layout()  
        plt.show()

In [None]:
# List of .jpg directories. If for some reason they don't work they can be found
# in /Project/Media

img_data = ["./Media/Babylonian-tablet.jpg",
           "./Media/bench1.jpg",
           "./Media/banana.jpg",
           "./Media/fourier.jpg",
           "./Media/mLisa.jpg"]

pc_values = [1, 5, 15, 135]

In [None]:
eigen_print(img_data, pc_values, 5)

## 4. Conclusion

That's it folks! I am fully aware that the work here in this project is nothing new (some might say even bloated), but for me the process was really fun and I had joy understanding and explaining the stuff I have learned. I believe tools such as ML (and AI) in which category PCA falls into can be extremely helpful to people, not only for compressing images and better visualization of data, but to actually get a better view of the world piece by piece. My plans for the future is to get an even deeper understanding of the matter at hand and use it for cool, meaningful things (and maybe upload this somewhere so to help other hopeless data science students).

## 5. References:
    
   <span id="fn1"> 1. Types of PCA. Aiasprirant. Link: https://aiaspirant.com/types-of-pca/#:~:text=Types%20of%20PCA%20%7C%20Kernel%20PCA,PCA%20%7C%20Incremental%20PCA%20in%20Python. </span>
    
    
   <span id="fn2">2. Spryut, Vincent. A geometric interpretation of the covariance matrix. Pg4. Link: https://users.cs.utah.edu/~tch/CS4640/resources/A%20geometric%20interpretation%20of%20the%20covariance%20matrix.pdf  </span>
   
   
   <span id="fn3"> 3. Wikipedia. Standard score. Link: https://en.wikipedia.org/wiki/Standard_score#</span>
   
   <span id="fn4"> 4. Prabhakaran, Selva. Mahalanobis Distance – Understanding the math with examples (python). Link: https://www.machinelearningplus.com/statistics/mahalanobis-distance/</span>
   
   <span id="fn5"> 5. Wikipedia. Mahalanobis distance. Link: https://en.wikipedia.org/wiki/Mahalanobis_distance</span>

## 6. Bibliography: 

1. Spryut, Vincent. A geometric interpretation of the covariance matrix. Pg4.  Link: https://users.cs.utah.edu/~tch/CS4640/resources/A%20geometric%20interpretation%20of%20the%20covariance%20matrix.pdf.

2. Shlens, Jonathon. A Tutorial on Principal Component Analysis, Google Research, Mountain View, CA 94043. Link: https://arxiv.org/pdf/1404.1100.pdf
3. Xu, Yang. 10-701 Machine Learning (Spring 2012) - Principal Component Analysis. Link: https://www.cs.cmu.edu/~tom/10601_fall2012/slides/pca.pdf

2. Bagheri, Alireza. Principal Component Analysis (PCA) from Scratch. Link: https://bagheri365.github.io/blog/Principal-Component-Analysis-from-Scratch/#Step_5

3. Holland, Steven (5 December, 2019). PRINCIPAL COMPONENTS ANALYSIS (PCA), Department of Geology, University of Georgia, Athens, GA 30602-2501. Link: https://strata.uga.edu/8370/handouts/pcaTutorial.pdf

4. (Couldn't find author). 3.6.10.14. The eigenfaces example: chaining PCA and SVMs, scipy-lectures.org. Link: https://scipy-lectures.org/packages/scikit-learn/auto_examples/plot_eigenfaces.html

4. Jaadi, Zakaria. Step-by-Step Explanation of Principal Component Analysis (PCA). Link: https://builtin.com/data-science/step-step-explanation-principal-component-analysis