## Name: Matthias Bartolo                            Id: 0436103L

</br>

# <div style="text-align: left"><font size="+4"> A Guide to Principal Component Analysis </font></div>

</br>

## Table of Contents:
### (Click any of the following links, to be redirected to that section)
### [0. Introduction](#intro)
### [1. Loading the Data](#loadData)
### [2. Dataset Feature Selection](#featureSelection)
### [3. Dealing with Discrete Data](#discreteData)
### [4. Filtered Dataset Visualisations](#initialDataSetVisualisations)
### [5. Normalizing Data](#normalizingData)
### [6. Normalized Dataset Visualisations](#normalizedDataSetVisualisations)
### [7. Understanding PCA - SVD Approach](#pcaSVD)
### [8. Understanding PCA - Covariance Matrix Approach](#pcaCovariance)
### [9. Comparisons Between Approaches](#approachCompare)
### [10. Working out PCA on the Entire Dataset](#pcaEntire)
### [11. PCA Visualisations](#pcaVisualisations)
### [12. Conclusions and Limitations of PCA](#conclusion)
### [13. References](#references)

</br></br>
### Packages Install (Please uncomment if you are receiving any errors):

In [1]:
#!pip install plotly
#!pip install scikit-learn
#!pip install pandas
#!pip install gensim
#!pip install nltk

### Packages:

In [2]:
import os
import random
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import gensim
import collections
import nltk
from os.path import exists
from sklearn.decomposition import PCA
from numpy import linalg as LA
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize

</br>

<a id='intro'></a>
### 0. Introduction
</br>

If you are a student, who just recently completed a **Linear Algebra** or **AI Numerical Methods course**, you might be wondering how these seemingly mathematical concepts relate to the real world. Incidentally, a noteworthy application of such concepts can be seen in the **Principal Component Analysis (PCA) Algorithm**, which is a widely used tool in various different fields such as Finance and Image Processing. Thus, the time and effort put into learning these mathematical ideas have not been in vain. Moreover, in this notebook we will be exploring how the techniques which you learned, relate together and form the PCA Algorithm.</br>

**What is PCA?** </br>
Principal Component Analysis (PCA) is an incredibly useful, and widely used multivariate algorithm in **Machine Learning**. Moreover, such algorithm is also extremely helpful in the analysis of huge datasets, whilst effectively undertaking **Dimensionality Reduction** and **Feature Selection**. Furthermore, PCA is used to ensure that data scientists may load and utilise large datasets on less powerful machines, which could not support the size of the full dataset. Additionally, PCA also provides cleaner data visualisation through the envisioning of the key data features in the full dataset, which hold the largest degree of information. [1-2]</br>

Mathematically, PCA enables the conversion of linear continuous data into a new coordinate system, characterized by new axis **(Principal Components)** which are ordered in accordance with the features in the new coordinate system. This enables that the best principal components are plotted on different dimensional graphs, thus presenting a satisfactory visualisation of a large dataset. Unfortunately, such method may have some minimal data reduction, however visualising an n dimensional feature dataset on a 3D plot is quite a benefit. The PCA's main characteristics of decreasing the dimensionality of data, whilst retaining salient information, lead to it being the most effectively ranked data analysis and machine learning technique [1-2].</br></br>


**Brief History of PCA:** </br>
- PCA has a long, and illustrious history that goes back more than a century. The algorithm was pioneered by Karl Pearson, who in 1901 launched this system with the aim of undertaking data analysis and dimensionality reduction. The current PCA's design was first pioneered by Harold Hotelling in the 1930s, who led the way for the method to truly take shape. Hotelling was instrumental in formulating the concept of variance maximization, and the use of orthogonal projections to find the Principal Components. [1-2]
- Further improvements to the PCA algorithm were developed in the 1960s, in part due to the emergence of Singular Value Decomposition (SVD), which offered an alternate method for calculating the eigenvalues and vectors, necessary to perform the PCA algorithm. The growing adoption of PCA at that time was largely triggered by the need for dimensionality reduction and the widespread growth of computers. Consequently, the method gained a lot of popularity in the 1970s and later on when data scientists and researchers fully comprehended the effectiveness of such technique in dealing with enormous and complex datasets which were becoming more and more prevalent in industries such as banking, engineering, and medicine. [1-2]


</br></br>

**Citations in this Section:** </br>

[1] S. Mishra et al., "Multivariate Statistical Data Analysis-Principal Component Analysis," Int. J. Livest. Res., vol. 1, pp. 1-6, 2017. [Online]. Available: https://www.researchgate.net/publication/316652806_Principal_Component_Analysis. [Accessed: 18-Apr-2023].</br>


[2] D. Li and S. Liu, "4.2.3.1 Principal Component Analysis," in Water Quality Monitoring and Management: Basis, Technology and Case Studies, 1st ed., S. K. Gupta and R. Kumar, Eds. Amsterdam, Netherlands: Elsevier, 2019. [Online]. Available: https://www.sciencedirect.com/topics/agricultural-and-biological-sciences/principal-component-analysis. [Accessed: 18-Apr-2023].

</br>

</br></br>

<a id='loadData'></a>
### 1. Loading the Data
</br>

A key step before the initiation of the PCA Algorithm, entails the selection of a relevant dataset which will be analysed by such algorithm. The designed implementation enables students interacting with the notebook, the choice to select any of the default datasets, and explore how the PCA algorithm will function on such datasets.  Students are also given the option to load their preferred dataset. In addition, the selected default datasets are characterised by different attributes, so as to allow students in carrying out different experiments, and to facilitate a comparative analysis of the results obtained through varying the datasets. **Additionally, students are also highly encouraged, before commencing the Principal Component Analysis, to thoroughly analyse and understand the dataset's properties and qualities.** </br></br>

**The following are the default datasets (Obtained from [3-9]):**
1. **country_wise_latest.csv** - This dataset has a small Size, a large number of Features, and a few numbers of Discrete Columns.
2. **diabetes.csv** - This dataset has a small Size, a small number of Features, and no Discrete Columns.
3. **FIFA - 2014.csv** - This dataset has a small Size, a small number of Features, and one Discrete Column.
4. **IRIS.csv** - This dataset has a small Size, a large number of Features, and one Discrete Column.
5. **Salary_Dataset_with_Extra_Features.csv** - This dataset has a large Size, a small number of Features, and a reasonable number of Discrete Columns.
6. **spotify.csv** - This dataset has a large Size, a large number of Features, and a reasonable number of Discrete Columns.
7. **wine-quality-white-and-red.csv** - This dataset has a large Size, a large number of Features, and one Discrete Column.

</br>

**In the code cell below, Students are presented with a Menu, either to load a default dataset or a preferred dataset of their choice.**

</br></br>

**Citations in this Section:** </br>


[3] DEVAKUMAR K. P., "COVID-19 Dataset", Kaggle, 2020. [Online]. Available: https://www.kaggle.com/datasets/imdevskp/corona-virus-report?select=country_wise_latest.csv. [Accessed: 18-Apr-2023].</br>
 
[4] UCI MACHINE LEARNING, "Pima Indians Diabetes Database", Kaggle, 2016. [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database. [Accessed: 18-Apr-2023].</br>

[5] S. BANERJEE, "FIFA - Football World Cup Dataset", Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/iamsouravbanerjee/fifa-football-world-cup-dataset?select=FIFA+-+2014.csv. [Accessed: 18-Apr-2023].</br>

[6] MATHNERD, "Iris Flower Dataset", Kaggle, 2018. [Online]. Available: https://www.kaggle.com/datasets/arshid/iris-flower-dataset. [Accessed: 18-Apr-2023].</br>

[7] S. BANERJEE, "Software Industry Salary Dataset - 2022", Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/iamsouravbanerjee/software-professional-salaries-2022. [Accessed: 18-Apr-2023].</br>

[8] R. Holbrook and A. Cook, "Principal Component Analysis, spotify.csv", Kaggle. [Online]. Available: https://www.kaggle.com/code/ryanholbrook/principal-component-analysis/data?select=spotify.csv. [Accessed: 18-Apr-2023].</br>

[9] RUTHGN, "Wine Quality Data Set (Red & White Wine)", Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/ruthgn/wine-quality-data-set-red-white-wine. [Accessed: 18-Apr-2023].</br>


In [4]:
#Giving the user the option to choose either a default dataset, or to enter his/her own dataset path
filename=None
path=""
#Looping until the user enters a valid choice
while True:
    #Displaying Menu
    print("\033[1m \nChoosing From the following Options: \033[0m \n1.Load a Default Dataset\n2.Choose a Requested Dataset")
    choice=int(input())
    if(choice==1):
        #Showing a list of default datasets to the user, and awaiting valid user choice
        print("\033[1m \nChoose from the Following Default Datasets: \033[0m")
        validDataset=[]
        for dataset in os.listdir("Datasets"):
            validDataset.append(dataset)
            print(dataset)
            #Looping until user, enters a valid File Name
        while filename not in validDataset:
            print("\nPlease Input File Name:")
            filename=input()
        #Constructing File Path
        path="Datasets/"+filename
        break
        
    elif(choice==2):
        #Giving the user the option to load a preferred dataset
        print("\nPlease Input Requested File Path (Make sure that the file you wish to load is in the current directory, and is of the .csv type):")
        filename=input()
        path=filename
        #Error Checking whether dataset exists
        if(exists(path)):
            break
        else:
            print("\033[91m \n\nError: Requested File Not Found \033[0m")
            
#Loading the data from a specified path, and storing csv contents in a dataframe, with correct Error Handling
if(exists(path)):
    dataframe = pd.read_csv(path)
    #Error Checking whether dataframe has the relevant number of columns
    if(len(dataframe.columns)<3):
        print("\033[91m \n\nWarning: The Dataset has less than 3 features/columns, and would present Errors later on in forthcoming sections. Please make sure to load a dataset with at least 3 features/columns \033[0m")
else:
    print("\033[91m \n\nError: Requested File Not Found \033[0m")

[1m 
Choosing From the following Options: [0m 
1.Load a Default Dataset
2.Choose a Requested Dataset
1
[1m 
Choose from the Following Default Datasets: [0m
country_wise_latest.csv
diabetes.csv
FIFA - 2014.csv
IRIS.csv
Salary_Dataset_with_Extra_Features.csv
spotify.csv
wine-quality-white-and-red.csv

Please Input File Name:
Salary_Dataset_with_Extra_Features.csv


</br>

### The following is the Requested Dataset loaded in a Pandas Dataframe:

In [None]:
display(dataframe)

</br></br>

<a id='featureSelection'></a>
### 2. Dataset Feature Selection
</br>

Another key step when performing a data analysis or a machine learning study, pertains to observing the type and number of different **Genes/Features**, which the dataset possesses. This step is highly critical, as sometimes processing a huge number of features in the dataset may cause memory allocation issues or prolong the processing time of algorithms.  

**Please note that in case less than three columns are chosen, the first three columns will be added to the filtered dataset. This is done, to ensure that the filtered dataset, would have enough  features for visualisation in the upcoming sections.**

</br>

### Displaying the number of Genes/Features in the Dataset:

In [None]:
print("The \033[1m",filename,"\033[0m dataset currently has \033[1m",len(dataframe.columns)," different Genes/Features.\033[0m")

</br> 

**In the code cell below, Students are presented all the dataset **Features/Columns** one by one, and they are given the option, to continue processing with such feature, or to discard it.** 

In [None]:
#Dataframe which will hold the selected features
filteredDataframe=dataframe
#Displaying Menu to the User, so that the user will choose which features to keep
print("\033[1mChoose whether to keep the following Genes/Features, to work with: \033[0m ")
for colNum,column in enumerate(dataframe.columns):
    print("Do you wish to keep: ",colNum+1,"\b.",column,"? \033[1m(Enter 'Y' to accept, and 'N' to decline)\033[0m")
    choice=input()
    if(choice!="Y"):
        filteredDataframe=filteredDataframe.drop(columns=[column])

#Looping until size is smaller than 3 and adding the first 3 columns to the dataframe (Error Checking)
count=0
while(len(filteredDataframe.columns)<3):
    filteredDataframe.insert(count,dataframe.columns[count],dataframe.iloc[:,0])
    count+=1

</br>

### The following is the Filtered Dataset with the Requested Genes/Features:

In [None]:
display(filteredDataframe)

</br>

### Displaying the number of Genes/Features in the Filtered Dataset:

In [None]:
print("The \033[1m",filename,"\033[0m dataset currently has \033[1m",len(filteredDataframe.columns)," different Genes/Features \033[0m")

</br></br>

<a id='discreteData'></a>
### 3. Dealing with Discrete Data
</br>
As previously mentioned, PCA is designed to be utilised on continuous data [10]. This is a cardinal feature and dictates the need to transform discrete data into continuous data before using PCA on a dataset. This is critical, as discrete data lacks a continuous range of values and cannot be represented in the same way as continuous data for this cause.

</br>

**Types of Data:**
- **Continous Data** - This type of data refers to values which belong to a set, and data can take any value between a bounded and unbounded interval. (For example a Worker's Pay.)
- **Discrete Data** - This type of data refers to values which belong to a set, and every data value needs to be disctint. (For example a Worker's Name.)

</br>

There are various ways how discrete data can be **Transformed/Encoded** to continuous data, in order to be examined by the PCA. 
</br></br>

**The following are different types of Encoders, which can be used:**
1. **One-Hot Encoding**
2. **Label Encoding**
3. **Ordinal Encoding (Similar to Label Encoding)**
4. **Count Encoding**
5. **Word Embeddings Model**

</br></br>

**Citations in this Section:** </br>

[10] V. Karthik, "PCA for categorical features", Stack Overflow, Dec. 2016. [Online]. Available: https://stackoverflow.com/questions/40795141/pca-for-categorical-features#:~:text=PCA%20is%20designed%20for%20continuous,yes%2C%20you%20can%20use%20PCA. [Accessed: 18-Apr-2023].</br>

</br>

###  Observing the type of Data for each Column in the Filtered Dataset:

In [None]:
display(filteredDataframe)

</br>

###  3.1 One-Hot Encoding on the first Discrete Data Column 
</br>

**What is One-Hot Encoding?** </br>
One-hot Encoding is a data preparation technique used to transform discrete variables into a format that enables the examination by machine learning algorithms. Consequently, this encoding algorithm works by creating a binary vector for each possible category in the data. Additionally, each binary vector would have a value of 1 or 0 symbolising the presence or absence of each category respectively. [11-13]

</br>

**In the code cell below, One-Hot Encoding is applied to the first Discrete Data Column, via the pd.get_dummies function.**

</br></br>

**Citations in this Section:** </br>

[11] Datagy. "Pandas get_dummies (One-Hot Encoding) Explained," Datagy.io, Feb. 2021. [Online]. Available: https://datagy.io/pandas-get-dummies/. [Accessed: 18-Apr-2023].</br>

[12] DataCamp. "Dealing with Categorical Data". DataCamp, 2021. [Online]. Available: https://www.datacamp.com/tutorial/categorical-data. [Accessed: 18-Apr-2023].</br>

[13] B. Roy, "All about Categorical Variable Encoding," Towards Data Science, Jul. 2, 2019. [Online]. Available: https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02. [Accessed: 18-Apr-2023].</br>

In [None]:
#Looping through all the columns in the data frame and checking whether column has object type (i.e., contains discrete data),
#if so, applying one hot encoding on the first discrete column, and exiting
oneHotEncDataframe=pd.DataFrame()
columnName=None
for column in filteredDataframe.columns:
    if filteredDataframe[column].dtype=='O':#O denotes type Object
        oneHotEncDataframe=pd.get_dummies(filteredDataframe[column])
        columnName=column
        break
#Showing one-hot encoding dataframe of first discrete column
display(oneHotEncDataframe)

</br>

### Displaying the number of Genes/Features in the Filtered Dataset, and One-Hot Encoded first Discrete Data Column:

In [None]:
print("The Filtered\033[1m",filename,"\033[0m dataset has \033[1m",len(filteredDataframe.columns)," different Genes/Features \033[0m")
if(columnName is None):
    print("\033[1mThere are no Discrete Columns\033[0m")
else:
    print("Applying \033[1mOne-Hot Encoding only on the",columnName,"column\033[0m, results in the encoded data to have \033[1m",len(oneHotEncDataframe.columns)," different Genes/Features \033[0m")

</br>
As can be seen from the above result, such encoding is quite explosive, as the number of different Genes/Features obtained after applying One-Hot encoding on a single column, will greatly increase the number of columns depending on the number of distinct features in each column. For an algorithm which aims to reduce dimensionality, such approach to turn discrete data into continuous  data is quite inefficient, notwithstanding the increase in memory and time complexity presented.

</br>

**In case you thought whether this binary vector can be transformed back to decimal. Note that such encoding algorithm exists and is known as Binary to Decimal Decoding. The aforementioned  algorithm effectively transforms the binary vector back into a decimal value, thus reducing the size of the Genes/Features to their original number [14]. Essentially such encoding would take relatively more time whilst achieving the same results as Label Encoding or Ordinal Encoding.**

</br></br>

**Citations in this Section:** </br>


[14] T. Crosley, "What is the binary to decimal decoder?", Quora, May 8, 2018. [Online]. Available: https://www.quora.com/What-is-the-binary-to-decimal-decoder. [Accessed: 18-Apr-2023].</br>

</br>

### 3.2  Label Encoding
</br>

**What is Label Encoding?** </br>
Label Encoding is another data preparation technique which facilitates the transformation of discrete variables into a format that is easily readable by machine learning algorithms. Such encoder works by giving each distinct category a unique numeric value or code [12,13,15]. For instance, taking the list of categories [“hat”,”apple”,”cap”] will be encoded as [3,1,2] (as numeric values).

</br>

**In the code cell below, Label Encoding is applied through the pd.factorise function, and the sort flag applied to True, so that there wouldn't be in the order which they appeared first.**

</br></br>

**Citations in this Section:** </br>

[12] DataCamp. "Dealing with Categorical Data". DataCamp, 2021. [Online]. Available: https://www.datacamp.com/tutorial/categorical-data. [Accessed: 18-Apr-2023].</br>

[13] B. Roy, "All about Categorical Variable Encoding," Towards Data Science, Jul. 2, 2019. [Online]. Available: https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02. [Accessed: 18-Apr-2023].</br>

[15] Pandas. "pandas.factorize()". pandas 1.4.0 documentation, Jan. 07, 2022. [Online]. Available: https://pandas.pydata.org/docs/reference/api/pandas.factorize.html. [Accessed: 18-Apr-2023].</br>

In [None]:
#Looping through every colum, and applying the pd.factorise function with "sort=True" on the discrete data
labelEncDataframe=filteredDataframe.copy()
for column in labelEncDataframe.columns:
    if labelEncDataframe[column].dtype=='O':#O denotes type Object
        labelEncDataframe[column] = pd.factorize(filteredDataframe[column], sort=True)[0]
#Displaying Label Encoding DataFrame        
display(labelEncDataframe)

</br>

### 3.3  Ordinal Encoding
</br>

**What is Ordinal Encoding?** </br>
A similar data preparation technique to Label Encoding is Ordinal Encoding. Such encoder works by giving each distinct category a unique numeric value or code, based on the order which the category appeared first [12,13,16]. For instance, taking the list of categories ["hat","apple","cap"] will be encoded as [1,2,3] (as numeric values, and encoded in the order which they appeared).

</br>

**In the code cell below, Ordinal Encoding is applied through the pd.factorise function, and the sort flag applied to False, so that the elements would be classified in the order which they appeared first.**

</br></br>

**Citations in this Section:** </br>

[12] DataCamp. "Dealing with Categorical Data". DataCamp, 2021. [Online]. Available: https://www.datacamp.com/tutorial/categorical-data. [Accessed: 18-Apr-2023].</br>

[13] B. Roy, "All about Categorical Variable Encoding," Towards Data Science, Jul. 2, 2019. [Online]. Available: https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02. [Accessed: 18-Apr-2023].</br>

[16] J. Brownlee, "One-Hot Encoding for Categorical Data," Machine Learning Mastery, Aug. 17, 2020. [Online]. Available: https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/. [Accessed: 18-Apr-2023].</br>

In [None]:
#Looping through every colum, and applying the pd.factorise function on the discrete data
ordinalEncDataframe=filteredDataframe.copy()
for column in ordinalEncDataframe.columns:
    if ordinalEncDataframe[column].dtype=='O':#O denotes type Object
        ordinalEncDataframe[column] = pd.factorize(filteredDataframe[column])[0]
#Displaying Ordinal Encoding DataFrame        
display(ordinalEncDataframe)

</br>

### 3.4  Count Encoding
</br>

**What is Count Encoding?** </br>
Another data preparation technique is Count Encoding. This encoder works by encoding each distinct category, with the number of times such category appeared [12-13]. For instance, if the category "hat" appeared 5 times, then "hat" will be encoded by the number 5.

</br>

**In the code cell below, Count Encoding is applied through the .value_counts and .map function on the discrete data.**

</br></br>

**Citations in this Section:** </br>

[12] DataCamp. "Dealing with Categorical Data". DataCamp, 2021. [Online]. Available: https://www.datacamp.com/tutorial/categorical-data. [Accessed: 18-Apr-2023].</br>

[13] B. Roy, "All about Categorical Variable Encoding," Towards Data Science, Jul. 2, 2019. [Online]. Available: https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02. [Accessed: 18-Apr-2023].</br>

In [None]:
#Looping through every colum, and applying the .value_counts and .map functions on the discrete data
countEncDataframe=filteredDataframe.copy()
for column in countEncDataframe.columns:
    if countEncDataframe[column].dtype=='O':#O denotes type Object
        colValueFreq=countEncDataframe[column].value_counts(dropna=False)
        countEncDataframe[column] =countEncDataframe[column].map(lambda x : colValueFreq[x])
#Displaying Count Encoding DataFrame    
display(countEncDataframe)


</br>

### 3.5  Word Embeddings Model (utilising Word2vec)
</br>

**What is a Word Embeddings Model?** </br>
A Word Embeddings Model is a type of natural language processing (NLP) model which depict words as numerical vectors in a high-dimensional space. This model works by first training a neural network on a large corpus of text data, in order to represent words as dense, low-dimensional vectors. Each component of the word vector represents a specific aspect or characteristic of the word, such as its semantic meaning, part of speech, or syntactic context [17]. Mainly, the developed artefact focuses on the use of Word2vec, which is one type of Word Embeddings Model [17].

</br>

**In the code cell below, the Word Embeddings Model (Word2vec) is applied on the discrete data. Please note that the process may take some time to complete. Additionally, the encoded data will be in the form of the vector mean of all the current words in the sentence.**

</br></br>

**Citations in this Section:** </br>

[17] Vatsal, "Word2Vec Explained", Towards Data Science, Jul. 29, 2021. [Online]. Available: https://towardsdatascience.com/word2vec-explained-49c52b4ccb71. [Accessed: 18-Apr-2023].</br>

In [None]:
#Looping through every colum, and training the Word2Vec model on thge discrete data
wordEncDataframe=filteredDataframe.copy()
for column in wordEncDataframe.columns:
    if wordEncDataframe[column].dtype=='O':#O denotes type Object
        print("\033[1mCompleting Column:\033[0m",column)
        #Tokenising current Column
        tokens=wordEncDataframe[column].apply(lambda x: word_tokenize(str(x).lower()))
        #Feeding the model the tokens
        #min count refers to the number of words to consider, a count of 1 means
        #we are considering words with a count of 1
        wordEmbeddingsModel=Word2Vec(tokens,min_count=1)
        
        #Retrieving the mean of the vector of all the current tokens in the sentence, and checking that token is not nan
        wordEncDataframe[column] = wordEncDataframe[column].apply(lambda x: 
            np.mean([wordEmbeddingsModel.wv[token] for token in word_tokenize(str(x).lower())], axis=0).tolist()[0] 
            if str(x) != 'nan' else 0)

</br>

###  Displaying Dataframe after applying Word Embeddings Model:

In [None]:
display(wordEncDataframe)

</br>

###  Giving the User the Option to choose his/her preferred Encoding Technique:

In [None]:
print("\n\033[1mChoose which Encoding technique to utilise:\n1. Label Encoding\n2. Ordinal Encoding\n3. Count Encoding \n4. Word Embeddings Model\033[0m")
userChoice=int(input())
filteredContinousData=labelEncDataframe 
if(userChoice==2):
    filteredContinousData=ordinalEncDataframe 
elif(userChoice==3):
    filteredContinousData=countEncDataframe 
elif(userChoice==4):
    filteredContinousData=wordEncDataframe

</br></br>

<a id='initialDataSetVisualisations'></a>
### 4. Filtered Dataset Visualisations
</br>
Visualisation is a useful tool, as it aids in the process of identifying data visual patterns and characteristics. It is a common fact that individuals find it simpler to spot patterns and trends when data is presented visually, rather than in numerical or written form. Unfortunately, not all features can be projected on screen, as visualised data is limited to three dimensions. Thus, individuals need to choose which features to visualise, from a high-dimensional dataset with many features.
</br></br>

**In the code cell below, Students are presented with the list of features in the dataset, and are given the option to choose either feature for the three Dimensional Variables, which will be Represented Visually.**

In [None]:
#Displaying list of features
print("\n\033[1mChoose 3 Features to represent the Data in 2D and 3D \033[0m")
for colNum,column in enumerate(filteredContinousData.columns):
    print(colNum,"\b.",column,)
#List which holds the valid Column ranges
validColumnRanges=range(0,len(filteredContinousData.columns))
#Variable Initialisation to null
x=None
y=None
z=None

#Looping until valid column index is entered for every Dimensional variable
while x not in validColumnRanges:
    x=int(input("\nPlease Input a Valid Column Index to represent the \033[1mX axis\033[0m attribute in the Plot:"))
    
while y not in validColumnRanges:
    y=int(input("\nPlease Input a Valid Column Index to represent the \033[1mY axis\033[0m attribute in the Plot:"))
    
while z not in validColumnRanges:
    z=int(input("\nPlease Input a Valid Column Index to represent the \033[1mZ axis\033[0m attribute in the Plot:"))

</br>

**Please note that the visualisation tools used are quite interactive and allow Students to zoom in or zoom out of the plots, enabling them to recognize certain data trends better. Additionally, note that the colour component does not show any relationship between the variables, but is used as a marker in order to compare graph axis between 2D and 3D plots. Furthermore, there are some cases where some graphs would need to rerun the code cell in order to appear, this is due to the plotly limit of 10 graphs.**

</br>

###  The following is the Filtered Dataset with the selected Features visualised in 3D:

In [None]:
#Creating a 3D scatter plot, from the respective user inputted columns
fig3D = go.Figure(data=go.Scatter3d(
    x=filteredContinousData.iloc[:,x],
    y=filteredContinousData.iloc[:,y],
    z=filteredContinousData.iloc[:,z],
    mode='markers',
    marker=dict(color = filteredContinousData.iloc[:,x],line=dict(width=2000, color='DarkSlateGrey'))
))

#Setting the respective title preferences
fig3D.update_layout(width=1000, height=1000, title = '3D Representation of Filtered Dataset from '+filename+':',
                  font_color="blue",font_family="verdana",
                  scene = dict(xaxis=dict(title=filteredContinousData.columns[x], titlefont_color='blue'),
                               yaxis=dict(title=filteredContinousData.columns[y], titlefont_color='blue'),
                               zaxis=dict(title=filteredContinousData.columns[z], titlefont_color='blue')))
#Displaying plot
fig3D.show()

</br>

###  The following is the Filtered Dataset with the selected Features visualised in 2D:

In [None]:
#Creating a 2D scatter plot, from the respective user inputted columns
fig2D = px.scatter(x=filteredContinousData.iloc[:,x] ,y=filteredContinousData.iloc[:,y])
fig2D.update_layout(title = '2D Representation of Filtered Dataset from '+filename+':',
                    xaxis_title=filteredContinousData.columns[x],
                    yaxis_title=filteredContinousData.columns[y],
                    font_color="blue",font_family="verdana",coloraxis_showscale=False)
#Updating settings
fig2D.update_traces(marker=dict(color=filteredContinousData.iloc[:,x],size =10,line=dict(width=2,color='DarkSlateGrey')),selector=dict(mode='markers'))
#Displaying plot
fig2D.show()

</br></br>

<a id='normalizingData'></a>
### 5. Normalizing Data
</br>

**What is Normalization?** </br>
Normalization is the process of converting and scaling the numerical characteristics inside a dataset, with the aim of ensuring that the data is characterized by a uniform range and distribution. Normalization's primary objective is to guarantee that no feature dominates or has an excessively large impact on the model's performance. **This process is critical in the calculation of the PCA, since if given unnormalized data, the PCA algorithm will load on the high variance data** [18]. An example would be having two data variables, one having a value of 1 and the other having a value of 700, whereby the PCA algorithm will issue higher importance to the second value. The importance of Normalizing data is highly significant especially when one considers that the previous encoding techniques, such as Label Encoding or Ordinal Encoding will provide transformed variables with an uneven distribution. Normalization addresses this issue once it converts the data values into a uniform range.</br></br>

This implementation focuses on utilising **Z-Score Normalization** or also known as **Standardization** [19], and such normalization technique can be constructed through the following formula: 

<div style="text-align: center"><font size="+3"> $norm=\frac{(x - \mu)}{σ}$ </font></div></br>

<font size="+0.5">
    
**where:**
1. **x** - is the Original Value
2. **$\mu$** - is the Mean of the Data
2. **σ** - is the Standard Deviation of the Data
3. **norm** - is the Normalized Data
</font>

</br></br>

**Citations in this Section:** </br>

[18] Stack Exchange. "Why do we need to normalize data before Principal Component Analysis (PCA)?", Cross Validated, May 26, 2014. [Online]. Available: https://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-principal-component-analysis-pca. [Accessed: 18-Apr-2023].</br>

[19] R. Sharma. "What is Normalization in Data Mining and How to Do It?", UpGrad, Sep. 22, 2022. [Online]. Available: https://www.upgrad.com/blog/normalization-in-data-mining/#:~:text=Project%20Ideas%20%26%20Topics-,Z%2DScore%20Normalization,up%20to%20%2B3%20standard%20deviation. [Accessed: 18-Apr-2023].</br>

</br>

###  5.1 First, Calculating the Mean ($\mu$) for each Column in the Dataframe

**In the code cell below, Students are presented with the calculation of the mean for each colum through the .mean function.**

In [None]:
#Calculating the Mean for each Column through the .mean() function, and storing result in a list
colMean=list(filteredContinousData.mean())
#Displaying Mean for each column
for colNum, column in enumerate(colMean):
    print("\033[1mColumn:\033[0m ",colNum+1,"\t\033[1mMean: \033[0m",column)

</br>

###  5.2 Second, Calculating the Standard Deviation (σ) for each Column in the Dataframe

</br><font size="+1">**Calculating Standard Deviation through the Formula:**</font></br></br>

<div style="text-align: center"><font size="+3"> $σ=\sqrt{\frac{\sum (x_i - \mu)^2}{N}}$ </font></div></br>

<font size="+0.5">
    
**where:**
1. **$x_i$** - is a Single Value from the Whole Data
2. **$\mu$** - is the Mean of the Data
2. **N** - is the Size of the Data
3. **σ** - is the Standard Deviation of the Data
</font>

**In the code cell below, Students are presented with the calculation of the mean for each colum through the .std function.**

In [None]:
#Calculating the Standard Deviation for each Column through the .std() function, and storing result in a list
colStandardDev=list(filteredContinousData.std())
#Displaying Standard Deviation for each column
for colNum, column in enumerate(colStandardDev):
    print("\033[1mColumn: \033[0m",colNum+1,"\t\033[1mStandard Deviation: \033[0m",column)

</br>

###  5.3 Finally, Utilising the Calculated Mean and Standard Deviation for each Column, to Normalize the Dataframe

</br><font size="+1">**Applying Formula:**</font></br></br>

<div style="text-align: center"><font size="+3"> $norm=\frac{(x - \mu)}{σ}$ </font></div></br>

<font size="+0.5">
    
**where:**
1. **x** - is the Original Value
2. **$\mu$** - is the Mean of the Data
2. **σ** - is the Standard Deviation of the Data
3. **norm** - is the Normalized Data
</font>

**In the code cell below, Students are presented with the construction of the Z-Score Normalization.**

In [None]:
#Dictionary which will hold the normalized data
normalizedData=dict()
counter=0
#Looping through all the columns in the Dataframe
for column in filteredContinousData:
    #Dictionary which will hold the normalized Column
    normalizedColumn=dict()
    rowCounter=0
    #Looping through every row in the current column
    for row in filteredContinousData[column]:
        #Checking whether Standard Deviation is not 0, and if so normalizing current value
        if colStandardDev[counter]!=0:
            normalizedColumn[rowCounter]=(row-colMean[counter])/colStandardDev[counter]
        else:#Standard Deviation is 0, therefore setting value to 0
            normalizedColumn[rowCounter]=0
        #Incrementing row counter
        rowCounter+=1
    #appending normalized column to normalized data
    normalizedData[column]=normalizedColumn
    counter+=1

#Converting data into pandas Dataframe
normalizedDF=pd.DataFrame.from_dict(normalizedData)
#Changing nan to 0
normalizedDF=normalizedDF.replace(np.nan,0)

</br>

###  Creating a function, which applies the afformentioned techniques in succession, in order to Normalize a given Dataframe

In [None]:
#Function to Normalize a Dataframe
def NormalizeDF(inputDF):
    #Calculating the Mean for each Column through the .mean() function, and storing result in a list
    colMean=list(inputDF.mean())
    #Calculating the Standard Deviation for each Column through the .std() function, and storing result in a list
    colStandardDev=list(inputDF.std())
    #Dictionary which will hold the normalized data
    normalizedData=dict()
    counter=0
    #Looping through all the columns in the DataSet
    for column in inputDF:
        #Dictionary which will hold the normalized Column
        normalizedColumn=dict()
        rowCounter=0
        #Looping through every row in the current column
        for row in inputDF[column]:
            #Checking whether Standard Deviation is not 0, and if so normalizing current value
            if colStandardDev[counter]!=0:
                normalizedColumn[rowCounter]=(row-colMean[counter])/colStandardDev[counter]
            else:#Standard Deviation is 0, therefore setting value to 0
                normalizedColumn[rowCounter]=0
            #Incrementing row counter
            rowCounter+=1
        #appending normalized column to normalized data
        normalizedData[column]=normalizedColumn
        counter+=1

    #Converting data into pandas DataFrame
    normalizedDF=pd.DataFrame.from_dict(normalizedData)
    #Changing nan to 0
    normalizedDF=normalizedDF.replace(np.nan,0)
    #Returning normalized Dataframe
    return normalizedDF

</br>

###  The following is the Normalized Dataframe:

In [None]:
display(normalizedDF)

</br>

<a id='normalizedDataSetVisualisations'></a>
### 6. Normalized Dataset Visualisations

</br>

**In the code sections below, the normalized dataset can be visualised in 3D and 2D respectively. Furthermore, one might note how in the following plots, the normalized data is plotted on a smaller range of values, when compared to the original plots in the section above. Additionally, one can also notice how in the normalized plots, the data values are centred  around zero.**

**Please also note that the colour of the points in this graph may change slightly from the above graphs, as the data values are now centred  around zero and the normalized plot is plotted on a smaller range of values.**

</br>

###  The following is the Normalized Dataset visualised in 3D:

In [None]:
#Creating a 3D scatter plot, from the respective user inputted columns (columns are the same as the prvious section)
fig3DNormalized = go.Figure(data=go.Scatter3d(
    x=normalizedDF.iloc[:,x],
    y=normalizedDF.iloc[:,y],
    z=normalizedDF.iloc[:,z],
    mode='markers',
    marker=dict(color = normalizedDF.iloc[:,x],line=dict(width=2000, color='DarkSlateGrey'))
))

#Setting the respective title preferences
fig3DNormalized.update_layout(width=1000, height=1000, title = '3D Representation of Normalized Dataset from '+filename+':',
                  font_color="blue",font_family="verdana",
                  scene = dict(xaxis=dict(title=normalizedDF.columns[x], titlefont_color='blue'),
                               yaxis=dict(title=normalizedDF.columns[y], titlefont_color='blue'),
                               zaxis=dict(title=normalizedDF.columns[z], titlefont_color='blue')))
#Displaying plot
fig3DNormalized.show()

</br>

###  The following is the Normalized Dataset visualised in 2D:

In [None]:
#Creating a 2D scatter plot, from the respective user inputted columns
fig2DNormalized = px.scatter(x=normalizedDF.iloc[:,x] ,y=normalizedDF.iloc[:,y])
fig2DNormalized.update_layout(title = '2D Representation of Normalized Dataset from '+filename+':',
                    xaxis_title=normalizedDF.columns[x],
                    yaxis_title=normalizedDF.columns[y],
                    font_color="blue",font_family="verdana",coloraxis_showscale=False)
#Updating settings
fig2DNormalized.update_traces(marker=dict(color=normalizedDF.iloc[:,x],size =10,line=dict(width=2,color='DarkSlateGrey')),selector=dict(mode='markers'))
#Displaying plot
fig2DNormalized.show()

</br></br>

<a id='pcaSVD'></a>
### 7. Understanding PCA - SVD Approach

</br>

**What is Single Value Decomposition (SVD)?** </br>
The first step in Calculation of PCA via SVD Approach, was undertaken through the utilisation of the Singular Value Decomposition (SVD), which is a decomposition method aimed at factorising a matrix of **m x n** size into three components. The resultant components include **U** and **$V^T$**, which are two orthonormal matrices, and Sigma (**$\Sigma$**) which is a diagonal matrix containing the singular values of the original matrix. Additionally, the size/magnitude of each singular value signifies the importance in explaining the data [20]. For example, a singular value of 10 will have a higher importance than a singular value of 5. 

</br></br>

**Citations in this Section:** </br>

[20] M. E. Wall, A. Rechtsteiner, and L. M. Rocha, "Singular Value Decomposition and Principal Component Analysis," in Learning from Data: Concepts, Theory, and Methods, vol. 2, Springer, Boston, MA, 2007, pp. 151-176, doi: 10.1007/0-306-47815-3_5. [Online]. Available: https://www.researchgate.net/publication/2167923_Singular_Value_Decomposition_and_Principal_Component_Analysis. [Accessed: 18-Apr-2023].</br>

###  Taking a small subset of the entire dataset if dataset has a larger size than a respective threshold, and working out the PCA algorithm, via the SVD Approach.
</br>

**Note that the dataset is being reduced to a tenth of its size, whilst maintaining the number of columns, in order to aid the student to better understand the concept, and method of calculation, in case the dataset has a larger size than the respective threshold of 10000.**

In [None]:
#Taking a subset of the dataframe, normalizing it, and converting it to a numpy array, if the dataframe has a size more than 10000 
sizeThreshold=10000 
smallerDF=NormalizeDF(filteredContinousData[:int(len(normalizedDF)/10)])
#Checking for the size
if(len(normalizedDF)<=sizeThreshold):
    smallerDF=NormalizeDF(filteredContinousData)
SmalledDFMatrix=smallerDF.to_numpy()

</br><font size="+1">**SVD Decomposition of matrix A with size m x n results in:**</font></br></br>

<div style="text-align: center"><font size="+3"> $A = U.\Sigma.V^T$ </font></div></br>

<font size="+0.5">
    
**where:**
1. **U** - is an Orthonormal Matrix of size **m x m**
2. **$\Sigma$** - is a Diagonal Matrix of size **m x n**
2. **$V^T$** - is an Orthonormal Matrix of size **n x n**
</font></br>

**In the code cell below, Students are presented with the SVD Decomposition of the smaller dataset, done programmatically through the np.linalg.svd function. Note that the svd function presents the singular values only, and thus the matrix $\Sigma$ needs to be calculated, from such values whilst filling the empty slots with zeros.**

In [None]:
#Calculating the svd, via np.linalg.svd function
U, s, VT = np.linalg.svd(SmalledDFMatrix)
#Constructing Sigma Matrix
Sigma = np.zeros((smallerDF.shape[0], smallerDF.shape[1]))
Sigma[:smallerDF.shape[1], :smallerDF.shape[0]] = np.diag(s)

</br>

###  The following is the U Matrix from the SVD Decompostion on the Small Dataset:

In [None]:
pd.DataFrame(U)

</br>

###  The following is the Sigma Matrix from the SVD Decomposition on the Small Dataset:

In [None]:
pd.DataFrame(Sigma)

</br>

###  The following is the $V^T$ Matrix from the SVD Decompostion on the Small Dataset:

In [None]:
pd.DataFrame(VT)

</br>

###  The second step in the Calculation of PCA via SVD Approach, involves multiplying the U matrix by the $\Sigma$ matrix.
</br>

This is done as the multiplication of **$U.\Sigma$** presents a matrix whose columns give the projections of the data points on each principal axis.

**In the code cell below, Students are presented with the construction of the matrix containing the Principal axis.**

In [None]:
#Multiplying U by Sigma to obtain Principal Axis
pcaSmallData1=U@Sigma
pd.DataFrame(pcaSmallData1)

</br>

###  Calculating the Variance Ratio for each Principal Component in order to determine which are the best Principal Components in explaining the variation in the data
</br>

The use of variance ratios in PCA is done, in order to calculate the percentage of the overall variance in the data that each principal component contributes to. Each principal component in the PCA algorithm captures a specific amount of data variation, thus we can determine the percentage of the overall variation that each component accounts for by computing the variance ratio. Additionally, through the variance ratio we are able to determine which principal components are crucial for explaining the variation in the data. [21]</br>

**Therefore, we are able to lower the number of dimensions in the data whilst preserving a sizeable portion of the overall variance, ultimately simplifying the data.**</br>


</br><font size="+1">**The Variance Ratio is calculated through the following Formula (obtained from [21-22]):**</font></br></br></br>

<div style="text-align: center"><font size="+2.4"> $Variance Ratio = \frac{\lambda_i}{\Sigma \lambda_i}$ </font></div></br>

<font size="+0.5">
    
**where:**
1. **$\lambda_i$** - is the sum of the squared distance for each of the principal component
2. **$\Sigma \lambda_i$** - is the sum of all squared distances for all of the principal components
</font></br>

**In the code cell below, Students are presented with the calculation of the Variance Ratio for the current PCA configuration.**


</br></br>

**Citations in this Section:** </br>

[21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," in The Data Deluge: Can Libraries Cope with E-Science? Proceedings of a Conference Held at the Royal Society, London, UK, 4-5 November 2004, vol. 463, Royal Society Publishing, 2016, pp. 21-36. doi: 10.1098/rsta.2015.0202.[Online]. Available: https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202. [Accessed: 18-Apr-2023].</br>

[22] K. Guillaumier, "Linear Algebra in Data Science and PCA"</br>

In [None]:
#Creating an array to hold the square of each principal component
square=[0]*len(pcaSmallData1[0])
#Looping through all the principal components, and calculating the square for each component
for row in range(len(pcaSmallData1)):
    for col in range(len(pcaSmallData1[row])):
        square[col]+=(pcaSmallData1[row][col])**2 
#Calculating the total Square
totalSquare=sum(square)
#Calculating the Variance Ratio by dividing the square by the total square and multiplying by 100 to obtain a percentage
svdVarianceRatio=(square/totalSquare)*100

</br>

###  Visualising the Variance Ratio in the form of a Scree Plot
</br>
Through the use of the Scree Plot, which displays the percentage of variation explained by each primary component, we are able to calculate the number of components required to account for a specific percentage of the overall variance in the data [23].

The following methods can be used to determine the optimal number of principal components to retain [23]:
1. **Elbow Method** - This method of selection adopts to retain all the principal components prior to the curve plateau in the Scree Plot. Moreover, this method works by pinpointing the point on the Scree Plot where the curve plateaus, and then selecting the number of components before this point as the ideal number of components to maintain.
2. **Kaiser Rule** - This method of selection selects to retain all the principal components with eigenvalues which have at least a value of 1.
3. **Proportion of Variance Plot** - This method of selection chooses to retain all the principal components which represent a percentage (%) amount of the variance.



</br></br>

**Citations in this Section:** </br>

[23] S. Mangale, "Scree Plot," Medium, Aug. 28, 2020. [Online]. Available: https://sanchitamangale12.medium.com/scree-plot-733ed72c8608. [Accessed: 18-Apr-2023].</br>

</br>

###  Visual Representation of Scree Plot

In [None]:
#Creating a Scree Plot to visualise the importance of each Principal Component
figScreeSVD = go.Figure()
#Creating Line Graph
figScreeSVD.add_trace(go.Scatter(y=svdVarianceRatio))
#Creating Bar Graph, with values on top rounded to 2 decimal place
figScreeSVD.add_trace(go.Bar(marker_color=svdVarianceRatio,y=svdVarianceRatio, text=list(np.around(np.array(svdVarianceRatio),2)),textposition='outside'))
#Updating layout
figScreeSVD.update_layout(title = 'Scree Plot of PCA via SVD Approach of small dataset from '+filename+':',
                    yaxis_title="Percentage of Explained Variance",
                    xaxis_title="Components",
                    font_color="blue",font_family="verdana", showlegend=False)
#Displaying plot
figScreeSVD.show()

</br>

###  Visual Representation of Principal Components

</br>

**In the code sections below, the Best Principal Components obtained will be plotted on different axis, in order to show the correlation between the data, after the reduction in dimensions.**

</br>

###  Plotting the best three Principal Components which retain the Highest Variance in a 3D plot:

In [None]:
#Creating a 3D scatter plot, from the respective Principal Components
fig3DSVD = go.Figure(data=go.Scatter3d(
    x=pcaSmallData1.T[0],
    y=pcaSmallData1.T[1],
    z=pcaSmallData1.T[2],
    mode='markers',
    marker=dict(color = pcaSmallData1.T[0],line=dict(width=2000, color='DarkSlateGrey'))
))

#Setting the respective title preferences
fig3DSVD.update_layout(width=1000, height=1000, title = '3D Representation PCA Data via SVD Approach of small dataset from '+filename+':',
                  font_color="blue",font_family="verdana",
                  scene = dict(xaxis=dict(title="Principal Component 1", titlefont_color='blue'),
                               yaxis=dict(title="Principal Component 2", titlefont_color='blue'),
                               zaxis=dict(title="Principal Component 3", titlefont_color='blue')))
#Displaying plot
fig3DSVD.show()

</br>

###  Plotting the best two Principal Components which retain the Highest Variance in a 2D plot:

In [None]:
#Creating a 2D scatter plot, from the respective Principal Components
fig2DSVD = px.scatter(x=pcaSmallData1.T[0] ,y=pcaSmallData1.T[1])
fig2DSVD.update_layout(title = '2D Representation of PCA Data via SVD Approach of small dataset from '+filename+':',
                    xaxis_title="Principal Component 1",
                    yaxis_title="Principal Component 2",
                    font_color="blue",font_family="verdana",coloraxis_showscale=False)
#Updating settings
fig2DSVD.update_traces(marker=dict(color=pcaSmallData1.T[0],size =10,line=dict(width=2,color='DarkSlateGrey')),selector=dict(mode='markers'))
#Displaying plot
fig2DSVD.show()

</br></br>

<a id='pcaCovariance'></a>
### 8. Understanding PCA - Covariance Matrix Approach

</br>

**What is a Covariance Matrix?** </br>
Another approach which can be used to calculate PCA, is the Covariance Matrix Method. Notably, the first step in the implementation of such approach pertain to the development of the Covariance Matrix or also known as the **Covariance Variance Matrix**. The aforementioned matrix is a **n x n** symmetric matrix which is used to show the covariance values between adjacent pairs of items in a dataset of n attributes. Additionally, the diagonal elements of such matrix represent the variance of each element. [24] 

</br><font size="+1">**An example of a **3 x 3 Covariance Matrix** for a dataset containing 3 variable **x, y, z**:**</font></br></br></br>

<div style="text-align: center"><font size="+2.4">$$\begin{bmatrix} cov(x,x) & cov(x,y) & cov(x,z) \\ cov(y,x) & cov(y,y) & cov(y,z) \\ cov(z,x) & cov(z,y) & cov(z,z) \end{bmatrix}$$ </font></div></br>


</br><font size="+1">**Another notation for the Covariance Matrix above is as follows:**</font></br></br></br>

<div style="text-align: center"><font size="+2.4">$$\begin{bmatrix} var(x) & cov(x,y) & cov(x,z) \\ cov(y,x) & var(y) & cov(y,z) \\ cov(z,x) & cov(z,y) & var(z) \end{bmatrix}$$ </font></div></br></br></br>

</br><font size="+1">**Calculation of Covariance for two variables x and y can be facilitate through the following Formula:**</font></br></br></br>

<div style="text-align: center"><font size="+2.5"> $cov(x,y)=\frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{N}$ </font></div></br>

<font size="+0.5">
    
**where:**
1. **$\bar{x}$** - is the mean value of the x attribute
2. **$\bar{y}$** - is the mean value of the y attribute
3. **$x_i$** - is the current data value of the x attribute
4. **$y_i$** - is the current data value of the y attribute
5. **$\bar{N}$** - is the number of data points
</font></br>


**Interpreting Covariance:** </br>
Through Covariance we can determine the direction of the linear relationship between two attributes [25].
- In case both variables increase or decrease simultaneously, then their Covariance is positive.
- In case one variable increases whilst the other decreases simultaneously, then their Covariance is negative.


</br></br>

**Citations in this Section:** </br>

[24] CUEMATH, "Covariance Matrix", CUEMATH. [Online]. Available: https://www.cuemath.com/algebra/covariance-matrix/. [Accessed: 18-Apr-2023].</br>

[25] Minitab LLC, "Interpret the key results for Covariance", Minitab Support, 2022. [Online]. Available: https://support.minitab.com/en-us/minitab/21/help-and-how-to/statistics/basic-statistics/how-to/covariance/interpret-the-results/key-results/. [Accessed: 18-Apr-2023].</br>

###  Taking a small subset of the entire dataset if dataset has a larger size than a respective threshold, and working out the PCA algorithm, via the Covariance Approach.
</br>

**Note that the dataset is being reduced to a tenth of its size, whilst maintaining the number of columns, in order to aid the student to better understand the concept, and method of calculation, in case the dataset has a larger size than the respective threshold of 10000.**

In [None]:
#Taking a subset of the dataframe, normalizing it, and converting it to a numpy array, if the dataframe has a size more than 10000
sizeThreshold=10000 
smallerDF=NormalizeDF(filteredContinousData[:int(len(normalizedDF)/10)])
#Checking for the size
if(len(normalizedDF)<=sizeThreshold):
    smallerDF=NormalizeDF(filteredContinousData)
SmalledDFMatrix=smallerDF.to_numpy()

###  Construction of Covariance Matrix
</br>

**In the code cell below, Students are presented with the construction of the smaller dataset's Covariance Matrix, done programmatically through the np.cov function.**

In [None]:
#Calculating covariance matrix through np.cov function
covMatrix=np.cov(SmalledDFMatrix.T)
pd.DataFrame(covMatrix)

</br>

###  Computing the Eigenvectors and Eigenvalues for the Covariance Matrix
</br>

**Why do we compute the Eigen Decomposition for the Covariance Matrix?** </br>

The eigenvectors and eigenvalues of the Covariance Matrix are essentially the directions of the axis where there is the highest variance (most data), which we refer to as Principal Components. Furthermore, the variance held by each Principal Component is shown by the eigenvalues, which are essentially the coefficients associated to the eigenvectors [26]. Therefore, computing and sorting the eigenvectors and eigenvalues by the order of largest eigenvalue first, will provide the required Principal Components sorted by their importance.

</br>


**In the code cell below, Students are presented with the computation of the Eigen Decomposition of the Covariance Matrix through the .eig function, and sorting the eigenvalues and eigenvectors by the order of largest eigenvalue first.**

</br></br>

**Citations in this Section:** </br>

[26] Z. Jaadi, "A Step-by-Step Explanation of Principal Component Analysis (PCA)", Built In, 2023. [Online]. Available: https://builtin.com/data-science/step-step-explanation-principal-component-analysis. [Accessed: 18-Apr-2023].</br>

In [None]:
#Computing eigen decomposition for covariance matrix
eigenvalues, eigenvectors = LA.eigh(covMatrix)
# Sort the eigenvectors by descending eigenvalues
sortKey = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sortKey]
eigenvectors = eigenvectors[:, sortKey]

</br>

###  Calculating the PCA by multiplying the Normalised Dataframe with Sorted Eigenvectors of the Covariance Matrix

</br>

This is done as the multiplication of **Normalized Dataframe . Eigenvectors** presents a matrix whose columns give the projections of the data points on each principal axis.

**In the code cell below, Students are presented with the construction of the matrix containing the Principal axis, through the np.dot function.**

In [None]:
#Multiplying the normalized Dataframe with the eigenvectors of the Covariance Matrix
pcaSmallData2=SmalledDFMatrix@eigenvectors
pd.DataFrame(pcaSmallData2)
#Ordering By EigenValues
sortIndices = np.argsort(eigenvalues)[::-1]
pcaSmallData2 = pcaSmallData2[:,sortIndices]

# Convert pcaSmallData2 to a pandas DataFrame and print it
pd.DataFrame(pcaSmallData2)

</br>

###  Calculating the Variance Ratio for each Principal Component in order to determine which are the best Principal Components in explaining the variation in the data
</br>

The use of variance ratios in PCA is done, in order to calculate the percentage of the overall variance in the data that each principal component contributes to. Each principal component in the PCA algorithm captures a specific amount of data variation, thus we can determine the percentage of the overall variation that each component accounts for by computing the variance ratio. Additionally, through the variance ratio we are able to determine which principal components are crucial for explaining the variation in the data. [21]</br>

**Therefore, we are able to lower the number of dimensions in the data whilst preserving a sizeable portion of the overall variance, ultimately simplifying the data.**</br>


</br><font size="+1">**The Variance Ratio is calculated through the following Formula (obtained from [21-22]):**</font></br></br></br>

<div style="text-align: center"><font size="+2.4"> $Variance Ratio = \frac{\lambda_i}{\Sigma \lambda_i}$ </font></div></br>

<font size="+0.5">
    
**where:**
1. **$\lambda_i$** - is the sum of the squared distance for each of the principal component
2. **$\Sigma \lambda_i$** - is the sum of all squared distances for all of the principal components
</font></br>

**In the code cell below, Students are presented with the calculation of the Variance Ratio for the current PCA configuration.**


</br></br>

**Citations in this Section:** </br>

[21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," in The Data Deluge: Can Libraries Cope with E-Science? Proceedings of a Conference Held at the Royal Society, London, UK, 4-5 November 2004, vol. 463, Royal Society Publishing, 2016, pp. 21-36. doi: 10.1098/rsta.2015.0202.[Online]. Available: https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202. [Accessed: 18-Apr-2023].</br>

[22] K. Guillaumier, "Linear Algebra in Data Science and PCA"</br>

In [None]:
#Creating an array to hold the square of each principal component
square=[0]*len(pcaSmallData2[0])
#Looping through all the principal components, and calculating the square for each component
for row in range(len(pcaSmallData2)):
    for col in range(len(pcaSmallData2[row])):
        square[col]+=(pcaSmallData2[row][col])**2 
#Calculating the total Square
totalSquare=sum(square)
#Calculating the Variance Ratio by dividing the square by the total square and multiplying by 100 to obtain a percentage
covVarianceRatio=(square/totalSquare)*100

</br>

###  Visualising the Variance Ratio in the form of a Scree Plot
</br>
Through the use of the Scree Plot, which displays the percentage of variation explained by each primary component, we are able to calculate the number of components required to account for a specific percentage of the overall variance in the data [23].

The following methods can be used to determine the optimal number of principal components to retain [23]:
1. **Elbow Method** - This method of selection adopts to retain all the principal components prior to the curve plateau in the Scree Plot. Moreover, this method works by pinpointing the point on the Scree Plot where the curve plateaus, and then selecting the number of components before this point as the ideal number of components to maintain.
2. **Kaiser Rule** - This method of selection selects to retain all the principal components with eigenvalues which have at least a value of 1.
3. **Proportion of Variance Plot** - This method of selection chooses to retain all the principal components which represent a percentage (%) amount of the variance.



</br></br>

**Citations in this Section:** </br>

[23] S. Mangale, "Scree Plot," Medium, Aug. 28, 2020. [Online]. Available: https://sanchitamangale12.medium.com/scree-plot-733ed72c8608. [Accessed: 18-Apr-2023].</br>

</br>

###  Visual Representation of Scree Plot

In [None]:
#Creating a Scree Plot to visualise the importance of each Principal Component
figScreeCov = go.Figure()
#Creating Line Graph
figScreeCov.add_trace(go.Scatter(y=covVarianceRatio))
#Creating Bar Graph, with values on top rounded to 2 decimal place
figScreeCov.add_trace(go.Bar(marker_color=covVarianceRatio,y=covVarianceRatio, text=list(np.around(np.array(covVarianceRatio),2)),textposition='outside'))
#Updating layout
figScreeCov.update_layout(title = 'Scree Plot of PCA via Covariance Matrix Approach of small dataset from '+filename+':',
                    yaxis_title="Percentage of Explained Variance",
                    xaxis_title="Components",
                    font_color="blue",font_family="verdana", showlegend=False)
#Displaying plot
figScreeCov.show()

</br>

###  Visual Representation of Principal Components

</br>

**In the code sections below, the Best Principal Components obtained will be plotted on different axis, in order to show the correlation between the data, after the reduction in dimensions.**

</br>

###  Plotting the best three Principal Components which retain the Highest Variance in a 3D plot:

In [None]:
#Creating a 3D scatter plot, from the respective Principal Components
fig3DCov = go.Figure(data=go.Scatter3d(
    x=pcaSmallData2.T[0],
    y=pcaSmallData2.T[1],
    z=pcaSmallData2.T[2],
    mode='markers',
    marker=dict(color = pcaSmallData2.T[0],line=dict(width=2000, color='DarkSlateGrey'))
))

#Setting the respective title preferences
fig3DCov.update_layout(width=1000, height=1000, title = '3D Representation PCA Data via Covariance Matrix Approach of small dataset from '+filename+':',
                  font_color="blue",font_family="verdana",
                  scene = dict(xaxis=dict(title="Principal Component 1", titlefont_color='blue'),
                               yaxis=dict(title="Principal Component 2", titlefont_color='blue'),
                               zaxis=dict(title="Principal Component 3", titlefont_color='blue')))
#Displaying plot
fig3DCov.show()

</br>

###  Plotting the best three Principal Components which retain the Highest Variance in a 2D plot:

In [None]:
#Creating a 2D scatter plot, from the respective Principal Components
fig2DCov = px.scatter(x=pcaSmallData2.T[0] ,y=pcaSmallData2.T[1])
fig2DCov.update_layout(title = '2D Representation of PCA Data via Covariance Matrix Approach of small dataset from '+filename+':',
                    xaxis_title="Principal Component 1",
                    yaxis_title="Principal Component 2",
                    font_color="blue",font_family="verdana",coloraxis_showscale=False)
#Updating settings
fig2DCov.update_traces(marker=dict(color=pcaSmallData2.T[0],size =10,line=dict(width=2,color='DarkSlateGrey')),selector=dict(mode='markers'))
#Displaying plot
fig2DCov.show()

</br></br>

<a id='approachCompare'></a>
### 9. Comparisons Between Approaches

</br>
Although both approaches of computing the PCA algorithm essentially provide similar results, as can be seen in following graphs below, both approaches have their fair share of differences. For instance, in the SVD approach, one can compute the principal components directly by applying SVD decomposition on the original matrix. On the other hand, in the Covariance Matrix approach one needs to first compute the Covariance matrix, and then apply Eigen Decomposition in order to compute the principal components, making the process a lengthier one when compared to the SVD approach. Moreover, the Covariance approach also tends to be quite memory inefficient, due to the construction of the Covariance matrix, since the goal of the PCA is to reduce dimensionality, whilst in this approach one must first compute a larger matrix. Performance-wise, PCA with SVD outperforms PCA with covariance and is often quicker and more numerically stable. However, in some circumstances, such as when the data includes missing values or when the data is not centred, PCA with covariance may be chosen. [20]

</br>

**In the code sections below, the different Scree Plots and 2D Plots for each approach are shown next to each other, so that Students can compare both approaches together visually. Moreover, there might be some discrepancies between the 2D and 3D Plots between both approaches, as the graphs would be inverted since the direction of the eigenvectors would be different.**


</br></br>

**Citations in this Section:** </br>

[20] M. E. Wall, A. Rechtsteiner, and L. M. Rocha, "Singular Value Decomposition and Principal Component Analysis," in Learning from Data: Concepts, Theory, and Methods, vol. 2, Springer, Boston, MA, 2007, pp. 151-176, doi: 10.1007/0-306-47815-3_5. [Online]. Available: https://www.researchgate.net/publication/2167923_Singular_Value_Decomposition_and_Principal_Component_Analysis. [Accessed: 18-Apr-2023].</br>

In [None]:
figScreeSVD.show()

In [None]:
figScreeCov.show()

In [None]:
fig2DSVD.show()

In [None]:
fig2DCov.show()

</br></br>

<a id='pcaEntire'></a>
### 10. Working out PCA on the Entire Dataset 

</br>

The **PCA** algorithm implementations above, were implemented for the sole purpose to educate Students on how the algorithm functions. Nevertheless, utilisation of such algorithm does not require the lengthy implementation in the previous sections, as one can easily adopt to use the PCA function in the **scikit-learn library** through: **from sklearn.decomposition import PCA** import.
</br></br>

Additionally, it is interesting to know that sometimes the calculation of PCA via the NumPy library crashes the notebook when running on large datasets, whilst the PCA from the scikit-learn library does not. This is due, since in the scikit-learn library, if the input dataset has a size larger than 500 x 500, and the number of components to extract is less than 80% of the smallest dimension of the data, then a randomized SVD proposed by Halko [27] is utilised [28]. If not, the exact entire SVD is calculated and then could be truncated backwards [28]. Furthermore, utilisation of the NumPy library stores arrays in a contiguous block in memory, thus making the notebook crash in the case that the computer has insufficient memory [29]. 

</br>

In continuation, the **Randomized SVD** proposed by Halko, can approximate the whole SVD with a substantially lower computation cost by randomly selecting a fraction of the matrix's rows or columns [27]. Thus explaining, the mystery behind the enhanced efficiency in the scikit-learn approach, when compared to the NumPy approach.

</br>

**In the code sections below, show the implementation of the PCA algorithm via scikit-learn library.**


</br></br>

**Citations in this Section:** </br>

[27] N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” arXiv preprint arXiv:0909.4061, 2009. [Online]. Available: https://arxiv.org/abs/0909.4061. [Accessed: 18-Apr-2023].</br>

[28] Scikit-learn, “sklearn.decomposition.PCA", scikit-learn.org. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html. [Accessed: 18-Apr-2023].</br>

[29] M. Kumar, "Memory error in NumPy SVD," in IEEE, 2014. [Online]. Available: https://stackoverflow.com/questions/21180298/memory-error-in-numpy-svd. [Accessed: 18-Apr-2023].</br>

In [None]:
#Calling PCA algorithm from sklearn library, and feeding it the normalized Dataframe in its entirety
pca = PCA(n_components=len(normalizedDF.columns))
pca.fit(normalizedDF)
pcaData=pca.transform(normalizedDF)
#Transposing the matrix
pcaData=pcaData.transpose()

</br></br>

<a id='pcaVisualisations'></a>
### 11. PCA Visualisations


</br>

###  Visualising the Variance Ratio in the form of a Scree Plot
</br>
Through the use of the Scree Plot, which displays the percentage of variation explained by each primary component, we are able to calculate the number of components required to account for a specific percentage of the overall variance in the data [23].

The following methods can be used to determine the optimal number of principal components to retain [23]:
1. **Elbow Method** - This method of selection adopts to retain all the principal components prior to the curve plateau in the Scree Plot. Moreover, this method works by pinpointing the point on the Scree Plot where the curve plateaus, and then selecting the number of components before this point as the ideal number of components to maintain.
2. **Kaiser Rule** - This method of selection selects to retain all the principal components with eigenvalues which have at least a value of 1.
3. **Proportion of Variance Plot** - This method of selection chooses to retain all the principal components which represent a percentage (%) amount of the variance.



</br></br>

**Citations in this Section:** </br>

[23] S. Mangale, "Scree Plot," Medium, Aug. 28, 2020. [Online]. Available: https://sanchitamangale12.medium.com/scree-plot-733ed72c8608. [Accessed: 18-Apr-2023].</br>

In [None]:
#Creating a Scree Plot to visualise the importance of each Principal Component
figScreePca = go.Figure()
#Creating Line Graph (multiplying variance ratio by 100 to transform into percentage)
figScreePca.add_trace(go.Scatter(y=pca.explained_variance_ratio_*100))
#Creating Bar Graph, with values on top rounded to 2 decimal place
figScreePca.add_trace(go.Bar(marker_color=pca.explained_variance_ratio_*100,y=pca.explained_variance_ratio_*100, text=list(np.around(np.array(pca.explained_variance_ratio_*100),2)),textposition='outside'))
#Updating layout
figScreePca.update_layout(title = 'Scree Plot of PCA Algorithm on the entire dataset from '+filename+':',
                    yaxis_title="Percentage of Explained Variance",
                    xaxis_title="Components",
                    font_color="blue",font_family="verdana", showlegend=False)
#Displaying plot
figScreePca.show()

</br>

###  Visual Representation of Principal Components

</br>

**In the code sections below, the Best Principal Components obtained will be plotted on different axis, in order to show the correlation between the data, after the reduction in dimensions.**

</br>

###  Plotting the best three Principal Components which retain the Highest Variance in a 3D plot:

In [None]:
#Creating a 3D scatter plot, from the respective Principal Components
fig3DPca = go.Figure(data=go.Scatter3d(
    x=pcaData[0],
    y=pcaData[1],
    z=pcaData[2],
    mode='markers',
    marker=dict(color = pcaData[0],line=dict(width=2000, color='DarkSlateGrey'))
))

#Setting the respective title preferences
fig3DPca.update_layout(width=1000, height=1000, title = '3D Representation of PCA Algorithm on the entire dataset from '+filename+':',
                  font_color="blue",font_family="verdana",
                  scene = dict(xaxis=dict(title="Principal Component 1", titlefont_color='blue'),
                               yaxis=dict(title="Principal Component 2", titlefont_color='blue'),
                               zaxis=dict(title="Principal Component 3", titlefont_color='blue')))
#Displaying plot
fig3DPca.show()

</br>

###  Plotting the best two Principal Components which retain the Highest Variance in a 2D plot:

In [None]:
#Creating a 2D scatter plot, from the respective Principal Components
fig2DPca = px.scatter(x=pcaData[0] ,y=pcaData[1])
fig2DPca.update_layout(title = '2D Representation of PCA Algorithm on the entire dataset from '+filename+':',
                    xaxis_title="Principal Component 1",
                    yaxis_title="Principal Component 2",
                    font_color="blue",font_family="verdana",coloraxis_showscale=False)
#Updating settings
fig2DPca.update_traces(marker=dict(color=pcaData[0],size =10,line=dict(width=2,color='DarkSlateGrey')),selector=dict(mode='markers'))
#Displaying plot
fig2DPca.show()

</br>

###  Plotting the best Principal Component which retain the Highest Variance in a 1D plot:

In [None]:
#Creating a 1D scatter plot, from the respective Principal Components
fig1DPca = px.scatter(x=pcaData[0] ,y=[0]*len(pcaData[0]))
fig1DPca.update_layout(title = '1D Representation of PCA Algorithm on the entire dataset from '+filename+':',
                    xaxis_title="Principal Component 1",
                    yaxis_title="No Axis",
                    font_color="blue",font_family="verdana",coloraxis_showscale=False)
#Updating settings
fig1DPca.update_traces(marker=dict(color=pcaData[0],size =10,line=dict(width=2,color='DarkSlateGrey')),selector=dict(mode='markers'))
#Displaying plot
fig1DPca.show()

</br></br>

<a id='conclusion'></a>
### 12. Conclusions and Limitations of PCA

</br>

**Summary**</br>

In this notebook we have explained and delved deeply in the inner workings of the PCA algorithm. Furthermore, we began by comprehending the basic principles of PCA and how it may be applied for dimensionality reduction and feature selection. Additionally, we also covered in depth, the mathematical principles of PCA and the two approaches of calculating such algorithm , such as the **Covariance Matrix approach**, **SVD approach**, eigenvectors, and eigenvalues. Moreover, through the various visualisations tools presented, we were also able to perceive and understand various data patterns and distributions. Throughout this notebook we have also discussed greatly the different types of encoding algorithms, which can be used to transform discrete data in a way which can be interpreted by the PCA algorithm.

</br>

**Advantages of PCA [30]:**
1. **Easily calculable** - PCA relies on linear algebra methods which are mathematically simple to compute.
2. **Accelerating other Machine Learning Algorithms** - These algorithms would converge faster when trained on the principal components, rather than the original dataset.
3. **Minimising the issues of High-Dimensional Data** - Utilising the PCA algorithm to reduce the number of dimensions, would guarantee that predictive algorithms would not overfit, and thus such algorithms would learn the robust features in the data.

</br>

**Disadvantages of PCA [30]:**
1. **Principal Components are poorly interpreted** - After computing the principal components, it is quite challenging to determine which features in the dataset are the most crucial. 
2. **Information loss and Dimensionality reduction Trade-off** - Whilst employing PCA, one must choose a balanced trade-off between dimensionality reduction and information loss, as dimensionality reduction comes with the cost of information loss.

</br>

**Assumptions taken by PCA [30]:**
1. **PCA assumes that features are correlated** - The PCA algorithm cannot identify the principal components if the features are not correlated.
2. **PCA assumes the connection between features is linear** - The PCA algorithm is not applicable in capturing non-linear relationships.

</br>

**Limitations of PCA [30]:**
1. **The Scale of the features affects PCA** - If given unnormalised/unscaled data (some data will have a high variance, and some will have a low variance), PCA will load on the high variance data .
2. **PCA lacks resistance to outliers** - The PCA algorithm can be biased towards outliers in the dataset. Thus, it is recommended to remove outliers beforehand.


</br>

**In this notebook, we have explored the vanilla version of the PCA algorithm, however, there are various different other types of PCA.**

</br>

**Types of PCA [31-32]:**
1. **Sparse PCA** - Sparse PCA makes use of sparse loading, to attempt to generate models that are simple to understand.
2. **Randomized PCA** - Randomized PCA makes use of randomized singular value decomposition, in order to quickly approximatively determine the first K principal components. 
3. **Incremental PCA** - Incremental PCA splits the dataset in mini-batches, and proceeds to loads each mini-batch in memory, one at a time.
4. **Kernel PCA** - Kernel PCA is a method that projects the linearly inseparable data into a higher dimension where it is linearly separable using the so-called kernel trick. Moreover, there are various different kernels such as linear, polynomial, RBF, and sigmoid. **Furthermore, to solve one of the assumptions/limitations of PCA, i.e., PCA works only for linear datasets, the Kernel PCA can be utilised to resolve such issue.**
5. **Robust PCA** - Robust PCA is another version of PCA which is more resilient to outliers and data errors. Additionally, Robust PCA identifies the robust principal components in the existence of data faults and outliers.
</br>

**If you are interested to learn more about the different types of PCA, you might want to look at [31] as it explains the different types of PCA, whilst showing implementations of them in python.**

</br></br></br>

## <div style="text-align: center"> Thank you for your time and attention. Have a Good one :) </div>


</br></br>

**Citations in this Section:** </br>

[30] Keboola, "A Guide to Principal Component Analysis (PCA) for Machine Learning", keboola.com, Apr. 02, 2022. [Online]. Available: https://www.keboola.com/blog/pca-machine-learning. [Accessed: 18-Apr-2023].</br>

[31] N. B. Subramanian, "Types of PCA", aiaspirant.com. [Online]. Available: https://aiaspirant.com/types-of-pca/. [Accessed: 18-Apr-2023].</br>

[32] E. J. Candes, X. Li, Y. Ma, and J. Wright, "Robust Principal Component Analysis?", Journal of the ACM (JACM), vol. 58, no. 3, 2011. [Online]. Available: https://arxiv.org/pdf/0912.3599.pdf. [Accessed: 18-Apr-2023].</br>

</br></br>

<a id='references'></a>
### 13. References

</br>

[1] S. Mishra et al., "Multivariate Statistical Data Analysis-Principal Component Analysis," Int. J. Livest. Res., vol. 1, pp. 1-6, 2017, [Online]. Available: https://www.researchgate.net/publication/316652806_Principal_Component_Analysis. [Accessed: 18-Apr-2023].</br>


[2] D. Li and S. Liu, "4.2.3.1 Principal Component Analysis," in Water Quality Monitoring and Management: Basis, Technology and Case Studies, 1st ed., S. K. Gupta and R. Kumar, Eds. Amsterdam, Netherlands: Elsevier, 2019, [Online]. Available: https://www.sciencedirect.com/topics/agricultural-and-biological-sciences/principal-component-analysis. [Accessed: 18-Apr-2023].</br>

[3] DEVAKUMAR K. P., "COVID-19 Dataset", Kaggle, 2020. [Online]. Available: https://www.kaggle.com/datasets/imdevskp/corona-virus-report?select=country_wise_latest.csv. [Accessed: 18-Apr-2023].</br>
 
[4] UCI MACHINE LEARNING, "Pima Indians Diabetes Database", Kaggle, 2016. [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database. [Accessed: 18-Apr-2023].</br>

[5] S. BANERJEE, "FIFA - Football World Cup Dataset", Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/iamsouravbanerjee/fifa-football-world-cup-dataset?select=FIFA+-+2014.csv. [Accessed: 18-Apr-2023].</br>

[6] MATHNERD, "Iris Flower Dataset", Kaggle, 2018. [Online]. Available: https://www.kaggle.com/datasets/arshid/iris-flower-dataset. [Accessed: 18-Apr-2023].</br>

[7] S. BANERJEE, "Software Industry Salary Dataset - 2022", Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/iamsouravbanerjee/software-professional-salaries-2022. [Accessed: 18-Apr-2023].</br>

[8] R. Holbrook and A. Cook, "Principal Component Analysis, spotify.csv", Kaggle. [Online]. Available: https://www.kaggle.com/code/ryanholbrook/principal-component-analysis/data?select=spotify.csv. [Accessed: 18-Apr-2023].</br>

[9] RUTHGN, "Wine Quality Data Set (Red & White Wine)", Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/ruthgn/wine-quality-data-set-red-white-wine. [Accessed: 18-Apr-2023].</br>

[10] V. Karthik, "PCA for categorical features", Stack Overflow, Dec. 2016. [Online]. Available: https://stackoverflow.com/questions/40795141/pca-for-categorical-features#:~:text=PCA%20is%20designed%20for%20continuous,yes%2C%20you%20can%20use%20PCA. [Accessed: 18-Apr-2023].</br>

[11] Datagy. "Pandas get_dummies (One-Hot Encoding) Explained," Datagy.io, Feb. 2021. [Online]. Available: https://datagy.io/pandas-get-dummies/. [Accessed: 18-Apr-2023].</br>

[12] DataCamp. "Dealing with Categorical Data". DataCamp, 2021. [Online]. Available: https://www.datacamp.com/tutorial/categorical-data. [Accessed: 18-Apr-2023].</br>

[13] B. Roy, "All about Categorical Variable Encoding," Towards Data Science, Jul. 2, 2019. [Online]. Available: https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02. [Accessed: 18-Apr-2023].</br>

[14] T. Crosley, "What is the binary to decimal decoder?", Quora, May 8, 2018. [Online]. Available: https://www.quora.com/What-is-the-binary-to-decimal-decoder. [Accessed: 18-Apr-2023].</br>

[15] Pandas. "pandas.factorize()". pandas 1.4.0 documentation, Jan. 07, 2022. [Online]. Available: https://pandas.pydata.org/docs/reference/api/pandas.factorize.html. [Accessed: 18-Apr-2023].</br>

[16] J. Brownlee, "One-Hot Encoding for Categorical Data," Machine Learning Mastery, Aug. 17, 2020. [Online]. Available: https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/. [Accessed: 18-Apr-2023].</br>

[17] Vatsal, "Word2Vec Explained", Towards Data Science, Jul. 29, 2021. [Online]. Available: https://towardsdatascience.com/word2vec-explained-49c52b4ccb71. [Accessed: 18-Apr-2023].</br>

[18] Stack Exchange. "Why do we need to normalize data before Principal Component Analysis (PCA)?", Cross Validated, May 26, 2014. [Online]. Available: https://stats.stackexchange.com/questions/69157/why-do-we-need-to-normalize-data-before-principal-component-analysis-pca. [Accessed: 18-Apr-2023].</br>

[19] R. Sharma. "What is Normalization in Data Mining and How to Do It?", UpGrad, Sep. 22, 2022. [Online]. Available: https://www.upgrad.com/blog/normalization-in-data-mining/#:~:text=Project%20Ideas%20%26%20Topics-,Z%2DScore%20Normalization,up%20to%20%2B3%20standard%20deviation. [Accessed: 18-Apr-2023].</br>

[20] M. E. Wall, A. Rechtsteiner, and L. M. Rocha, "Singular Value Decomposition and Principal Component Analysis," in Learning from Data: Concepts, Theory, and Methods, vol. 2, Springer, Boston, MA, 2007, pp. 151-176, doi: 10.1007/0-306-47815-3_5. [Online]. Available: https://www.researchgate.net/publication/2167923_Singular_Value_Decomposition_and_Principal_Component_Analysis. [Accessed: 18-Apr-2023].</br>

[21] I. T. Jolliffe and J. Cadima, "Principal component analysis: a review and recent developments," in The Data Deluge: Can Libraries Cope with E-Science? Proceedings of a Conference Held at the Royal Society, London, UK, 4-5 November 2004, vol. 463, Royal Society Publishing, 2016, pp. 21-36. doi: 10.1098/rsta.2015.0202.[Online]. Available: https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202. [Accessed: 18-Apr-2023].</br>

[22] K. Guillaumier, "Linear Algebra in Data Science and PCA"</br>

[23] S. Mangale, "Scree Plot," Medium, Aug. 28, 2020. [Online]. Available: https://sanchitamangale12.medium.com/scree-plot-733ed72c8608. [Accessed: 18-Apr-2023].</br>

[24] CUEMATH, "Covariance Matrix", CUEMATH. [Online]. Available: https://www.cuemath.com/algebra/covariance-matrix/. [Accessed: 18-Apr-2023].</br>

[25] Minitab LLC, "Interpret the key results for Covariance", Minitab Support, 2022. [Online]. Available: https://support.minitab.com/en-us/minitab/21/help-and-how-to/statistics/basic-statistics/how-to/covariance/interpret-the-results/key-results/. [Accessed: 18-Apr-2023].</br>

[26] Z. Jaadi, "A Step-by-Step Explanation of Principal Component Analysis (PCA)", Built In, 2023. [Online]. Available: https://builtin.com/data-science/step-step-explanation-principal-component-analysis. [Accessed: 18-Apr-2023].</br>

[27] N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” arXiv preprint arXiv:0909.4061, 2009. [Online]. Available: https://arxiv.org/abs/0909.4061. [Accessed: 18-Apr-2023].</br>

[28] Scikit-learn, “sklearn.decomposition.PCA", scikit-learn.org. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html. [Accessed: 18-Apr-2023].</br>

[29] M. Kumar, "Memory error in NumPy SVD," in IEEE, 2014. [Online]. Available: https://stackoverflow.com/questions/21180298/memory-error-in-numpy-svd. [Accessed: 18-Apr-2023].</br>

[30] Keboola, "A Guide to Principal Component Analysis (PCA) for Machine Learning", keboola.com, Apr. 02, 2022. [Online]. Available: https://www.keboola.com/blog/pca-machine-learning. [Accessed: 18-Apr-2023].</br>

[31] N. B. Subramanian, "Types of PCA", aiaspirant.com. [Online]. Available: https://aiaspirant.com/types-of-pca/. [Accessed: 18-Apr-2023].</br>

[32] E. J. Candes, X. Li, Y. Ma, and J. Wright, "Robust Principal Component Analysis?", Journal of the ACM (JACM), vol. 58, no. 3, 2011. [Online]. Available: https://arxiv.org/pdf/0912.3599.pdf. [Accessed: 18-Apr-2023].</br>

</br></br></br>

</br>