# Principal Component  Analysis :

As there are as many principal components as there are variables in the data, principal components are constructed in such a manner that the first principal component accounts for the largest possible variance in the data set. For example, let’s assume that the scatter plot of our data set is as shown below, can we guess the first principal component ? Yes, it’s approximately the line that matches the purple marks because it goes through the origin and it’s the line in which the projection of the points (red dots) is the most spread out. Or mathematically speaking, it’s the line that maximizes the variance (the average of the squared distances from the projected points (red dots) to the origin).

<img src="https://builtin.com/sites/default/files/inline-images/Principal%20Component%20Analysis%20second%20principal.gif">

# Steps Involved in the PCA : 

Step 1: Standardize the dataset. 
    
Step 2: Calculate the covariance matrix for the features in the dataset. 
    
Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix. 
    
Step 4: Sort eigenvalues and their corresponding eigenvectors. 
    
Step 5: Pick k eigenvalues and form a matrix of eigenvectors. 
    
Step 6: Transform the original matrix.


In [None]:
import pandas as pd
import seaborn as sns 
import numpy as np 
import matplotlib.pyplot as plt 
from numpy.linalg import eig 
from sklearn.decomposition import PCA 
import plotly.express as px
%matplotlib inline

In [None]:
df = pd.read_csv("../input/class12/Class2.csv") 
df1 = pd.read_csv("../input/class12/Class2.csv")
df

In [None]:
df.drop(["id","gender"],axis=1,inplace=True)

# 1. Standardize the Dataset :
Assume we have the below dataset which has 4 features and a total of 100 training examples.

In [None]:
df

First, we need to standardize the dataset and for that, we need to calculate the mean and standard deviation for each feature.

<img src="https://miro.medium.com/max/333/1*X4YeGxtzOhnnOWBfoBBJfA.png">

In [None]:
df_std = df-df.mean()/df.std()

In [None]:
df_std

# Step 2: Create the covariance matrix :

Next, we’ll create the covariance matrix for this dataset using the numpy function cov(), specifying that bias = True so that we are able to calculate the population covariance matrix. 

In [None]:
df.cov()#covariance matrix

In [None]:
df.var()#variance of matrix

# 3. Calculate eigen values and eigen vectors :  

An eigenvector is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue is the factor by which the eigenvector is scaled. 
       
Let A be a square matrix (in our case the covariance matrix), ν a vector and λ a scalar that satisfies Aν = λν, then λ is called eigenvalue associated with eigenvector ν of A. 
    
Rearranging the above equation, 
    
                         Aν-λν =0 ; (A-λI)ν = 0


In [None]:
a = np.array(df).reshape(200,2)

In [None]:
values,vectors = eig(df.cov())

In [None]:
values

In [None]:
vectors

# Step 4: Sort eigenvalues and their corresponding eigenvectors.

In [None]:
eigen_vectors = pd.DataFrame(vectors,columns=['e1','e2','e3','e4']) 
eigen_vectors

In [None]:
eigen_values = pd.DataFrame(values.reshape(1,4),columns=['test1','test2','test3','test4'])

In [None]:
eigen_values

# Step 5: Pick k eigenvalues and form a matrix of eigenvectors

In [None]:
vectors = eigen_vectors[['e1','e2']].to_numpy()

# Step 6: Transform the original matrix.

In [None]:
pca_components = pd.DataFrame(np.array(df)@np.array(vectors),columns=['pc1','pc2'])

In [None]:
pca_components

In [None]:
pca = PCA(n_components=4)

In [None]:
components = pca.fit_transform(df) 
components

In [None]:
labels =  pca.explained_variance_ratio_*100

In [None]:
labels = {
    str(i): f"PC {i+1} ({var:.1f}%)"
    for i, var in enumerate(pca.explained_variance_ratio_ * 100)
} 
print(labels)

In [None]:
len(components[0:,2])

In [None]:
component = pd.DataFrame(components,columns=['A','B','C','D']) 

In [None]:
fig = px.scatter_matrix(components,labels = labels,dimensions=range(4),color=df1['gender']) 
fig.show()

# Linear Discriminant Analysis : 

Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs. 
    
Both Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are linear transformation techniques that are commonly used for dimensionality reduction. PCA can be described as an “unsupervised” algorithm, since it “ignores” class labels and its goal is to find the directions (the so-called principal components) that maximize the variance in a dataset. In contrast to PCA, LDA is “supervised” and computes the directions (“linear discriminants”) that will represent the axes that that maximize the separation between multiple classes.

<img src="https://sebastianraschka.com/images/blog/2014/linear-discriminant-analysis/lda_1.png"></img>

# Steps involved in the LDA approach :

Listed below are the 5 general steps for performing a linear discriminant analysis; we will explore them in more detail in the following sections.


1.Compute the d-dimensional mean vectors for the different classes from the dataset.
    
2.Compute the scatter matrices (in-between-class and within-class scatter matrix).
    
3.Compute the eigenvectors (ee1,ee2,...,eed) and corresponding eigenvalues (λλ1,λλ2,...,λλd) for the scatter matrices.
    
4.Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d×k dimensional matrix W (where every column represents an eigenvector).
    
5.Use this d×k eigenvector matrix to transform the samples onto the new subspace. This can be summarized by the matrix multiplication: YY=XX×WW (where XX is a n×d-dimensional matrix representing the n samples, and yy are the transformed n×k-dimensional samples in the new subspace).

<img src="https://lh4.googleusercontent.com/6NfVmmG39n41HvHiER7x-mBs8sjIDtAZnzZdt4cBUVU2Jw4chLOVEgYs28eqFq2w6P3Ow2sSDpFFkJ3VwCfqcEEqs_lbkEhjPZ36hOu-gAh6adJ5kgSnVgCA0LzDrCP4WeIhXAM">

# Assumptions of LDA :   
LDA assumes:

1.Each feature (variable or dimension or attribute) in the dataset is a gaussian distribution. In other words, each feature in the dataset is shaped like a bell-shaped curve. 

2. Each feature has the same variance, the value of each feature varies around the mean with the same amount on average. 


3. Each feature is assumed to be randomly sampled.

4. Lack of multicollinearity in independent features. Increase in correlations between independent features and the power of prediction decreases.

# **LDA TOPIC WILL BE UPDATED SOON...***