**If you like my notebook, please upvote my work!**

**If you use parts of this notebook in your scripts/notebooks, giving some kind of credit for instance link back to this notebook would be very much appreciated. Thanks in advance! :)**

P.S:

Please make sure that you have plotly installed on your local machine.

Thankyou! :) Hope you like my work!

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/digit-recognizer/sample_submission.csv
/kaggle/input/digit-recognizer/test.csv
/kaggle/input/digit-recognizer/train.csv


# Importing important libraries.

In [2]:
import plotly.express as px
import plotly.graph_objects as go
from sklearn.decomposition import PCA

# What is PCA

PCA is one of the most popular methods for dimensionality reduction and a method of lossy data compression.

For more information about PCA read the medium article [https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c](http://)

# Importance of PCA

The reasons why one we we want to use PCA are:
1. It reduces the dimensionality in the data which significantly speeds up algorithms on the dataset.
2. It **sometimes** makes the data set more interpretable as lower dimensionality data are easier to understand and handle.
3. Significantly reduces the memory required for the storage of the dataset.
4. Ensures that the resulting components are independent of each other thus making it easier to understand the depedence between the variabels.
5. It brings out the true underlying dimensions in the data.

# Loading the Dataset.

In [3]:
df_train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
df_train.describe()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
count,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,...,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0,42000.0
mean,4.456643,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.219286,0.117095,0.059024,0.02019,0.017238,0.002857,0.0,0.0,0.0,0.0
std,2.88773,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,6.31289,4.633819,3.274488,1.75987,1.894498,0.414264,0.0,0.0,0.0,0.0
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,254.0,254.0,253.0,253.0,254.0,62.0,0.0,0.0,0.0,0.0


**About the dataset: **Our dataset which has been derived from the famous MNIST dataset consists of 784 data fields which contain pixel values from 0-255 inclusive and a target field "label".Thus in our data set, each image is a handwritten digit converted to greyscale image which is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total given as pixel values from 0-783. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. The taget field contains the digit in the handwritten image.

# Splitting the target and predictor variables.

In [4]:
df_train_x = df_train.drop('label',axis =1)
df_train_y = df_train[['label']]

# Visualising percentage variance loss.

## Fetching the variance ratios for PCA over the given dataset.

In [5]:
pca = PCA().fit(df_train_x)
pca.explained_variance_ratio_

array([9.74893769e-02, 7.16026628e-02, 6.14590336e-02, 5.37930200e-02,
       4.89426213e-02, 4.30321399e-02, 3.27705076e-02, 2.89210317e-02,
       2.76690235e-02, 2.34887103e-02, 2.09932543e-02, 2.05900116e-02,
       1.70255350e-02, 1.69278702e-02, 1.58112641e-02, 1.48323962e-02,
       1.31968789e-02, 1.28272708e-02, 1.18797614e-02, 1.15275473e-02,
       1.07219122e-02, 1.01519930e-02, 9.64902259e-03, 9.12846068e-03,
       8.87640859e-03, 8.38766308e-03, 8.11855855e-03, 7.77405747e-03,
       7.40635116e-03, 6.86661489e-03, 6.57982211e-03, 6.38798611e-03,
       5.99367016e-03, 5.88913410e-03, 5.64335178e-03, 5.40967048e-03,
       5.09221943e-03, 4.87504936e-03, 4.75569422e-03, 4.66544724e-03,
       4.52952464e-03, 4.44989164e-03, 4.18255277e-03, 3.97505755e-03,
       3.84541993e-03, 3.74919479e-03, 3.61013219e-03, 3.48522166e-03,
       3.36487802e-03, 3.20738135e-03, 3.15467117e-03, 3.09145543e-03,
       2.93709181e-03, 2.86541339e-03, 2.80759437e-03, 2.69618435e-03,
      

The variance ratio matrix denotes the variance due the eigen vectors in decreasing order(first ratio denotes the variance due to the top eigen vector)

## Plotting the loss in variance as we reduce number of components

In [6]:
a = []
s = 0
a.append([0,(1-s)*100,'Percentage varience lost is :'+str((1-s)*100)+'%'])
for i in range(len(pca.explained_variance_ratio_)):
    s+=pca.explained_variance_ratio_[i]
    a.append([i+1,(1-s)*100,
              'Percentage varience lost is : '+str((((1-s)*100)//0.0001)/10000)+'%'])
arr = pd.DataFrame(a)
arr = arr.rename(columns = {0:'No of components used:',
                            1:'Total varience lost (in percentage)'} )
px.line(data_frame = arr,x = 'No of components used:',
        y = 'Total varience lost (in percentage)',
        range_x = [0,784],range_y = [0,100],hover_name = 2,
        title='Graph depicting the loss in varience as we reduce the number of components.')

This graph depicts how the loss in variance decreases as we increase the number of components.
1. We can see that using only 100 components we can retain almost 92% varaiance in the data
2. As we increase the number of components the variance retained increases rapidly at first and then slowly afterwords.
3. If we keep increasing the number of components, eventually the variance loss becomes 0 at 784 components.

# Visualising the effect of PCA over input images.

## Training different PCA Models.

In [7]:
pca = []
pca784 = PCA(n_components = 784).fit(df_train_x)
pca.append(pca784)
pca10 = PCA(n_components = 10).fit(df_train_x)
pca.append(pca10)
pca20 = PCA(n_components = 20).fit(df_train_x)
pca.append(pca20)
pca50 = PCA(n_components = 50).fit(df_train_x)
pca.append(pca50)
pca100 = PCA(n_components = 100).fit(df_train_x)
pca.append(pca100)
pca200 = PCA(n_components = 200).fit(df_train_x)
pca.append(pca200)
pca300 = PCA(n_components = 300).fit(df_train_x)
pca.append(pca300)
pca500 = PCA(n_components = 500).fit(df_train_x)
pca.append(pca300)
pca

[PCA(copy=True, iterated_power='auto', n_components=784, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 PCA(copy=True, iterated_power='auto', n_components=20, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 PCA(copy=True, iterated_power='auto', n_components=50, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 PCA(copy=True, iterated_power='auto', n_components=100, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 PCA(copy=True, iterated_power='auto', n_components=200, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 PCA(copy=True, iterated_power='auto', n_components=300, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False),
 PCA(copy=True, iterated_power='auto', n_components=300, random_state=None,
     svd_solver='auto', tol=0.0, whiten=False)]

## Creating list of labels for the plot.

In [8]:
a = df_train_y['label'][0:20].to_numpy()
label = []
for i in a:
    label.append("The Label for the digit is: "+str(i))
a = []
a.append('Border')
a.append('Border')
for i in label:
    a.append('Border')
    for j in range (28):
        a.append(i)
    a.append('Border')
a.append('Border')
a.append('Border')
label = a
border = ['Border']*604
a = []
a.append(border)
a.append(border)
for i in range (8):
    a.append(border)
    for j in range(28):
        a.append(label)
    a.append(border)
a.append(border)
a.append(border)
label = a

## Creating the image matrix for the dataset.

In [9]:
numpy_train_x = df_train_x.to_numpy()
for i in range(8):
    pca_trans = pca[i].transform(numpy_train_x)
    pca_invtrans = pca[i].inverse_transform(pca_trans)
    for j in range(20):
        if j ==0:
            if i==0:
                a = numpy_train_x[0].reshape(28,28)
                a = np.pad(a, pad_width=1, mode='constant', constant_values=400)
                stack = a
            else:
                b = pca_invtrans[0].reshape(28,28)
                b = np.pad(b, pad_width=1, mode='constant', constant_values=450)
                stack = b
        else:
            if i==0:
                a = numpy_train_x[j].reshape(28,28)
                a = np.pad(a, pad_width=1, mode='constant', constant_values=400)
                stack = np.hstack((stack,a))
            else:
                b = pca_invtrans[j].reshape(28,28)
                b = np.pad(b, pad_width=1, mode='constant', constant_values=450)
                stack = np.hstack((stack,b))
    if i ==0:
        final = stack
    else:
        final = np.vstack((final,stack))
final = np.pad(final,pad_width=2, mode='constant', constant_values=500)

## Plotting the image matrix.

In [10]:
fig = go.Figure(data = go.Heatmap(z = final,colorbar = None,
                                  colorscale = [[0,'white'],[0.7,'black'],[0.8,'red'],
                                                [0.9,'blue'],[1.0,'rgb(255,0,255)']],
                                  zmin = 0,zmax = 500,zauto = False,hovertext = label))
fig['layout']['yaxis']['autorange'] = "reversed"
fig.update_layout(title = 'The Distortion induced due to PCA while using different number of components.',
                  height  = 600,width = 1200,xaxis_dtick = 30,xaxis_tick0 = 15,
                  yaxis_tickvals = [15,45,75,105,135,165,195,225],
                  yaxis_ticktext =[' ','10','20','50','100','200','300','500'],
                  yaxis_title = 'No. of Components used for PCA: ',
                  xaxis_tickvals = [ 16,  46,  76, 106, 136, 166, 196, 226, 256, 286, 316,
                                    346, 376, 406, 436, 466, 496, 526, 556, 586],
                  xaxis_ticktext = ['Sample_1','Sample_2','Sample_3','Sample_4',
                                    'Sample_5','Sample_6','Sample_7','Sample_8',
                                    'Sample_9','Sample_10','Sample_11','Sample_12',
                                    'Sample_13','Sample_14','Sample_15','Sample_16',
                                    'Sample_17','Sample_18','Sample_19','Sample_20'],
                  xaxis_title = 'Samples (in red boxes are originals while in blue are their PCA transforms.)')
fig.update_traces(showscale = False)
fig.show()

1. In the matrix above the first row consists of the original image where as the rows below contain the image when using the top k eigen vectors or the top k components. 
2. Thus we can see that as the number of components increases the images also get more and more similar to the original.
3. For construction of any model we can take the optimal value of n_components = 300 as we can see that the difference between the original and the transformed image is very very subtle.
4. If we take the value of n as 300,we can reduce the number of dimensions from 784 to 300 which can speed up our machine learning algorithms by anywhere from 3 to 10 times without creating any significant difference in results obtained.
5. The images obtained using 500 components seem same as the original miages but if we hover over the image to check the z values for the pixels we can see that they are quite different.