<a href="https://colab.research.google.com/github/jalalbamniya/DataSet/blob/main/SVD_Notbook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Singular Value Decomposition(SVD) in python**

$\textbf{Mandatory Key steps:}$

1. Loading the data set.
2. Use TruncatedSVD function in scikit learn to apply SVD for dimensionality reduction
3. Set the value of rank as a parameter for TruncatedSVD
4. Fit and transform the data set 

$\textbf{Optional steps}$

1. Investigate U, $\Sigma$ and V matrices
2. Finding the best rank for the given data set using Frobenius Norm

# Loading libraries and data set

https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html

In [None]:
import pandas as pd # read the csv file
from numpy import diag #
import numpy as np
from sklearn.decomposition import TruncatedSVD
ds = pd.read_csv('/content/Admission_Predict.csv')
print(ds)
data_old =ds.values
print(data_old)

     Serial No.  GRE Score  TOEFL Score  University Rating  SOP  LOR   CGPA  \
0             1        337          118                  4  4.5   4.5  9.65   
1             2        324          107                  4  4.0   4.5  8.87   
2             3        316          104                  3  3.0   3.5  8.00   
3             4        322          110                  3  3.5   2.5  8.67   
4             5        314          103                  2  2.0   3.0  8.21   
..          ...        ...          ...                ...  ...   ...   ...   
395         396        324          110                  3  3.5   3.5  9.04   
396         397        325          107                  3  3.0   3.5  9.11   
397         398        330          116                  4  5.0   4.5  9.45   
398         399        312          103                  3  3.5   4.0  8.78   
399         400        333          117                  4  5.0   4.0  9.66   

     Research  Chance of Admit   
0           1    

Calling TruncatedSVD function from scikit learn and fitting, transforming the data set into desired number of dimensions controlled  by parameter n_components of TruncatedSVD function 

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html#

In [None]:
svd = TruncatedSVD(n_components =8)
result= svd.fit_transform(data_old)
print(result)

[[ 2.99567949e+02  1.94672749e+02  2.58029147e+00 ... -7.79730248e-02
  -2.52561050e-01 -2.64569107e-01]
 [ 2.86839754e+02  1.85119960e+02 -3.46393735e+00 ... -3.56596480e-01
  -2.61456288e-01 -1.27591746e-02]
 [ 2.80199116e+02  1.79577619e+02 -4.07474027e+00 ... -2.55370682e-01
  -4.99107594e-01  3.82067129e-01]
 ...
 [ 5.10723062e+02 -1.41583093e+02  5.18241633e+00 ...  4.09442706e-01
  -1.13633774e-01  1.09939702e-03]
 [ 4.93476512e+02 -1.54096296e+02 -1.45204456e+00 ... -2.60811161e-01
   5.52617089e-01 -2.51317659e-01]
 [ 5.14463263e+02 -1.41528079e+02  5.11638697e+00 ...  7.34422494e-01
  -9.79552666e-02 -1.86170680e-01]]


In SVD the data set D  is decomposed into three matrices namely U,$\Sigma$ and V, i.e.,

$\begin{equation}
D= U \Sigma V^{T}
\end{equation}$

- The result of fit and transform functions under TruncatedSVD  is $\textbf{U $\Sigma$}$. The dot product of $\textbf{U $\Sigma$}$ gives us the new data point to be projected along vectors given by V.

- In order to find  $\Sigma$, $\textbf{singular_values_}$ under TruncatedSVD is used. 

- Matrix $\textbf{V_T}$ is obtained by using $\textbf{components_}$ function

- To obtain U, we can divide transformed data by values obtained by singular_values_

In [None]:
sigma = svd.singular_values_
v = svd.components_
u = result/svd.singular_values_

print(sigma.shape)
print(v.shape)
print(u.shape)
print(sigma)
print(v)
print(u)


(8,)
(8, 9)
(400, 8)
[7.89777483e+03 1.97646590e+03 6.74311708e+01 2.53513624e+01
 1.19636809e+01 9.66880450e+00 8.65617085e+00 5.11325904e+00]
[[ 5.47200459e-01  7.92334707e-01  2.68520912e-01  7.69309514e-03
   8.47496126e-03  8.63378351e-03  2.15204537e-02  1.37543970e-03
   1.82117703e-03]
 [-8.36982837e-01  5.16043161e-01  1.81120679e-01  7.31909367e-03
   7.44453765e-03  6.56134504e-03  1.41144361e-02  1.37287927e-03
   1.15876859e-03]
 [ 5.39453534e-03 -3.20543160e-01  9.18666067e-01  1.52822304e-01
   1.30512608e-01  9.24170559e-02  5.52106891e-02  3.06269483e-02
   1.89815508e-02]
 [-1.34340610e-03 -5.35496893e-02  2.23543022e-01 -6.19652867e-01
  -5.35087606e-01 -4.77393191e-01 -1.45178837e-01 -1.58534847e-01
  -5.25001987e-02]
 [ 3.47844296e-04  7.08187552e-03 -1.23239968e-02  7.34030720e-01
  -3.59109254e-01 -5.73201504e-01 -1.09170336e-02  5.77587080e-02
   4.69859017e-03]
 [ 3.34094668e-04  5.05578437e-03 -1.23794949e-02 -1.65711623e-01
   7.24721666e-01 -6.49969032e-01 -

# Testing if the dot product of U, Σ and V gives the original data set

In [None]:
s= diag(svd.singular_values_)
data_new = np.dot(result, v)
data_new = np.round(data_new,2)
print(data_new)
print(data_old)

[[  1.   337.   118.   ...   9.65   1.     0.88]
 [  2.   324.   107.   ...   8.87   1.     0.78]
 [  3.   316.   104.   ...   8.01   1.     0.64]
 ...
 [398.   330.   116.   ...   9.45   1.     0.93]
 [399.   312.   103.   ...   8.77  -0.     0.75]
 [400.   333.   117.   ...   9.66   1.     0.94]]
[[  1.   337.   118.   ...   9.65   1.     0.92]
 [  2.   324.   107.   ...   8.87   1.     0.76]
 [  3.   316.   104.   ...   8.     1.     0.72]
 ...
 [398.   330.   116.   ...   9.45   1.     0.91]
 [399.   312.   103.   ...   8.78   0.     0.67]
 [400.   333.   117.   ...   9.66   1.     0.95]]


Finding the best rank for the given data set using  Frobenius Norm. The Frobenius Norm has to be small

$\begin{equation}
\mid (A - B)_F \mid = \sqrt{\sum_{ij} (A_{ij} - B_{ij})^2}
\end{equation}$

Where A and B represents old and new data set respectively

In [None]:
data_diff = np.subtract(data_old, data_new)
data_squarediff = np.square(data_diff)
print('Frobenius Norm = ', np.sqrt(data_squarediff.sum()))

Frobenius Norm =  1.370547335920945
