# PCA
Tenemos 4 puntos en 5 dimensiones representados por la siguiente matriz: 

<pre>
1	6	2	3	2
3	4	6	1	2
5	1	10	4	2
6	0	11	2	2
</pre>

Calcular la matriz de covarianza y aplicar PCA

In [6]:
%matplotlib inline
from sklearn.datasets import fetch_mldata
from sklearn.decomposition import PCA
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

import warnings
warnings.filterwarnings('ignore')

In [3]:
data = [[1, 6, 2, 3, 2], [3, 4, 6, 1, 2], [5, 1, 10 ,4, 2], [6, 0, 11, 2, 2]]

In [4]:
data

[[1, 6, 2, 3, 2], [3, 4, 6, 1, 2], [5, 1, 10, 4, 2], [6, 0, 11, 2, 2]]

# Aplicando PCA

In [5]:
# http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
explained_variance = .95
pca = PCA(explained_variance)

In [6]:
# realizamos una reduccion de dimensiones
lower_dimensional_data = pca.fit_transform(data)

In [8]:
principal_components = pca.n_components_


In [9]:
lower_dimensional_data

array([[ 6.74099246, -0.68376928],
       [ 1.92632049,  1.4731224 ],
       [-3.52194195, -1.41055566],
       [-5.145371  ,  0.62120253]])

In [16]:
covarianza = np.cov(data)

In [17]:
covarianza

array([[ 3.7 ,  0.3 , -3.65, -5.2 ],
       [ 0.3 ,  3.7 ,  4.15,  5.7 ],
       [-3.65,  4.15, 12.3 , 14.9 ],
       [-5.2 ,  5.7 , 14.9 , 19.2 ]])

In [18]:
pca_c = pca.fit_transform(covarianza)

In [19]:
pca_c

array([[ 18.37335339],
       [  4.20944876],
       [ -8.59175032],
       [-13.99105182]])

# Calculando la descomposicion SVD con scipy

In [1]:
# usaremos la implementacion base de svd
from scipy.linalg import svd # https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html
# por otro lado hay otras opciones como svds 
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html 
from scipy.sparse.linalg import svds
# la usaremos para verificar el calculo de autovalores/eigenvalues
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigs.html
from scipy.sparse.linalg import eigs

In [11]:
# calculamos la svd
u, s, vt = svd(data)

In [12]:
u.shape

(4, 4)

In [13]:
u

array([[-0.22116432, -0.85013696,  0.26640324, -0.39671501],
       [-0.39157469, -0.34342534, -0.70567948,  0.48035899],
       [-0.61644331,  0.12606306,  0.6092386 ,  0.48263245],
       [-0.64633834,  0.37872719, -0.24469185, -0.61557969]])

In [14]:
s

array([19.42929757,  6.97741707,  2.17516297,  0.2944707 ])

In [15]:
vt

array([[-0.43007911, -0.18064101, -0.82689205, -0.24774532, -0.19306109],
       [ 0.14651059, -0.90985819,  0.23874218, -0.23391601, -0.19742895],
       [-0.12532088, -0.28276496, -0.13814818,  0.93836691, -0.06871163],
       [-0.80128164,  0.08074946,  0.48789833, -0.03537073, -0.33486025],
       [-0.36852275, -0.23032672,  0.04606534, -0.04606534,  0.89827421]])

## Sea la siguiente matriz de 5x4:
<pre>
3	1	1	0
2	1	0	2
3	3	0	1
0	1	2	0
2	0	2	2
</pre>

Calcular la SVD de la matriz y luego identificar la opción correcta entre las siguientes:

a) El vector principal de la base es [-0.77,-0.45,-0.25,-0.36]

b) Reduciendo la matriz a 2 dimensiones tres puntos del set original quedan muy juntos formando un cluster.

c) El cuarto punto reducido a 3 dimensiones es [-0.15,-0.44,0]

d) El valor singular mas importante es 1.51

In [2]:
m = [[3, 1, 1, 0], [2, 1, 0, 2], [3, 3, 0, 1], [0, 1, 2, 0], [2, 0, 2, 2]]

In [3]:
# calculamos la svd
u, s, vt = svd(m)

In [8]:
print(u)

[[-0.46968068  0.00570064 -0.3083355   0.77855809 -0.27954262]
 [-0.42262917  0.06403416  0.54753326 -0.27696589 -0.66391373]
 [-0.62711442  0.54488918 -0.19579499 -0.30929506  0.41931393]
 [-0.14849486 -0.44030138 -0.68185617 -0.46931336 -0.31448545]
 [-0.43065233 -0.71070219  0.31917556  0.03491026  0.45425676]]


In [7]:
print(s)

[6.45017782 2.74947668 2.13135033 1.51424231]


In [9]:
print(vt)

[[-0.77469931 -0.45303371 -0.25239228 -0.36180048]
 [ 0.13036423  0.45976057 -0.83517948 -0.2722143 ]
 [ 0.10370242 -0.48328205 -0.48499616  0.72143121]
 [ 0.60999341 -0.5914551  -0.05959952 -0.52396259]]


In [4]:
from functools import reduce
reduce(lambda x, y: x*y, s)


57.23635208501677

In [7]:
x = [[1,6,2,3,2], [3,4,6,1,2], [5,1,10,4,2],[6,0,11,2,2]]
x = StandardScaler().fit_transform(x)
cov_matrix = 1/4*np.dot(x.transpose(), x)

In [8]:
cov_matrix

array([[ 1.        , -0.99626795,  0.99598426,  0.05822225,  0.        ],
       [-0.99626795,  1.        , -0.99326482, -0.14064217,  0.        ],
       [ 0.99598426, -0.99326482,  1.        ,  0.09416472,  0.        ],
       [ 0.05822225, -0.14064217,  0.09416472,  1.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])