<a href="https://colab.research.google.com/github/keymemory/Machine_Learning_Second/blob/main/ML2_day3_1_PCA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

PCA 구하는 방법을 이용하여 예제 데이터 셋에 적용

In [None]:
import numpy as np

In [None]:
np.random.seed(4)
m = 60
w1, w2 = 0.1, 0.3
noise = 0.1

In [None]:
angles = np.random.rand(m) * 3 * np.pi / 2 - 0.5
X = np.empty((m, 3))
X[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * np.random.randn(m) / 2
X[:, 1] = np.sin(angles) * 0.7 + noise * np.random.randn(m) / 2
X[:, 2] = X[:, 0] * w1 + X[:, 1] * w2 + noise * np.random.randn(m)

In [None]:
print('X.shape:', X.shape)

X.shape: (60, 3)


PCA를 구하기 위해서는 먼저 공분산 행렬

In [None]:
X_cen = X - X.mean(axis=0)  # 평균을 0으로
X_cov = np.dot(X_cen.T, X_cen) / 59

print(X_cov)

[[0.69812855 0.17640539 0.12137931]
 [0.17640539 0.1801727  0.07253614]
 [0.12137931 0.07253614 0.04552382]]


In [None]:
# np.cov()를 이용해 구할 수도 있다.
print(np.cov(X_cen.T))

[[0.69812855 0.17640539 0.12137931]
 [0.17640539 0.1801727  0.07253614]
 [0.12137931 0.07253614 0.04552382]]


공분산 행렬 X_cov에 대해 np.linalg.eig를 이용해 eigenvalue(w)와 eigenvector(v)를 구할 수 있다.

In [None]:
w, v = np.linalg.eig(X_cov)

print('eigenvalue :', w)

eigenvalue : [0.77830975 0.1351726  0.01034272]


In [None]:
print('eigenvector :\n', v)

eigenvector :
 [[ 0.93636116  0.34027485 -0.08626012]
 [ 0.29854881 -0.90119108 -0.31420255]
 [ 0.18465208 -0.2684542   0.94542898]]


In [None]:
print('explained variance ratio :', w / w.sum())

explained variance ratio : [0.84248607 0.14631839 0.01119554]


주성분 벡터를 이용해 투영한 뒤 분산의 비율 즉, Explained Variance Ratio를 계산한 코드

In [None]:
# PC1에 projection
pc1 = v[:, 0]
proj1 = np.dot(X, pc1)

# PC2에 projection
pc2 = v[:, 1]
proj2 = np.dot(X, pc2)

# PC3에 projection
pc3 = v[:, 2]
proj3 = np.dot(X, pc3)

In [None]:
proj_list = np.array([proj1.var(), proj2.var(), proj3.var()])

print('variance(==eigenvalue) :', proj_list)

variance(==eigenvalue) : [0.76533792 0.13291972 0.01017034]


In [None]:
print('explained variance ratio :', proj_list / proj_list.sum())

explained variance ratio : [0.84248607 0.14631839 0.01119554]
