```
From: https://github.com/ksatola
Version: 0.0.1

TODOs
1. https://machinelearningmastery.com/calculate-principal-component-analysis-scratch-python/
2. https://machinelearningmastery.com/dimensionality-reduction-for-machine-learning/

```

# Dimensionality Reduction - PCA
[Principal Component Analysis (or PCA)](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) uses linear algebra to transform the dataset into a compressed form. Generally this is called a `data reduction technique`. A property of PCA is that you can choose the number of `dimensions` or `principal components` in the transformed result.

In [1]:
# Use PCA and select 3 principal components

# Feature Extraction with PCA
import numpy
from pandas import read_csv
from sklearn.decomposition import PCA

# Load data
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]

# Feature extraction
pca = PCA(n_components=3)
fit = pca.fit(X)

# Summarize components
print(f"Explained Variance: {fit.explained_variance_ratio_}")
print(fit.components_)

Explained Variance: [0.88854663 0.06159078 0.02579012]
[[-2.02176587e-03  9.78115765e-02  1.60930503e-02  6.07566861e-02
   9.93110844e-01  1.40108085e-02  5.37167919e-04 -3.56474430e-03]
 [-2.26488861e-02 -9.72210040e-01 -1.41909330e-01  5.78614699e-02
   9.46266913e-02 -4.69729766e-02 -8.16804621e-04 -1.40168181e-01]
 [-2.24649003e-02  1.43428710e-01 -9.22467192e-01 -3.07013055e-01
   2.09773019e-02 -1.32444542e-01 -6.39983017e-04 -1.25454310e-01]]


You can see that the transformed dataset (3 principal components) bare little resemblance to the source data.