# Resources

1. The book Chapter 8
2. A good [link](https://aiaspirant.com/types-of-pca/)

# Types of PCA
1. PCA
2. Sparse PCA
3. Randomized PCA
4. Incremental PCA
5. Kernel PCA

In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier  
from sklearn.metrics import f1_score

# [MNIST Database](https://en.wikipedia.org/wiki/MNIST_database)

In [None]:
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=True)

In [None]:
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=y,random_state=42)

In [None]:
for c in X.columns:
    print(c,X[c].min(),X[c].max())

In [None]:
print('Shape of X:', X.shape, '\n', 'Shape of y:', y.shape)

In [None]:
y.value_counts()

In [None]:
X_train.iloc[1].value_counts()

# Data Standardization 

In [None]:
sc = StandardScaler()
sc.fit(X_train)
X_train = sc.transform(X_train)
X_test = sc.transform(X_test)

# Baseline classifier with all features

In [None]:
RF=RandomForestClassifier()
RF.fit(X_train,y_train)
yhat=RF.predict(X_test)
f1_all=f1_score(y_test,yhat,average='macro')
print(f1_all)

# PCA-2-components

## PCA-2-training

In [None]:
def scatter_plot(X_trans, y):
    X_p = pd.DataFrame(data = X_trans, columns=['PC1','PC2'])
    X_p['Label'] = y
    sns.lmplot(x="PC1", y="PC2", hue="Label", data=X_p, fit_reg=False)
    ax = plt.gca()
    plt.show()

In [None]:
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
scatter_plot(X_train_pca, y)

## PCA-2-testing

In [None]:
X_test_pca = pca.transform(X_test)
scatter_plot(X_test_pca, y)

In [None]:
RF=RandomForestClassifier()
RF.fit(X_train_pca,y_train)
yhat=RF.predict(X_test_pca)
f1_pca_2=f1_score(y_test,yhat,average='macro')
print(f1_pca_2)

# PCA-0.98-explained_variance_ratio

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=y,random_state=42)
pca = PCA()
pca.fit(X_train)
total_explained_variance = pca.explained_variance_ratio_.cumsum()
print(total_explained_variance)

In [None]:
plt.plot(total_explained_variance)
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');

In [None]:
n_over_98 = len(total_explained_variance[total_explained_variance >= .98])
n_to_reach_98 = X_train.shape[1]-n_over_98 + 1
print('Number features: {}\tTotal Variance Explained: {}'.format(n_to_reach_98, total_explained_variance[n_to_reach_98-1]))

## Retrained the data and compare with the original

In [None]:
# Compute the components and projected images
pca = PCA(154)
pca.fit(X_train)
X_98 = pca.transform(X_train)
projected = pca.inverse_transform(X_98)

In [None]:
# Plot the results
fig, ax = plt.subplots(2, 10, figsize=(10, 2.5),
                       subplot_kw={'xticks':[], 'yticks':[]},
                       gridspec_kw=dict(hspace=0.1, wspace=0.1))
for i in range(10):
    ax[0, i].imshow(X_train.iloc[i].values.reshape((28,28)), cmap='binary_r')
    ax[1, i].imshow(projected[i].reshape((28,28)), cmap='binary_r')
    
ax[0, 0].set_ylabel('full-dim\ninput')
ax[1, 0].set_ylabel('150-dim\nreconstruction');

In [None]:
X_test_98 = pca.transform(X_test)

In [None]:
RF=RandomForestClassifier()
RF.fit(X_98,y_train)
yhat=RF.predict(X_test_98)
f1_pca_98=f1_score(y_test,yhat,average='macro')
print(f1_pca_98)

# [SPARSE PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.SparsePCA.html)

Contemporary datasets often have the number of input variables $p$ comparable with or even much larger than the number of samples $n$. It has been shown that if $p/n$ does not converge to zero, the classical PCA is not consistent. But sparse PCA can retain consistency even if $p>>n$. A particular disadvantage of ordinary PCA is that the principal components are usually linear combinations of all input variables. Sparse PCA overcomes this disadvantage by finding linear combinations that contain just a few input variables. Sparse PCA is a variant of PCA which attempts to produce easily interpretable models through sparse loading. 


Example, for machine learning problems like gene analytics, each axis might correspond to a specific gene. In such cases, if most of the entries in the loadings are zeros, we can easily interpret the model and understand the physical meaning of the loading as well as the principal components.


In [None]:
from sklearn.decomposition import SparsePCA
help(SparsePCA)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=y,random_state=42)
spca = SparsePCA(n_components=2, alpha=0.0001)
X_spca = spca.fit_transform(X_train)
 
scatter_plot(X_spca, y)

In [None]:
X_test_spca = spca.transform(X_test)
scatter_plot(X_test_spca, y)

# [Randomized PCA](https://scikit-learn.org/0.15/modules/generated/sklearn.decomposition.RandomizedPCA.html)


The classical PCA uses the low-rank matrix approximation to estimate the principal components. However, this method becomes costly and makes the whole process difficult to scale, for large datasets.


This method is Principal component analysis (PCA) using randomized SVD. Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular vectors to project the data to a lower dimensional space.

In [None]:
from sklearn.decomposition import PCA
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=y,random_state=42)


rpca = PCA(n_components=2, svd_solver='randomized')
X_rpca = rpca.fit_transform(X_train)
 
scatter_plot(X_rpca, y)

In [None]:
X_test_rpca = rpca.transform(X_test)
scatter_plot(X_test_rpca, y)

# INCREMENTAL PCA

Incremental PCA can be used when the dataset is too large to fit in the memory.

Here we split the dataset into mini-batches where each batch can fit into the memory and then feed it one mini-batch at a moment to the IPCA algorithm.

In [None]:
from sklearn.decomposition import IncrementalPCA
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=y,random_state=42)
 
ipca = IncrementalPCA(n_components=2, batch_size=500)
X_ipca = ipca.fit_transform(X_train)
 
scatter_plot(X_ipca, y)

In [None]:
X_test_ipca = ipca.transform(X_test)
scatter_plot(X_test_ipca, y)

# KERNEL PCA

PCA is a linear method. It works great for linearly separable datasets. However, if the dataset has non-linear relationships, then it produces undesirable results.

Kernel PCA is a technique which uses the so-called kernel trick and projects the linearly inseparable data into a higher dimension where it is linearly separable.

There are various kernels that are popularly used; some of them are linear, polynomial, RBF, and sigmoid.

Let’s create a dataset using sklearn’s make_circles which is not linearly separable.

In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles
from sklearn.decomposition import PCA, KernelPCA
 
X,y = make_circles(n_samples=500, factor=.1, noise=0.02, random_state=47)
 
plt.scatter(X[:,0], X[:,1], c=y)
plt.show()

## Applying PCA 

In [None]:
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
 
plt.title("PCA")
plt.scatter(X_pca[:,0], X_pca[:,1], c=y)
plt.xlabel("Component 1")
plt.ylabel("Component 2")
plt.show()

## Applying Kernel PCA

In [None]:
kpca = KernelPCA(kernel='rbf', gamma=1)
X_kpca = kpca.fit_transform(X)
 
plt.title("Kernel PCA")
plt.scatter(X_kpca[:,0], X_kpca[:,1], c=y)
plt.show()