**pca** is a python package to perform Principal Component Analysis and to create insightful plots. The core of PCA is build on sklearn functionality to find maximum compatibility when combining with other packages.

But this package can do a lot more. Besides the regular pca, it can also perform **SparsePCA**, and **TruncatedSVD**. Depending on your input data, the best approach will be choosen.

Other functionalities are:
  * **Biplot** to plot the loadings
  * Determine the **explained variance**
  * Extract the best performing **features**
  * Scatter plot with the **loadings**
  * Outlier detection using **Hotelling T2 and/or SPE/Dmodx**

This notebook will show some examples.

More information can be found here:

* [github pca](https://github.com/erdogant/pca)



In [None]:
!pip install pca

Lets check the version

In [None]:
import pca
print(pca.__version__)

Import the pca library

In [None]:
from pca import pca
import numpy as np
import pandas as pd

Here we will create a random dataset.

In [None]:
# Dataset
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)

Initialize using specified parameters. The parameters here are the default parameters.

In [None]:
# Initialize
model = pca(n_components=3, normalize=True)

In [None]:
# Fit transform
out = model.fit_transform(X)

In [None]:
# Make plot with only the directions (no scatter)
fig, ax = model.biplot(label=True, legend=False)

In [None]:
# Make plot with only the directions (no scatter)
fig, ax = model.biplot(cmap=None, label=False, legend=False)

In [None]:
from pca import pca
# Load example data
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)

In [None]:
# Initialize
model = pca(n_components=3)
# Fit using PCA
results = model.fit_transform(X)

In [None]:
# Make plots
fig, ax = model.scatter()
fig, ax = model.plot()
fig, ax = model.biplot()
fig, ax = model.biplot(SPE=True, hotellingt2=True)


In [None]:
# 3D plots
fig, ax = model.scatter3d()
fig, ax = model.biplot3d()
fig, ax = model.biplot3d(SPE=True, hotellingt2=True)

In [None]:
# Normalize out PCs
X_norm = model.norm(X)

In [None]:
print(X_norm)