## 1 Reducing Features Using Principal Components
### Problem
Given a set of features, you want to reduce the number of features while retaining the
variance in the data.
### Solution
Use principal component analysis with scikit’s PCA:

In [1]:
# Load libraries
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets
# Load the data
digits = datasets.load_digits()
# Standardize the feature matrix
features = StandardScaler().fit_transform(digits.data)
# Create a PCA that will retain 99% of variance
pca = PCA(n_components=0.99, whiten=True)
# Conduct PCA
features_pca = pca.fit_transform(features)
# Show results
print("Original number of features:", features.shape[1])
print("Reduced number of features:", features_pca.shape[1])


Original number of features: 64
Reduced number of features: 54


## 2 Reducing Features When Data Is Linearly Inseparable
### Problem
You suspect you have linearly inseparable data and want to reduce the dimensions.
### Solution
Use an extension of principal component analysis that uses kernels to allow for nonlinear dimensionality reduction:

In [2]:
# Load libraries
from sklearn.decomposition import PCA, KernelPCA
from sklearn.datasets import make_circles
# Create linearly inseparable data
features, _ = make_circles(n_samples=1000, random_state=1, noise=0.1, factor=0.1)
# Apply kernal PCA with radius basis function (RBF) kernel
kpca = KernelPCA(kernel="rbf", gamma=15, n_components=1)
features_kpca = kpca.fit_transform(features)
print("Original number of features:", features.shape[1])
print("Reduced number of features:", features_kpca.shape[1])

Original number of features: 2
Reduced number of features: 1


## 3 Reducing Features by Maximizing Class Separability
### Problem
You want to reduce the features to be used by a classifier.
### Solution
Try linear discriminant analysis (LDA) to project the features onto component axes
that maximize the separation of classes:

In [3]:
# Load libraries
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# Load Iris flower dataset:
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create and run an LDA, then use it to transform the features
lda = LinearDiscriminantAnalysis(n_components=1)
features_lda = lda.fit(features, target).transform(features)
# Print the number of features
print("Original number of features:", features.shape[1])
print("Reduced number of features:", features_lda.shape[1])

Original number of features: 4
Reduced number of features: 1


In [4]:
lda.explained_variance_ratio_

array([0.9912126])

In [5]:
# Create and run LDA
lda = LinearDiscriminantAnalysis(n_components=None)
features_lda = lda.fit(features, target)
# Create array of explained variance ratios
lda_var_ratios = lda.explained_variance_ratio_
# Create function
def select_n_components(var_ratio, goal_var):
    # Set initial variance explained so far
    total_variance = 0.0
    # Set initial number of features
    n_components = 0
    # For the explained variance of each feature:
    for explained_variance in var_ratio:
        # Add the explained variance to the total
        total_variance += explained_variance
        # Add one to the number of components
        n_components += 1
        # If we reach our goal level of explained variance
        if total_variance >= goal_var:
            # End the loop
            break
    # Return the number of components
    return n_components
# Run function
select_n_components(lda_var_ratios, 0.95)

1

## 4 Reducing Features Using Matrix Factorization
### Problem
You have a feature matrix of nonnegative values and want to reduce the dimensionality.
### Solution
Use non-negative matrix factorization (NMF) to reduce the dimensionality of the feature matrix:

In [6]:
# Load libraries
from sklearn.decomposition import NMF
from sklearn import datasets
# Load the data
digits = datasets.load_digits()
# Load feature matrix
features = digits.data
# Create, fit, and apply NMF
nmf = NMF(n_components=10, random_state=1)
features_nmf = nmf.fit_transform(features)
# Show results
print("Original number of features:", features.shape[1])
print("Reduced number of features:", features_nmf.shape[1])

Original number of features: 64
Reduced number of features: 10




`V = WH
where V is our d × _n feature matrix (i.e., d features, n observations), W is a d × r,
and H is an r × n matrix. By adjusting the value of r we can set the amount of dimensionality
reduction desired.`

## 5 Reducing Features on Sparse Data
### Problem
You have a sparse feature matrix and want to reduce the dimensionality.
### Solution
Use Truncated Singular Value Decomposition (TSVD):

In [7]:
# Load libraries
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import TruncatedSVD
from scipy.sparse import csr_matrix
from sklearn import datasets
import numpy as np
# Load the data
digits = datasets.load_digits()
# Standardize feature matrix
features = StandardScaler().fit_transform(digits.data)
# Make sparse matrix
features_sparse = csr_matrix(features)
# Create a TSVD
tsvd = TruncatedSVD(n_components=10)
# Conduct TSVD on sparse matrix
features_sparse_tsvd = tsvd.fit(features_sparse).transform(features_sparse)
# Show results
print("Original number of features:", features_sparse.shape[1])
print("Reduced number of features:", features_sparse_tsvd.shape[1])


Original number of features: 64
Reduced number of features: 10
