Fortunately, not all features are created equal and the goal of feature extraction for dimensionality reduction is to transform our set of features, poriginal, such that we end up with a new set, pnew, where poriginal > pnew, while still keeping much of the underlying information. Put another way, we reduce the number of features with only a small loss in our data’s ability to generate high-quality predictions. In this chapter, we will cover a number of feature extraction techniques to do just this.

One downside of the feature extraction techniques we discuss is that the new features we generate will not be interpretable by humans. They will contain as much or nearly as much ability to train our models, but will appear to the human eye as a collection of random numbers. If we wanted to maintain our ability to interpret our models, dimensionality reduction through feature selection is a better option.

In [82]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA, KernelPCA, NMF, TruncatedSVD
from sklearn import datasets
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from scipy.sparse import csr_matrix

In [40]:
#Given a set of features, you want to reduce the number of features while retaining 
#the variance(distance from mean) in the data 
digits = datasets.load_digits()

features = StandardScaler()
features = features.fit_transform(digits.data)

pca = PCA(n_components=0.99, whiten=True)

features_pca = pca.fit_transform(features)

print("Original number of features:", features.shape) 
print("Reduced number of features:", features_pca.shape)

Original number of features: (1797, 64)
Reduced number of features: (1797, 54)


In [41]:
arr = np.array([[1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
              [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],
               [1,2,3,2,3,4,5,5,6,7,8,8,9,9,7,6,5,4,4],])

arr.shape

(16, 19)

In [52]:
#You suspect you have linearly inseparable data and want to reduce the dimensions
#make_circles makes linearly inseparable data; specifically, one class is surrounded on all sides by the other class.
features_kernal, _ = datasets.make_circles(n_samples=1000, random_state=1, noise=0.1, factor=0.1)

#KernelPCA Non-linear dimensionality reduction through the use of kernels
kpca = KernelPCA(kernel="rbf", gamma=15, n_components=1)
features_kpca = kpca.fit_transform(features_kernal)

print("Original number of features:", features_kernal.shape[1])
print("Reduced number of features:", features_kpca.shape[1])

Original number of features: 2
Reduced number of features: 1


In [76]:
#You want to reduce the features to be used by a classifier
iris = datasets.load_iris()
iris_features = iris.data
iris_target = iris.target

lda = LinearDiscriminantAnalysis(n_components=1)
features_lda = lda.fit(iris_features, iris_target).transform(iris_features)

print("Original number of features:", features.shape[1])
print("Reduced number of features:", features_lda.shape[1])
lda.explained_variance_ratio_

Original number of features: 4
Reduced number of features: 1


array([0.99147248])

In [81]:
#You have a feature matrix of nonnegative values and want to reduce the dimensionality
#Find two non-negative matrices (W, H) whose product approximates the non- negative matrix X. 
#This factorization can be used for example for dimensionality reduction, source separation or topic extraction
nmf_digits = datasets.load_digits()
nmf_features = nmf_digits.data

# Create, fit, and apply NMF
nmf = NMF(n_components=30, random_state=1)
features_nmf = nmf.fit_transform(nmf_features)

# Show results
print("Original number of features:", nmf_features.shape[1])
print("Reduced number of features:", features_nmf.shape[1])

Original number of features: 64
Reduced number of features: 30


In [85]:
#You have a sparse feature matrix and want to reduce the dimensionality
ss_digits = datasets.load_digits()
ss_features = StandardScaler().fit_transform(ss_digits.data)

# Make sparse matrix
features_sparse = csr_matrix(ss_features)

# Create a TSVD
tsvd = TruncatedSVD(n_components=10)

features_sparse_tsvd = tsvd.fit(features_sparse).transform(features_sparse)

# Show results
print("Original number of features:", features_sparse.shape[1])
print("Reduced number of features:", features_sparse_tsvd.shape[1])

Original number of features: 64
Reduced number of features: 10
