# Introduction

In this lab, we will learn how to concatenate multiple feature extraction methods using Python's scikit-learn library. We will use the **FeatureUnion** transformer to combine features obtained by PCA and univariate selection. Combining features using this transformer has the benefit that it allows cross-validation and grid searches over the whole process.

# Import Libraries

In [1]:
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

# Load the Dataset

Next, we will load the iris dataset using the **load_iris** function.

In [2]:
iris = load_iris()

X, y = iris.data, iris.target

# Feature Extraction

Since the iris dataset is high-dimensional, we will perform feature extraction using PCA and univariate selection.

# PCA

We will use PCA to reduce the dimensionality of the dataset.

In [3]:
pca = PCA(n_components=2)

# Univariate Selection

We will use univariate selection to select the most significant features.

In [4]:
selection = SelectKBest(k=1)

# Combined Features

We will combine the features obtained from PCA and univariate selection using the **FeatureUnion transformer**.

In [5]:
combined_features = FeatureUnion([
    ('pca', pca),
    ('univ_select', selection),
])

# Transformed Dataset

We will use the combined features to transform the dataset.

In [6]:
X_features = combined_features.fit(X, y).transform(X)
print('Combined space has', X_features.shape[1], 'features')

Combined space has 3 features


# Model Training

We will train a support vector machine (SVM) model using the transformed dataset.

In [8]:
svm = SVC(kernel='linear')

# Grid Search

We will perform a grid search over the hyperparameters of the pipeline using GridSearchCV.

In [10]:
pipeline = Pipeline([("features", combined_features), ("svm", svm)])

param_grid = dict(
    features__pca__n_components=[1, 2, 3],
    features__univ_select__k=[1, 2],
    svm__C=[0.1, 1, 10],
)

grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)

Fitting 5 folds for each of 18 candidates, totalling 90 fits
[CV 1/5; 1/18] START features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV 1/5; 1/18] END features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1;, score=0.933 total time=   0.0s
[CV 2/5; 1/18] START features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV 2/5; 1/18] END features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1;, score=0.933 total time=   0.0s
[CV 3/5; 1/18] START features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV 3/5; 1/18] END features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1;, score=0.867 total time=   0.0s
[CV 4/5; 1/18] START features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV 4/5; 1/18] END features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1;, score=0.933 total time=   0.0s
[CV 5/5; 1/18] START features__pca__n_components=1, features__univ_select__k=1, svm__C=

# Summary

In this lab, we learned how to concatenate multiple feature extraction methods using Python's scikit-learn library. We used the **FeatureUnion** transformer to combine features obtained by PCA and univariate selection. We also trained a support vector machine (SVM) model and performed a grid search over the hyperparameters of the pipeline.