# Kernel-Based Principal Component Analysis (Kernel PCA)

Kernel Principal Component Analysis (Kernel PCA) is an extension of the traditional PCA algorithm that allows for nonlinear dimensionality reduction by using kernel methods. The goal of Kernel PCA is to perform dimensionality reduction in a higher-dimensional feature space implicitly defined by a kernel function, without the need to explicitly compute the transformed features.

## Kernel Function and Gram Matrix

Consider $N$ feature vectors $\{\mathbf{x}_n\}$ in $\mathbb{R}^M$ and a kernel function $K(\mathbf{x}_n, \mathbf{x}_m)$ that computes the inner product in the transformed feature space:

$$
 K(\mathbf{x}_n, \mathbf{x}_m) = (\phi(\mathbf{x}_n))^T \phi(\mathbf{x}_m)
$$

for some mapping $\phi(\cdot)$. The corresponding $N \times N$ Gram matrix $\mathbf{A}$ is defined as:

$$
[\mathbf{A}]_{n,m} = K(\mathbf{x}_n, \mathbf{x}_m)
$$

The matrix $\mathbf{A}$ is symmetric and non-negative definite. The aim is to replace each transformed vector $\phi(\mathbf{x}_n)$ with a reduced vector $\phi'(\mathbf{x}_n)$ in $\mathbb{R}^{M'}$ where $M' \ll M$.

## Centering the Data

To perform Kernel PCA, we start by centering the transformed feature vectors. The sample mean of the transformed vectors is:

$$ 
\bar{\phi} = \frac{1}{N} \sum_{n=0}^{N-1} \phi(\mathbf{x}_n)
$$

The centered transformed vectors are:

$$
\phi_c(\mathbf{x}_n) = \phi(\mathbf{x}_n) - \bar{\phi}, \quad n = 0, 1, 2, \ldots, N-1
$$

The Gram matrix of the centered variables, denoted by $\mathbf{A}_c$, can be related to the original Gram matrix $\mathbf{A}$ as follows:

$$
[\mathbf{A}_c]_{n,m} = K(\mathbf{x}_n, \mathbf{x}_m) - \frac{1}{N} \sum_{k=0}^{N-1} K(\mathbf{x}_n, \mathbf{x}_k) - \frac{1}{N} \sum_{k=0}^{N-1} K(\mathbf{x}_k, \mathbf{x}_m) + \frac{1}{N^2} \sum_{k=0}^{N-1} \sum_{j=0}^{N-1} K(\mathbf{x}_k, \mathbf{x}_j)
$$

## Eigenvalue Decomposition

We then proceed with dimensionality reduction in the kernel domain. The sample covariance matrix of the centered and transformed feature vectors is:

$$
\hat{\mathbf{R}} = \frac{1}{N-1} \sum_{n=0}^{N-1} \phi_c(\mathbf{x}_n) \phi_c(\mathbf{x}_n)^T
$$

We consider its eigen-decomposition:

$$
\hat{\mathbf{R}} = \mathbf{U} \mathbf{\Lambda} \mathbf{U}^T
$$

where $\mathbf{U}$ is orthogonal and $\mathbf{\Lambda}$ is diagonal with non-negative entries. The challenge is that we cannot directly compute the matrix $\hat{\mathbf{R}}$ because it requires the explicit computation of the transformed features $\phi(\mathbf{x}_n)$. Instead, we solve the eigenvalue problem using the Gram matrix $\mathbf{A}_c$.



## Kernel PCA Algorithm

1. **Compute the Gram matrix**:
   $$ 
   [\mathbf{A}]_{n,m} = K(\mathbf{x}_n, \mathbf{x}_m)
   $$

2. **Center the Gram matrix**:
   $$
   \mathbf{A}_c = \mathbf{A} - \frac{1}{N} \mathbf{A} \mathbf{1}_N - \frac{1}{N} \mathbf{1}_N \mathbf{A} + \frac{1}{N^2} \mathbf{1}_N \mathbf{A} \mathbf{1}_N
   $$

3. **Compute the eigenvalues and eigenvectors of the centered Gram matrix**:
   Solve $\mathbf{A}_c \mathbf{a} = \lambda \mathbf{a}$

4. **Construct the reduced features**:
   For each feature vector $\mathbf{x}_n$, compute the reduced features using the eigenvectors $\mathbf{a}_k$:
   $$
   \phi'_k(\mathbf{x}_n) = \frac{1}{\sqrt{\lambda_k}} \sum_{m=0}^{N-1} a_k(m) K_c(\mathbf{x}_n, \mathbf{x}_m)
   $$



# Conclusion

Kernel PCA provides a powerful method for nonlinear dimensionality reduction by leveraging kernel functions. This approach allows us to perform PCA in a high-dimensional feature space without explicitly computing the transformed features, enabling us to capture complex patterns and structures in the data.

In the next steps, we will implement the Kernel PCA algorithm using Python and demonstrate its application on a non-linear dataset.

In [1]:
import plotly.express as px
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.decomposition import KernelPCA

# Fetch the dataset
satimages = fetch_openml(data_id=182, as_frame=True)
X, y = satimages.data, satimages.target

# Display dataset information
print(X.info())


  warn(


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6430 entries, 0 to 6429
Data columns (total 36 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Aattr    6430 non-null   float64
 1   Battr    6430 non-null   float64
 2   Cattr    6430 non-null   float64
 3   Dattr    6430 non-null   float64
 4   Eattr    6430 non-null   float64
 5   Fattr    6430 non-null   float64
 6   A1attr   6430 non-null   float64
 7   B2attr   6430 non-null   float64
 8   C3attr   6430 non-null   float64
 9   D4attr   6430 non-null   float64
 10  E5attr   6430 non-null   float64
 11  F6attr   6430 non-null   float64
 12  A7attr   6430 non-null   float64
 13  B8attr   6430 non-null   float64
 14  C9attr   6430 non-null   float64
 15  D10attr  6430 non-null   float64
 16  E11attr  6430 non-null   float64
 17  F12attr  6430 non-null   float64
 18  A13attr  6430 non-null   float64
 19  B14attr  6430 non-null   float64
 20  C15attr  6430 non-null   float64
 21  D16attr  6430 

In [9]:

# Preprocessing
numeric_features        = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features    = X.select_dtypes(include=['object']).columns

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(), categorical_features)
    ])

# Apply preprocessing
X_preprocessed = preprocessor.fit_transform(X)

y = satimages.target
label_mapping = {label: i for i, label in enumerate(np.unique(y))}
y_numeric = np.array([label_mapping[label] for label in y])

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_preprocessed, y_numeric, test_size=0.2, random_state=42)

# Pipeline with KernelPCA and Logistic Regression
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('kpca', KernelPCA(n_components=5000, kernel='rbf', gamma=None)),
    ('classifier', LogisticRegression(max_iter=1000))
])

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:")
print(classification_report(y_test, y_pred))


Accuracy: 0.8950233281493002
Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.99      0.97       289
           1       0.98      0.97      0.98       160
           2       0.91      0.94      0.93       270
           3       0.70      0.58      0.63       139
           4       0.91      0.82      0.86       136
           5       0.84      0.90      0.87       292

    accuracy                           0.90      1286
   macro avg       0.88      0.87      0.87      1286
weighted avg       0.89      0.90      0.89      1286

