In [1]:
import pandas as pd
from sklearn.decomposition import KernelPCA
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

Load and pre-process the dataset.

In [2]:
NUM_FEATURES = 224
TRAIN_DATA = '../datasets/datasetTV.csv'

train = pd.read_csv(TRAIN_DATA, header=None)

feature_columns = [f'feature_{i+1}' for i in range(NUM_FEATURES)]
train.columns = feature_columns + ['label']

X_train = train[feature_columns]
y_train = train['label']

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

Kernels are functions that transform input data into a higher-dimensional space where linear separability might be achieved. We will test different kernels, such as RBF (Gaussian kernel), polynomial, sigmoid, and cosine, to evaluate whether our data is linearly separable. For each kernel, we will test various $\gamma$ values, $\gamma$ is a parameter that controls the influence of individual data points.

In [3]:
kernels = ['rbf', 'poly', 'sigmoid', 'cosine']
gamma_values = [0.001, 0.01, 0.1, 1, 10]

In order to check whether the transformed data from the kernel is linearly separable, we will use the **silhouette score** as a metric. 
The silhouette score quantifies how well a data point fits into its class and ranges from -1 to 1:
- 1: Perfectly matched to its class (linearly separable classes).
- 0: On the boundary between classes.
- -1: Misclassified, closer to another class (non-linearly separable classes).

In [4]:
score = silhouette_score(X_train_scaled, y_train)
print(f"Original Dataset: Silhouette Score: {score:.2f}")

for kernel in kernels:
    print(f"\nTesting Kernel: {kernel}")
    for gamma in gamma_values:
        try:
            kpca = KernelPCA(n_components=50, kernel=kernel, gamma=gamma, random_state=42)
            X_kpca = kpca.fit_transform(X_train_scaled)
            
            # Calculate silhouette score
            score = silhouette_score(X_kpca, y_train)
            print(f"Kernel: {kernel}, Gamma: {gamma}, Silhouette Score: {score:.2f}")
            
        except Exception as e:
            print(f"Kernel: {kernel}, Gamma: {gamma} - Failed with error: {e}")

Original Dataset: Silhouette Score: 0.01

Testing Kernel: rbf
Kernel: rbf, Gamma: 0.001, Silhouette Score: 0.03
Kernel: rbf, Gamma: 0.01, Silhouette Score: 0.03
Kernel: rbf, Gamma: 0.1, Silhouette Score: -0.37
Kernel: rbf, Gamma: 1, Silhouette Score: -0.03
Kernel: rbf, Gamma: 10, Silhouette Score: -0.03

Testing Kernel: poly
Kernel: poly, Gamma: 0.001, Silhouette Score: 0.03
Kernel: poly, Gamma: 0.01, Silhouette Score: -0.00
Kernel: poly, Gamma: 0.1, Silhouette Score: -0.07
Kernel: poly, Gamma: 1, Silhouette Score: -0.09
Kernel: poly, Gamma: 10, Silhouette Score: -0.09

Testing Kernel: sigmoid
Kernel: sigmoid, Gamma: 0.001, Silhouette Score: 0.03
Kernel: sigmoid, Gamma: 0.01, Silhouette Score: 0.03
Kernel: sigmoid, Gamma: 0.1, Silhouette Score: 0.03
Kernel: sigmoid, Gamma: 1, Silhouette Score: 0.03
Kernel: sigmoid, Gamma: 10, Silhouette Score: 0.04

Testing Kernel: cosine
Kernel: cosine, Gamma: 0.001, Silhouette Score: 0.04
Kernel: cosine, Gamma: 0.01, Silhouette Score: 0.04
Kernel: co

As we can see from the results above, the silhouette scores of both the original and transformed data range from -0.37 to 0.04. This indicates that our data is not linearly separable.