## 1. Set Data For Training - Breast Cancer Dataset

- Dataset: Breast Cancer Wisconsin (Diagnostic)
- Classes: 2 (Malignant=0, Benign=1)
- Original features: 30 (cell nucleus characteristics)
- Reduced features: 2-8 (using PCA - tuned)
- Random seed: 42
- Scale: MinMaxScaler to [-π, π] for quantum compatibility
- **Data Split: 60% Train / 20% Validation / 20% Test**
- **Step-by-step hyperparameter tuning using validation set**
- SVM Pipeline with preprocessing and systematic parameter optimization

In [40]:
from sklearn import datasets
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt


cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target

print(f"Original X shape: {X.shape}")



# First split: 60% train, 40% temp
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.4, random_state=42, stratify=y
)

# Second split: 20% validation, 20% test (from the 40% temp)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp
)


# STEP 1: Scale first (features have different ranges, need normalization)
scaler = MinMaxScaler(feature_range=(-np.pi, np.pi))
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)


# STEP 2: Apply PCA for dimensionality reduction (30 → 8 features initially)
n_components = 8
pca = PCA(n_components=n_components)
X_train_pca = pca.fit_transform(X_train_scaled)
X_val_pca = pca.transform(X_val_scaled)
X_test_pca = pca.transform(X_test_scaled)


print(f"Final shapes:")
print(f"  X_train: {X_train_pca.shape}")
print(f"  X_val: {X_val_pca.shape}")
print(f"  X_test: {X_test_pca.shape}")

Original X shape: (569, 30)
Final shapes:
  X_train: (341, 8)
  X_val: (114, 8)
  X_test: (114, 8)


In [41]:
np.savez_compressed(
    "../data/cancer_8features_pi_data.npz",
    X_train=X_train_pca,
    y_train=y_train,
    X_val=X_val_pca,
    y_val=y_val,
    X_test=X_test_pca,
    y_test=y_test
)

In [42]:
data = np.load("../data/cancer_8features_pi_data.npz")

X_train = data['X_train']
y_train = data['y_train']
X_val = data['X_val']
y_val = data['y_val']
X_test = data['X_test']
y_test = data['y_test']

In [43]:
import sys
import os
sys.path.append(os.path.join(os.path.dirname(os.getcwd())))

In [44]:
from sklearn.metrics.pairwise import rbf_kernel
from src.utils import calculate_accuracy

rbf_K_train = rbf_kernel(X_train)
rbf_K_val = rbf_kernel(X_val, X_train)
rbf_K_test = rbf_kernel(X_test, X_train)

classical_val_acc, classical_test_acc, _ = calculate_accuracy(
    rbf_K_train, rbf_K_val, rbf_K_test,
    y_train, y_val, y_test
)
print("Val acc | Test acc")
print(f"{classical_val_acc} | {classical_test_acc}")

Val acc | Test acc
0.9736842105263158 | 0.956140350877193
