#   CIFAR-10 Classifier Documentation Using SVM

Author: Filip Gębala

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
from sklearn import svm
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

## SVM Model
Support Vector Machine (SVM) is a supervised classification method that finds the optimal hyperplane separating data in feature space. The model aims to maximize the margin — the distance between the hyperplane and the nearest data points from both classes — leading to better generalization and resistance to noise.

When applied to CIFAR-10 image data, RGB images are first transformed into tensors and then flattened into feature vectors. Due to the high dimensionality of the data (3072 features per image), before classification, standardization and dimensionality reduction using Principal Component Analysis (PCA) are applied. PCA reduces the data to 300 dimensions, which not only speeds up training but also helps remove noise and correlations between features.

The SVM classifier uses the Radial Basis Function (RBF) kernel, which allows it to model non-linear decision boundaries. The model is trained with the regularization parameter C=10 and an automatically tuned gamma set to "scale", ensuring a good fit with moderate overfitting risk. Training is done on a subset of 100,000 training samples, and testing is performed on 1,000 samples from the CIFAR-10 test set.

Thanks to dimensionality reduction and standardization, the classifier achieves decent accuracy considering the limitations of not using specialized feature extraction (like in CNNs). The SVM model works well as a baseline classifier or a component of larger classification systems, especially when combined with dimensionality reduction and preprocessing techniques.

## Data Preparation
Similar to CNNs, data is loaded using torchvision. However, no augmentation or normalization is applied because the data will later be transformed using StandardScaler. Additionally, since the full CIFAR-10 dataset contains 50,000 samples, we use a subset of the data (e.g., 10,000) to shorten the training time for SVM. Each image is flattened into a vector for use as input to the SVM.

In [None]:
# === TRANSFORMATION ===
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465),
                         (0.2023, 0.1994, 0.2010))
])

# === LOAD DATASET (train on a sample) ===
trainset = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)

# Sample a small subset
def get_data(dataset, n_samples):
    loader = torch.utils.data.DataLoader(dataset, batch_size=n_samples, shuffle=True)
    images, labels = next(iter(loader))
    images = images.view(images.size(0), -1)
    return images.numpy(), labels.numpy()

X_train, y_train = get_data(trainset, 100000)
X_test, y_test = get_data(testset, 1000)

## Preprocessing: Standardization and PCA
Standardizing the data is crucial when using SVM. After that, dimensionality reduction is performed using PCA, which speeds up training and improves algorithm stability. The dimensionality is reduced from 3072 to 300.

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# === PCA ===
pca = PCA(n_components=300)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

## Training the SVM Model
The SVM is trained using the RBF (Gaussian) kernel. The C=10 parameter sets the penalty for classification errors, and gamma='scale' automatically adjusts the RBF kernel width to the data. Training may take a few minutes depending on the number of samples and the performance of the computer.

In [None]:
# === TRAINING ===
clf = svm.SVC(kernel='rbf', C=10, gamma='scale')
print("Training...")
clf.fit(X_train_pca, y_train)

## Testing the Model

In [None]:
# === TESTING ===
y_pred = clf.predict(X_test_pca)
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

## Plots

1. Explained Variance by PCA
<br>
To evaluate how much of the variance is explained by the reduced components, we can visualize the cumulative explained variance using PCA:

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(np.cumsum(pca.explained_variance_ratio_), marker='o')
plt.title("Cumulative Explained Variance by PCA")
plt.xlabel("Number of Components")
plt.ylabel("Cumulative Variance")
plt.grid(True)
plt.tight_layout()
plt.show()

2. PCA for Visualization in 2D
<br>
We can visualize the data in two dimensions to get an intuitive understanding of how well PCA has separated the data:

In [None]:
pca_2d = PCA(n_components=2)
X_vis = pca_2d.fit_transform(X_train)

plt.figure(figsize=(10, 6))
scatter = plt.scatter(X_vis[:, 0], X_vis[:, 1], c=y_train, cmap='tab10', alpha=0.5, s=10)
plt.legend(*scatter.legend_elements(), title="Classes")
plt.title("Training Data after PCA (2 Components)")
plt.xlabel("PC 1")
plt.ylabel("PC 2")
plt.tight_layout()
plt.show()

3. Confusion Matrix 
<br>
To further analyze the performance of the model, the confusion matrix can provide detailed insights into misclassifications:

In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=trainset.classes, yticklabels=trainset.classes)
plt.xlabel("Prediction")
plt.ylabel("True Label")
plt.title("Confusion Matrix - SVM on CIFAR-10")
plt.tight_layout()
plt.show()

## Results and Summary
SVM can achieve good results even without deep learning, especially with proper dimensionality reduction (PCA) and well-tuned parameters. Compared to CNNs, SVM is less prone to overfitting, but its effectiveness might be limited by the lack of non-linear, multi-layered data representations.

In this case, unfortunately, SVM did not perform well — achieving an accuracy of ~55%. This is mainly due to the large image size, which distorts the hyperplane significantly.