# COMP4318/5318 Assignment 2: Image Classification

### Group number: 42  , SID1: 520330054 , SID2: ..., SID3: ... 

This template notebook includes code to load the  dataset and a skeleton for the main sections that should be included in the notebook. Please stick to this struture for your submitted notebook.

Please focus on making your code clear, with appropriate variable names and whitespace. Include comments and markdown text to aid the readability of your code where relevant. See the specification and marking criteria in the associated specification to guide you when completing your implementation.

## Setup and dependencies
Please use this section to list and set up all your required libraries/dependencies and your plotting environment. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, top_k_accuracy_score
from sklearn.decomposition import PCA
from sklearn.svm import SVC 
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV, train_test_split, StratifiedKFold

## 1. Data loading, exploration, and preprocessing


Code to load the dataset is provided in the following cell. Please proceed with your data exploration and preprocessing in the remainder of this section.

In [None]:
# Load the dataset training and test sets as numpy arrays
# assuming Assignment2Data folder is present in the same directory 
# as the notebook
X_train: np.ndarray = np.load('Assignment2Data/X_train.npy')
y_train: np.ndarray = np.load('Assignment2Data/y_train.npy')
X_test: np.ndarray = np.load('Assignment2Data/X_test.npy')
y_test: np.ndarray = np.load('Assignment2Data/y_test.npy')

In [None]:
# Normalize
X_train = X_train / 255.0
X_test = X_test / 255.0

In [None]:
# Flatten for SVM
X_train_flat = X_train.reshape(len(X_train), -1).astype(np.float32)
X_test_flat  = X_test.reshape(len(X_test), -1).astype(np.float32)

### Examples of preprocessed data
Please print/display some examples of your preprocessed data here.

In [None]:
cifar10_label_to_text = {
    0: "airplane",
    1: "automobile",
    2: "bird", 
    3: "cat",
    4: "deer", 
    5: "dog",
    6: "frog", 
    7: "horse", 
    8: "ship", 
    9: "truck",
}

In [None]:
print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape} type: {type(y_train[0])}")
print(f"X_test shape: {X_test.shape}")
print(f"y_test shape: {y_test.shape} type: {type(y_test[0])}")

In [None]:
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(9, 3))

for i, (dataset, title) in enumerate([(y_train, 'training'), (y_test, 'test')]):
    labels, counts = np.unique(dataset, return_counts=True)
    cmap = plt.get_cmap('viridis')
    colors = [cmap(i / len(labels)) for i in range(len(labels))]
    labels_text = [cifar10_label_to_text[label] for label in labels]

    axs[i].barh(labels_text, counts, color=colors)
    axs[i].set_title(f'Classes in {title} set')
    axs[i].set_xlabel('Count')

fig.supylabel('Classes')
plt.tight_layout()

In [None]:
N_ROWS = 5; N_COLS = 8
fig, axs = plt.subplots(nrows=N_ROWS, ncols=N_COLS, figsize=(10, 7))

for i in range(N_ROWS):
    for j in range(N_COLS):
        axs[i][j].imshow(X_train[i*N_COLS+j])
        axs[i][j].axis('off')
        axs[i][j].set_title(cifar10_label_to_text[y_train[i*N_COLS+j]])

plt.show()

## 2. Algorithm design and setup

### Algorithm of choice from first six weeks of course

In [28]:
svm_pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(n_components=0.8)),
    ("svc", SVC(kernel="rbf")),
])

svm_pipe.fit(X_train_flat, y_train)
y_predicted = svm_pipe.predict(X_test_flat)

KeyboardInterrupt: 

In [None]:
print(classification_report(y_test,y_predicted))

### Fully connected neural network

### Convolutional neural network

## 3. Hyperparameter tuning

### Algorithm of choice from first six weeks of course

In [None]:
param_grid = {
    "pca__n_components": [0.4, 0.8],
    "svc__C": [0.1, 1, 10],
    "svc__gamma": ["scale", 1e-3, 1e-2],
}
clf_pipe = GridSearchCV(
    svm_pipe, param_grid=param_grid, cv=StratifiedKFold(shuffle=True), return_train_score=True, verbose=3
)
clf_pipe.fit(X_train_flat, y_train)
y_predicted = clf_pipe.predict(X_test_flat)

In [None]:
print(f"Best params: {clf_pipe.best_params_}")
print(f"Best cross-validation score: {clf_pipe.best_score_}")
print(classification_report(y_test, y_predicted))

### Fully connected neural network

### Convolutional neural network

## 4. Final models
In this section, please ensure to include cells to train each model with its best hyperparmater combination independently of the hyperparameter tuning cells, i.e. don't rely on the hyperparameter tuning cells having been run.

### Algorithm of choice from first six weeks of course

In [None]:
final_svm_pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(n_components=0.8)),
    ("svc", SVC(kernel="rbf")),
])
final_svm_pipe.fit(X_train_flat, y_train)


### Fully connected neural network

### Convolutional neural network