### Part f): Classification  analysis using neural networks

With a well-written code it should now be easy to change the
activation function for the output layer.

Here we will change the cost function for our neural network code
developed in parts b), d) and e) in order to perform a classification
analysis.  The classification problem we will study is the multiclass
MNIST problem, see the description of the full data set at
<https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. We will use the Softmax cross entropy function discussed in a). 
The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. 

Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  
MNIST-Fashion data set at for example <https://www.kaggle.com/datasets/zalando-research/fashionmnist>.

To set up the data set, the following python programs may be useful

In [21]:
import sys
import os

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

import numpy as np
import torch
from torchvision import datasets, transforms


from Implementations.activations import relu, relu_deriv, linear, linear_deriv
from Implementations.losses import cross_entropy_with_logits, cross_entropy_with_logits_deriv
from Implementations.optimizers import Adam
from Implementations.neural_network import NeuralNetwork

In [22]:
def accuracy_from_logits(logits, y_true):
    preds = np.argmax(logits, axis=1)
    labels = np.argmax(y_true, axis=1)
    return np.mean(preds == labels)

In [23]:
transform = transforms.Compose([
    transforms.ToTensor()
])

fashion_train = datasets.FashionMNIST(root="./data", train=True, download=True,transform=transform)
fashion_test = datasets.FashionMNIST(root="./data", train=False, download=True,transform=transform)

X_train = fashion_train.data.numpy().astype(np.float64) / 255.0
X_test = fashion_test.data.numpy().astype(np.float64) / 255.0

y_train = fashion_train.targets.numpy().astype(np.int64)
y_test = fashion_test.targets.numpy().astype(np.int64) 

X_train = X_train.reshape(-1, 28*28) #images are 28*28 pixels
X_test = X_test.reshape(-1, 28*28)

# One hot encoding 
Y_train = np.zeros((y_train.size, 10))
Y_train[np.arange(y_train.size), y_train] = 1.0

Y_test = np.zeros((y_test.size, 10))
Y_test[np.arange(y_test.size), y_test] = 1.0


In [24]:
layer_sizes = [128, 64, 10]
activation_funcs = [relu, relu, linear]
activation_ders  = [relu_deriv, relu_deriv, linear_deriv]

net = NeuralNetwork(
    network_input_size=X_train.shape[1],
    layer_output_sizes=layer_sizes,
    activation_funcs=activation_funcs,
    activation_ders=activation_ders,
    cost_fun=cross_entropy_with_logits,
    cost_der=cross_entropy_with_logits_deriv,
    seed=6114,
    l2_lambda=1e-4
)

optimizer = Adam(lr=1e-3)

history = net.fit(
    X_train, Y_train,
    epochs=100,
    batch_size=128,
    optimizer=optimizer,
    shuffle=True
)

Epoch   1 | train: 0.555700
Epoch   2 | train: 0.446539
Epoch   3 | train: 0.404633
Epoch   4 | train: 0.377808
Epoch   5 | train: 0.356745
Epoch   6 | train: 0.359595
Epoch   7 | train: 0.325346
Epoch   8 | train: 0.302108
Epoch   9 | train: 0.286780
Epoch  10 | train: 0.288042
Epoch  11 | train: 0.287692
Epoch  12 | train: 0.270885
Epoch  13 | train: 0.271304
Epoch  14 | train: 0.272062
Epoch  15 | train: 0.258025
Epoch  16 | train: 0.243352
Epoch  17 | train: 0.242686
Epoch  18 | train: 0.229097
Epoch  19 | train: 0.223139
Epoch  20 | train: 0.212646
Epoch  21 | train: 0.209677
Epoch  22 | train: 0.202467
Epoch  23 | train: 0.211456
Epoch  24 | train: 0.189287
Epoch  25 | train: 0.189279
Epoch  26 | train: 0.192431
Epoch  27 | train: 0.184544
Epoch  28 | train: 0.183938
Epoch  29 | train: 0.175303
Epoch  30 | train: 0.188206
Epoch  31 | train: 0.167668
Epoch  32 | train: 0.170759
Epoch  33 | train: 0.159406
Epoch  34 | train: 0.151235
Epoch  35 | train: 0.171708
Epoch  36 | train: 0

To measure the performance of our classification problem we will use the
so-called *accuracy* score.  The accuracy is as you would expect just
the number of correctly guessed targets $t_i$ divided by the total
number of targets, that is

$$
\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
$$

where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$
otherwise if we have a binary classification problem. Here $t_i$
represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.

Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  

Again, we strongly recommend that you compare your own neural Network
code for classification and pertinent results against a similar code using **Scikit-Learn**  or **tensorflow/keras** or **pytorch**.

If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.
The weblink  here <https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. 

If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set

In [None]:
from sklearn.linear_model import LogisticRegression
# Initialize the model
model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)
# Train the model
model.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.4f}")