# CIS 678 - Machine Learning - Kaggle Challenge 3: MNIST Digit Classification

<!-- ## Data Loading -->

<!-- ## Model Implementation -->

<!-- ## Model Evaluation -->

<!-- ## Conclusion -->

## Notebook Configuration
Before running our model, let's change our working directory over to our Python scripts by running the following shell script:

In [None]:
!echo sourcing MNIST model directory...
%cd '../model/src/'

## Importing our Custom Neural Network Libraries
Now that were are under the right directory, let's start off by importing all of our custom Python scripts.

In [None]:
import numpy as np

# custom modules
from toolkit import Toolkit
from parameters import ParameterManager
from model import Model

# used for visualizing our training progress
from rich.progress import track

# import cross validation libraries
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score

# creating our notebook toolkit helper
tools = Toolkit()
tools.configure(name = 'Notebook', level = 'INFO')

## Parameter Configuration
Parameter selection is model development makes it easier to optimize our model.
For example, if we can tune our model's learning rate value and see its effects on model performance.
Using our custom `ParameterManager` class, we can easily create a variety of parameters to pass onto our MNIST model.

In [None]:
# hyper-parameter manager
parameters = ParameterManager()

# create learning parameters
parameters.add_parameter(epochs = [15])
parameters.add_parameter(learning_rate = [0.001]) # note: 0.1 too high, 0.0001 to low?
parameters.add_parameter(loss = ['mse'])
parameters.add_parameter(select_case = [-1])

# create an architecture parameter
parameters.add_nested_parameter(hidden_layers = [2], hidden_dimensions = [256, 128, 64], activation = ['tanh'])

tools.info(f'showing notebook parameters: {parameters}')

## Load in Data
This model will train on the MNIST data set.
A collection of three `csv` files are imported to generate our training and testing data sets.
Let's load in our `mnist_train.csv`, `mnist_train_targets.csv`, and `mnist_test.csv` files.


In [None]:
tools.info("loading in our MNIST data sets")

train_data    = tools.load_data("../data/train/mnist_train.csv")
train_targets = tools.load_data("../data/train/mnist_train_targets.csv", transpose = False)
test_data     = tools.load_data("../data/test/mnist_test.csv")
test_targets  = np.zeros((test_data.shape[0], 1), dtype=int)

tools.info(f"read in our data with shapes: {train_data.shape} and {train_targets.shape} ")
tools.info(f"read in our data with shapes: {test_data.shape} and {test_targets.shape} ")

tools.info("applying normalization to our train data")
train_data = tools.normalize(train_data)
test_data  = tools.normalize(test_data)

tools.info("visualizing a few MNIST training samples...")
tools.visualize(train_data)

tools.info("visualizing a few MNIST testing samples...")
tools.visualize(test_data)

tools.warning("plots should be labaled!")

## MNIST Model Configuration and Compilation
With our training and testing data sets ready for processing and all of our tunable parameters declared, we can start compiling our custom neural architecture.

In [None]:
mnist_model = Model(debug_mode = False)

mnist_model.configure(parameters = parameters)

## Cross-Validating Model Performance
The following section describes our model performance over all parameter variations

In [None]:

# Combine your train and test data and targets
X = np.concatenate((train_data, test_data), axis=0)
y = np.concatenate((train_targets, test_targets), axis=0)

#> performing cross-validation to assess model performance
k = 5
kf = KFold(n_splits=k, shuffle=True, random_state=42)

#> run cross-validation train and predict loop
for train_index, test_index in kf.split(X):
  X_train, X_test = X[train_index], X[test_index]
  y_train, y_test = y[train_index], y[test_index]

  # Your model training and evaluation here
  mnist_model.fit(X_train, y_train)
  y_pred = mnist_model.predict(X_test)

  # Evaluate the model (use appropriate evaluation metric based on your problem)
  accuracy = accuracy_score(y_test, y_pred)
  tools.info(f"cross-validation accuracy: {accuracy * 100:.1f}%")


## Generating Kaggle Submission
Now that we have cross-validated our model's performance, we can retrain our model with the full training data set to generate our test targets.

In [None]:
mnist_model.fit(train_data, train_targets)

test_targets = mnist_model.predict(test_data)

tools.info("saving our test targets")

tools.save_data("../data/test/mnist_test_targets.csv", test_targets)