# Implementation of a Neural Network Classification Model
**Joey Shi**

In this notebook, we will describe the implementation details and theory behind neural network classification.
The topics covered will include the softmax loss function, computing gradients for multi-variable vector functions, and the gradient descent algorithm.

## Notation
1. $X$ is a $n \times d$ feature matrix of real numbers, such that each row $x_i\in\mathbb{R}^d$ is an example.
2. $y$ is an $n$-dimensional label vector, such that each $y_i\in\{1,2,\dots , k_L\}$ is a class label.
3. $h(z)$ is a non-linear function. We will call it an activation function.
4. $W^{(l)}$ is a $k_{l} \times k_{l-1}$ weight matrix.
5. $b^{(l)}$ is a $k_{l}$-dimensional bias vector.
6. $p(y_i \mid D, x_i)$ is the probability of predicting $y_i$ from $x_i$ and parameters $D$
7. $\delta_{a, b} = 1$ if $a = b$, $\delta_{a, b} = 0$ otherwise [Kronecker delta function].

In [None]:
import pickle
import gzip
import numpy as np
import matplotlib.pyplot as plt
from joblib import dump
from nnclassifier.nnclassifier import NNClassifier

In [None]:
with gzip.open('data/mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding="latin1")
X, y = train_set
X_test, y_test = test_set
print("n =", X.shape[0])
print("d =", X.shape[1])

In [None]:
model = NNClassifier(hidden_layer_sizes=[300], lammy=0.01, epochs=20, verbose=True)
model.fit(X, y)

In [None]:
print(np.mean(model.predict(X) != y))
print(np.mean(model.predict(X_test) != y_test))

In [None]:
dump(model, "trained_models/nn2.joblib")

In [None]:
i = 50
image = 1 - np.reshape(X[i], (28, 28))
plt.imshow(image, cmap='gray')
print("Prediction: %d" %model.predict([X[i]]))
print("Actual: %d" %y[i])