# Convolutional Neural Network for MNIST

## Introduction

TODO

## Requirements

### Python-Modules

In [1]:
# third party
import numpy as np
# mnist data
from deep_teaching_commons.data.fundamentals.mnist import Mnist
# tensorflow
import tensorflow as tf

from tqdm import tqdm

import framework.loss_function as loss_function
import framework.layer as layer
import framework.network as network
import framework.activation_function as activation_function
import framework.optimizer as optimizer

### Data

In [2]:
# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data')

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=True, normalized=True)
print(train_images.shape, train_labels.shape)

# reshape to match generel framework architecture
train_images, test_images = train_images.reshape(60000, 28, 28, 1), test_images.reshape(10000, 28, 28, 1)
print(train_images.shape, train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

auto download is active, attempting download
mnist data directory already exists, download aborted
(60000, 28, 28) (60000, 10)
(60000, 28, 28, 1) (60000, 10)


## Building the Neural Network
For a better understanding of neural networks you will start to implement your own framework. The given notebook explaines some core functions and concetps of the framework, so you all have the same starting point. The  Pipeline will be: 

**define a model architecture -> construct a neural network from the model -> define a evaluation citeria -> optimize the network**

### Creating a custom architecture
To create a custom model you have to define layers and activation functions that can be used to do so. Layers and activation functions are modeled as objects. Each object that you want to use has to implement a `forward` that is used by the `NeuralNetwork` class. Additionally the `self.params` attribute is mandatory to meet the specification of the `NeuralNetwork` class. It is used to store all learnable parameters that you need for the optimization algorithm. We implement our neural network so that we can use the objects as building blocks and stack them up to create a custom model. 

#### Layers  Class
The file `layer.py` contains implementations of neural network layers and regularization techniques that can be inserted as layers into the architecture. This file already exists in the ```framework``` folder. Here we are going to implement the convolution layer, because we already covered the flatten and dense layer last notebook.

In [3]:
class Convolution():
    ''' Fully connected layer implemtenting linear function hypothesis 
        in the forward pass and its derivation in the backward pass.
    '''

    def __init__(self, input_channels=1, filter_num=32, filter_dim=(3, 3), stride=1, activation_func=None):
        self.W = tf.Variable(tf.truncated_normal([filter_dim[0], filter_dim[1], input_channels, filter_num], stddev=0.1))
        self.b = tf.Variable(tf.ones([filter_num])/10)
        self.params = [self.W, self.b]
        self.stride = stride
        self.activation_func = activation_func

    def forward(self, X):
        ''' Linear combiationn of images, weights and bias terms

        Args:
            X: Matrix of images (flatten represenation)

        Returns:
            out: Sum of X*W+b  
        '''
        Z = tf.nn.conv2d(X, self.W, strides=[1, self.stride, self.stride, 1], padding='SAME') + self.b
        if self.activation_func is None:
            return Z
        else:
            return self.activation_func.forward(Z)

## Put it all together
Now you have parts together to create and train a fully connected neural network. First, you have to define an individual network architecture by flatten the input and stacking fully connected layer with activation functions. Your custom architecture is given to a `NeuralNetwork` object that handles the inter-layer communication during the forward and backward pass. Finally, you optimize the model with a chosen algorithm, here stochastic gradient descent. That kind of pipeline is similar to the one you would create with a framework like Tensorflow or PyTorch.

In [4]:
# design a three hidden layer architecture with Dense-Layer
# and ReLU as activation function
def fcn_mnist():
    conv1 = Convolution(1, 64, stride=2, activation_func=activation_function.ReLU())
    conv2 = Convolution(64, 32, stride=2, activation_func=activation_function.ReLU())
    flat = layer.Flatten()
    hidden_01 = layer.FullyConnected(784, 256, activation_func=activation_function.ReLU())
    hidden_02 = layer.FullyConnected(256, 100, activation_func=activation_function.ReLU())
    ouput = layer.FullyConnected(100, 10, activation_func=activation_function.Softmax())
    return [flat, hidden_01, hidden_02, ouput]

# create a neural network on specified architecture with softmax as score function
fcn = network.NeuralNetwork(fcn_mnist())

In [5]:
# optimize the network and a softmax loss
fcn = optimizer.Optimizer.sgd(fcn, train_images, train_labels, loss_function.cross_entropy, X_test=test_images, y_test=test_labels, verbose=True)

  0%|          | 1/1875 [00:00<04:08,  7.53it/s]

Epoch 1


 77%|███████▋  | 1436/1875 [28:05<08:35,  1.17s/it]

KeyboardInterrupt: 