# ML-Fundamentals - Neural Networks - Exercise: Neural Network Framework

# Tasks
Your main goal is to extend the existing framework, to perform experiments with different model combinations and to document your observations. Here is a list of necessary tasks and some ideas for additional points:
  * (6) Write a data loader for a different image dataset, e.g., CIFAR or Labelled Faces in the Wild. Feel free to search a dataset you like to classify. Create and train a simple fully connected network on that dataset in this notebook.
  * (10) Implement the `Conv` and `Pool` Layer in `layer.py`. Create and train a convolutional neural network on Mnist and your chosen dataset in this notebook.

Bonus points
  * (5) 1 to 5 points are given for improving the class and method comments in the framework files. Points are given based on the quality and quantity of the comments.
  * (1) For each additional implemented activation functions in `activation_func.py` you get 1 bonus point (max 4 points). Test your implementation in this notebook and observe effects on your networks. Keep an eye on your layer initialization.
  * (2) Implement `Dropout` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (5) Implement `Batchnorm` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (4) Implement another optimization algorithm in `optimizer.py`. Train one of your models with that new optimizer.
  * (5) Do something extra, up to 5 points.  
  
Please document thoroughly and explain what you do in your experiments, so that work in the notebook is comprehensible, else no points are given.

# Requirements

## Python-Modules

In [None]:
%load_ext autoreload
%autoreload 2
# custom 
from htw_nn_framework.networks import NeuralNetwork
from htw_nn_framework.layer import *
from htw_nn_framework.activation_func import *
from htw_nn_framework.loss_func import *
from htw_nn_framework.optimizer import *
from htw_nn_framework.cifar import *
from htw_nn_framework.initializer import *

# third party
from deep_teaching_commons.data.fundamentals.mnist import Mnist

## Data
## MNIST

In [None]:
# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data')

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
mnist_train_images, mnist_train_labels, mnist_test_images, mnist_test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=False, normalized=True)
print(mnist_train_images.shape, mnist_train_labels.shape)

# reshape to match generell framework architecture 
mnist_train_images, mnist_test_images = mnist_train_images.reshape(60000, 1, 28, 28), mnist_test_images.reshape(10000, 1, 28, 28)            
print(mnist_train_images.shape, mnist_train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(60000)
mnist_train_images, mnist_train_labels = mnist_train_images[shuffle_index], mnist_train_labels[shuffle_index]

# Generate an even smaller data set to be able to compute and debug faster. 
# Can be left out if computing power is high enough or if the computation is 
# not for debugging but for computing the best model possible.
mnist_train_images_small, mnist_train_labels_small, mnist_test_images_small, mnist_test_labels_small = mnist_train_images[:100,:,:,:], mnist_train_labels[:100], mnist_test_images[:100,:,:,:], mnist_test_labels[:100]
print(mnist_train_images_small.shape, mnist_train_labels_small.shape)

## CIFAR10

In [None]:
# create cifar loader
cifar_loader = Cifar(data_dir='cifar-10-batches-py')

# load all data, pixel squashed between [0,1]
cifar_train_images, cifar_train_labels, cifar_test_images, cifar_test_labels = cifar_loader.get_all_data(normalized=True)
print(cifar_train_images.shape, cifar_train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(len(cifar_train_images))
cifar_train_images, cifar_train_labels = cifar_train_images[shuffle_index], cifar_train_labels[shuffle_index]

# Generate an even smaller data set to be able to compute and debug faster. 
# Can be left out if computing power is high enough or if the computation is 
# not for debugging but for computing the best model possible.
cifar_train_images_small, cifar_train_labels_small, cifar_test_images_small, cifar_test_labels_small = cifar_train_images[:100,:,:,:], cifar_train_labels[:100], cifar_test_images[:100,:,:,:], cifar_test_labels[:100]
print(cifar_train_images_small.shape, cifar_train_labels_small.shape)

# MNIST Fully Connected Network Example
This model and optimization is taken from `framework_exercise.ipynb` as an example for a typical pipeline using the framework files.

In [None]:
# Design a three hidden layer architecture with dense layer
# and ReLU as activation function
def fcn_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_mnist = NeuralNetwork(fcn_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_mnist = Optimizer.sgd(fcn_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

# MNIST Experiments
## Convolutional Neural Network

In [None]:
# Design a one hidden layer architecture with conv layer
# and ReLU as activation function
def cnn_mnist_layer():
    hidden_01 = Conv(mnist_train_images_small.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    flat = Flatten()
    hidden_02 = FullyConnected(784, 200)
    relu_02 = ReLU()
    ouput = FullyConnected(200, 10)
    return [hidden_01, relu_01, flat, hidden_02, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_mnist = NeuralNetwork(cnn_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_mnist = Optimizer.sgd(cnn_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Convolutional Neural Network with He nomal initalization for Conv Layer (instead of Glorot normal)

In [None]:
# Design a one hidden layer architecture with conv layer, and ReLU as activation 
# function and he initalization
def cnn_he_mnist_layer():
    hidden_01 = Conv(mnist_train_images_small.shape, 1, 3, 1, True, w_initializer=Initializer.he_normal)
    relu_01 = ReLU()
    flat = Flatten()
    hidden_02 = FullyConnected(784, 200)
    relu_02 = ReLU()
    ouput = FullyConnected(200, 10)
    return [hidden_01, relu_01, flat, hidden_02, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_he_mnist = NeuralNetwork(cnn_he_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_he_mnist = Optimizer.sgd(cnn_he_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Convolutional Neural Network with Pooling

In [None]:
# Design a one hidden layer architecture with conv layer, pooling layer
# and ReLU as activation function
def cnn_pool_mnist_layer():
    hidden_01 = Conv(mnist_train_images_small.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    pool = Pool(mnist_train_images_small.shape, np.max, 2, 1)
    flat = Flatten()
    ouput = FullyConnected(729, 10)
    return [hidden_01, relu_01, pool, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_pool_mnist = NeuralNetwork(cnn_pool_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_pool_mnist = Optimizer.sgd(cnn_pool_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=20, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Fully Conected Neural Network with Dropout

In [None]:
# Design a two hidden layer architecture with dense layer, dropout layer
# and ReLU as activation function
def fcn_dropout_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    dropout = Dropout()
    hidden_02 = FullyConnected(500, 100)
    relu_02 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, dropout, hidden_02, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_dropout_mnist = NeuralNetwork(fcn_dropout_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_dropout_mnist = Optimizer.sgd(fcn_dropout_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Fully Conected Neural Network with Leaky ReLU

In [None]:
# Design a three hidden layer architecture with dense layer
# and Leaky ReLU as activation function
def fcn_leaky_relu_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    leaky_relu_01 = LeakyReLU()
    hidden_02 = FullyConnected(500, 200)
    leaky_relu_02 = LeakyReLU()
    hidden_03 = FullyConnected(200, 100)
    leaky_relu_03 = LeakyReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, leaky_relu_01, hidden_02, leaky_relu_02, hidden_03, leaky_relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_leaky_relu_mnist = NeuralNetwork(fcn_leaky_relu_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_leaky_relu_mnist = Optimizer.sgd(fcn_leaky_relu_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Fully Conected Neural Network with Sigmoid and Xavier Initialization

In [None]:
# Design a three hidden layer architecture with dense layer, sigmoid as activation function
# and xavier initialization
def fcn_sigmoid_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500, w_initializer=Initializer.xavier_normal)
    sigmoid_01 = sigmoid()
    hidden_02 = FullyConnected(500, 200, w_initializer=Initializer.xavier_normal)
    sigmoid_02 = sigmoid()
    hidden_03 = FullyConnected(200, 100, w_initializer=Initializer.xavier_normal)
    sigmoid_03 = sigmoid()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, sigmoid_01, hidden_02, sigmoid_02, hidden_03, sigmoid_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_sigmoid_mnist = NeuralNetwork(fcn_sigmoid_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_sigmoid_mnist = Optimizer.sgd(fcn_sigmoid_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Fully Conected Neural Network with Tanh and Xavier Initialization

In [None]:
# Design a three hidden layer architecture with dense layer, tanh as activation function
# and xavier initialization
def fcn_tanh_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500, w_initializer=Initializer.xavier_normal)
    tanh_01 = tanh()
    hidden_02 = FullyConnected(500, 200, w_initializer=Initializer.xavier_normal)
    tanh_02 = tanh()
    hidden_03 = FullyConnected(200, 100, w_initializer=Initializer.xavier_normal)
    tanh_03 = tanh()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, tanh_01, hidden_02, tanh_02, hidden_03, tanh_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_tanh_mnist = NeuralNetwork(fcn_tanh_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_tanh_mnist = Optimizer.sgd(fcn_tanh_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Fully Conected Neural Network with SGD Momentum

In [None]:
# Design a three hidden layer architecture with dense layer
# and ReLU as activation function
def fcn_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_mnist = NeuralNetwork(fcn_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network with sgd momentum and a softmax loss
fcn_mnist = Optimizer.sgd_momentum(fcn_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Fully Conected Neural Network with RMSProp

In [None]:
# Design a three hidden layer architecture with dense layer
# and ReLU as activation function
def fcn_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_mnist = NeuralNetwork(fcn_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_mnist = Optimizer.rmsprop(fcn_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

## Fully Conected Neural Network with Adam

In [None]:
# Design a three hidden layer architecture with dense layer
# and ReLU as activation function
def fcn_mnist_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_mnist = NeuralNetwork(fcn_mnist_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_mnist = Optimizer.adam(fcn_mnist, mnist_train_images_small, mnist_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=mnist_test_images_small, y_test=mnist_test_labels_small, verbose=True)

# CIFAR10 Experiments
## Fully Connected Network

In [None]:
# Design a three hidden layer architecture with dense layer
# and ReLU as activation function
def fcn_cifar_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(3072, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_cifar = NeuralNetwork(fcn_cifar_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_cifar = Optimizer.sgd(fcn_cifar, cifar_train_images_small, cifar_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=cifar_test_images_small, y_test=cifar_test_labels_small, verbose=True)

## Convolutional Neural Network

In [None]:
# Design a one hidden layer architecture with conv layer
# and ReLU as activation function
def cnn_cifar_layer():
    conv = Conv(cifar_train_images_small.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    flat = Flatten()
    ouput = FullyConnected(1024, 10)
    return [conv, relu_01, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_cifar = NeuralNetwork(cnn_cifar_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_cifar = Optimizer.sgd(cnn_cifar, cifar_train_images_small, cifar_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=32, epoch=10, learning_rate=0.01, X_test=cifar_test_images_small, y_test=cifar_test_labels_small, verbose=True)

## Convolutional Neural Network with Pooling

In [None]:
def cnn_pool_cifar_layer():
    conv = Conv(cifar_train_images_small.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    pool = Pool(cifar_train_images_small.shape, np.max, 2, 1)
    flat = Flatten()
    ouput = FullyConnected(961, 10)
    return [conv, relu_01, pool, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_pool_cifar = NeuralNetwork(cnn_pool_cifar_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_pool_cifar = Optimizer.sgd(cnn_pool_cifar, cifar_train_images_small, cifar_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=cifar_test_images_small, y_test=cifar_test_labels_small, verbose=True)

## Fully Conected Neural Network with Dropout

In [None]:
# Design a two hidden layer architecture with dense layer, dropout layer
# and ReLU as activation function
def fcn_dropout_cifar_layer():
    flat = Flatten()
    hidden_01 = FullyConnected(3072, 500)
    relu_01 = ReLU()
    dropout = Dropout()
    hidden_02 = FullyConnected(500, 100)
    relu_02 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, dropout, hidden_02, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_dropout_cifar = NeuralNetwork(fcn_dropout_cifar_layer(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_dropout_cifar = Optimizer.sgd(fcn_dropout_cifar, cifar_train_images_small, cifar_train_labels_small, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=cifar_test_images_small, y_test=cifar_test_labels_small, verbose=True)