# ML-Fundamentals - Neural Networks - Exercise: Neural Network Framework

# Tasks
Your main goal is to extend the existing framework, to perform experiments with different model combinations and to document your observations. Here is a list of necessary tasks and some ideas for additional points:
  * (6) Write a data loader for a different image dataset, e.g., CIFAR or Labelled Faces in the Wild. Feel free to search a dataset you like to classify. Create and train a simple fully connected network on that dataset in this notebook.
  * (10) Implement the `Conv` and `Pool` Layer in `layer.py`. Create and train a convolutional neural network on Mnist and your chosen dataset in this notebook.

Bonus points
  * (5) 1 to 5 points are given for improving the class and method comments in the framework files. Points are given based on the quality and quantity of the comments.
  * (1) For each additional implemented activation functions in `activation_func.py` you get 1 bonus point (max 4 points). Test your implementation in this notebook and observe effects on your networks. Keep an eye on your layer initialization.
  * (2) Implement `Dropout` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (5) Implement `Batchnorm` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (4) Implement another optimization algorithm in `optimizer.py`. Train one of your models with that new optimizer.
  * (5) Do something extra, up to 5 points.  
  
Please document thoroughly and explain what you do in your experiments, so that work in the notebook is comprehensible, else no points are given.

Fragen
* Sollen wir nur für Pool Layer nur Max forward und backward implementieren? Falls nein, wie solen wir das mit der backward implementierung machen?
* Siehe in layers.py: 
    * In Pooling.backward: What is if not only one pixel has the max value?
    * In Conv Layer: soll zu jedem Pixel bias hinzugefügt werden oder erst nach sum?
    * In Dropout: maske für alle images gleich oder für jedes eine andere? Und wie sollen wir damit umgehen, wenn wir vorhersagen wollen?
    * Braucht man nach Dropout oder Pool Layer eine Aktivierungsfunktion?
    * Pooling Layer gibt immer gleichen loss zurück, woran kann das liegen?

# Requirements

## Python-Modules

In [1]:
%load_ext autoreload
%autoreload 2
# custom 
from htw_nn_framework.networks import NeuralNetwork
from htw_nn_framework.layer import *
from htw_nn_framework.activation_func import *
from htw_nn_framework.loss_func import *
from htw_nn_framework.optimizer import *
from htw_nn_framework.cifar import *

# third party
from deep_teaching_commons.data.fundamentals.mnist import Mnist

## Data

In [2]:
# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data')

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=False, normalized=True)
print(train_images.shape, train_labels.shape)

# reshape to match generell framework architecture 
train_images, test_images = train_images.reshape(60000, 1, 28, 28), test_images.reshape(10000, 1, 28, 28)            
print(train_images.shape, train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

auto download is active, attempting download
mnist data directory already exists, download aborted
(60000, 28, 28) (60000,)
(60000, 1, 28, 28) (60000,)


# MNIST Fully Connected Network Example
This model and optimization is taken from `framework_exercise.ipynb` as an example for a typical pipeline using the framework files.

In [3]:
train_images_batch, train_labels_batch, test_images_batch, test_labels_batch = train_images[:100,:,:,:], train_labels[:100], test_images[:100,:,:,:], test_labels[:100]

In [4]:
# design a three hidden layer architecture with Dense-Layer
# and ReLU as activation function
def fcn_mnist():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn = NeuralNetwork(fcn_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn = Optimizer.sgd(fcn, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 2.693177362393938 :: Training = 0.3 :: Test = 0.24
Epoch 2
Loss = 1.9220157832908111 :: Training = 0.64 :: Test = 0.44
Epoch 3
Loss = 1.989212269543896 :: Training = 0.4 :: Test = 0.32
Epoch 4
Loss = 1.7863229941199785 :: Training = 0.44 :: Test = 0.28
Epoch 5
Loss = 2.6829131411569977 :: Training = 0.5 :: Test = 0.23
Epoch 6
Loss = 1.7813905743512126 :: Training = 0.71 :: Test = 0.6
Epoch 7
Loss = 1.6224524127319702 :: Training = 0.75 :: Test = 0.65
Epoch 8
Loss = 0.7361944780450254 :: Training = 0.91 :: Test = 0.7
Epoch 9
Loss = 0.3773846787111502 :: Training = 0.95 :: Test = 0.68
Epoch 10
Loss = 0.7268348383292372 :: Training = 0.88 :: Test = 0.51


# Todo: Your Extensions and Experiments

In [5]:
train_images_batch, train_labels_batch, test_images_batch, test_labels_batch = train_images[:100,:,:,:], train_labels[:100], test_images[:100,:,:,:], test_labels[:100]

In [6]:
# design a one hidden layer architecture with Conv-Layer
# and ReLU as activation function
def cnn_mnist():
    conv = Conv(train_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    flat = Flatten()
    ouput = FullyConnected(784, 10)
    return [conv, relu_01, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn = NeuralNetwork(cnn_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn = Optimizer.sgd(cnn, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 3.1421774728373197 :: Training = 0.13 :: Test = 0.14
Epoch 2
Loss = 2.2978636745267353 :: Training = 0.13 :: Test = 0.14
Epoch 3
Loss = 2.2946815750663947 :: Training = 0.13 :: Test = 0.14
Epoch 4
Loss = 2.2920567596444745 :: Training = 0.13 :: Test = 0.14
Epoch 5
Loss = 2.289887245866618 :: Training = 0.13 :: Test = 0.14
Epoch 6
Loss = 2.2880899656750646 :: Training = 0.13 :: Test = 0.14
Epoch 7
Loss = 2.2865972625589217 :: Training = 0.13 :: Test = 0.14
Epoch 8
Loss = 2.285354031196735 :: Training = 0.13 :: Test = 0.14
Epoch 9
Loss = 2.2843153828337623 :: Training = 0.13 :: Test = 0.14
Epoch 10
Loss = 2.2834447412802854 :: Training = 0.13 :: Test = 0.14


In [7]:
# design a two hidden layer architecture with Conv-Layer and Pooling-Layer
# and ReLU as activation function
def cnn_pool_mnist():
    conv = Conv(train_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    pool = Pool(train_images_batch.shape, np.max, 2, 1)
    flat = Flatten()
    ouput = FullyConnected(729, 10)
    return [conv, relu_01, pool, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_pool = NeuralNetwork(cnn_pool_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_pool = Optimizer.sgd(cnn_pool, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 2.6699041081506745 :: Training = 0.67 :: Test = 0.34
Epoch 2
Loss = 1.0209147848148612 :: Training = 0.92 :: Test = 0.55
Epoch 3
Loss = 0.6732797592248874 :: Training = 0.93 :: Test = 0.57
Epoch 4
Loss = 0.5267299566689198 :: Training = 1.0 :: Test = 0.62
Epoch 5
Loss = 0.1994946962175744 :: Training = 1.0 :: Test = 0.64
Epoch 6
Loss = 0.1308696549109375 :: Training = 1.0 :: Test = 0.65
Epoch 7
Loss = 0.09233207062411494 :: Training = 1.0 :: Test = 0.65
Epoch 8
Loss = 0.07203453986831009 :: Training = 1.0 :: Test = 0.65
Epoch 9
Loss = 0.05910917229565508 :: Training = 1.0 :: Test = 0.65
Epoch 10
Loss = 0.05059595506940542 :: Training = 1.0 :: Test = 0.64


In [8]:
# design a three hidden layer architecture with Dense-Layer and Dropout-Layer
# and ReLU as activation function
def fcn_mnist():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = Dropout()
    hidden_03 = FullyConnected(500, 100)
    relu_02 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, hidden_03, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
fcn = NeuralNetwork(fcn_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn = Optimizer.sgd(fcn, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 2.3059331810437658 :: Training = 0.39 :: Test = 0.25
Epoch 2
Loss = 2.0777573044698574 :: Training = 0.58 :: Test = 0.38
Epoch 3
Loss = 1.8621823155728445 :: Training = 0.62 :: Test = 0.47
Epoch 4
Loss = 1.748727881925048 :: Training = 0.72 :: Test = 0.55
Epoch 5
Loss = 1.4392470857493598 :: Training = 0.75 :: Test = 0.54
Epoch 6
Loss = 1.3172429870906357 :: Training = 0.8 :: Test = 0.56
Epoch 7
Loss = 1.2069414659291644 :: Training = 0.86 :: Test = 0.67
Epoch 8
Loss = 0.9768855426047318 :: Training = 0.9 :: Test = 0.66
Epoch 9
Loss = 0.7892253872356049 :: Training = 0.88 :: Test = 0.57
Epoch 10
Loss = 0.9160760861920048 :: Training = 0.94 :: Test = 0.62


In [11]:
c10 = Cifar()
c10.load()
tr_images, tr_labels, te_images, te_labels = c10.get()

Data already downloaded
htw_nn_framework/cifar-10-batches-py/data_batch_1
htw_nn_framework/cifar-10-batches-py/data_batch_2
htw_nn_framework/cifar-10-batches-py/data_batch_3
htw_nn_framework/cifar-10-batches-py/data_batch_4
htw_nn_framework/cifar-10-batches-py/data_batch_5


In [32]:
# shuffle training data
shuffle_index = np.random.permutation(len(tr_images))
tr_images, tr_labels = tr_images[shuffle_index], tr_labels[shuffle_index]

tr_images_batch, tr_labels_batch, te_images_batch, te_labels_batch = tr_images[:100,:,:,:], tr_labels[:100], te_images[:100,:,:,:], te_labels[:100]

In [33]:
# design a one hidden layer architecture with Conv-Layer
# and ReLU as activation function
def cnn_mnist_cifar():
    conv = Conv(tr_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    flat = Flatten()
    ouput = FullyConnected(1024, 10)
    return [conv, relu_01, flat, ouput]

In [41]:
cnn_c10 = NeuralNetwork(cnn_mnist_cifar(), score_func=LossCriteria.softmax)
cnn_c10 = Optimizer.sgd(cnn_c10, tr_images_batch, tr_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=32, epoch=20, learning_rate=0.01, X_test=te_images_batch, y_test=te_labels_batch, verbose=True)

Epoch 1
Loss = 2.295610368710087 :: Training = 0.22 :: Test = 0.14
Epoch 2
Loss = 2.2327829770554253 :: Training = 0.22 :: Test = 0.15
Epoch 3
Loss = 2.21578763006391 :: Training = 0.27 :: Test = 0.13
Epoch 4
Loss = 2.1976602019938967 :: Training = 0.26 :: Test = 0.12
Epoch 5
Loss = 2.1802088449040706 :: Training = 0.26 :: Test = 0.13
Epoch 6
Loss = 2.164802345515623 :: Training = 0.27 :: Test = 0.13
Epoch 7
Loss = 2.1511370707347313 :: Training = 0.26 :: Test = 0.13
Epoch 8
Loss = 2.1348636239292014 :: Training = 0.26 :: Test = 0.13
Epoch 9
Loss = 2.118315752218825 :: Training = 0.26 :: Test = 0.13
Epoch 10
Loss = 2.1031619910138946 :: Training = 0.28 :: Test = 0.13
Epoch 11
Loss = 2.084932197971387 :: Training = 0.27 :: Test = 0.13
Epoch 12
Loss = 2.0773118361104292 :: Training = 0.28 :: Test = 0.12
Epoch 13
Loss = 2.057887852590206 :: Training = 0.28 :: Test = 0.13
Epoch 14
Loss = 2.0476036192213787 :: Training = 0.3 :: Test = 0.13
Epoch 15
Loss = 2.026343496353146 :: Training = 0.3

In [45]:
def cnn_pool_cifar():
    conv = Conv(tr_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    pool = Pool(tr_images_batch.shape, np.max, 2, 1)
    flat = Flatten()
    ouput = FullyConnected(961, 10)
    return [conv, relu_01, pool, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_pool_c10 = NeuralNetwork(cnn_pool_cifar(), score_func=LossCriteria.softmax)
cnn_c10 = Optimizer.sgd(cnn_pool_c10, tr_images_batch, tr_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=20, learning_rate=0.01, X_test=te_images_batch, y_test=te_labels_batch, verbose=True)

Epoch 1
Loss = 3.16766413488691 :: Training = 0.17 :: Test = 0.13
Epoch 2
Loss = 2.2978438285669425 :: Training = 0.18 :: Test = 0.13
Epoch 3
Loss = 2.2685694015025866 :: Training = 0.18 :: Test = 0.13
Epoch 4
Loss = 2.2380109522582217 :: Training = 0.19 :: Test = 0.13
Epoch 5
Loss = 2.1989712753540775 :: Training = 0.21 :: Test = 0.13
Epoch 6
Loss = 2.1563434748251034 :: Training = 0.28 :: Test = 0.13
Epoch 7
Loss = 2.107127331130487 :: Training = 0.32 :: Test = 0.12
Epoch 8
Loss = 2.037163928798608 :: Training = 0.33 :: Test = 0.15
Epoch 9
Loss = 1.979981432961945 :: Training = 0.38 :: Test = 0.17
Epoch 10
Loss = 1.8776078380537378 :: Training = 0.4 :: Test = 0.16
Epoch 11
Loss = 1.8036824252812436 :: Training = 0.47 :: Test = 0.18
Epoch 12
Loss = 1.6519583298269562 :: Training = 0.51 :: Test = 0.2
Epoch 13
Loss = 1.6146995291333217 :: Training = 0.63 :: Test = 0.2
Epoch 14
Loss = 1.3247238890346096 :: Training = 0.64 :: Test = 0.2
Epoch 15
Loss = 1.3951303027017947 :: Training = 0.7