# ML-Fundamentals - Neural Networks - Exercise: Neural Network Framework

# Tasks
Your main goal is to extend the existing framework, to perform experiments with different model combinations and to document your observations. Here is a list of necessary tasks and some ideas for additional points:
  * (6) Write a data loader for a different image dataset, e.g., CIFAR or Labelled Faces in the Wild. Feel free to search a dataset you like to classify. Create and train a simple fully connected network on that dataset in this notebook.
  * (10) Implement the `Conv` and `Pool` Layer in `layer.py`. Create and train a convolutional neural network on Mnist and your chosen dataset in this notebook.

Bonus points
  * (5) 1 to 5 points are given for improving the class and method comments in the framework files. Points are given based on the quality and quantity of the comments.
  * (1) For each additional implemented activation functions in `activation_func.py` you get 1 bonus point (max 4 points). Test your implementation in this notebook and observe effects on your networks. Keep an eye on your layer initialization.
  * (2) Implement `Dropout` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (5) Implement `Batchnorm` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (4) Implement another optimization algorithm in `optimizer.py`. Train one of your models with that new optimizer.
  * (5) Do something extra, up to 5 points.  
  
Please document thoroughly and explain what you do in your experiments, so that work in the notebook is comprehensible, else no points are given.

Fragen
* Sollen wir nur für Pool Layer nur Max forward und backward implementieren? Falls nein, wie solen wir das mit der backward implementierung machen?
* Siehe in layers.py: 
    * In Pooling.backward: What is if not only one pixel has the max value?
    * In Conv Layer: soll zu jedem Pixel bias hinzugefügt werden oder erst nach sum?
    * In Dropout: maske für alle images gleich oder für jedes eine andere? Und wie sollen wir damit umgehen, wenn wir vorhersagen wollen?
    * Braucht man nach Dropout oder Pool Layer eine Aktivierungsfunktion?
    * Pooling Layer gibt immer gleichen loss zurück, woran kann das liegen?

# Requirements

## Python-Modules

In [1]:
%load_ext autoreload
%autoreload 2
# custom 
from htw_nn_framework.networks import NeuralNetwork
from htw_nn_framework.layer import *
from htw_nn_framework.activation_func import *
from htw_nn_framework.loss_func import *
from htw_nn_framework.optimizer import *
from htw_nn_framework.cifar import *
from htw_nn_framework.initializer import *

# third party
from deep_teaching_commons.data.fundamentals.mnist import Mnist
import numpy as np

## Data

In [2]:
# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data')

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=False, normalized=True)
print(train_images.shape, train_labels.shape)

# reshape to match generell framework architecture 
train_images, test_images = train_images.reshape(60000, 1, 28, 28), test_images.reshape(10000, 1, 28, 28)            
print(train_images.shape, train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

auto download is active, attempting download
mnist data directory already exists, download aborted
(60000, 28, 28) (60000,)
(60000, 1, 28, 28) (60000,)


# MNIST Fully Connected Network Example
This model and optimization is taken from `framework_exercise.ipynb` as an example for a typical pipeline using the framework files.

In [3]:
train_images_batch, train_labels_batch, test_images_batch, test_labels_batch = train_images[:100,:,:,:], train_labels[:100], test_images[:100,:,:,:], test_labels[:100]

In [4]:
# Design a three hidden layer architecture with dense layer
# and ReLU as activation function
def fcn_mnist():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn = NeuralNetwork(fcn_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn = Optimizer.sgd(fcn, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 2.3364738791479858 :: Training = 0.13 :: Test = 0.16
Epoch 2
Loss = 2.0272483809532784 :: Training = 0.37 :: Test = 0.16
Epoch 3
Loss = 2.193337858230821 :: Training = 0.51 :: Test = 0.51
Epoch 4
Loss = 1.3557988884971797 :: Training = 0.79 :: Test = 0.65
Epoch 5
Loss = 0.8306299892125698 :: Training = 0.77 :: Test = 0.54
Epoch 6
Loss = 2.1740561027037675 :: Training = 0.46 :: Test = 0.41
Epoch 7
Loss = 2.013534465468906 :: Training = 0.56 :: Test = 0.39
Epoch 8
Loss = 0.9540863399348548 :: Training = 0.85 :: Test = 0.66
Epoch 9
Loss = 0.5767976726141201 :: Training = 0.81 :: Test = 0.65
Epoch 10
Loss = 0.5009758424512314 :: Training = 0.95 :: Test = 0.69


# Todo: Your Extensions and Experiments
# MNIST
## Convolutional Neural Network

In [5]:
# Design a one hidden layer architecture with conv layer
# and ReLU as activation function
def cnn_mnist():
    hidden_01 = Conv(train_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    flat = Flatten()
    hidden_02 = FullyConnected(784, 1500)
    relu_02 = ReLU()
    ouput = FullyConnected(1500, 10)
    return [hidden_01, relu_01, flat, hidden_02, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
cnn = NeuralNetwork(cnn_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn = Optimizer.sgd(cnn, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 2.304337018828493 :: Training = 0.21 :: Test = 0.08
Epoch 2
Loss = 2.2638550200719196 :: Training = 0.2 :: Test = 0.09
Epoch 3
Loss = 2.14215034329222 :: Training = 0.46 :: Test = 0.17
Epoch 4
Loss = 2.2382018859818853 :: Training = 0.15 :: Test = 0.07
Epoch 5
Loss = 2.2934995505407887 :: Training = 0.15 :: Test = 0.07
Epoch 6
Loss = 2.2946863213449706 :: Training = 0.15 :: Test = 0.07
Epoch 7
Loss = 2.295984899810657 :: Training = 0.15 :: Test = 0.07
Epoch 8
Loss = 2.2973857901683608 :: Training = 0.15 :: Test = 0.07
Epoch 9
Loss = 2.2987866039692118 :: Training = 0.15 :: Test = 0.07
Epoch 10
Loss = 2.300079054682855 :: Training = 0.15 :: Test = 0.07


## Convolutional Neural Network with Pooling

In [6]:
# Design a one hidden layer architecture with conv layer, pooling layer
# and ReLU as activation function
def cnn_pool_mnist():
    hidden_01 = Conv(train_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    pool = Pool(train_images_batch.shape, np.max, 2, 1)
    flat = Flatten()
    ouput = FullyConnected(729, 10)
    return [hidden_01, relu_01, pool, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_pool = NeuralNetwork(cnn_pool_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_pool = Optimizer.sgd(cnn_pool, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=20, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 9.838239235428928 :: Training = 0.11 :: Test = 0.11
Epoch 2
Loss = 2.2907622739947784 :: Training = 0.15 :: Test = 0.07
Epoch 3
Loss = 2.290511825915284 :: Training = 0.15 :: Test = 0.07
Epoch 4
Loss = 2.2906626881965453 :: Training = 0.15 :: Test = 0.07
Epoch 5
Loss = 2.2911097407428156 :: Training = 0.15 :: Test = 0.07
Epoch 6
Loss = 2.2917685153074334 :: Training = 0.15 :: Test = 0.07
Epoch 7
Loss = 2.2925719955461403 :: Training = 0.15 :: Test = 0.07
Epoch 8
Loss = 2.2934677179302785 :: Training = 0.15 :: Test = 0.07
Epoch 9
Loss = 2.294415197992204 :: Training = 0.15 :: Test = 0.07
Epoch 10
Loss = 2.295383690377126 :: Training = 0.15 :: Test = 0.07
Epoch 11
Loss = 2.296350275024445 :: Training = 0.15 :: Test = 0.07
Epoch 12
Loss = 2.297298248956498 :: Training = 0.15 :: Test = 0.07
Epoch 13
Loss = 2.298215794699637 :: Training = 0.15 :: Test = 0.07
Epoch 14
Loss = 2.2990948920643 :: Training = 0.15 :: Test = 0.07
Epoch 15
Loss = 2.299930439032074 :: Training = 0.15 

## Fully Conected Neural Network with Dropout

In [7]:
# Design a two hidden layer architecture with dense layer, dropout layer
# and ReLU as activation function
def fcn_dropout_mnist():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    dropout = Dropout()
    hidden_02 = FullyConnected(500, 100)
    relu_02 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, dropout, hidden_02, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_dropout = NeuralNetwork(fcn_dropout_mnist(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_dropout = Optimizer.sgd(fcn_dropout, train_images_batch, train_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=20, learning_rate=0.01, X_test=test_images_batch, y_test=test_labels_batch, verbose=True)

Epoch 1
Loss = 3.2722752812145584 :: Training = 0.37 :: Test = 0.27
Epoch 2
Loss = 2.072553360792204 :: Training = 0.54 :: Test = 0.33
Epoch 3
Loss = 2.1560332106372146 :: Training = 0.53 :: Test = 0.38
Epoch 4
Loss = 3.7683582115401086 :: Training = 0.22 :: Test = 0.11
Epoch 5
Loss = 3.9531304728098147 :: Training = 0.3 :: Test = 0.25
Epoch 6
Loss = 1.9102260490358134 :: Training = 0.52 :: Test = 0.4
Epoch 7
Loss = 1.9088210835779809 :: Training = 0.58 :: Test = 0.34
Epoch 8
Loss = 1.561222744940665 :: Training = 0.72 :: Test = 0.62
Epoch 9
Loss = 1.0319520500645334 :: Training = 0.9 :: Test = 0.72
Epoch 10
Loss = 0.5116928893799273 :: Training = 0.95 :: Test = 0.74
Epoch 11
Loss = 0.6970934951951289 :: Training = 0.96 :: Test = 0.74
Epoch 12
Loss = 0.6559776133949877 :: Training = 0.88 :: Test = 0.66
Epoch 13
Loss = 0.5098367199151349 :: Training = 0.95 :: Test = 0.72
Epoch 14
Loss = 0.2584170402654539 :: Training = 1.0 :: Test = 0.75
Epoch 15
Loss = 0.45337824572705565 :: Training =

# CIFAR10

In [8]:
Cifar.load()

Data already downloaded


In [9]:
tr_images, tr_labels, te_images, te_labels = Cifar.get()

/storage/neural-networks/Assignment_03/htw_nn_framework/cifar-10-batches-py/data_batch_1
/storage/neural-networks/Assignment_03/htw_nn_framework/cifar-10-batches-py/data_batch_2
/storage/neural-networks/Assignment_03/htw_nn_framework/cifar-10-batches-py/data_batch_3
/storage/neural-networks/Assignment_03/htw_nn_framework/cifar-10-batches-py/data_batch_4
/storage/neural-networks/Assignment_03/htw_nn_framework/cifar-10-batches-py/data_batch_5


In [10]:
# shuffle training data
shuffle_index = np.random.permutation(len(tr_images))
tr_images, tr_labels = tr_images[shuffle_index], tr_labels[shuffle_index]

tr_images_batch, tr_labels_batch, te_images_batch, te_labels_batch = tr_images[:100,:,:,:], tr_labels[:100], te_images[:100,:,:,:], te_labels[:100]

## Fully Connected Network

In [11]:
# Design a three hidden layer architecture with dense layer
# and ReLU as activation function
def fcn_cifar():
    flat = Flatten()
    hidden_01 = FullyConnected(3072, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_c10 = NeuralNetwork(fcn_cifar(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_c10 = Optimizer.sgd(fcn_c10, tr_images_batch, tr_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=32, epoch=20, learning_rate=0.01, X_test=te_images_batch, y_test=te_labels_batch, verbose=True)

Epoch 1
Loss = inf :: Training = 0.05 :: Test = 0.11
Epoch 2
Loss = inf :: Training = 0.16 :: Test = 0.11
Epoch 3


  log_likelihood = -np.log(p[range(m), y])


Loss = 2.340927595804878 :: Training = 0.12 :: Test = 0.1
Epoch 4
Loss = 2.350726080536907 :: Training = 0.12 :: Test = 0.1
Epoch 5
Loss = 2.360355980771642 :: Training = 0.12 :: Test = 0.1
Epoch 6
Loss = 2.3696593427873043 :: Training = 0.12 :: Test = 0.1
Epoch 7
Loss = 2.3785246537519305 :: Training = 0.12 :: Test = 0.1
Epoch 8
Loss = 2.3868788654599373 :: Training = 0.16 :: Test = 0.11
Epoch 9
Loss = 2.394679957935995 :: Training = 0.16 :: Test = 0.11
Epoch 10
Loss = 2.4019102310508353 :: Training = 0.16 :: Test = 0.11
Epoch 11
Loss = 2.4085704442111684 :: Training = 0.16 :: Test = 0.11
Epoch 12
Loss = 2.414674850345958 :: Training = 0.16 :: Test = 0.11
Epoch 13
Loss = 2.420247106955406 :: Training = 0.16 :: Test = 0.11
Epoch 14
Loss = 2.4253170008146796 :: Training = 0.16 :: Test = 0.11
Epoch 15
Loss = 2.429917894796839 :: Training = 0.16 :: Test = 0.11
Epoch 16
Loss = 2.4340847926112423 :: Training = 0.16 :: Test = 0.11
Epoch 17
Loss = 2.437852916152391 :: Training = 0.16 :: Test 

## Convolutional Neural Network

In [12]:
# Design a one hidden layer architecture with conv layer
# and ReLU as activation function
def cnn_cifar():
    conv = Conv(tr_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    flat = Flatten()
    ouput = FullyConnected(1024, 10)
    return [conv, relu_01, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_c10 = NeuralNetwork(cnn_cifar(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_c10 = Optimizer.sgd(cnn_c10, tr_images_batch, tr_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=32, epoch=20, learning_rate=0.01, X_test=te_images_batch, y_test=te_labels_batch, verbose=True)

Epoch 1
Loss = 2.358531432748527 :: Training = 0.19 :: Test = 0.11
Epoch 2
Loss = 2.4233869762495646 :: Training = 0.17 :: Test = 0.12
Epoch 3
Loss = 2.430393885283067 :: Training = 0.23 :: Test = 0.16
Epoch 4
Loss = 2.355750678188758 :: Training = 0.19 :: Test = 0.1
Epoch 5
Loss = 2.373550359638425 :: Training = 0.34 :: Test = 0.18
Epoch 6
Loss = 1.9098234901638533 :: Training = 0.22 :: Test = 0.11
Epoch 7
Loss = 2.3681873072855453 :: Training = 0.16 :: Test = 0.11
Epoch 8
Loss = 2.3748321227269367 :: Training = 0.16 :: Test = 0.11
Epoch 9
Loss = 2.3812944975304857 :: Training = 0.16 :: Test = 0.11
Epoch 10
Loss = 2.3874993820377837 :: Training = 0.16 :: Test = 0.11
Epoch 11
Loss = 2.3933982699357 :: Training = 0.16 :: Test = 0.11
Epoch 12
Loss = 2.3989632221497335 :: Training = 0.16 :: Test = 0.11
Epoch 13
Loss = 2.40418185642091 :: Training = 0.16 :: Test = 0.11
Epoch 14
Loss = 2.409053241775854 :: Training = 0.16 :: Test = 0.11
Epoch 15
Loss = 2.41358460574074 :: Training = 0.16 ::

## Convolutional Neural Network with Pooling

In [13]:
def cnn_pool_cifar():
    conv = Conv(tr_images_batch.shape, 1, 3, 1, True)
    relu_01 = ReLU()
    pool = Pool(tr_images_batch.shape, np.max, 2, 1)
    flat = Flatten()
    ouput = FullyConnected(961, 10)
    return [conv, relu_01, pool, flat, ouput]

# create a neural network on specified architecture with softmax as score function
cnn_pool_c10 = NeuralNetwork(cnn_pool_cifar(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
cnn_c10 = Optimizer.sgd(cnn_pool_c10, tr_images_batch, tr_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=20, learning_rate=0.01, X_test=te_images_batch, y_test=te_labels_batch, verbose=True)

Epoch 1
Loss = 2.312640820130772 :: Training = 0.15 :: Test = 0.11
Epoch 2
Loss = 2.2943106838072747 :: Training = 0.15 :: Test = 0.11
Epoch 3
Loss = 2.2748085767086392 :: Training = 0.16 :: Test = 0.11
Epoch 4
Loss = 2.2547448909364327 :: Training = 0.17 :: Test = 0.11
Epoch 5
Loss = 2.2328610319977105 :: Training = 0.17 :: Test = 0.11
Epoch 6
Loss = 2.214845308820109 :: Training = 0.18 :: Test = 0.11
Epoch 7
Loss = 2.1950710624951255 :: Training = 0.18 :: Test = 0.11
Epoch 8
Loss = 2.172548565156189 :: Training = 0.2 :: Test = 0.12
Epoch 9
Loss = 2.155283666071791 :: Training = 0.2 :: Test = 0.13
Epoch 10
Loss = 2.1133945868246458 :: Training = 0.24 :: Test = 0.13
Epoch 11
Loss = 2.0653839612277 :: Training = 0.25 :: Test = 0.13
Epoch 12
Loss = 2.045934189198816 :: Training = 0.31 :: Test = 0.19
Epoch 13
Loss = 2.184712053544089 :: Training = 0.28 :: Test = 0.14
Epoch 14
Loss = 2.0394407583892056 :: Training = 0.22 :: Test = 0.11
Epoch 15
Loss = 1.9859670577519024 :: Training = 0.34 

## Fully Conected Neural Network with Dropout

In [14]:
# Design a two hidden layer architecture with dense layer, dropout layer
# and ReLU as activation function
def fcn_dropout_cifar():
    flat = Flatten()
    hidden_01 = FullyConnected(3072, 500)
    relu_01 = ReLU()
    dropout = Dropout()
    hidden_02 = FullyConnected(500, 100)
    relu_02 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, dropout, hidden_02, relu_02, ouput]

# create a neural network on specified architecture with softmax as score function
fcn_dropout_c10 = NeuralNetwork(fcn_dropout_cifar(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_dropout_c10 = Optimizer.sgd(fcn_dropout_c10, tr_images_batch, tr_labels_batch, LossCriteria.cross_entropy_softmax, batch_size=64, epoch=20, learning_rate=0.01, X_test=te_images_batch, y_test=te_labels_batch, verbose=True)

Epoch 1
Loss = 78.38093186385963 :: Training = 0.12 :: Test = 0.16
Epoch 2
Loss = inf :: Training = 0.16 :: Test = 0.11
Epoch 3


  log_likelihood = -np.log(p[range(m), y])


Loss = inf :: Training = 0.1 :: Test = 0.08
Epoch 4
Loss = inf :: Training = 0.1 :: Test = 0.06
Epoch 5
Loss = 2.297609378682687 :: Training = 0.1 :: Test = 0.06
Epoch 6
Loss = 2.2887696619593476 :: Training = 0.1 :: Test = 0.06
Epoch 7
Loss = 2.2819134754025723 :: Training = 0.1 :: Test = 0.06
Epoch 8
Loss = 2.2766332133303275 :: Training = 0.16 :: Test = 0.11
Epoch 9
Loss = 2.272600493867971 :: Training = 0.16 :: Test = 0.11
Epoch 10
Loss = 2.2695521892185613 :: Training = 0.16 :: Test = 0.11
Epoch 11
Loss = 2.267278449445659 :: Training = 0.16 :: Test = 0.11
Epoch 12
Loss = 2.2656125713778748 :: Training = 0.16 :: Test = 0.11
Epoch 13
Loss = 2.2644225415073604 :: Training = 0.16 :: Test = 0.11
Epoch 14
Loss = 2.2636040639593427 :: Training = 0.16 :: Test = 0.11
Epoch 15
Loss = 2.2630748771252964 :: Training = 0.16 :: Test = 0.11
Epoch 16
Loss = 2.2627701659443087 :: Training = 0.16 :: Test = 0.11
Epoch 17
Loss = 2.262638889036512 :: Training = 0.16 :: Test = 0.11
Epoch 18
Loss = 2.2