# ML-Fundamentals - Neural Networks - Exercise: Neural Network Framework

# Tasks
Your main goal is to extend the existing framework, to perform experiments with different model combinations and to document your observations. Here is a list of necessary tasks and some ideas for additional points:

  * (5) 1 to 5 points are given for improving the class and method comments in the framework files. Points are given based on the quality and quantity of the comments.
  * (2) Implement `Dropout` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (5) Implement `Batchnorm` in `layer.py` and test your implementation with a toy example. Create and train a model that includes Dropout as a layer.
  * (5) Do something extra, up to 5 points.  
  
Please document thoroughly and explain what you do in your experiments, so that work in the notebook is comprehensible, else no points are given.

# Requirements

## Python-Modules

In [1]:
# custom
from htw_nn_framework.networks import NeuralNetwork
from htw_nn_framework.layer import *
from htw_nn_framework.activation_func import *
from htw_nn_framework.loss_func import *
from htw_nn_framework.optimizer import *

# third party
from deep_teaching_commons.data.fundamentals.mnist import Mnist

## Data

In [2]:
# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data')

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=False, normalized=True)
print(train_images.shape, train_labels.shape)

# reshape to match generell framework architecture 
train_images, test_images = train_images.reshape(60000, 1, 28, 28), test_images.reshape(10000, 1, 28, 28)            
print(train_images.shape, train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

auto download is active, attempting download
mnist data directory already exists, download aborted
(60000, 28, 28) (60000,)
(60000, 1, 28, 28) (60000,)


# MNIST Fully Connected Network Example
This model and optimization is taken from `framework_exercise.ipynb` as an example for a typical pipeline using the framework files.

# Your Extensions and Experiments

## Preparations
### Download dataset

In [3]:
from sklearn.datasets import fetch_lfw_people
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# for our network we use the data directly
X = lfw_people.data
print(X.shape)

# the label to predict is the id of the person
y = lfw_people.target
target_names = lfw_people.target_names
print(y.shape)
print(target_names.shape)

(1288, 1850)
(1288,)
(7,)


### Splitting, normalizing and randomizing the Dataset

In [4]:
from sklearn.model_selection import train_test_split

# split into a training and testing set
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42)

# shuffle training data
shuffle_index = np.random.permutation(966)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]

print("train shapes:", X_train.shape, ", ", y_train.shape)
print("test shapes:", X_test.shape, ", ", y_test.shape)
print(y_train[0])

# to get our matrix values inbetween 0 and 1
X_train /= 255
X_test /= 255
print("max values", np.max(X_train))
print("min value", np.max(X_test))

train shapes: (966, 1850) ,  (966,)
test shapes: (322, 1850) ,  (322,)
3
max values 1.0
min value 1.0


### Some cropping magic

In [5]:
# cropping magic in order to get squared images
X_train = X_train[:, 0:1849]
X_test = X_test[:, 0:1849]

print("train shapes:", X_train.shape, ", ", y_train.shape)
print("test shapes:", X_test.shape, ", ", y_test.shape)

train shapes: (966, 1849) ,  (966,)
test shapes: (322, 1849) ,  (322,)


### Labels to binary vector - not needed

### Some reshape magic

In [6]:
# reshape to match generell framework architecture
X_train, X_test = X_train.reshape(
    966, 1, 43, 43), X_test.reshape(322, 1, 43, 43)
print(X_train.shape, X_test.shape)

(966, 1, 43, 43) (322, 1, 43, 43)


## Testing same MNIST Model

## Testing activation functions and optimizers

### Testing Leaky Relu and sgd_momentum + nesterov

### Testing Sigmoid and adagrad + rmsprop

### Testing tan hyperbolic and adadelta + adam

## Conv + Pooling Layer Test

### MNIST test

In [None]:
# design a three hidden layer architecture with Dense-Layer
# and ReLU as activation function


def fcn_conv_test():
    conv_01 = Conv(1, 32, (3, 3), stride=2, padding=0)
    conv_02 = Conv(32, 64, (3, 3), stride=2, padding=0)
    max_pooling = Pool()
    flat = Flatten()
    hidden_01 = FullyConnected(576, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [conv_01, conv_02, max_pooling, flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]


# create a neural network on specified architecture with softmax as score function
fcn_conv_test_1 = NeuralNetwork(
    fcn_conv_test(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_conv_test_1 = Optimizer.adam(fcn_conv_test_1, train_images, train_labels, LossCriteria.cross_entropy_softmax,
                                     batch_size=128, epoch=10, learning_rate=0.0001, X_test=test_images, y_test=test_labels, verbose=True)

Epoch 1


  log_likelihood = -np.log(p[range(m), y])


### LFW test

In [14]:
# design a three hidden layer architecture with Dense-Layer
# and ReLU as activation function
def fcn_conv_test():
    conv_01 = Conv(1,8, (3, 3), stride=2, padding=0)
    conv_02 = Conv(8,16, (3, 3), stride=2, padding=0)
    max_pooling = Pool()
    flat = Flatten()
    hidden_01 = FullyConnected(400, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 7)
    return [conv_01,conv_02,max_pooling,flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]


# create a neural network on specified architecture with softmax as score function
fcn_conv_test_2 = NeuralNetwork(fcn_conv_test(), score_func=LossCriteria.softmax)

# optimize the network and a softmax loss
fcn_conv_test_2 = Optimizer.adam(fcn_conv_test_2, X_train, y_train, LossCriteria.cross_entropy_softmax, batch_size=64,
                    epoch=50, learning_rate=0.01, X_test=X_test, y_test=y_test, verbose=True)

Epoch 1
Loss = 4.752318926052453 :: Training = 0.2505175983436853 :: Test = 0.2453416149068323
Epoch 2
Loss = 1.7507978155072184 :: Training = 0.3612836438923395 :: Test = 0.3695652173913043
Epoch 3
Loss = 1.6860057606400363 :: Training = 0.3944099378881988 :: Test = 0.422360248447205
Epoch 4
Loss = 1.6072208893237012 :: Training = 0.39751552795031053 :: Test = 0.453416149068323
Epoch 5
Loss = 1.5867796036746022 :: Training = 0.39751552795031053 :: Test = 0.4472049689440994
Epoch 6
Loss = 1.6253500838414503 :: Training = 0.39648033126293997 :: Test = 0.43788819875776397
Epoch 7
Loss = 1.5651489034428692 :: Training = 0.39544513457556935 :: Test = 0.4440993788819876
Epoch 8
Loss = 1.5399461818259794 :: Training = 0.39544513457556935 :: Test = 0.4440993788819876
Epoch 9
Loss = 1.5266031430415952 :: Training = 0.386128364389234 :: Test = 0.40062111801242234
Epoch 10
Loss = 1.5015057386908723 :: Training = 0.38819875776397517 :: Test = 0.3944099378881988
Epoch 11
Loss = 1.46845107659631 ::