# Important Information

This file has two usages. 
1. This file can be displayed in Jupyter. You can read the task and insert your answers here. Start the notebook from an Anaconda prompt and change to the working directory containing the *.ipynb file.
2. You can execute the code in Google Colab making use of Keras and Tensorflow. If you do not want to create a Google Account, you have to create a local environment for Keras and Tensorflow.

For submission, upload your edited notebook together with all used images in a seperate folder (/images).

Setup for this exercise sheet. Download data and define Tensorflow version.
Execute code only if you setup your enviroment correctly or if you are inside a colab enviroment.

In [1]:
! git clone https://gitlab+deploy-token-26:XBza882znMmexaQSpjad@git.informatik.uni-kiel.de/las/nndl.git

%tensorflow_version 2.x

Der Befehl "git" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.
ERROR:root:Line magic function `%tensorflow_version` not found.


# Exercise 1 (Learning in neural networks)

a) Explain the following terms related to neural networks as short and precise as possible. 

* Loss function
* Stochastic gradient descent
* Mini-batch 
* Regularization
* Dropout
* Batch normalization
* Learning with momentum
* Data augmentation
* Unsupervised pre-training / supervised fine-tuning
* Deep learning


In [None]:
Answer: Write your answer here.

b) Name the most important output activation functions f(z), i.e., activation function of the output neuron(s), together with a corresponding suitable loss function L (in both cases, give the mathematical equation). Indicate whether such a perceptron is used for a classification or a regression task.

In [None]:
Answer: Write your answer here.

# Exercise 2 (Multi-layer perceptron – regression problem)

The goal of this exercise is to train a multi-layer perceptron to solve a high difficulty level nonlinear regression problem. The data has been generated using an exponential function with the following shape:

![IMAGE: perceptron](images/Eckerle4Dataset.png)

This graph corresponds to the values of a dataset that can be downloaded from the Statistical Reference Dataset of the Information Technology Laboratory of the United States on this link:
http://www.itl.nist.gov/div898/strd/nls/data/eckerle4.shtml

This dataset is provided in the file Eckerle4.csv. Note that this dataset is divided into a training and test corpus comprising 60% and 40% of the data samples, respectively. Moreover, the input and output values are normalized to the interval [0, 1]. Basic code to load the dataset and divide it into a training and test corpus, normalizing the data and to apply a multi-layer perceptron is provided in the Jupyter notebook.

Choose a suitable network topology (number of hidden layers and hidden neurons, potentially include dropout, activation function of hidden layers) and use it for the multi-layer perceptron defined in the Jupyter notebook. Set further parameters (learning rate, loss function, optimizer, number of epochs, batch size; see the lines marked with *# FIX!!!* in the Jupyter notebook). Try to avoid underfitting and overfitting. Vary the network and parameter configuration in order to achieve a network performance as optimal as possible. For each network configuration, due to the random components in the experiment, perform (at least) 4 different training and evaluation runs and report the mean and standard deviation of the training and evaluation results. Report on your results and conclusions.

(Source of exercise: http://gonzalopla.com/deep-learning-nonlinear-regression)

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from os.path import join
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras import Model, Input, Sequential
from tensorflow.keras.optimizers import SGD, Adam, Adadelta, Adagrad, Nadam, RMSprop
from tensorflow.keras.utils import normalize
import pandas
from sklearn import preprocessing
from sklearn import model_selection
import sys

###--------
# load data
###--------

# Imports csv into pandas DataFrame object.
path_to_task = "nndl/Lab4"
Eckerle4_df = pandas.read_csv(join(path_to_task,"Eckerle4.csv"), header=0)
 
# Converts dataframes into numpy objects.
Eckerle4_dataset = Eckerle4_df.values.astype("float32")
# Slicing all rows, second column...
X = Eckerle4_dataset[:,1]
# Slicing all rows, first column...
y = Eckerle4_dataset[:,0]
 
# plot data
plt.plot(X,y, color='red')
plt.legend(labels=["data"], loc="upper right")
plt.title("data")
plt.show()

###-----------
# process data
###-----------

# Data Scaling from 0 to 1, X and y originally have very different scales.
X_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
y_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
X_scaled = ( X_scaler.fit_transform(X.reshape(-1,1)))
y_scaled = (y_scaler.fit_transform(y.reshape(-1,1)).reshape(-1) )
 
# Preparing test and train data: 60% training, 40% testing.
X_train, X_test, y_train, y_test = model_selection.train_test_split( X_scaled, y_scaled, test_size=0.40, random_state=3)


###-----------
# define model
###-----------

num_inputs = X_train.shape[1] # should be 1 in case of Eckerle4
num_hidden = ... # for each hidden layer: number of hidden units in form of a python list   # FIX!!!
num_outputs = 1 # predict single number in case of Eckerle4

activation = '...' # activation of hidden layers   # FIX!!!
dropout = ... # 0 if no dropout, else fraction of dropout units (e.g. 0.2)   # FIX!!!

# Sequential network structure.
model = Sequential()

if len(num_hidden) == 0:
  print("Error: Must at least have one hidden layer!")
  sys.exit()  

# add first hidden layer connecting to input layer
model.add(Dense(num_hidden[0], input_dim=num_inputs, activation=activation))

if dropout:
  # dropout of fraction dropout of the neurons and activation layer.
  model.add(Dropout(dropout))
#  model.add(Activation("linear"))

# potentially further hidden layers
for i in range(1, len(num_hidden)):
  # add hidden layer with len[i] neurons
  model.add(Dense(num_hidden[i], activation=activation))
#  model.add(Activation("linear"))

# output layer
model.add(Dense(1))

# show how the model looks
model.summary()

# compile model
opt = ... # FIX!!!
model.compile(loss='...', optimizer=opt, metrics=["..."])# FIX!!!

# Training model with train data. Fixed random seed:
np.random.seed(3)
num_epochs = ...   # FIX !!!
batch_size = ... # FIX !!! 
history = model.fit(X_train, y_train, epochs=num_epochs, batch_size=batch_size, verbose=2)

###-----------
# plot results
###-----------

print("final (mse) training error: %f" % history.history['loss'][num_epochs-1])

plt.plot(history.history['loss'], color='red', label = 'training loss')
plt.legend(labels=["loss"], loc="upper right")
plt.title("training (mse) error")
plt.show()

# Plot in blue color the predicted data and in green color the
# actual data to verify visually the accuracy of the model.
predicted = model.predict(X_test)
plt.plot(y_scaler.inverse_transform(predicted.reshape(-1,1)), color="blue")
plt.plot(y_scaler.inverse_transform(y_test.reshape(-1,1)), color="green")
plt.legend(labels=["predicted", "target"], loc="upper right")
plt.title("evaluation on test corpus")
plt.show()
print("test error: %f" % model.evaluate(X_test, y_test)[0])

In [None]:
Answer: Write your answer here.

# Exercise 3 (Parameters of a multi-layer perceptron – digit recognition)

The 

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from os.path import join
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras import Model, Input, Sequential
from tensorflow.keras.optimizers import SGD, Adam, Adadelta, Adagrad, Nadam, RMSprop
from tensorflow.keras.utils import normalize
import tensorflow.keras.datasets as tfds

###--------
# load data
###--------

(training_input, training_target), (test_input, test_target)  = tfds.mnist.load_data()

# Reserve 10,000 samples for validation
validation_input = training_input[-10000:]
validation_target = training_target[-10000:]
training_input = training_input[:-10000]
training_target = training_target[:-10000]

print("training input shape: %s, training target shape: %s"  % (training_input.shape, training_target.shape))
print("validation input shape: %s, validation target shape: %s"  % (validation_input.shape, validation_target.shape))
print("test input shape: %s, test target shape: %s"  % (test_input.shape, test_target.shape))
# range of input values: 0 ... 255
print("\n")

# plot some sample images
num_examples = 1
for s in range(num_examples):
  print("Example image, true label: %d" % training_target[s])
  plt.imshow(training_input[s], vmin=0, vmax=255, cmap=plt.cm.gray)
  plt.show()

###-----------
# process data
###-----------

# Note: shuffling is performed in fit method

# scaling inputs from range 0 ... 255 to range [0,1] if desired
scale_inputs = True # scale inputs to range [0,1]
if scale_inputs:
  training_input = training_input / 255
  validation_input = validation_input / 255 
  test_input = test_input / 255

print("min. training data: %f" % np.min(training_input))
print("max. training data: %f" % np.max(training_input))
print("min. validation data: %f" % np.min(validation_input))
print("max. validation data: %f" % np.max(validation_input))
print("min. test data: %f" % np.min(test_input))
print("max. test data: %f" % np.max(test_input))

# flatten inputs to vectors
training_input = training_input.reshape(training_input.shape[0], training_input.shape[1] * training_input.shape[2])
validation_input = validation_input.reshape(validation_input.shape[0], validation_input.shape[1] * validation_input.shape[2])
test_input = test_input.reshape(test_input.shape[0], test_input.shape[1] * test_input.shape[2])
print(training_input.shape)
print(validation_input.shape)
print(test_input.shape)

num_classes = 10 # 10 digits

###-----------
# define model
###-----------

histories = {}
opt_learning_rate = {}
final_training_loss = {}
final_training_accuracy = {}
final_validation_loss = {}
final_validation_accuracy = {}
final_test_loss = {}
final_test_accuracy = {}

configurations = [
        # single hidden layer with 50 neurons
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [50],
         'solver': 'SGD',
         'activation':'relu'}, # activation of hidden layers
         
        # single hidden layer with 100 neurons
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [100],
         'solver': 'SGD',
         'activation':'relu'}, # activation of hidden layers

        # single hidden layer with 200 neurons
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [200],
         'solver': 'SGD',
         'activation':'relu'}, # activation of hidden layers
         
        # two hidden layers with 100 neurons each
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [100, 100],
         'solver': 'SGD',
         'activation':'relu'}, # activation of hidden layers
         
         # three hidden layers with 100 neurons each
        {'learningRates': [0.001, 0.01, 0.1], 
         'hiddenLayerSizes': [100, 100, 100],
         'solver': 'SGD',
         'activation':'relu'}, # activation of hidden layers
         
         # four hidden layers with 100 neurons each
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [100, 100, 100, 100],
         'solver': 'SGD',
         'activation':'relu'}, # activation of hidden layers

        # single hidden layer with 100 neurons, Adam
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [100],
         'solver': 'Adam',
         'activation':'relu'}, # activation of hidden layers

        # single hidden layer with 100 neurons, AdaGrad
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [100],
         'solver': 'Adagrad',
         'activation':'relu'}, # activation of hidden layers

        # single hidden layer with 100 neurons, SGD, logistic
        {'learningRates': [0.001, 0.01, 0.1],
         'hiddenLayerSizes': [100],
         'solver': 'SGD',
         'activation':'logistic'}, # activation of hidden layers
]

numRepetitions = 4 # repetitions of experiment due to stochastic nature

num_inputs = training_input.shape[1] 
num_outputs = num_classes 
dropout = 0 # 0 if no dropout, else fraction of dropout units (e.g. 0.2)   # FIX!!!

idx_config = 0

for config in configurations:
  print("=======")
  print("Now running tests for config", config)

  learningRates = config['learningRates']
  num_hidden = config['hiddenLayerSizes']
  solver = config['solver']
  activation = config['activation']

  # Sequential network structure.
  model = Sequential()

  if len(num_hidden) == 0:
    print("Error: Must at least have one hidden layer!")
    sys.exit()  

  # add first hidden layer connecting to input layer
  model.add(Dense(num_hidden[0], input_dim=num_inputs, activation=activation))

  if dropout:
    # dropout of fraction dropout of the neurons and activation layer.
    model.add(Dropout(dropout))
  #  model.add(Activation("linear"))

  # potentially further hidden layers
  for i in range(1, len(num_hidden)):
    # add hidden layer with len[i] neurons
    model.add(Dense(num_hidden[i], activation=activation))
  #  model.add(Activation("linear"))

  # output layer
  model.add(Dense(units=num_outputs, name = "output"))

  # print configuration
  print("\nModel configuration: ")
  print(model.get_config())
  print("\n")

  # show how the model looks
  model.summary()

  optLearningRate = 0
  optValidationAccuracy = 0

  histories_lr = [] # remember history for each learning rate

  for idx_lr in range(len(learningRates)):
  
    print("MODIFYING LEARNING RATE")
    learningRate = learningRates[idx_lr]
    print("learning rate = %f" % learningRate)

    train_loss = np.zeros(numRepetitions)
    train_acc = np.zeros(numRepetitions)
    val_loss = np.zeros(numRepetitions)
    val_acc = np.zeros(numRepetitions)
    test_loss = np.zeros(numRepetitions)
    test_acc = np.zeros(numRepetitions)

    histories_rep = [] # (temporarily) remember history of each repetition
    for idx_rep in range(numRepetitions):
      print("\nIteration %d..." % idx_rep)  
      
      # compile model
      if solver == 'SGD':
        opt = SGD(learning_rate=learningRate) # SGD or Adam, Nadam, Adadelta, Adagrad, RMSProp, potentially setting more parameters
      elif solver == 'Adam':
        opt = Adam(learning_rate=learningRate)
      elif solver == 'Nadam':
        opt = Adam(learning_rate=learningRate)
      elif solver == 'Adadelta':
        opt = Adam(learning_rate=learningRate)
      elif solver == 'Adagrad':
        opt = Adam(learning_rate=learningRate)
      elif solver == 'RMSprop':
        opt = RMSprop(learning_rate=learningRate)
      model.compile(optimizer=opt,loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['sparse_categorical_accuracy'])

      # Training model with train data. Fixed random seed:
      num_epochs = 30 # FIX !!!
      batch_size = 1 # FIX !!! 
      history = model.fit(training_input, training_target, epochs=num_epochs, batch_size=batch_size, shuffle="True", verbose=2)
      histories_rep.append(history) # remember all histories from all repetitions
      train_loss[idx_rep] = history.history['loss'][num_epochs-1] 
      train_acc[idx_rep] = history.history['sparse_categorical_accuracy'][num_epochs-1]
      val_loss[idx_rep] = model.evaluate(validation_input, validation_target)[0]
      val_acc[idx_rep] = model.evaluate(validation_input, validation_target)[1]
      test_loss[idx_rep] = model.evaluate(test_input, test_target)[0]
      test_acc[idx_rep] = model.evaluate(test_input, test_target)[1]

    # print results:
    print("training loss (in brackets: mean +/- std):")
    for i in range(numRepetitions):
        print("%f" % train_loss[i])
    print("(%f +/- %f)\n" % (np.mean(train_loss), np.std(train_loss, ddof=1)))

    print("training accuracy (in brackets: mean +/- std):")
    for i in range(numRepetitions):
        print("%f" % train_acc[i])
    print("(%f +/- %f)\n" % (np.mean(train_acc), np.std(train_acc, ddof=1)))

    print("validation loss (in brackets: mean +/- std):")
    for i in range(numRepetitions):
        print("%f" % val_loss[i])
    print("(%f +/- %f)\n" % (np.mean(val_loss), np.std(val_loss, ddof=1)))

    print("validation accuracy (in brackets: mean +/- std):")
    for i in range(numRepetitions):
        print("%f" % val_acc[i])
    print("(%f +/- %f)\n" % (np.mean(val_acc), np.std(val_acc, ddof=1)))

    print("test loss (in brackets: mean +/- std):")
    for i in range(numRepetitions):
        print("%f" % test_loss[i])
    print("(%f +/- %f)\n" % (np.mean(test_loss), np.std(test_loss, ddof=1)))

    print("test accuracy (in brackets: mean +/- std):")
    for i in range(numRepetitions):
        print("%f" % test_acc[i])
    print("(%f +/- %f)\n" % (np.mean(test_acc), np.std(test_acc, ddof=1)))

    # remember history of best repetition (based on maximal validation accuracy)
    idx_best_rep = np.argmax(val_acc)

    # determine optimal learning rate (based on mean validation accuracy over repetitions)
    if np.mean(val_acc) > optValidationAccuracy:
        optValidationAccuracy = np.mean(val_acc)
        opt_learning_rate[idx_config] = learningRate  
        # remember history
        histories[idx_config] = histories_rep[idx_best_rep]
        # remember evaluation results
        final_training_loss[idx_config] = train_loss[idx_best_rep]
        final_training_accuracy[idx_config] = train_acc[idx_best_rep]
        final_validation_loss[idx_config] = val_loss[idx_best_rep]
        final_validation_accuracy[idx_config] = val_acc[idx_best_rep]
        final_test_loss[idx_config] = test_loss[idx_best_rep]
        final_test_accuracy[idx_config] = test_acc[idx_best_rep]   

  print("optimal learning rate for this configuration: %f" % opt_learning_rate[idx_config])

  idx_config = idx_config + 1

###-----------------------
# print evaluation results
###-----------------------

for i in range(len(configurations)): 
  print("\nconfiguration %s:\n" % configuration[i])
  print("optimal learning rate: %f" % opt_learning_rate[i])
  print("final training loss: %f" % final_training_loss[i])
  print("final training accuracy: %f" % final_training_accuracy[i])
  print("final validation loss: %f" % final_validation_loss[i])
  print("final validation accuracy: %f" % final_validation_accuracy[i])
  print("final test loss: %f" % final_test_loss[i])
  print("final test accuracy: %f" % final_test_accuracy[i])

###-----------
# plot results
###-----------
 
# plot setup
num_rows = np.ceil(len(configurations)/2)
fig, axes = plt.subplots(num_rows, 2, figsize=(15, 10))
fig.tight_layout() # improve spacing between subplots, doesn't work
plt.subplots_adjust(left=0.125, right=0.9, bottom=0.1, top=0.9, wspace=0.2, hspace=0.2) # doesn't work
legend = []
i = 0
axes_indices = {}

for i in range(num_rows):
  axes_indices[2*i] = (i, 0)
  axes_indices[2*i+1] = (i, 1)

for i in range(len(configurations)):
  # plot loss    
  axes[axes_indices[i]].set_title('configuration ' + str(i))
  if i == 8 or i == 9:  
    axes[axes_indices[i]].set_xlabel('Epoch number')
  axes[axes_indices[i]].set_ylim(0, 1)
  axes[axes_indices[i]].plot(histories[name].history['categorical_crossentropy'], color = 'blue', 
              label = 'training loss')
  axes[axes_indices[i]].plot(histories[name].history['categorical_accuracy'], color = 'red', 
              label = 'traning accuracy')
  axes[axes_indices[i]].legend()

  i = i + 1


# show the plot
plt.show()

In [None]:
Answer: Write your answer here.

# Exercise 4 (Vanishing gradient)

The 

In [None]:
Answer: Write your answer here.