# Basics of Deep Learning Projects


Subject of this notebook is to teach fundamental properties of deep learning projects. It is important being able to reproduce different runs of model training, as well as different runs should be comparable to each other. For this, we will use [MLflow](https://mlflow.org/), which is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. It currently offers three components:

* MLflow Tracking: Record and query experiments: code, data, config, and results.
* MLflow Projects: Packaging format for reproducible runs on any platform.
* MLflow Models: General format for sending models to diverse deployment tools.

ToDo's:

* Define different neural networks using `tf.keras`.
* Train them and monitor the training using TensorBoard and MLflow
* Try to intentionally overfit the training set. There are multiple ways to achieve this. 
* Compare different runs using MLflow. Can you spot the point where overfitting happened?


Help:
* [TensorFLow API Documentation](https://www.tensorflow.org/api_docs)
* [MLflow Documentation](https://mlflow.org/docs/latest/index.html)


In [None]:
import tensorflow as tf
import mlflow
import numpy as np

tf.__version__

In [None]:
%%script bash --bg

mlflow server  \
--backend-store-uri "postgresql://mlflow:mlflow@localhost:5432/mlflow" \
--default-artifact-root file:./mlruns \
--host 0.0.0.0

In [None]:
# ToDo: load the dataset from the file `data/fashion_mnist.npz`. 
# The data accesible using the key `data`, the labels as `label` respectively.
with np.load('data/fashion_mnist.npz') as fashion:
    x = # ToDo
    y = # ToDo

In [None]:
def shuffle_split(x, y, at=0.8):
    # ToDo: shuffle data and labels and perform train/test split at given fraction.
    pass

x_train, x_test, y_train, y_test = shuffle_split(x, y, at=0.8)

In [None]:
# ToDo: normalize data
x_train, x_test = # ToDo

In [None]:
# print first 15 labels
for i in range(15):
    if i % 5 == 0:
        print("\n")
    print(y_train[i], end=' ')

In [None]:
# plots the first 15 entries in the train set
import matplotlib.pyplot as plt
%matplotlib inline

for i in range(15):
    plt.subplot(3, 5, i + 1)
    plt.imshow(x_train[i], cmap='gray')

In [None]:
# ToDo: define input and output network parameters
n_input =  # MNIST data input
n_classes = # MNIST total classes

In [None]:
# one hot encoding of labels
def one_hot_encode(a, length):
    temp = np.zeros((a.shape[0], length))
    temp[np.arange(a.shape[0]), a] = 1
    return temp

y_train = one_hot_encode(y_train, n_classes)
y_test = one_hot_encode(y_test, n_classes)

In [None]:
# TODO: define hyper parameters
training_iters = 
learning_rate = 
batch_size = 

In [None]:
# ToDo: MLP definition
model = tf.keras.models.Sequential([
    
])

In [None]:
# ToDo: define cost
cost = 
# define optimizer
optimizer = 

In [None]:
# TODO: compile model with optimizer, loss function and validation metric
model.compile(...)

In [None]:
# ToDo: Train your model and evaluate on holdout test set
import mlflow.tensorflow as mltf

mltf.autolog(every_n_iter=1)

model.fit(....)
model.evaluate(...)