# Introduction
This notebook will handle the topic of an autoencoder on the MNIST-dataset. In particular there will be an denoising autoencoder that reduces noise in data by reconstructing the original image.

The specifics of autoencoding are in the fact that there is no label needed to predict as it is known in common deep-learning architecture. Under this particular architecture, the goal is to "predict" its given input values (the features). Therefore we better say "reconstruct" instead of "predict". The model is forced to reconstruct the the given features, this is implemented through three major changes to the traditional NN structure:
1. Building the model out of two seperate models (encoder and decorder).
2. Reducing the given information in the middle of the network (codings).
3. Using the features as label values as well (prediction value is the input).

You will find these structures as well as two variations of denoising autoencoders in this notebook, one with a simple deep neural network (First Model - Deep Denoising Autoencoder) and a dropout layer and another one (Second Model - Convolutional Denoising Autoencoder) with convolution layers and a GaussianNoise creation.

Finally to visualize the capabilities of autoencoding, one of the two trained models will be used to train a new model for a completely other task, a classification one. This represents the unsupervised pre-trained approach of model development when there is not enough data to train a model for. 

## My other Projects
If you are interested in some more clearly analysis of the dataset, take a look into my other notebooks about the MNIS-dataset:

- cnn-for-digit-recognition-MNIST (accuracy score: 0.99135):
    - https://www.kaggle.com/skiplik/cnn-for-digit-recognition-mnist 
- Finetuning Hyperparameters in Deep Neural Network:
    - https://www.kaggle.com/skiplik/finetuning-hyperparameters-in-deep-neural-network
- Digit Recognition with a Deep Neural Network:
    - https://www.kaggle.com/skiplik/digit-recognition-with-a-deep-neural-network
- Another MNIST Try:
    - https://www.kaggle.com/skiplik/another-mnist-try
- First NN by Detecting Handwritten Characters:
    - https://www.kaggle.com/skiplik/first-nn-by-detecting-handwritten-characters 
- ...

# Imports and Helper Functions

In [None]:
import sys
assert sys.version_info >= (3,5)

# Is this notebook running on Colab or Kaggle?
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from functools import partial
from sklearn.model_selection import train_test_split

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer

import tensorflow as tf
from tensorflow import keras
from tensorflow import image
from tensorflow import core
from tensorflow.keras import layers
from tensorflow.keras.utils import plot_model

#import mlflow
#from mlflow import log_metric, log_param, log_artifacts
#import mlflow.tensorflow
#from mlflow import pyfunc

assert tf.__version__ >= "2.0"

print(f"Tensorflow Version: {tf.__version__}")
print(f"Keras Version: {keras.__version__}")

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. LSTMs and CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")
    if IS_KAGGLE:
        print("Go to Settings > Accelerator and select GPU.")


tf.random.set_seed(42)
np.random.seed(42)
random_state=42

kaggle = 1 # Kaggle path active = 1

# change your local path here
if kaggle == 1 :
    MNIST_PATH= '../input/digit-recognizer'
else:
    MNIST_PATH= '../autoencoding-mnist/data/input/digit-recognizer'



import os
for dirname, _, filenames in os.walk(MNIST_PATH): 
    for filename in filenames:
        print(os.path.join(dirname, filename))
        

## Get The Data

In [None]:
# Data path and file
CSV_FILE_TRAIN='train.csv'
CSV_FILE_TEST='test.csv'

def load_csv_data(path, csv_file):
    csv_path = os.path.join(path, csv_file)
    return pd.read_csv(csv_path)

def load_csv_data_manuel(path, csv_file):
    csv_path = os.path.join(path, csv_file)
    csv_file = open(csv_path, 'r')
    csv_data = csv_file.readlines()
    csv_file.close()
    return csv_data
    

train = load_csv_data(MNIST_PATH,CSV_FILE_TRAIN)
test = load_csv_data(MNIST_PATH,CSV_FILE_TEST)

### Train Val Split

In [None]:
X_train, X_valid, y_train, y_valid =  train_test_split(train.loc[:, train.columns != 'label'], train['label'], test_size=0.33, random_state=random_state)

### Preparing for Neural Network Structure

In [None]:
X_train = X_train.astype(np.float32) / 255
X_valid = X_valid.astype(np.float32) / 255
X_test = test.astype(np.float32) / 255

In [None]:
print(f"'X_train' shape: {X_train.shape}")
print(f"'X_valid' shape: {X_valid.shape}")
print(f"'X_test' shape: {X_test.shape}")

#### Reshaping Data in Proper Structure

In [None]:
### DEBUG TODO: implement /wo chanel param
def reshape_images(df_images):
    '''Reshaping image structure to a three dimensional tensor'''
    df_images_resh = df_images.values.reshape(df_images.shape[0], 28, 28)
    return df_images_resh

def reshape_conv_images(df_images):
    '''
    Reshaping image structure to a four dimensional tensor for conv networks,
    where chanel is last
    '''
    df_images_conv_resh = df_images.values.reshape(df_images.shape[0], 28, 28, 1)
    return df_images_conv_resh

In [None]:
# using this data for only dense layer models
X_train_resh = reshape_images(X_train)
X_valid_resh = reshape_images(X_valid)
X_test_resh = reshape_images(X_test)

# using this data for convolutional models
X_train_resh_ch = reshape_conv_images(X_train)
X_valid_resh_ch = reshape_conv_images(X_valid)
X_test_resh_ch = reshape_conv_images(X_test)

y_train_resh =  y_train.values.reshape(y_train.shape[0], 1)
y_valid_resh = y_valid.values.reshape(y_valid.shape[0], 1)

In [None]:
print(f"'X_train_resh' shape: {X_train_resh.shape}")
print(f"'X_valid_resh' shape: {X_valid_resh.shape}")
print(f"'X_test_resh' shape: {X_test_resh.shape}")
print(f"'y_train_resh' shape: {y_train_resh.shape}")
print(f"'y_valid_resh' shape: {y_valid_resh.shape}")

## Plotting Functions

In [None]:
def plot_image(image):
    plt.imshow(image, cmap="binary")
    plt.axis("off")

def show_reconstructions(model, X_valid, n_images=5):
    reconstructions = model.predict(X_valid[:n_images])
    fig = plt.figure(figsize=(n_images * 1.5, 3))

    for image_index in range(n_images):
        plt.subplot(2, n_images, 1 + image_index)
        plot_image(X_valid[image_index])
        plt.subplot(2, n_images, 1 + n_images + image_index)
        plot_image(reconstructions[image_index])

### Plotting One Example Value

In [None]:
print(f"Example label: {y_valid.iloc[1]}")

In [None]:
plot_image(X_valid.iloc[1].values.reshape(28,28))

# Autoencoding Models for Denoising
This chapter will be about two denoising autoencoder structures and their different ways of building such an architecture to reduce noise in data. The results of these models will be shown in the next chapter "Results - Denoising Autoencoders".

Basically an autoencoder architecture includes two seperate models, the encoder and the decorder. In the neuron point of view they both have a rejuvenation which represents a need of information compression - keep only the realy relevant information and drop the less relevant information. Both models combined therefore need to recreate information artificially - the main idea of an autoencoder structure.

## First Model - Deep Denoising Autoencoder 
This model uses fully connected layers and a dropout layer to simulate the noising in the data, which it will be trained on in the later process. There is no convolutional layer or other structures used like in the model in the following chapter.

### Encoder Architecture

In [None]:
dropout_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="selu")           # Codings
])

In [None]:
dropout_encoder.summary()

### Decoder Architecture

In [None]:
dropout_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])

In [None]:
dropout_decoder.summary()

### Building the Model

In [None]:
dropout_ae = keras.models.Sequential([dropout_encoder, dropout_decoder])
dropout_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.5))

### Training and Logging -  Deep Denoising Autoencoder

#### Logging the Model (ML Flow)

In [None]:
#mlflow.tensorflow.autolog()

#### Training the Model

In [None]:
history = dropout_ae.fit(X_train_resh, X_train_resh, epochs= 10, validation_data=(X_valid_resh, X_valid_resh))

#### Losses
The question is: "How good does the model reconstruct the given features/information". So there is no prediction of some label value! The model "predicts" (or better) reconstruct its input values as good as it can.

In [None]:
# Getting in touch with the model statistics 
# shows the available keys in models history:  print(history.history.keys())
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

## Second Model - Convolutional Denoising Autoencoder
The second model is built on a convolutional structure. It uses several Conv2D layers in combinations with MaxPool2D layers to build the base structure. Instead of a Dropout layer to create noise  in the data (like the model one version) this time a GaussianNoise layer is used.

### Convolutional Encoder Architecture  

In [None]:
conv_encoder = keras.models.Sequential([
    keras.layers.GaussianNoise(0.1, input_shape = [28, 28, 1]),
    keras.layers.Conv2D(16, kernel_size=3, activation='selu'),
    keras.layers.MaxPool2D(pool_size=2),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="selu"),
    keras.layers.MaxPool2D(pool_size=4),
    keras.layers.Dense(392, activation="relu")           # Codings
])

In [None]:
conv_encoder.summary()

### Convolutional Decoder Architecture

In [None]:
conv_decoder = keras.models.Sequential([
    keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding="valid",activation="selu", input_shape=[3, 3, 392]),
    keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding="same",activation="selu"),
    keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding="same", activation="sigmoid"),
    keras.layers.Reshape([28, 28, 1])
])

In [None]:
conv_decoder.summary()

### Building the Model - Deep Convolutional Denoising Autoencoder

In [None]:
conv_denoising_ae = keras.models.Sequential([conv_encoder, conv_decoder])
conv_denoising_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.5))

In [None]:
conv_denoising_ae.summary()

### Training and Logging - Deep Convolutional Denoising Autoencoder

#### Training the Model

In [None]:
history = conv_denoising_ae.fit(X_train_resh_ch, X_train_resh_ch, epochs= 10, validation_data=(X_valid_resh_ch, X_valid_resh_ch))


In [None]:
# saving model for weight using in later process
conv_denoising_ae.save('conv_denoising_ae')

In [None]:
# Probably not working on Kaggle
plot_model(conv_denoising_ae, 'conv_autencoder.png', show_shapes=True)

# Results - Denoising Autoencoders
The structure of the "Results" chapter will in first place visualize the results of the different models (First Model - Deep Denoising Autoencoder and Second Model - Convolutional Denoising Autoencoder) by showing the origin digit and the reconstructed one without any noised data input. This is followed by a noised data test where a noised digit represents the input into the model and and a denoised, reconstructed one its result ("prediction").

## Results - Deep Denoising Autoencoder

### Origin Digit

In [None]:
plot_image(X_valid_resh[10])

### ("Predicted") Reconstructed digit

In [None]:
testimage = X_valid_resh[10].reshape(1, 28, 28)     # single image prepared for the test (reshape)
plot_image(dropout_ae.predict(testimage)[0, :, :])  # needed to extract the second and third array of the three dimensional tensor struc for printing the image

### Showing Multi Results

In [None]:
show_reconstructions(dropout_ae, X_valid_resh)

### Using Noised Images

#### Creating Noisy Data


In [None]:
new_images = X_test[5:10]
new_images = reshape_images(new_images)
noisy_images = new_images + np.random.randn(5,28,28) *0.1

#### Origin Number


In [None]:
idx_testnumber = 0      # Index of test digit for the next plottings 
plot_image(new_images[idx_testnumber])

#### Noisy Digit

In [None]:
plot_image(noisy_images[idx_testnumber])

#### Denoised Digit

In [None]:
# Very Important ! added a new axis according to the models batch based behavior
plot_image(dropout_ae.predict(noisy_images[np.newaxis ,idx_testnumber]).reshape(28,28)) 

#### Multiple Digits in Comparision

In [None]:
show_reconstructions(dropout_ae, noisy_images)

## Results - Convolutional Denoising Autoencoder

### Using Noised Images

#### Creating Noisy Data

In [None]:
new_images_conv = X_test[5:10]
new_images_conv = reshape_conv_images(new_images_conv)
noisy_images_conv = new_images_conv + np.random.randn(5, 28, 28, 1) *0.1

#### Origin Digit


In [None]:
idx_testnumber = 0      # Index of test number for the next plottings 
plot_image(new_images_conv[idx_testnumber])

#### Noisy Digit

In [None]:
plot_image(noisy_images_conv[idx_testnumber])

#### Denoised Digit

In [None]:
# Very Important ! added a new axis according to the models batch based behavior
plot_image(conv_denoising_ae.predict(noisy_images_conv[np.newaxis ,idx_testnumber]).reshape(28,28))

#### Multiple Digits in Comparision

In [None]:
show_reconstructions(conv_denoising_ae, noisy_images_conv)

# Third Model - Unsupervised Pre-Training for Supervised Training
This chapter is about a third model whose topic is to predict a digit shown on a MNIST image. Basically, the architecture represents the way of using unsupervised pre-training for projects that do not have enough training data to train the model properly. In this part layers from the already trained Convolutional Denoising Autoencoder (second model) will be used to build a classification prediction model.

In [None]:
# Loading untouched model weights 
# Doing this for the reason that model weights are modified not only for a specifically created model object (that's what I was recognizing), but for the stored object in the cache.
# Therefore I load here the model new from storage to be capable of running multiple new model trainings without any influence on previous runs (make this training deterministic)
conv_denoising_ae_clone = tf.keras.models.load_model("conv_denoising_ae")

In [None]:
ae_model_id = 0 # 0 for encorder / 1 for decorder
layerscount = 5 # how many layers of the model ...


new_model = keras.models.Sequential(conv_denoising_ae_clone.layers[ae_model_id].layers[:layerscount])
new_model.trainable =True  # Setting it on True to sepcificly set them on False afterwards  

In [None]:
new_model.summary()

### Making the Layers Trainable / Not-Trainable
The already trained layers which have been trained in the unsupervised way above need to be frozen so that only the new layers for the new topic (classification) can be trained.

In [None]:
# Make the layers not trainable 
for layer in new_model.layers[:layerscount -1 ]:              # last layer still be trainable
    layer.trainable = False
    print(f"Layer '{layer.name}' is Trainable = {layer.trainable}")

Adding the layers for the new topic

In [None]:
new_model.add(keras.layers.Flatten())
new_model.add(keras.layers.Dense(10, activation="softmax"))

Viewing final layers-construction and its trainabilaties

In [None]:
for layer in new_model.layers:
    print(f"Layer '{layer.name}' trainable state = {layer.trainable}")

In [None]:
new_model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Nadam(), metrics=["accuracy"]) ##keras.optimizers.SGD(learning_rate=0.0003)

In [None]:
new_model.summary()

In [None]:
plot_model(new_model, 'autencoder_base_w_classifier.png', show_shapes=True)

In [None]:
new_model.save('class_net_01')

## Creating Training Data for Unsupervised Pre-Training Model
In this part 500 data points will be created for training a normal backpropagation model based on the unsupervised pre-trained model.

Additionally, the remaining data points will be used for testing the model's accuracy.

In [None]:
#DEBUG TODO: Using the reshape_conv_image function here or better the already converted dataframe set.

X_train_pret_resh_500 = X_train_resh_ch[:500]
X_valid_pret_resh_500 = X_valid_resh_ch[:500]

X_valid_pret_resh_r = X_valid_resh_ch[500:]   # Choosing the last 500 digits for testing unknown, untrained data


y_train_pretrained_500 = y_train_resh[:500]
y_valid_pretrained_500 = y_valid_resh[:500]

y_valid_pretrained_r = y_valid_resh[500:]

## Training and Logging

#### Tensorboard Logging

In [None]:
# Tensorboard logging structure function
root_logdir = "../../tensorboard-logs"

def get_run_logdir(root_logdir, project):
    '''
    Returns logdir to the Tensorboard log for a specific project.

            Parameters:
                    root_logdir (str) : basic logdir from Tensorboard
                    project (str): projectname that will be logged in TB

            Returns:
                    os.path (str): Path to the final logdir
    '''
    import time
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    project_logdir = os.path.join(root_logdir,project)
    return os.path.join(project_logdir, run_id)


tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=get_run_logdir(root_logdir,"autoencoding_mnist"), histogram_freq=1)


#### Training the Pre-Trained Model

In [None]:
history_pretrainted = new_model.fit(X_train_pret_resh_500,y_train_pretrained_500 ,epochs=20, \
                                        validation_data=(X_valid_pret_resh_500,y_valid_pretrained_500), \
                                        callbacks=tensorboard_callback
                                    )


### Loss Plotting

In [None]:
plt.plot(history_pretrainted.history['loss'])
plt.plot(history_pretrainted.history['val_loss'])

## Predicting the Digit on a Pre-Trained Autoencoder Based Model

The prediction test on unseen data with the pre-trained autoencoder based model will be used with the validation dataset. The model is trained on the first 500 digits from the train set and validated by the first 500 digits on the validation set. There are still enough data in the validation set (and in the train dataset but it sounds better to use the validation set for) to test the model with unseen data (in total 13.360 digits).

First there is a visualized test, where the test digit and its prediction will be printed.

In [None]:
idx_testnumber_cls_ae = 8      # Index of test number for the next plottings 
plot_image(X_valid_pret_resh_r[idx_testnumber_cls_ae])

In [None]:
print(f"Prediction based on the picture above: {np.argmax(new_model.predict(X_valid_pret_resh_r[np.newaxis, idx_testnumber_cls_ae]))}")
print(f"Probability of the Prediction: {new_model.predict(X_valid_pret_resh_r[np.newaxis, idx_testnumber_cls_ae])}")

Following will be determine the accuracy on the remaining validation dataset of 13.360 digits and shows the overall accuracy of the model after pre-training and classification training.

In [None]:
y_val_pred_r = new_model.predict(X_valid_pret_resh_r)
y_val_pred_r = y_val_pred_r.argmax(axis = 1)[:,None]

In [None]:
from sklearn.metrics import accuracy_score
acc_score_val = accuracy_score(y_valid_pretrained_r, y_val_pred_r)

print(f"Unsupervised pre-trained model, trained on 500 digits, predicting {y_val_pred_r.shape[0]:} digits with only 20 epochs trained with an accuracy of: {acc_score_val}")