# **Landscape Generation**
## **Project for the course of Deep Learning 2022-2023**

**Author**: Jahrim Gabriele Cesario

**GitHub Repository**: [link](https://github.com/jahrim/dl-project)

**PowerPoint Presentation (Summary):** [link](https://github.com/jahrim/dl-project/blob/master/docs/landscape-generation.pptx)

---

## Introduction

In this section, the content of this notebook will be briefly introduced.

### Goal

The goal of this project is to train a model capable of generating images representing different kinds of landscapes. In other words, the final result
of this project should be a **Landscape Generator**.

### Dataset

The training will be based on the [Landscape Recognition Dataset](https://www.kaggle.com/datasets/utkarshsaxenadn/landscape-recognition-image-dataset-12k-images) available on [Kaggle](https://www.kaggle.com/), which contains 12.000 images of **Coasts**, **Deserts**, **Forests**, **Glaciers** and **Mountains**, therefore making the model only capable of generating images of such landscape types.

### Strategy

The architecture chosen for implementing the **Landscape Generator** will be
that of a **Conditional Variational AutoEncoder (CVAE)**, made of two main
components:
- **Encoder**: mapping original images of landscapes into a compressed latent space;
- **Decoder**: mapping points of the latent space into generated images of landscapes. This part of the model will play the role of **Landscape Generator**.

To make it **Conditional**, the **Encoder** and **Decoder** will be trained with
knowledge about the type of landscape they are processing, so that during prediction it will be possible to use the **Decoder** to generate a specific landscape type decided by the user.

More formally, the **Decoder** will be a function $D$ with the following
specification:

$D(z,y) = x$

where

$z$ : is a point in the latent space $Z$

$y$ : is a point representing one of the possible landscape types in $Y=\{Coast, Desert, Forest, Glacier, Mountain\}$

$x$ : is a point representing the generated image.

#### **Example**

Consider the latent space $Z$ to be **bidimensional** and the representation of
$Y$ to be the following:

| Landscape | y         |
|-----------|-----------|
| Coast     | 0         |
| Desert    | 1         |
| Forest    | 2         |
| Glacier   | 3         |
| Mountain  | 4         |

Then, $D$ should yield a behavior similar to the following:

$D((0,0), 0) =$

![Coast Image](https://drive.google.com/uc?export=view&id=1wLylelAmZ7DtqevFF0A-77nBYj-SbRs2)

$D((0,0), 1) =$

![Desert Image](https://drive.google.com/uc?export=view&id=151PmJtSqxIZ_VT_ghZc15SxFp216zc4l)

### Evaluation

In order to evaluate the quality of the trained **Landscape Generator**,
two main metrics will be considered:
- **Loss**: which is a weighted sum of:

    - **Reconstruction Error**: the error committed by the **CVAE** when
      attempting to reconstruct the training, validation and test images.

      A low **Reconstruction Error** tells that the model has learned to
      correctly encode images into the latent space and to correctly decode
      points from the latent space into images;

    - **Regularization Error**: the error committed by the **CVAE** when
      encoding the training, validation and test images into an irregular
      latent space.

      A low **Regularization Error** tells that the model has learned to
      correctly encode images into a regular latent space, that is both
      **complete** (i.e. most latent space points can be decoded into a
      meaningful generated image) and **continuous** (i.e. similar points
      in the latent space are decoded into similar generated images).

- **Generation Quality**: an indicator of how simple it is to identify the
  images generated by the **Landscape Generator** as the correct landscape
  type.

  The **Generation Quality** will be measured as the **accuracy** achieved
  by a **Landscape Classifier** trained on the same dataset when applied to a
  batch of images generated by the **Landscape Generator**.

  An high **accuracy** of such classifier tells that the model has learned
  a correct representation for all the possible landscape types.
  
  However, the **accuracy** won't measure the variance in the possible landscapes generated by the **Landscape Generator** (e.g. the generator may produce images of forests that are all classified as forests, but they all look the same).
  
  In other words, another quality metric should be introduced to consider also the variance in the generated images as a metric of **Generation Quality**.

---

## Notebook Configuration

In this section, the user can configure the major settings for executing this
notebook.

In [None]:
# If 'False', use default values for skipping user interactions
# with the notebook when possible
is_interactive = False

# If 'True', use verbose logger for tensorflow or other dependencies
verbose = True

# The random seed to make this notebook deterministic. Set to 'None' to make it non-deterministic.
seed = 1234

---

## Environment Configuration

In this section, the environment of the host machine will be configured, installing and importing the dependencies required to execute this notebook.

### Install Dependencies

Let's install the dependencies required to executed this notebook on the host
machine. To do so, it is possible to use a combination of the **apt-get** and
**pip** package managers.

In [None]:
!apt-get update
!apt-get -y install graphviz
!pip install h5py matplotlib opendatasets pandas tensorflow-datasets pydot pyyaml

### Show Dependencies

In order to list the dependencies installed in the host machine, it is possible
to run the `pip freeze` command. This may prove useful to verify version compatibilities between several dependencies.

In [None]:
!pip freeze

### Import Dependencies

Once the dependencies are installed in the host machine, it is possible to
leverage them in this notebook by importing them.

In particular, **Keras (Tensorflow)** will be used as the machine learning framework
for training the **Landscape Generator**.

In [None]:
from datetime import datetime
from matplotlib import pyplot as plt
import opendatasets as od
import os
import pandas as pd
import random
import shutil
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as tfhub
import tensorflow.keras as keras
import tensorflow.keras.backend as K
import tensorflow.keras.layers as layers
import time

### Configure Dependencies

Finally, the imported dependencies can be configured for this specific notebook.

In [None]:
# If verbose, use tensorflow in DEBUG mode
if verbose:
    os.environ['TF_CPP_MIN_VLOG_LEVEL'] = '2'
    tf.keras.utils.enable_interactive_logging()

In [None]:
# If a random seed is given, use it globally
if seed:
    random.seed(seed)
    tf.random.set_seed(seed)

### GPU Support

Another important step during the preparation of the environment is to verify
that the host machine is capable of executing code using its GPU, which is
crucial during training.

First, let's display some information about the GPU of the host machine.

In [None]:
print("Host GPU Support:")
!nvidia-smi

# For more details use:
# !nvidia-smi -q

Then, let's verify that **Tensorflow** is properly configured to run on such
GPU.

If it is, the GPU of the host machine should be shown amongst the available
GPUs detected by **Tensorflow**.

In [None]:
print("Tensorflow GPU Support:")
print("- Physical GPUs Available:", tf.config.list_physical_devices('GPU'))
print("- Logical GPUs Available:", tf.config.list_logical_devices('GPU'))
print("- GPU Used:", tf.test.gpu_device_name())
print("- GPU Support:", tf.test.is_built_with_gpu_support())
print("- Cuda:", tf.test.is_built_with_cuda())
print("- ROCm:", tf.test.is_built_with_rocm())
print("- XLA", tf.test.is_built_with_xla())

### Prepare Directories

Here, all the directories that will be used during the execution
of the notebook are created.

In particular, the following directories will be created:
- **Dataset**: the directory where the training, validation and test sets
  will be stored;
- **Cache**: the directory where **Tensorflow** will store intermediate
  processing of the dataset to achieve faster perfomance during training;
- **Images**: the directory where **Tensorflow** will store the images
  of the computational graphs of the trained models;
- **Model**: the directory where **Tensorflow** will save the models
  trained during the execution of this notebook.

The paths to these directories can be changed in the code below, by
modifying the `directories` dictionary.

In [None]:
directories = {
    "dataset": "./.dataset/",
    "cache": "./.cache/",
    "images": "./.images/",
    "model": "./.model/",
}

If the dataset is already present, the notebook will avoid re-downloading it
from Kaggle.

In [None]:
dataset_directory_already_exist = os.path.isdir(directories['dataset'])

If the dataset has changed since the previous run of this notebook,
the user should delete its cache, to recompute its records.

In [None]:
if os.path.isdir(directories['cache']):
    delete_cache = input("Delete cache? (Y/n)") if is_interactive else ""
    if not(delete_cache and delete_cache[0].lower() == 'n'):
        print("Deleting cache...")
        shutil.rmtree(directories['cache'])
        print("Cache deleted.")

Finally, let's create the directories of the host environment.

In [None]:
print("Creating directories...")
for name in directories:
    try:
        os.mkdir(directories[name])
        print(directories[name], "directory created.")
    except FileExistsError:
        print(directories[name], "directory already exists.")

### Download Dataset

Finally, as the last step of the environment configuration, let's download the
dataset on which the model will be trained.

First, we can define a utility function for moving and renaming files from
the download directory to the dataset directory.

In [None]:
def index_files(path, moveTo = None, i = 0):
    """Renames all files in the specified directory to an increasing index.

    Description:
    Renames all files inside the 'path' directory to an increasing index,
    starting from 'i', eventually moving them to the 'moveTo' directory.
    """
    if moveTo is None: moveTo = path
    for filename in os.listdir(path):
        name, extension = os.path.splitext(filename)
        os.rename(path + "/" + filename, moveTo + "/" + str(i) + extension)
        i += 1

Then, it is possible to leverage the [opendatasets](https://pypi.org/project/opendatasets/) library to download the [Landscape Recognition Dataset](https://www.kaggle.com/datasets/utkarshsaxenadn/landscape-recognition-image-dataset-12k-images) from [Kaggle](https://www.kaggle.com/). In order to do so, a **Kaggle Account** is required. In fact, **opendatasets** asks the user to input its **Kaggle Username** and **API Token** when downloading Kaggle datasets (refer to [Kaggle Authentication](https://www.kaggle.com/docs/api#getting-started-installation-&-authentication) for more information).

When successfully authenticated, **opendatasets** will create a download
directory and it will start downloading the dataset. The download directory will contain all the files of the dataset hosted on
Kaggle, which includes different formats for representing the data of the dataset, such as [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord)s or JPEGs, together with the best classifier trained on the dataset,
namely [BiT-LR-91-83.h5](https://www.kaggle.com/datasets/utkarshsaxenadn/landscape-recognition-image-dataset-12k-images).

In this notebook, we'll use **TFRecords**, so all **TFRecords** are moved from the download directory to the dataset directory, while **BiT-LR-91-83.h5** is
moved in the current directory and renamed to **landscape-classifier.h5**. Finally, the download directory is deleted.

If the dataset directory was already present when running the notebook, it is assumed that the dataset has already been downloaded and the download is skipped.

In [None]:
download_directory = "./landscape-recognition-image-dataset-12k-images"
download_dataset_directory = download_directory + "/Landscape Classification/Landscape Classification/TFrecords/"

if not dataset_directory_already_exist:
  print("Downloading dataset...")
  od.download("https://www.kaggle.com/datasets/utkarshsaxenadn/landscape-recognition-image-dataset-12k-images")
  index_files(path = download_dataset_directory + "Train", moveTo = directories['dataset'], i = 0)
  index_files(path = download_dataset_directory + "Valid", moveTo = directories['dataset'], i = 10000)
  index_files(path = download_dataset_directory + "Test", moveTo = directories['dataset'], i = 11500)
  os.rename(download_directory + "/BiT-LR-91-83.h5", "./landscape-classifier.h5")
  shutil.rmtree(download_directory)
else:
  print("Skipping dataset download: dataset directory already exists.")

---

## Utilities

In this section, a set of utilities that will be used within the notebook is defined.

### Boolean Input

Here, a function to ask the user for a boolean input is provided.

In [None]:
def boolean_input(question):
    """Ask the specified yes/no 'question' to the user."""
    if not is_interactive: print(question)
    answer = input(question + ' (y/N)\n') if is_interactive else ""
    boolean_answer = answer and answer[0].lower() == 'y'
    if boolean_answer:
        print("Answer: [Yes]")
    else:
        print("Answer: [No]")
    return boolean_answer

### Count Records

Here, a function that counts the records of a **tf.data.Dataset**, in case its size it's unknown.

In [None]:
def count_records(dataset):
    """Count the number of records contained by the specified 'dataset'."""
    dataset_size = 0
    for record in dataset: dataset_size = dataset_size + 1
    return dataset_size

### Log

Here, a logging function to customize the logs within this notebook is provided.

In [None]:
def log(*args):
    """Print the specified sequence of objects."""
    prefix = ["[" + str(datetime.utcnow()) + "]:"]
    print(*(prefix + [*args]))

### Plot History

Here, a function to plot the history of a training session is provided.

In [None]:
def plot_history(history, metrics=[], skip=0):
    """
    Plot the loss history of a training session.
    'metrics' can be used to specify additional metrics to plot.
    'skip' can be used to specify how many epochs will be skipped
    from the start of the training in the plot.
    """
    fig, ax1 = plt.subplots(figsize=(10, 8))

    loss = history.history['loss'][skip:]
    epoch_count=len(loss)
    epochs=range(1+skip,1+skip+epoch_count)

    line1,=ax1.plot(epochs,loss,label='loss',color='orange')
    ax1.set_xlim([1, epochs[-1]])
    ax1.set_ylim([0, max(loss)])
    ax1.set_ylabel('loss',color = line1.get_color())
    ax1.tick_params(axis='y', labelcolor=line1.get_color())
    ax1.grid(True)
    ax1.set_xlabel('Epochs')

    for metric_name in metrics:
        metric=history.history[metric_name][skip:]
        line2,=ax1.plot(epochs,metric,label=metric_name)
        ax1.set_ylim([0, max(ax1.get_ylim()[1], max(metric))])

    _=ax1.legend(loc='best')

---

## Preprocessing

In this section, the dataset will be prepared for training using different
preprocessing techniques.

### Configuration

In the code below, the user can configure how the **preprocessing** should be
performed.

In [None]:
cropped_image_size = [256, 256, 3]   # The size of the dataset images after cropping
resized_image_size = [128, 128, 3]   # The size of the dataset images after resizing
image_size = resized_image_size      # The size of the dataset images after preprocessing

### Load Dataset

First, an in-memory view of the dataset should be created. In order to do so, we'll rely on
[tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)s.

In particular, the [tf.data.TFRecordDataset](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset) can be used to reference **TFRecord**s and to prepare a processing pipeline for the dataset, without loading
its actual content in memory until training.

Here, a **TFRecordDataset** is created from the list of **TFRecord** files that
were previously moved in the dataset directory.

First, let's collect the dataset files in a list.

In [None]:
dataset_files = list(map(lambda f: os.path.join(directories["dataset"], f), os.listdir(directories["dataset"])))
dataset_size = len(dataset_files)
log("Dataset Size:", dataset_size)

Then, let's shuffle the dataset files once to ensure that the following code
won't rely on the order with which the data is presented.

> _**Note**: since shuffling a **tf.data.Dataset** requires loading a buffer of the data of the dataset, a proper full shuffle of the dataset can only occur when the whole dataset can be loaded in memory. At this point of the notebook, the dataset consists only in a list of filenames, so a full shuffle can be easily performed._

In [None]:
random.shuffle(dataset_files)

Finally, let's create a **TFRecordDataset** from the dataset files.

In [None]:
dataset = tf.data.TFRecordDataset(dataset_files)

### Parse Dataset

At this point, each record of the dataset is represented as the raw content of the corresponding **TFRecord**, which is akin to a binary format.

Before the dataset can be used, its records should be parsed into a format that is suitable for training.

The **TFRecord**s of the [Landscape Recognition Dataset](https://www.kaggle.com/datasets/utkarshsaxenadn/landscape-recognition-image-dataset-12k-images)
contains two **features** per record:
- **image**: the string encoding of a 256x256x3 JPEG image;
- **label**: an integer indicating the type of landscape represented in the
  image (0: Coast, 1: Desert, 2: Forest, 3: Glacier, 4: Mountain).

In [None]:
# The features of a TFRecord
features = {
    'label': tf.io.FixedLenFeature([], tf.int64),
    'image': tf.io.FixedLenFeature([], tf.string),
}

# A map from label_ids to label_names (and viceversa)
label_names = { 0: 'Coast', 1: 'Desert', 2: 'Forest', 3: 'Glacier', 4: 'Mountain' }
label_ids = { name: id for id, name in label_names.items() }
label_count = len(label_names)

def label_id_of(label):
    """Return the id of the specified 'label', which is either its name or id."""
    return label_ids[label] if type(label) == str else label

With `tf.io.parse_single_example` it is possible to convert each **TFRecord**
into a dictionary of its parsed content.

Here, `preprocess_record` does exactly that, producing for each **TFRecord** a
pair of **(Object, Label)** (i.e. **(x, y)**), which is standard when dealing with **tf.data.Datasets** in **Keras**.

In [None]:
def preprocess_record(tfrecord):
    """Decode the specified raw TFRecord into a pair of (object, label)"""
    xyi = tf.io.parse_single_example(tfrecord, features)
    return (xyi['image'], xyi['label'])

In [None]:
dataset = dataset.map(preprocess_record, num_parallel_calls=tf.data.AUTOTUNE)

### Filter Dataset

For additional flexibility, it is also possible to exclude certain landscape types from the dataset, in order to analyze the performances of the model when the variance of the records in the dataset is reduced.

First, let's define some filters that allows to extract only images representing specific landscape types from the dataset.

In [None]:
def has_label(label):
    """Filter out all the records without the specified 'label'."""
    label_id = label_id_of(label)
    def apply(xi, yi): return tf.argmax(yi) == label_id if tf.rank(yi) == 1 else yi == label_id
    return apply

def has_not_label(label):
    """Filter out all the records with the specified 'label'."""
    has_label_apply = has_label(label)
    def apply(xi, yi): return not has_label_apply(xi, yi)
    return apply

Then, the user can decide which landscape types will be excluded from the
original dataset.

In [None]:
excluded_labels = []
for label_name in label_names.copy().values():
    if boolean_input("Exclude " + label_name + "s from the dataset?"):
        dataset = dataset.filter(has_not_label(label_name))
        excluded_labels.append(label_name)

If any landscape type has been excluded, then we need to recompute the
size of the dataset accordingly. Since, the landscape types in the dataset are equally balanced, this is as easy as multiplying its original size by the percentage of included landscape types.

In [None]:
if excluded_labels:
    included_label_count = label_count - len(excluded_labels)
    dataset_size = int(dataset_size * included_label_count / label_count)

In [None]:
log("Filtered Dataset Size:", dataset_size)

Also, the information known about the labels of the dataset should be updated.
In particular, the excluded label names must be removed from the known labels,
while the label identifiers must be updated so that they progressively increase
starting from 0.

> _**Note**: such constraints on the label identifiers are only needed for
**one-hot encoding**._

In [None]:
# Save original label names and identifiers
original_label_names = label_names.copy()
original_label_ids = label_ids.copy()

if excluded_labels:
    # Update label names and identifiers
    i = 0
    label_names.clear()
    label_ids.clear()
    for label_name in original_label_ids.keys():
        if label_name not in excluded_labels:
            label_names[i] = label_name
            label_ids[label_name] = i
            i = i + 1
    label_count = included_label_count

In [None]:
log("Original Dataset Labels: ", original_label_ids)
log("Filtered Dataset Labels: ", label_ids)

Finally, the trasformation on the label identifiers should also be reflected on the dataset. In order to do so, we can apply a discrete mapping function from
the old identifiers contained in the dataset to the new ones.

_**Note**: a discrete function can be modelled as a dictionary whose keys are
elements of the **Domain** and whose values are elements of the **Codomain**._

In [None]:
def discrete_map_labels(partial_discrete_map_fn, default=-1):
    """
    Update the specified label using the specified 'discrete_map_fn', which is a dictionary from old
    labels ids to new label ids. If no matching old label id is found for an id, then the id is
    mapped to 'default'.
    """
    discrete_map_fn_as_list = []
    for old_id in range(0, max(list(partial_discrete_map_fn.keys()))+1):
        discrete_map_fn_as_list.append(partial_discrete_map_fn.get(old_id, default))
    tf_discrete_map_fn = tf.constant(discrete_map_fn_as_list, dtype=tf.int64)
    def apply(xi, yi): return (xi, tf.gather(tf_discrete_map_fn, yi))
    return apply

Here, we define the discrete mapping function from old to new label identifiers.

In [None]:
old_id_to_new_id = {original_label_ids[label_name]: label_ids[label_name] for label_name in label_ids}

In [None]:
print(old_id_to_new_id)

Then, such mapping function is applied to the labels of the dataset.

In [None]:
if excluded_labels:
    dataset = dataset.map(discrete_map_labels(old_id_to_new_id), num_parallel_calls=tf.data.AUTOTUNE)

### Preprocess Dataset

After filtering the dataset, it is finally possible to apply several preprocessing techniques to its data.

First, let's decode the string encoding of the **image** feature into a JPEG,
while cropping the image to ensure that all images will have the same size.

In [None]:
def decode_and_crop_image(crop_window, channels):
    """Decode the specified string image to JPEG and crop it with the specified 'crop_window'."""
    def apply(xi, yi):
        return (tf.image.decode_and_crop_jpeg(xi, crop_window=crop_window, channels=channels), yi)
    return apply

Then, let's resize the image to a size that is suitable for training.

In [None]:
def resize_image(new_size):
    """Resize the specified image to the specified 'size'."""
    def apply(xi, yi):
        return (tf.image.resize(xi, size=new_size, method=tf.image.ResizeMethod.BILINEAR), yi)
    return apply

After that, let's normalize the RGB values of the images from the range
$[0..255]$ to the range $[0..1]$, as a good practice for increased
training stability.

In [None]:
def normalize_image(xi, yi):
    """Normalize the specified image from the space [0..255]^N to the space [0..1]^N"""
    return (tf.cast(xi, tf.float32) / 255, yi)

Finally, let's represent the label of the images as **one-hot vectors** to forbid the model from inferring inexistent correlations between **scalar** labels.

> **Example** \\
If $0$ means **Coast**, $1$ means **Desert** and $5$ means **Mountain**, the model could infer that coasts are more similar to deserts than mountains just because $|0-1| < |0-5|$. Instead, one-hot encoding ensures that all labels are equidistant between each other.

In addition, a function is provided to map a probability distribution on labels to the name of the most probable label and its probability.

In [None]:
def label_to_one_hot_encoding(xi, yi):
    """Encode the specified label using one-hot-encoding"""
    return (xi, tf.one_hot(yi, label_count, dtype=tf.float32))

def one_hot_encoding_to_label_name(yi, attach_probability = True):
    """Decode a probability distribution on labels to the human-friendly name of the most probable label."""
    label_index = tf.argmax(yi).numpy()
    label_name = label_names[label_index]
    return "{} ({:.2f}%)".format(label_name, yi[label_index].numpy()*100) if attach_probability else label_name

Here, all the aforementioned preprocessing techniques are applied on the dataset.

In [None]:
dataset = dataset.map(decode_and_crop_image(crop_window=[0,0,cropped_image_size[0],cropped_image_size[1]], channels=cropped_image_size[2]), num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.map(resize_image(new_size=resized_image_size[0:2]), num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.map(normalize_image, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.map(label_to_one_hot_encoding, num_parallel_calls=tf.data.AUTOTUNE)

### Visualize Dataset

After preprocessing the dataset, we can finally take a look at the data that
will be fed to the model during training.

First, let's analyze the shape of the records of the dataset.

In [None]:
for (x0, y0) in dataset.take(1):
    log("Object Shape:", x0.shape)
    log("Label Shape:", y0.shape)

Then, let's look at the images that will be observed by the model during training.

In [None]:
def show_dataset_sample(dataset, n_rows=1, n_cols=1, image_header=True, show_label_probability=True, figsize=None):
    """Display a grid of a number of samples from 'dataset' equals to 'n_rows*n_cols'."""
    if figsize == None: figsize = (15, 3*n_rows)
    xys = iter(dataset)
    fig, axs = plt.subplots(n_rows, n_cols, squeeze=False, figsize = figsize)
    for row in range(0, n_rows):
        for col in range(0, n_cols):
            (xi, yi) = next(xys)
            axs[row, col].imshow(xi)
            if image_header: axs[row, col].title.set_text(one_hot_encoding_to_label_name(yi, show_label_probability))
            axs[row, col].axis('off')
    plt.show()

In [None]:
show_dataset_sample(dataset, n_rows=5, n_cols=5, image_header=True, show_label_probability=True)

We can observe that the images present an **high variance**, even between the same landscape types.

Moreover, some images contain a lot of **noise**, usually the subject that is situated in the landscape portraited in the image (e.g. people, buildings, picture frames...); while others contain **mixed landscapes** (e.g. mixes of mountainous landscapes with forestal landscapes...).

Finally, some images are completely misleading on purpose, since the dataset was
prepared for a classification challenge (e.g. an indoor image of a water bottle branded 'Glacier', labeled as a **Glacier**...).

---

## Split Dataset

In this section, the dataset will be split into three datasets for training, validation and testing of the model.

### Configuration

In the code below, the user can change the relative sizes of the **training**, **validation** and **test sets**.

In [None]:
relative_training_set_size = 0.80
relative_validation_set_size = 0.10
relative_test_set_size = 0.10

### Homogenize Dataset Distribution

Before splitting the dataset into training, validation and
test sets, it is necessary to make the distribution of data
in the dataset homogeneous.

Otherwise, when splitting the dataset, it may happen that
the classes of the dataset are contained in disproportionate amounts in the training set (e.g. in extreme cases, the training set may contain no images related to deserts, because all the deserts are contained in the validation and test sets...).

First, let's extract all the images related to each landscape type into separate datasets.

In [None]:
landscape_datasets = []
for label_name in label_ids:
    landscape_datasets.append(dataset.filter(has_label(label_name)))

Then, let's recreate the dataset so that its data is
generated by alternating the data from the each of the previous landscape datasets (e.g. first it picks a record from the coast dataset, then from the desert dataset, then from the forest dataset and so on...).

> _**Note**: this homogeneous distribution is only used for splitting the data into training, validation and test sets. During training, the data will be shuffled to prevent such distribution from influencing the training._

In [None]:
dataset = tf.data.Dataset.choose_from_datasets(
    landscape_datasets,
    tf.data.Dataset.range(len(landscape_datasets)).repeat()
)

Now, if we take a look at the dataset, we can observe that each landscape type repeats after a fixed amount of images.

In [None]:
show_dataset_sample(dataset, n_rows=5, n_cols=label_count, image_header=True, show_label_probability=True)

### Training, Validation & Test Sets

With the landscape types uniformly distributed within the dataset, it is possible to partition the dataset into training, validation and test sets simply by taking and skipping records of the dataset.

First, let's compute the absolute sizes of the training, validation and test sets.

In [None]:
training_set_size = int(relative_training_set_size * dataset_size)
validation_set_size = int(relative_validation_set_size * dataset_size)
test_set_size = int(relative_test_set_size * dataset_size)

In [None]:
log("Training Set Size:", training_set_size)
log("Validation Set Size:", validation_set_size)
log("Test Set Size:", test_set_size)

Then, let's partition the dataset.

In [None]:
training_set = dataset.take(training_set_size)
validation_set = dataset.skip(training_set_size).take(validation_set_size)
test_set = dataset.skip(training_set_size + validation_set_size).take(test_set_size)

---

## Data Augmentation

In this section, a set of augmentation techniques will be provided, so that they
may optionally be applied to the training set before training, in an attempt to improve the model performance.

### Greyscale Augmentation

The **Grayscale Augmentation** consists in extending the training set with the
grayscale version of each one of its images.

The purpose of such augmentation would be to reduce the amount of focus put by
the model on the colors of the landscapes during training, so that it may learn
also other features characterizing each landscape type.


In [None]:
def to_grayscale(xi, yi):
    """Map an RGB image to grayscale without reducing the number of channels."""
    return (tf.repeat(tf.reduce_mean(xi, axis=2, keepdims=True), repeats=3, axis=2), yi)

In [None]:
grayscale_training_set = training_set.map(to_grayscale, num_parallel_calls=tf.data.AUTOTUNE)

In [None]:
show_dataset_sample(grayscale_training_set, n_rows=5, n_cols=5, show_label_probability=True)

### Cut-Out Augmentation

The **Cut-Out Augmentation** consists in extending the training set with new images
obtained by removing random portions of the original images, adjusting the
corresponding labels proportionally.

The purpose of such augmentation would be to increase the robustness of the
model, by reducing its perception of the images, therefore forcing it to focus
on different areas of the same image.

The provided implementation of **Cut-Out** allows to generate for each image
of the training set a set of augmented images obtained by sliding a **mask** of a
specified size by the specified horizontal and vertical **strides**.

In [None]:
def cut_out(mask_shape, strides):
    """
    Generate new images by applying cut-out to the specified image.

    The images are generated by applying a mask with the specified
    'mask_shape' in all possible positions using the specified 'strides'.
    """
    cut_out_percentage = tf.cast(mask_shape[0] * mask_shape[1] / (image_size[0] * image_size[1]), dtype=tf.float32)
    base_mask = tf.fill(value=0.0, dims=(mask_shape[0], mask_shape[1], image_size[2]))

    mask_max_padding = (image_size[0] - mask_shape[0], image_size[1] - mask_shape[1])
    n_strides = (int(mask_max_padding[0] / strides[0]), int(mask_max_padding[1] / strides[1]))
    masks = []
    for stride_x in range(0, n_strides[0]+1):
        left = stride_x*strides[0]
        right = mask_max_padding[0] - left
        for stride_y in range(0, n_strides[0]+1):
            top = stride_y*strides[1]
            bottom = mask_max_padding[1] - top
            masks.append(tf.pad(base_mask, paddings=[[left,right],[top,bottom],[0,0]], constant_values=1))
    masks = tf.stack(masks)

    def apply(xi, yi):
        new_images = tf.data.Dataset.from_tensor_slices(tf.unstack(tf.repeat([xi], repeats=masks.shape[0], axis=0) * masks))
        new_labels = tf.data.Dataset.from_tensor_slices(tf.repeat([yi*(1-cut_out_percentage)], repeats=masks.shape[0], axis=0))
        return tf.data.Dataset.zip(new_images, new_labels)
    return apply

In the code below, it is possible to configure the **Cut-Out Augmentation** by
specifying the size of the mask as `mask_shape` and the `strides` of its
application.

In [None]:
mask_relative_size = 0.333
mask_shape = (int(image_size[0]*mask_relative_size), int(image_size[1]*mask_relative_size))
strides = mask_shape

In [None]:
cut_out_training_set = training_set.flat_map(cut_out(mask_shape=mask_shape, strides=strides))

In [None]:
show_dataset_sample(cut_out_training_set, n_rows=6, n_cols=3)

### Augment Training Set

Here, the training set may be augmented by applying the aforementioned augmentation
techniques at the user's discretion.

In [None]:
augmentations = []

use_grayscale_augmentation = boolean_input('Use grayscale augmentation?')
if use_grayscale_augmentation: augmentations.append(grayscale_training_set)

use_cut_out_augmentation = boolean_input('Use cut-out augmentation?')
if use_cut_out_augmentation: augmentations.append(cut_out_training_set)

In [None]:
if augmentations:
    log("Using augmented training set.")
    training_set = tf.data.Dataset.sample_from_datasets([training_set] + augmentations)
    training_set_size = count_records(training_set)
    log("Augmented training set size:", training_set_size)
else:
    log("Using original training set.")

Let's take a look at a sample of the training set in its final state.

In [None]:
show_dataset_sample(training_set, n_rows=5, n_cols=5)

---

## Model Definition

In this section, a description and definition of the model used for
implementing the **Landscape Generator** will be provided.

### Configuration

Below, a list of the **hyperparameters** that the user can set to modify the
**architecture** of the model.


In [None]:
condition_size = label_count    # Size of the condition (possible classes of the input)
code_size = 64                  # Size of the code in the latent space (i.e. #dimensions of the latent space)

cnn_activation = 'relu'         # Activation function of the convolutional layers in the encoder and decoder
latent_activation = None        # Activation function used by the encoder to generate the mean and variance of multi-normal distributions
output_activation = 'sigmoid'   # Activation function used by the decoder to generate the image of a landscape

### Architecture

The implemented model has the architecture illustrated below.

![CVAE](https://drive.google.com/uc?export=view&id=1En-N_ibDOIh5kfoZCgq64L9tJcGJoA8O)

#### Encoding

The model takes in input an RGB image of a landscape $x$ and the type of landscape $y$ represented by the image.

First, the landscape type $y$ is reshaped using an ad-hoc transformation, so that it may be concatenated to the landscape image $x$.

The concatenation $xy$ is then fed to the **Encoder Core** $EC$, whose purpose is to map the landscape image $x$ and its type $y$ into multinormal distributions in the latent space $Z$.

A multinormal distribution can be uniquely identified by two main values: the mean of the distribution $\mu$, which tells the position of the center of the distribution in the latent space, and the covariance matrix $\Sigma$, which tells the shape of the distribution.

Basically, as the **Encoder** $E$ observes more data, it will project different distributions into the latent space $Z$, covering $Z$ as the training goes on.

> _**Note**: to reduce computation time during training, the **Encoder** will
ignore covariances in $\Sigma$. In other words, it will only learn the diagonal of $\Sigma$, namely the **variance** vector $σ^2$._

> _**Note**: to increase the stability during training, the **Encoder** will
learn the **logaritmic variance** $ln(σ^2)$, instead of directly the **variance**  $σ^2$._

#### Decoding

Using the latent space $Z$ generated by the **Encoder**, it is possible to
sample random points $z$ (or **codes**) to train a **Decoder** $D$ for reconstructing the original landscapes from their latent representations. During prediction, the same can be done for generating new landscapes.

By feeding the concatenation $zy$ to the **Decoder**, instead of just the code $z$, the **Decoder** will learn to reconstruct the original landscapes with knowledge about their landscape types. This will let the user specify what
type of landscape he wants to generate when using the **Decoder** during
prediction.

In particular, the concatenation $zy$ is processed by the **Decoder Core** $DC$, whose purpose is to map the code $z$ and the landscape type $y$ into the corresponding landscape image $x'$.

#### Sampling

During training, it is not possible to use standard sampling techniques for
extracting points from the latent space $Z$, as most of them require generating
random numbers, which is a non-differentiable operation, breaking training
algorithms like **Gradient Descent**, which relies on the differentiability of the computation graph of the neural network.

This problem can be solved using the **Reparametrization Trick**, which consists
in treating all non-differentiable operations as inputs of the computation graph of the neural network. In fact, the error does not need to be backpropagated to the inputs of the graph.

In the case of **Sampling**, this can be achieved with the following expansion
of the computation graph:

- From:

  $z \sim N(μ, Σ)$ : Intermediate Node

- To:

  $ϵ \sim N(0^{k}, I_{k})$ : Input Node

  $z = Σ \cdot ϵ + μ $ : Intermediate Node

  where $k$ is the number of dimensions of the latent space $Z$.

> _**Note**: if we consider the **logaritmic variance** $ln(σ^2)$ instead of the **covariance matrix** $Σ$, then $z$ can be sampled using the following formula: \\
$z = e^{\frac{1}{2} \cdot ln(σ^2)} \odot ϵ + μ$_

### Layers

First, let's define some custom layers that will be used for building the model.

#### Multinormal Sampling

The **Multinormal Sampling Layer** extracts random points from a batch of multinormal probability distributions using the **Reparametrization Trick**,
just as described before.

In [None]:
@keras.saving.register_keras_serializable()
class MultinormalSamplingLayer(keras.layers.Layer):
    """
    A layer that takes in input the `mean` and `log_variance` of a batch of
    multinormal distributions and produces as output a batch of random points
    extracted from such distributions.
    """
    @staticmethod
    def sample(mean, log_variance):
        batch_size = K.shape(mean)[0]
        dim = K.int_shape(mean)[1]
        epsilon = K.random_normal(shape=(batch_size, dim), mean=0., stddev=1.0)
        z = K.exp(0.5 * log_variance) * epsilon + mean
        return z

    def call(self, inputs):
        return MultinormalSamplingLayer.sample(inputs[0], inputs[1])

Let's use it to extract some random points from a batch of **standard** multinormal distributions.

In [None]:
sample_count = 20000
std_mean = tf.repeat([tf.constant([0.0, 0.0])], repeats=sample_count, axis=0)
std_log_variance = tf.repeat([tf.constant([0.0, 0.0])], repeats=sample_count, axis=0)
sample_points = MultinormalSamplingLayer()([std_mean, std_log_variance])

In [None]:
fig, axes = plt.figure(figsize=(5,5)), plt.axes()
axes.set_aspect("equal")
_ = plt.plot(sample_points[:, 0], sample_points[:, 1], ',', color="orange", zorder=0)
_ = plt.plot([0], [0], '.', color="black", zorder=0)
_ = plt.axvline(color="black", linewidth=1)
_ = plt.axhline(color="black", linewidth=1)

From the graph, we can see that most of the sample points are positioned inside a circular cluster centered in the origin $(0,0)$ as expected.

### Encoder

Let's start building the **CVAE** from the **Encoder**.

First, let's define the **Encoder Core**, whose architecture is a simple **CNN**.

> **Note**: the architecture should be designed to compress the original images
while retaining most of their information. A simple way to do that is to apply a small compression, extracting a large number of features from the images.

In [None]:
def build_encoder_core(activation):
    return keras.Sequential(
        name="Encoder_Core",
        layers=[
            layers.Conv2D(name="EC_Convolution1", filters=16, kernel_size=(3, 3), strides=1, padding='same', activation=activation),
            layers.Conv2D(name="EC_Convolution2", filters=32, kernel_size=(3, 3), strides=1, padding='same', activation=activation),
            layers.Conv2D(name="EC_Convolution3", filters=64, kernel_size=(3, 3), strides=2, padding='same', activation=activation),
            layers.Conv2D(name="EC_Convolution4", filters=128, kernel_size=(3, 3), strides=1, padding='same', activation=activation),
            layers.Conv2D(name="EC_Convolution5", filters=256, kernel_size=(3, 3), strides=1, padding='same', activation=activation),
            layers.Flatten(name="OriginalImage_Features"),
        ]
    )

Then, let's define the **Encoder**, which contains the **Encoder Core**.

In [None]:
def build_cvae_encoder(image_shape, condition_size, code_size, activation, latent_activation):
    original_image = layers.Input(name="OriginalImage", shape=image_shape)
    condition = layers.Input(name="Condition", shape=condition_size)

    condition_image = layers.Dense(name="ImageCondition_Flatten", units=image_shape[0]*image_shape[1], activation=activation)(condition)
    condition_image = layers.Reshape(name="ImageCondition", target_shape=(image_shape[0:2] + [1]))(condition_image)

    x = layers.Concatenate(name="E_Input", axis=3)([original_image, condition_image])
    x = build_encoder_core(activation)(x)

    mean = layers.Dense(name='Mean', units=code_size, activation=latent_activation)(x)
    log_variance = layers.Dense(name='LogVariance', units=code_size, activation=latent_activation)(x)

    return keras.Model(name="CVAE_Encoder", inputs=[original_image, condition], outputs=[mean, log_variance])

Here, we create an instance of the **Encoder**.

In [None]:
cvae_encoder = build_cvae_encoder(
    image_shape=image_size,
    condition_size=condition_size,
    code_size=code_size,
    activation=cnn_activation,
    latent_activation=latent_activation
)

Let's take a look at the shape of the features extracted from the original images by the **Encoder**.

In [None]:
feature_shape = cvae_encoder.get_layer("Encoder_Core").get_layer("OriginalImage_Features").input.shape[1:]
log("Feature shape:", feature_shape)
log("Feature count:", feature_shape.num_elements())

Then, let's take a look at its description.

In [None]:
cvae_encoder.summary()

Finally, let's take a look at its **computation graph**.

> _**Note**: there may be some visual bugs in the graph (known issue of
`plot_model` when used in composite models with multiple inputs and/or outputs)._

In [None]:
keras.utils.plot_model(cvae_encoder, show_shapes=True, show_layer_names=True, expand_nested=True, to_file=directories["images"]+"cvae_encoder.png")

### Decoder

Once the **Encoder** has been defined, the only missing piece of the **CVAE**
is the **Decoder**.

First, let's define the **Decoder Core**, which is the reverse architecture of the **Encoder Core**.

In [None]:
def build_decoder_core(feature_shape, activation, output_activation):
    return keras.Sequential(
        name="Decoder_Core",
        layers=[
            layers.Reshape(name="DC_Reshape", target_shape=feature_shape),
            layers.Conv2DTranspose(name="DC_TranConvolution1", filters=128, kernel_size=(2, 2), strides=1, padding='same', activation=activation),
            layers.Conv2DTranspose(name="DC_TranConvolution2", filters=64, kernel_size=(2, 2), strides=2, padding='same', activation=activation),
            layers.Conv2DTranspose(name="DC_TranConvolution3", filters=32, kernel_size=(2, 2), strides=1, padding='same', activation=activation),
            layers.Conv2DTranspose(name="DC_TranConvolution4", filters=16, kernel_size=(2, 2), strides=1, padding='same', activation=activation),
            layers.Conv2D(name="GeneratedImage", filters=3, kernel_size=(2, 2), strides=1, padding='same', activation=output_activation),
        ]
    )

Then, let's define the **Decoder**, which contains the **Decoder Core**.

In [None]:
def build_cvae_decoder(feature_shape, code_size, condition_size, activation, output_activation):
    code_input = layers.Input(name="Code", shape=code_size)
    condition_input = layers.Input(name="Condition", shape=condition_size)

    x = layers.Concatenate(name='D_Input')([code_input, condition_input])
    x = layers.Dense(name='GeneratedImage_Features', units=feature_shape.num_elements(), activation=activation)(x)
    x = build_decoder_core(feature_shape, activation, output_activation)(x)

    return keras.Model(name='CVAE_Decoder', inputs=[code_input, condition_input], outputs=x)

Here, we create an instance of the **Decoder**.

In [None]:
cvae_decoder = build_cvae_decoder(
    feature_shape=feature_shape,
    code_size=code_size,
    condition_size=condition_size,
    activation=cnn_activation,
    output_activation=output_activation
)

Let's take a look at its description.

In [None]:
cvae_decoder.summary()

Finally, let's take a look at its **computation graph**.

In [None]:
keras.utils.plot_model(cvae_decoder, show_shapes=True, show_layer_names=True, expand_nested=True, to_file=directories["images"]+"cvae_decoder.png")

### CVAE

Finally, let's define the **CVAE (Conditional Variational AutoEncoder)** that will contain both the **Encoder** and the **Decoder**.

In [None]:
class CVAE:
    """A Conditional Variational AutoEncoder made of the specified 'encoder' and 'decoder'."""
    def __init__(self, encoder, decoder):
        self.encoder = encoder
        self.decoder = decoder

        self.input = self.encoder.input
        self.condition_size = self.input[1].shape[1]
        self.mean = self.encoder.output[0]
        self.log_variance = self.encoder.output[1]
        self.code_size = self.mean.shape[1]
        self.sampling = MultinormalSamplingLayer(name='Sampling')([self.mean, self.log_variance])
        self.output = self.decoder([self.sampling, self.encoder.input[1]])

        self.model = keras.Model(name='CVAE', inputs=self.input, outputs=self.output)

    def generate_single_data(self, code, data_type):
        """Generate a new image corresponding to the specified 'code' and 'data_type'."""
        code_batch = tf.reshape(code, shape=(1, self.code_size))
        data_type_batch = tf.reshape(data_type, shape=(1, self.condition_size))
        return self.generate_batch_data(code_batch, data_type_batch)[0]

    def generate_batch_data(self, code_batch, data_type_batch):
        """Generate some new images corresponding to the specified batch of codes and data types."""
        return self.decoder([code_batch, data_type_batch], training=False)

Here, an instance of the **CVAE** is created.

In [None]:
cvae = CVAE(cvae_encoder, cvae_decoder)

Let's take a look at a description of the model.

In [None]:
cvae.model.summary()

Also, let's look at the **computation graph** of the model.

In [None]:
keras.utils.plot_model(cvae.model, show_shapes=True, show_layer_names=True, expand_nested=True, to_file=directories["images"]+"cvae.png")

Finally, the user can decide to load the weights learned from previous runs of this notebook into the model.

In [None]:
load_previous_weights=boolean_input('Load previous weights?')
if load_previous_weights:
    cvae.model.set_weights(keras.models.load_model(directories["model"]+'cvae.keras', safe_mode=False).get_weights())
    log("Using pre-trained model.")
else:
    log("Using new model.")

---

## Training

In this section, the training of the previously defined model will be configured
and performed.

### Configuration

Below, a list of the **hyperparameters** that the user can set to affect the
**training** of the model.

In [None]:
shuffle_buffer_size = 1024                                          # How much shuffling is applied to the data during training (at the cost of memory)
batch_size = 16                                                     # How many samples are observed by the model before updating its weights
epoch_count = 1000                                                  # How many times the training set will be fully observed by the model
patience = 200                                                      # How many epochs of non-improvement before the training is stopped
recon_coefficient = 30                                              # How much reconstruction of the original images weights on the loss of the model
kl_coefficient = 1                                                  # How much regularization of the latent space weights on the loss of the model
learning_rate = 0.001                                               # How much the model will learn from new observations at each training iteration
def optimizer(): return tf.keras.optimizers.Adam(learning_rate)     # How the weights of the model are updated during training

### Input Preparation

Before training the model, it is necessary to prepare the dataset appropriately
for the training session.

In particular, when using **tf.data.Dataset**s as inputs, the [fit](https://keras.io/api/models/model_training_apis/#fit-method) method of **Keras**
for training a model expects a **tf.data.Dataset** whose elements are mini-batches containing pairs of **(Input, TargetOutput)**.

For training the **CVAE**, we'll define a custom loss, so we can avoid specifying the **TargetOutput** in the `fit` method. In other words, each mini-batch of the training set will actually contain singleton tuples of **(Input, )**.

Finally, the inputs of a **CVAE** are both the **Input** and the **TargetOutput**, so each mini-batch of the training set will actually contain singleton tuples of **((Input, TargetOutput), )**.

This transformation is performed by `to_cvae_input` as shown below.

In [None]:
def to_cvae_input(xi, yi): return ((xi, yi),)

Below, the training set is mapped to an input suitable for training a **CVAE**,
then it is **cached**, **shuffled** and **batched** using the configuration specified previously.

Caching the training set will allow for faster training. In fact, it will store the results of previous computations of the training set pipeline in a file, fetching those results instead of recomputing the pipeline for each object of the training set each time it is observed. In this case, it will avoid recomputing the preprocessing and augmentation pipelines at the start of every epoch.

Instead, the `shuffle` and `batch` pipelines are added after caching, so they
will be recomputed at the start of every epoch.

> _**Note**: `prefetch` is just a **Tensorflow** optimization on the generation
of the objects of a dataset._

In [None]:
def prepare_training_set(training_set, prepare_fn, cache_name):
    training_set = training_set.map(prepare_fn, num_parallel_calls=tf.data.AUTOTUNE)
    training_set = training_set.cache(filename = directories["cache"] + cache_name)
    training_set = training_set.shuffle(buffer_size = shuffle_buffer_size)
    training_set = training_set.batch(batch_size = batch_size)
    training_set = training_set.prefetch(tf.data.AUTOTUNE)
    return training_set

In [None]:
cvae_training_set = prepare_training_set(training_set, prepare_fn=to_cvae_input, cache_name="cvae_training_set")

Similar transformations must also be applied to the validation and test sets.

In [None]:
def prepare_validation_set(validation_set, prepare_fn, cache_name):
    validation_set = validation_set.map(prepare_fn, num_parallel_calls=tf.data.AUTOTUNE)
    validation_set = validation_set.batch(batch_size = batch_size)
    validation_set = validation_set.cache(filename = directories["cache"] + cache_name)
    validation_set = validation_set.prefetch(tf.data.AUTOTUNE)
    return validation_set

In [None]:
# The preparation of the test set is the same as the preparation of the validation set
prepare_test_set = prepare_validation_set

In [None]:
cvae_validation_set = prepare_validation_set(validation_set, prepare_fn=to_cvae_input, cache_name="cvae_validation_set")
cvae_test_set = prepare_test_set(test_set, prepare_fn=to_cvae_input, cache_name="cvae_test_set")

### Loss Function

In order to train a model, it is necessary to define a **loss function**, which
tells what behavior should be learned by the model.

As discussed in the introduction, the **loss function** for this model will
consider the **Reconstruction Error** and the **Regularization Error**.

#### Reconstruction Error

The **Reconstruction Error** $E_{recon}$ is evaluated as the **loss** between the original images $x$ and the generated images $x'$.

Such **loss** can be evaluated as the **cost** between the original images and
the generated images multiplied by the number of images observed by the model
every epoch.

In this case, the **cost** chosen for the **Reconstruction Error** is the
**Mean Squared Error (MSE)** and the corresponding **loss** is the **L2 Loss**.
In particular, the **Reconstruction Error** is evaluated as follows:

$
E_{recon}
= MSE(x, x') \cdot |T|
= \frac{\sum_{i=1}^{|T|} (x_{i} - x_{i}')^2}{|T|} \cdot |T|
= \sum_{i=1}^{|T|} (x_{i} - x_{i}')^2
= L_2(x, x')
$

where

$T$ : is the training set

$x_i$ : is the $i^{th}$ image of the training set

$x_i'$ : is the image generated by the model when receiving $x_i$ as input

The purpose of the **Reconstruction Error** is to train the model to generate
images that are faithful with respect to the original images of the training set.

In [None]:
def reconstruction_error(input, output, training_set_size):
    input_flatten = tf.reshape(input, shape=(-1, tf.size(input)))
    output_flatten = tf.reshape(output, shape=(-1, tf.size(output)))
    return keras.losses.mean_squared_error(input_flatten, output_flatten) * training_set_size

In [None]:
cvae_recon_error = reconstruction_error(cvae.model.input[0], cvae.model.output, training_set_size)

#### Regularization Error

The **Regularization Error** $E_{regul}$ is evaluated as the **KL Divergence**
between the latent space distributions generated by the **Encoder** and the standard multinormal distribution. In fact, the **KL Divergence** is a measure of the difference between two probability distributions, meaning that the model
will be trained to produce distributions in the latent space that are as close
as possible to a standard multinormal distribution.

In particular, the **Regularization Error** is evaluated as follows:

\begin{align}
E_{regul}
&= λ * KL[N(μ,Σ) || N(0^{k}, I_{k})] = \\
&= λ * \frac{1}{2}[(0^k-μ)^T \cdot I_{k}^{-1} \cdot (0^k-μ) + tr(I_{k}^{-1} \cdot Σ) + ln\frac{det(I_{k})}{det(Σ)} - k] = \\
&= λ * \frac{1}{2}[(μ)^Tμ + tr(Σ) - ln(det(Σ)) - k]
\end{align}

where

$λ$ : is the **KL Coefficient** (hyperparameter).

$k$ : is the number of dimensions of the latent space.

> _**Note**: if we consider the **logaritmic variance** $ln(σ^2)$ instead of the **covariance matrix** $Σ$, then $E_{regul}$ can be computed using the following formula: \\
$E_{regul} = λ \cdot \frac{1}{2} \sum_{i=1}^{k} [μ_i^2 + e^{ln(σ_i^2)} - ln(σ_i^2) - 1]$_

The purpose of the **Regularization Error** is to train the **Encoder** to
generate a **regular** latent space, that is both **complete** and **continuous**, so that the latent space could be safely sampled for generating
new data.



In [None]:
def regularization_error(mean, log_variance, kl_coefficient):
    return kl_coefficient * 0.5 * (K.sum(K.square(mean) + K.exp(log_variance) - log_variance - 1, axis = -1))

In [None]:
cvae_regul_error = regularization_error(cvae.mean, cvae.log_variance, kl_coefficient)

#### Cumulative Loss

Finally, the cumulative loss of the model is the sum of the **Reconstruction Error** and the **Regularization Error**, as defined below.

In [None]:
def cvae_loss(recon_error, regul_error, recon_coefficient): return recon_coefficient * recon_error + regul_error

Once a loss has been defined, it is necessary to tell **Keras** to use that loss
during training, using the method `add_loss` (for losses that do not depend only on
the **Input** and **TargetOutput**, like in this case).

In [None]:
cvae.model.add_loss(cvae_loss(cvae_recon_error, cvae_regul_error, recon_coefficient))

Even if the loss that the **CVAE** should minimize is the sum of the **Reconstruction Error** and **Regularization Error**, it is still useful to
monitor both errors separately as metrics, in order to understand why the model is struggling to achieve better performances.

In [None]:
cvae.model.add_metric(cvae_recon_error, name="reconstruction_error")
cvae.model.add_metric(cvae_regul_error, name="regularization_error")

### Callbacks

In this subsection, we'll defined a set of callbacks to execute during the
training of the model.

#### Image Generation Callback

In order to monitor the images generated by the **CVAE** during the training, here a callback is provided to generate a batch of images at the end of each epoch.

In [None]:
class ImageGenerationCallback(keras.callbacks.Callback):
    """
    A callback that generate an image for each possible combination of the
    specified 'image_types' and 'latent_points' at the end of each epoch,
    using the specified 'cvae'.
    If a number of 'random_latent_points' greater than zero is specified,
    the callback will also generate the images corresponding to those
    points (randomly generated at the end of each epoch) for each of the
    'image_types'.
    """
    def __init__(self,
        cvae,
        image_types = tf.one_hot(tf.constant(list(label_names.keys()), shape=(label_count, )), label_count),
        latent_points = [tf.zeros(shape=(code_size,))],
        random_latent_points = 0,
        generateWhen = lambda epoch: True
    ):
        keras.callbacks.Callback.__init__(self)
        self.cvae = cvae
        self.image_types = image_types
        self.latent_points = latent_points
        self.generateWhen = generateWhen
        self.random_latent_point_count = random_latent_points
        self.sample_dataset = self.__combinations(self.image_types, self.latent_points)

    def on_epoch_end(self, epoch, logs=None):
        actualEpoch = epoch + 1
        if self.generateWhen(actualEpoch):
            print()
            log("Generating fixed images for epoch", actualEpoch, "...")
            generated_image_dataset = self.sample_dataset.map(lambda x,y: (self.cvae.generate_single_data(x, y), y))
            show_dataset_sample(
                dataset=generated_image_dataset,
                n_rows=len(self.latent_points),
                n_cols=len(self.image_types),
                show_label_probability=False,
            )

            if self.random_latent_point_count > 0 :
                log("Generating random images for epoch", actualEpoch, "...")
                random_latent_points = tf.random.normal(shape=(self.random_latent_point_count, self.cvae.code_size))
                random_sample_dataset = self.__combinations(self.image_types, random_latent_points)
                generated_image_dataset = random_sample_dataset.map(lambda x,y: (self.cvae.generate_single_data(x, y), y))
                show_dataset_sample(
                    dataset=generated_image_dataset,
                    n_rows=len(random_latent_points),
                    n_cols=len(self.image_types),
                    show_label_probability=False,
                )

    def __combinations(self, image_types, latent_points):
        """Return a dataset containing all the possible combinations between the specified 'image_types' and 'latent_points'"""
        samples_x, samples_y = [], []
        for latent_point in latent_points:
            for image_type in image_types:
                samples_x.append(latent_point)
                samples_y.append(image_type)
        return tf.data.Dataset.from_tensor_slices((samples_x, samples_y))

### Compilation

The last step before training is to compile the model, choosing the type of
training algorithm that will be used for training the model.

In [None]:
cvae.model.compile(optimizer=optimizer())

### Execution

Finally, let's train the model using the prepared training and validation sets.

During training, the following callbacks will be used:
- [Early Stopping](https://keras.io/api/callbacks/early_stopping/): stops the
training session after `patience` epochs of non-improvement of a monitored metric.
In this case, it will monitor the loss evaluated on the validation set.
- [Model Checkpoint](https://keras.io/api/callbacks/model_checkpoint/): saves
the current state of the model at the end of each epoch during the training session,
in order to avoid losing progress after an eventual crash of the system.
- **Image Generation Callback**: generates a batch of images using the model
at the end of each epoch. In particular, the callback has been configured to
generate the images corresponding to the origin of the latent space and some random sample points for each landscape type at the end of each epoch. \\
The origin of the latent space should correspond to the mean of the distribution
in the latent space discovered by the encoder (due to the regularization error). In that case, it represents the landscape representation that is closest to all the landscape representations learned by the model. In other words, its the **most general representation** that the model was able to find **for each landscape type**.

In [None]:
cvae_history = cvae.model.fit(
    cvae_training_set,
    validation_data=cvae_validation_set,
    epochs=epoch_count,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor='loss', patience=patience, restore_best_weights=True),
        keras.callbacks.ModelCheckpoint(monitor='loss', filepath=directories["model"]+'cvae.checkpoint.keras'),
        ImageGenerationCallback(cvae, random_latent_points=4, generateWhen=lambda epoch: epoch%10==1 or epoch==epoch_count),
    ],
)

Let's plot the history of the loss and the other metrics on the training and
validation sets.

In [None]:
history_metrics = [
    'reconstruction_error',
    'regularization_error',
    'val_loss',
    'val_reconstruction_error',
    'val_regularization_error'
]

In [None]:
plot_history(cvae_history, metrics=history_metrics)

By observing the following graph it is possible to infer if the model is
**underfitting**, meaning that it's too simple to perform well on the task, or **overfitting**, meaning that it's learning too much from the training set and
it does not generalize enough to perform well on new data.

For an **AutoEncoder**, **overfitting** would be the desired outcome, even
though being able to reconstruct never-observed images like those of the
validation set would still be good, meaning that the model has learned a
more general representation of the data it is trained to generate.

As the last step of training, let's save the model, so that its weights may be
loaded in the future, for using the model or continuing its training.

In [None]:
cvae.model.save(directories["model"]+'cvae.keras')

### Observations

`[11-22-2023]`

In this revision of the notebook, from the training curve of the **CVAE**, we can observe that there is still a margin for improvement on the training set by increasing the number of training epochs.

Another consideration is that the model is overfitting. In fact, the training and validation losses are diverging.

---

## Evaluation

In this section, the performance of the trained **Landscape Generator** will be measured, using the metrics discussed during the introduction of the notebook.

### Configuration

Below, the user can configure some parameters for the validation.

In [None]:
samples_per_test_set = 128   # The number of samples per landscape type to generate and evaluate (requires gpu memory)

### Landscape

First, let's define a utility class for generating datasets of images representing a specific landscape type using the trained model.

In [None]:
class LandscapeGenerator:
    """A landscape generator relying on the specified 'cvae' for the generation of landscapes."""
    def __init__(self, cvae):
      self.cvae = cvae

    def generate(self, landscape_type, samples = 100):
        """Generate the specified number of 'samples' of the specified 'landscape_type'."""
        label_id = label_id_of(landscape_type)
        latent_points = tf.random.normal(shape=(samples, self.cvae.code_size))
        image_types = keras.utils.to_categorical(tf.repeat([label_id], samples), num_classes=self.cvae.condition_size)
        return tf.data.Dataset.from_tensor_slices((self.cvae.generate_batch_data(latent_points, image_types), image_types))

In [None]:
landscape_generator = LandscapeGenerator(cvae)

### Landscape Generation

Then, let's create a test set for each landscape type, each containing generated
images of the corresponding landscape.

In [None]:
coast_test_set = landscape_generator.generate("Coast", samples=samples_per_test_set)
desert_test_set = landscape_generator.generate("Desert", samples=samples_per_test_set)
forest_test_set = landscape_generator.generate("Forest", samples=samples_per_test_set)
glacier_test_set = landscape_generator.generate("Glacier", samples=samples_per_test_set)
mountain_test_set = landscape_generator.generate("Mountain", samples=samples_per_test_set)

### Generated Landscape Visualization

Let's take a look at a sample of generated images for each landscape type.

#### Coasts

In [None]:
show_dataset_sample(coast_test_set, n_cols=5, n_rows=5, image_header=False)

#### Deserts

In [None]:
show_dataset_sample(desert_test_set, n_cols=5, n_rows=5, image_header=False)

#### Forests

In [None]:
show_dataset_sample(forest_test_set, n_cols=5, n_rows=5, image_header=False)

#### Glaciers

In [None]:
show_dataset_sample(glacier_test_set, n_cols=5, n_rows=5, image_header=False)

#### Mountains

In [None]:
show_dataset_sample(mountain_test_set, n_cols=5, n_rows=5, image_header=False)

### Performance Evaluation

Provided the test sets of generated images, it is possible to evaluate the
performance of the model by using the **Loss** and the **Generation Quality** as metrics, just as
discussed in the introduction.

#### Landscape Classifier

In order to evaluate the **Generation Quality** of the landscape images generated by
the model, we'll load the best classifier trained on the original dataset
from Kaggle, already downloaded during the environment configuration.

In [None]:
landscape_classifier = keras.models.load_model('./landscape-classifier.h5', custom_objects={'KerasLayer':tfhub.KerasLayer})

Let's take a look at its description.

In [None]:
landscape_classifier.summary()

Then, let's take a look at its computation graph.

In [None]:
keras.utils.plot_model(landscape_classifier, show_shapes=True, show_layer_names=True, expand_nested=True, to_file=directories["images"]+"landscape-classifier.png")

From its computation graph,
we can infer that the landscape classifier can process batches of 32 images of
unknown size at a time.

Finally, let's look at the metrics that it will compute during the evaluation
of its performance on a dataset.

In [None]:
log("Landscape Classifier Metrics:", landscape_classifier.metrics_names)

These are the metrics that we can use for evaluating the quality of the images
generated by the model.

#### Landscape Evaluator

Before the evaluation of the model, let's define a utility class for evaluating
the quality of the landscape images in a dataset.

In [None]:
class LandscapeEvaluator:
    """A landscape evaluator that measure the quality of images representing landscapes."""
    def __init__(self, landscape_classifier):
      self.landscape_classifier = landscape_classifier

    def evaluate(self, landscape_dataset):
        """Evaluate the average quality of the images in the specified 'landscape_dataset'."""
        return self.landscape_classifier.evaluate(
            x=landscape_dataset.map(lambda xi, yi: (xi, tf.argmax(yi))).batch(32),
            verbose=1,
            return_dict=True,
        )

In [None]:
landscape_evaluator = LandscapeEvaluator(landscape_classifier)

Let's test it on the original dataset. Supposedly, it should have an **accuracy** of **91.83%** on the test set of the original Kaggle competition,
so it should achieve similar performances on the whole dataset.

In [None]:
evaluator_performance = landscape_evaluator.evaluate(dataset)
log("Loss:", evaluator_performance['loss'])
log("Accuracy:", evaluator_performance['accuracy'])

#### Generation Quality

Below, the **Landscape Evaluator** is used to measure the **Generation Quality** of each type of landscape generated by the **Landscape Generator** separately.

In [None]:
coast_evaluation = landscape_evaluator.evaluate(coast_test_set)
desert_evaluation = landscape_evaluator.evaluate(desert_test_set)
forest_evaluation = landscape_evaluator.evaluate(forest_test_set)
glacier_evaluation = landscape_evaluator.evaluate(glacier_test_set)
mountain_evaluation = landscape_evaluator.evaluate(mountain_test_set)

Then, an average of the **Generation Quality** of all the landscape types is computed.

In [None]:
evaluations = [coast_evaluation, desert_evaluation, forest_evaluation, glacier_evaluation, mountain_evaluation]
avg_evaluation = {
    'loss': sum(map(lambda eval: eval['loss'], evaluations)) / len(evaluations),
    'accuracy': sum(map(lambda eval: eval['accuracy'], evaluations)) / len(evaluations)
}

#### Reconstruction & Regularization

Here, the **Loss** on the training, validation and test sets is computed, as a
metric measuring both the **Reconstruction Error** and the **Regularization Error**.

In [None]:
training_evaluation = cvae.model.evaluate(cvae_training_set, return_dict=True)
validation_evaluation = cvae.model.evaluate(cvae_validation_set, return_dict=True)
test_evaluation = cvae.model.evaluate(cvae_test_set, return_dict=True)

In order to get a better understanding of the reconstruction capabilities of
the model, let's try to reconstruct some of the original images.

> _**Note**: each original image will have its reconstructed version to its
right_.

In [None]:
((original_images, original_labels),) = next(iter(cvae_training_set))
reconstructed_images = cvae.model.predict([((original_images, original_labels),)])

original_images_dataset = tf.data.Dataset.zip(tf.data.Dataset.from_tensor_slices(original_images), tf.data.Dataset.from_tensor_slices(original_labels))
reconstructed_images_dataset = tf.data.Dataset.zip(tf.data.Dataset.from_tensor_slices(reconstructed_images), tf.data.Dataset.from_tensor_slices(original_labels))
original_vs_reconstructed_dataset = tf.data.Dataset.choose_from_datasets([original_images_dataset, reconstructed_images_dataset], tf.data.Dataset.range(2).repeat())

In [None]:
show_dataset_sample(original_vs_reconstructed_dataset, n_rows = 8, n_cols = 4, show_label_probability=False)

#### Performance Summary

Finally, a summary of the configuration of the model and the achieved performances is provided.

In [None]:
def log_landscape_evaluation(landscape_type, landscape_evaluation):
    """Print the 'accuracy' and 'loss' obtained on the specified 'landscape_type'"""
    log('{:10s} | accuracy: {:10.2f}% | loss: {:10.3f}'.format(landscape_type, landscape_evaluation['accuracy']*100, landscape_evaluation['loss']))

In [None]:
log('Configuration')
log('- Preprocessing:')
log('  - Dataset Size:', dataset_size)
log('  - Dataset Labels:', label_ids)
log('  - Image Size:', next(iter(dataset))[0].shape)
log('  - Label Size:', next(iter(dataset))[1].shape)
log('- Augmentations:')
log('  - Grayscale:', "Yes" if use_grayscale_augmentation else "No")
log('  - Cut Out:', "Yes" if use_cut_out_augmentation else "No")
log('- Architecture:')
log('  - Condition Size:', condition_size)
log('  - Code Size:', code_size)
log('  - CNN Activation:', cnn_activation)
log('  - Latent Activation:', latent_activation if latent_activation else "None")
log('  - Output Activation:', output_activation)
log('- Training:')
log('  - Training Set Size:', training_set_size)
log('  - Validation Size:', validation_set_size)
log('  - Test Size:', test_set_size)
log('  - Shuffle Size:', shuffle_buffer_size)
log('  - Batch Size:', batch_size)
log('  - Max Epochs:', epoch_count)
log('  - Patience:', patience)
log('  - KL Coefficient:', kl_coefficient)
log('  - Optimizer:', optimizer())
log('  - Learning Rate:', learning_rate)
log('  - Pre-trained:', "Yes" if load_previous_weights else "No")

print()

log('Reconstruction & Regularization')
log('Training loss:   {:.6f}'.format(training_evaluation['loss']))
log('- Reconstruction Error:   {:.6f}'.format(training_evaluation['reconstruction_error']))
log('- Regularization Error:   {:.6f}'.format(training_evaluation['regularization_error']))
log('Validation loss:   {:.6f}'.format(validation_evaluation['loss']))
log('- Reconstruction Error:   {:.6f}'.format(validation_evaluation['reconstruction_error']))
log('- Regularization Error:   {:.6f}'.format(validation_evaluation['regularization_error']))
log('Test loss:   {:.6f}'.format(test_evaluation['loss']))
log('- Reconstruction Error:   {:.6f}'.format(test_evaluation['reconstruction_error']))
log('- Regularization Error:   {:.6f}'.format(test_evaluation['regularization_error']))

print()

log('Generation Quality')
log_landscape_evaluation('Coast', coast_evaluation)
log_landscape_evaluation('Desert', desert_evaluation)
log_landscape_evaluation('Forest', forest_evaluation)
log_landscape_evaluation('Glacier', glacier_evaluation)
log_landscape_evaluation('Mountain', mountain_evaluation)
log('-'*53)
log_landscape_evaluation('Average', avg_evaluation)

### Observations

`[11-22-2023]`

In this revision of the notebook, we can observe that the **CVAE** is not able
to fully reconstruct the details of the original images with a latent space of 200 dimensions after 100 epochs.

In the performance summary, we can see the effects of overfitting in the loss
achieved on the training, validation and test sets. Also, the **Generation
Quality** is pretty low overall, probably because the generated images do not
contain many details and they often contain a mix of elements from different landscape types.

---

## Further Analysis

In this section, an analysis of the results of the trained **Landscape Generator** will be performed, in an attempt to identify the causes
of its possible underperforming.

### Undercomplete AutoEncoder

In order to determine if the architecture could be capable of reconstructing
the images of the training set, we'll train an **Undercomplete AutoEncoder**
with a similar architecture.

In particular, if the **AutoEncoder** is capable of reconstructing the images
of the training set, then the **CVAE** should also be able to do so, meaning
that the design of the **CVAE** is correct, but there is a problem in its implementation. Otherwise, there is a problem in the design of the **CVAE** (e.g. the model may be too small for the task...).

Basically, we are debugging the **CVAE** by training parts of its architecture as standalone components, in order to understand which of them are not behaving as expected, if any.

#### Architecture

The architecture of the **AutoEncoder** is a simplified version of the
architecture of the **CVAE**.

In particular, an **AutoEncoder** does not require additional knowledge about the processed input. In other words, it is no longer required to pass the landscape type $y$ as input to the model.
Moreover, the inputs are mapped into the latent space directly, without
intermediate probability distributions.

As a consequence, the **Encoder Core** and the **Decoder Core**, that were already introduced in the architecture of the **CVAE**, can be used
directly as the **Encoder** and **Decoder** of the **AutoEncoder**.

Such architecture is shown in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=16u5at1lFxPsnyQ5M2ohZgbI33JwKIETu" width="60%"/>
</div>

First, the **Encoder Core** process a landscape image $x$, producing
its corresponding code $z$.

Then, the code $z$ is fed to the **Decoder Core**, which tries to produce a reconstruction $x'$ of the original image $x$ from the limited information contained in $z$.

#### Model Definition

In order to build the **AutoEncoder**, let's start by defining the **Encoder** as the **Encoder Core**.

In [None]:
def build_ae_encoder(image_shape, activation):
    input = layers.Input(name="OriginalImage", shape=image_shape)
    output = build_encoder_core(activation)(input)
    return keras.Model(name = "AE_Encoder", inputs=input, outputs=output)

In [None]:
ae_encoder = build_ae_encoder(
    image_shape=image_size,
    activation=cnn_activation
)

Then, let's define the **Decoder** as the **Decoder Core**.

In [None]:
def build_ae_decoder(feature_shape, activation, output_activation):
    return build_decoder_core(feature_shape, activation, output_activation)

In [None]:
ae_decoder = build_ae_decoder(
    feature_shape=ae_encoder.get_layer("Encoder_Core").get_layer("OriginalImage_Features").input.shape[1:],
    activation=cnn_activation,
    output_activation=output_activation
)

Finally, let's define the **AutoEncoder** itself, which contains both the **Encoder** and the **Decoder**.

In [None]:
ae = keras.Model(name='AE', inputs=ae_encoder.input, outputs=ae_decoder(ae_encoder.output))

Let's take a look at a description of the model.

In [None]:
ae.summary()

Also, let's look at its **computation graph**.

In [None]:
keras.utils.plot_model(ae, show_shapes=True, show_layer_names=True, expand_nested=True, to_file=directories["images"]+"ae.png")

#### Input Preparation

Before training the model, it is necessary to prepare the dataset appropriately
for the training session.

For training the **AutoEncoder**, we'll be using a **built-in** loss, so the training set should contain batches of tuples of (**Input**, **TargetOutput**).
Since the **TargetOutput** is the **Input** itself for an **AutoEncoder**, the
training set will actually contain batches of tuples of (**Input**, **Input**).

In [None]:
def to_ae_input(xi, yi): return (xi, xi)

Then, the training, validation and test sets can be prepared similarly with respect to those
used for the **CVAE**.

In [None]:
ae_training_set = prepare_training_set(training_set, prepare_fn=to_ae_input, cache_name="ae_training_set")
ae_validation_set = prepare_validation_set(validation_set, prepare_fn=to_ae_input, cache_name="ae_validation_set")
ae_test_set = prepare_test_set(test_set, prepare_fn=to_ae_input, cache_name="ae_test_set")

#### Loss Function

The loss of an **AutoEncoder** is simply equal to the **Reconstruction Error**.
However, since the **Reconstruction Error** is the only component of the loss,
it does not require to be weighted by the number of samples in the training set.
Therefore, we can use the built-in **MSE** cost function directly.

In [None]:
ae_loss = "mse"

#### Image Reconstruction Callback

In order to monitor the reconstruction of the images produced by the **AutoEncoder** during the training, here a callback is provided to
reconstruct a batch of images at the end of each epoch.

In [None]:
class ImageReconstructionCallback(keras.callbacks.Callback):
    """
    A callback that reconstructs the specified 'dataset' of images
    at the end of each epoch, displaying them in a grid of 'n_rows'
    and 'n_cols'.
    """
    def __init__(self, ae, dataset, n_rows, n_cols, generateWhen = lambda epoch: True):
        keras.callbacks.Callback.__init__(self)
        self.ae = ae
        self.dataset = dataset
        self.n_rows = n_rows
        self.n_cols = n_cols
        self.generateWhen = generateWhen

    def on_epoch_end(self, epoch, logs=None):
        actual_epoch = epoch + 1
        if self.generateWhen(actual_epoch):
            print()
            log("Reconstructing images for epoch", actual_epoch, "...")
            reconstructed_dataset = self.dataset.map(lambda x, y: (self.ae(tf.expand_dims(x, axis=0), training=False)[0], y))
            original_vs_reconstructed_dataset = tf.data.Dataset.choose_from_datasets([self.dataset, reconstructed_dataset], tf.data.Dataset.range(2).repeat())
            show_dataset_sample(
                dataset=original_vs_reconstructed_dataset,
                n_rows=self.n_rows,
                n_cols=self.n_cols,
                show_label_probability=False,
            )

#### Model Compilation

The last step before training is to compile the model, choosing the type of
training algorithm and loss that will be used for training the model.

In [None]:
ae.compile(loss=ae_loss, optimizer=optimizer())

#### Training

Finally, let's train the model using the prepared training and validation sets.

In [None]:
ae_history = ae.fit(
    ae_training_set,
    validation_data=ae_validation_set,
    epochs=epoch_count,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor='loss', patience=patience, restore_best_weights=True),
        keras.callbacks.ModelCheckpoint(monitor='loss', filepath=directories["model"]+'ae.checkpoint.keras'),
        ImageReconstructionCallback(ae, training_set.take(16), 4, 4, generateWhen=lambda epoch: epoch%10==1 or epoch==epoch_count),
    ],
)

Let's plot the loss on the training and validation sets.

In [None]:
plot_history(ae_history, metrics=['val_loss'])

As the last step of training, let's save the model, so that its weights may be
loaded in the future, for using the model or continuing its training.

In [None]:
ae.save(directories["model"]+'ae.keras')

#### Observations

`[11-22-2023]`

In this revision of the notebook, we can see that the **Encoder Core** and the **Decoder Core** are complex enough to reconstruct the original images flawlessly in the latest epochs.

Knowing that, we can infer that the problem with the **CVAE** should be either
in its inputs or in the latent space (e.g. the dimension of the latent space...).

### Variational AutoEncoder

The previous analysis can be deepened further by training a **Variational
AutoEncoder** similar to the **CVAE**.

#### Architecture

The architecture of the **Variational AutoEncoder** is the same as the
architecture of the **CVAE**, except that it is not required to pass the landscape type $y$ as input to the model.

Such architecture is shown in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1ObHbDOn_6lpHgECZflQsNuQTpc9cCAeh" width="80%"/>
</div>

#### Model Definition

In order to build the **Variational AutoEncoder**, let's start by defining the **Encoder** as the **Encoder Core**.

Contrary to an **AutoEncoder**, the **Encoder** of a **Variational AutoEncoder**
produces the means and the variances of multinormal distributions projected in the latest space, just like the **CVAE**.

In [None]:
def build_vae_encoder(image_shape, condition_size, code_size, activation, latent_activation):
    input = layers.Input(name="OriginalImage", shape=image_shape)
    x = build_encoder_core(activation)(input)

    mean = layers.Dense(name='Mean', units=code_size, activation=latent_activation)(x)
    log_variance = layers.Dense(name='LogVariance', units=code_size, activation=latent_activation)(x)

    return keras.Model(name="VAE_Encoder", inputs=input, outputs=[mean, log_variance])

In [None]:
vae_encoder = build_vae_encoder(
    image_shape=image_size,
    condition_size=condition_size,
    code_size=code_size,
    activation=cnn_activation,
    latent_activation=latent_activation
)

Then, let's define the **Decoder** as the **Decoder Core**.

In [None]:
def build_vae_decoder(feature_shape, activation, output_activation):
    input = layers.Input(name="Code", shape=code_size)
    x = layers.Dense(name='GeneratedImage_Features', units=feature_shape.num_elements())(input)
    output = build_decoder_core(feature_shape, activation, output_activation)(x)
    return keras.Model(name='VAE_Decoder', inputs=input, outputs=output)

In [None]:
vae_decoder = build_vae_decoder(
    feature_shape=vae_encoder.get_layer("Encoder_Core").get_layer("OriginalImage_Features").input.shape[1:],
    activation=cnn_activation,
    output_activation=output_activation
)

Finally, let's define the **Variational AutoEncoder** itself, which contains both the **Encoder** and the **Decoder**.

In [None]:
class VAE:
    def __init__(self, encoder, decoder, name="VAE"):
        self.encoder = encoder
        self.decoder = decoder

        self.input = self.encoder.input
        self.mean = self.encoder.output[0]
        self.log_variance = self.encoder.output[1]
        self.code_size = self.mean.shape[1]
        self.sampling = MultinormalSamplingLayer(name='Sampling')([self.mean, self.log_variance])
        self.output = self.decoder(self.sampling)

        self.model = keras.Model(name=name, inputs=self.input, outputs=self.output)

    def generate_single_data(self, code):
        """Generate a new image corresponding to the specified 'code'."""
        code_batch = tf.reshape(code, shape=(1, self.code_size))
        return self.generate_batch_data(code_batch)[0]

    def generate_batch_data(self, code_batch):
        """Generate some new images corresponding to the specified batch of codes."""
        return self.decoder([code_batch], training=False)

In [None]:
vae = VAE(vae_encoder, vae_decoder)

Let's take a look at a description of the model.

In [None]:
vae.model.summary()

Also, let's look at its **computation graph**.

In [None]:
keras.utils.plot_model(vae.model, show_shapes=True, show_layer_names=True, expand_nested=True, to_file=directories["images"]+"vae.png")

#### Input Preparation

Before training the model, it is necessary to prepare the dataset appropriately
for the training session.

For training the **Variational AutoEncoder**, we'll be using the same loss as the **CVAE**, so the training set should contain batches of singleton tuples of (**Input**, ). However, contrary to the **CVAE**, the **Input** does not include
the landscape type $y$.

In [None]:
def to_vae_input(xi, yi): return (xi,)

Then, the training, validation and test sets can be prepared similarly to those
used for training the **CVAE**.

In [None]:
vae_training_set = prepare_training_set(training_set, prepare_fn=to_vae_input, cache_name="vae_training_set")
vae_validation_set = prepare_validation_set(validation_set, prepare_fn=to_vae_input, cache_name="vae_validation_set")
vae_test_set = prepare_test_set(test_set, prepare_fn=to_vae_input, cache_name="vae_test_set")

#### Loss Function

The loss of a **Variational AutoEncoder** is the same loss of the **CVAE**,
that is the sum of the **Reconstruction Error** and the **Regularization Error**.

Let's start by defining these separate losses.

In [None]:
vae_regul_error = regularization_error(vae.mean, vae.log_variance, kl_coefficient)
vae_recon_error = reconstruction_error(vae.model.input, vae.model.output, training_set_size)

Then, let's configure the model to train using the same loss and metrics of the
**CVAE**.

In [None]:
vae.model.add_loss(cvae_loss(vae_recon_error, vae_regul_error, recon_coefficient))
vae.model.add_metric(vae_recon_error, name="reconstruction_error")
vae.model.add_metric(vae_regul_error, name="regularization_error")

#### Model Compilation

The last step before training is to compile the model, choosing the type of
training algorithm that will be used for training the model.

In [None]:
vae.model.compile(optimizer=optimizer())

#### Training

Finally, let's train the model using the prepared training and validation sets.

In [None]:
vae_history = vae.model.fit(
    vae_training_set,
    validation_data=vae_validation_set,
    epochs=epoch_count,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor='loss', patience=patience, restore_best_weights=True),
        keras.callbacks.ModelCheckpoint(monitor='loss', filepath=directories["model"]+'vae.checkpoint.keras'),
        ImageReconstructionCallback(vae.model, training_set.take(16), 4, 4, generateWhen=lambda epoch: epoch%10==1 or epoch==epoch_count),
    ],
)

Let's plot the loss on the training and validation sets.

In [None]:
plot_history(history=vae_history, metrics=history_metrics, skip=1)

As the last step of training, let's save the model, so that its weights may be
loaded in the future, for using the model or continuing its training.

In [None]:
vae.model.save(directories["model"]+'vae.keras')

#### Generation

Contrary to the **AutoEncoder**, the **Variational AutoEncoder** can also be used for generating new landscape images, since it is trained to discover a **regular** latent space during the compression of the original images.

In order to do so, it is possible to apply its **Decoder** on random samples
extracted from a standard multinormal distribution (which is the same constraint enforced on the latent space during training).

First, let's use the **Variational AutoEncoder** to generate some images of landscapes.

> _**Note**: in a **Variational AutoEncoder**, all the landscape types are mixed within the same latent space, so when an image is generated from a random point in the latent space, it is not possible to know its landscape
type beforehand. In a **CVAE** instead, a latent space is produced for each landscape type, letting the user decide the landscape type of the generated images._

In [None]:
latent_points_count = 25
latent_points = tf.random.normal(shape=(latent_points_count, vae.code_size))
generated_images = tf.data.Dataset.from_tensor_slices((vae.generate_batch_data(latent_points), None))

Then, let's take a look at the landscapes generated by the **Variational AutoEncoder**.

In [None]:
show_dataset_sample(generated_images, n_rows=5, n_cols=5, image_header=False)

#### Observations

`[11-22-2023]`

In this revision of the notebook, we can see that the **Variational AutoEncoder** is not complex enough to reconstruct the original images as well
as the simpler **AutoEncoder**. That is probably due to a combination of the
dimension of the latent space and the constraints on its regularity, making the problem of reconstructing the original images more difficult to solve.

Knowing that, we can infer that the **CVAE** may suffer from the same problems.

Moreover, the generation capability of the **Variational AutoEncoder** trained on a subset of the dataset is only slightly better with the respect to the **CVAE** trained on the whole dataset.

---

## Results

Let's take a look at the images generated by different configurations of the model trained in this notebook.





### **E100-D2-Tanh-KL1**
These are the images generated after 100 epochs when setting the code size to 2 (i.e. bidimensional latent space), using **tanh** as the latent activation
function and a **KL Coefficient** equals to 1.

**Coasts**

![Generated Coasts](https://drive.google.com/uc?export=view&id=1d1Epqh_SHclJ54jFfBrAipm0UeDpH24r)

**Deserts**

![Generated Deserts](https://drive.google.com/uc?export=view&id=1NgsqCpnekM-lzPkjYn3R_BLKwk1O5z-f)

**Forests**

![Generated Forests](https://drive.google.com/uc?export=view&id=1ALwJzCkUlWSe4utgPuHN5UIX_r1GhnOo)

**Glaciers**

![Generated Glaciers](https://drive.google.com/uc?export=view&id=1YxafLzIoH1mM_fjW4u0jcETLbO0pvCD2)

**Mountains**

![Generated Mountains](https://drive.google.com/uc?export=view&id=13pKtBCvxeYFRFpSYS86m802t3LZ1Ottc)

### **E100-D10-Tanh-KL1**
These are the images generated after 100 epochs when setting the code size to 10, using **tanh** as the latent activation function and a **KL Coefficient** equals to 1.

**Coasts**

![Generated Coasts](https://drive.google.com/uc?export=view&id=1In-y4AsHOpFigWCZBISJyyDq_c4EXHu3)

**Deserts**

![Generated Deserts](https://drive.google.com/uc?export=view&id=1t_m4d-vJGkyHrsTGsfViuVPkA7hQdFSI)

**Forests**

![Generated Forests](https://drive.google.com/uc?export=view&id=1tgcLDxWCFKFKryEjXx9CY_UcM1oRMVMX)

**Glaciers**

![Generated Glaciers](https://drive.google.com/uc?export=view&id=10u_zqAkKoSkm7kL9zgXg9WuaT6yTobKi)

**Mountains**

![Generated Mountains](https://drive.google.com/uc?export=view&id=1z_h21xqEvl8BAJEXxbZJFbLkGwFrAOSF)

### **E100-D100-Tanh-KL1**
These are the images generated after 100 epochs when setting the code size to 100, using **tanh** as the latent activation function and a **KL Coefficient** equals to 1.

**Coasts**

![Generated Coasts](https://drive.google.com/uc?export=view&id=1myald24ny3kpAVv4S9zWLHZ58WYXyD3b)

**Deserts**

![Generated Deserts](https://drive.google.com/uc?export=view&id=10E3-dnCenN_dp5vz6j-kWt7fdzrGWBVe)

**Forests**

![Generated Forests](https://drive.google.com/uc?export=view&id=1YlCwGYcOp7-y8JI_SzAIO7VPMcvj-8E1)

**Glaciers**

![Generated Glaciers](https://drive.google.com/uc?export=view&id=1J4kJmywIosERlyQUiB_agjgPlQ4MPOh-)

**Mountains**

![Generated Mountains](https://drive.google.com/uc?export=view&id=1xOzrPmcNKr1-VL8HzZBV-ZMTqmCjsHI0)

### **E100-D200-None-KL1**
These are the images generated after 100 epochs when setting the code size to 200, using no latent activation function and a **KL Coefficient** equals to 1.
This was the best considering overall performances.

> _**Note**: the latent activation function was set to `None` because it was
constraining the latent space, positioning the mean of the multinormal distributions within a D-dimensional hypercube with length equals to 2, therefore implicitly implying some regularization in the latent space._

**Coasts**

![Generated Coasts](https://drive.google.com/uc?export=view&id=1gvoBPuM0aIwNjWMHVaxg8qb2XtsOc1Y7)

**Deserts**

![Generated Deserts](https://drive.google.com/uc?export=view&id=1t_Ua9vypo699d7k6xdConfU5HJJ3TqEV)

**Forests**

![Generated Forests](https://drive.google.com/uc?export=view&id=1luniZQE4QTyBCSxmIzBr0eGIKgBOJ1yh)

**Glaciers**

![Generated Glaciers](https://drive.google.com/uc?export=view&id=1FiBCRz2YBbVezsgXro-bYb_gy-hvUEko)

**Mountains**

![Generated Mountains](https://drive.google.com/uc?export=view&id=1n83MOIz4bBvrtPUR1JWKvgus6udkDIeb)

### **E100-D200-None-KL0.1**

These are the images generated after 100 epochs when setting the code size to 200, using no latent activation function and a **KL Coefficient** equals to 0.1.
This was the best considering reconstruction performances but also the worst
considering regularization performances.

**Coasts**

![Generated Coasts](https://drive.google.com/uc?export=view&id=1kqb6vCX4iXiWe8j7JHEMwKVFivb9psjQ)

**Deserts**

![Generated Deserts](https://drive.google.com/uc?export=view&id=1vSFEV32sxkE8taoV1yhShIfuskgZNtsm)

**Forests**

![Generated Forests](https://drive.google.com/uc?export=view&id=1tNchtk6cQJeJyNu1GGzHEFFgkmbSH1th)

**Glaciers**

![Generated Glaciers](https://drive.google.com/uc?export=view&id=1tNchtk6cQJeJyNu1GGzHEFFgkmbSH1th)

**Mountains**

![Generated Mountains](https://drive.google.com/uc?export=view&id=1b4TGREO9CF4LV_gfzlunAvKMU4css40e)

### Observations
`[11-22-2023]`

From the images above, we can observe that increasing the size of the latent space allows the model to provide finer details in the generated landscapes,
achieving somewhat **impressionistic representations of landscapes** when the latent space has a hundred dimensions.

Of course this comes at a cost. In fact, increasing the dimensions of the latent space also causes an increase of the model complexity, reaching levels of
complexity that make the model very difficult to train or even create in standard environments.

Another consideration is the possibility of balancing the reconstruction
capability of the model with the regularity of its latent space.

For example, it is possible to reduce the weight of the **Regularization Error** (namely the **KL Coefficient**), giving to the model more flexibility on the generation of the latent space when reconstructing the images.
However, while this would increase the reconstruction capability of the model, it would also decrease the regularity of the latent space as a tradeoff, increasing the chances of generating images unrelated to landscapes (e.g. blank images...).
In other words, this would increase the amount of details in the images that
are correctly generated, but it would also increase the chances of incorrectly
generated images.