<h1><font color="#113D68" size=6>TINTOlib: Converting Tidy Data into Synthetic Images</font></h1>

<h1><font color="#113D68" size=5>Template Regression problem with a Vision Transformer (ViT)</font></h1>

<br><br>
<div style="text-align: right">
<font color="#113D68" size=3>Manuel Castillo-Cara</font><br>
<font color="#113D68" size=3>Raúl García-Castro</font><br>
<font color="#113D68" size=3>Jiayun Liu</font><br>
</div>


---

<div class="alert alert-block alert-info">
    
<i class="fa fa-info-circle" aria-hidden="true"></i>
More information about [Manuel Castillo-Cara](https://www.manuelcastillo.eu/)

<div class="alert alert-block alert-info">

<i class="fa fa-info-circle" aria-hidden="true"></i>
Puedes ver más cursos de Inteligencia Artificial, Machine Learning y Deep Learning con descuento en mi [Página Web Personal](https://www.manuelcastillo.eu/udemy/)

---

<a id="indice"></a>
<h2><font color="#004D7F" size=5>Licencia</font></h2>

<p><small><small>Improving Deep Learning by Exploiting Synthetic Images Copyright 2024 Manuel Castillo Cara.</p>
<p><small><small> Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at </p>
<p><small><small> <a href="https://www.apache.org/licenses/LICENSE-2.0">https://www.apache.org/licenses/LICENSE-2.0</a> </p>
<p><small><small> Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. </p>

---

<a id="indice"></a>
<h2><font color="#004D7F" size=5>Index</font></h2>

* [0. Context](#section0)
* [1. Description](#section1)
    * [1.1. Main Features](#section11)
    * [1.2. Citation](#section12)
    * [1.3. Documentation and License](#section13)
* [2. Libraries](#section2)
    * [2.1. System setup](#section21)
    * [2.2. Invoke the libraries](#section22)
* [3. Data processing](#section3)
    * [3.1. Read the dataset](#section31)
    * [3.2. Create images with TINTOlib](#section32)
    * [3.3. Generate images](#section33)
    * [3.4. Read images](#section34)
    * [3.5. Mix images and tidy data](#section35)
* [4. Pre-modelling phase](#section4)
    * [4.1. Data curation](#section41)
* [5. Modelling with ViT](#section5)
    * [5.1. ViT for TINTOlib images](#section51)
    * [5.2. MLP](#section52)
    * [5.3. Metrics](#section53)
    * [5.4. Compile and fit](#section54)
* [6. Results](#section6)
    * [6.1. Train/Validation representation](#section61)
    * [6.2. Validation/Test evaluation](#section62)

---
<a id="section0"></a>
# <font color="#004D7F" size=6> 0. Context</font>

This tutorial explains how to read images generated by TINTOlib and input them into a Convolutional Neural Network (CNN). Ensure that the images have already been created using TINTOlib. For instructions on how to generate images from tabular data, refer to the TINTOlib documentation on GitHub.

Remember that you can set the training to be done with GPUs to improve performance.

---
<div style="text-align: right"> <font size=5> <a href="#indice"><i class="fa fa-arrow-circle-up" aria-hidden="true" style="color:#004D7F"></i></a></font></div>

---

<div class="alert alert-block alert-info">
    
<i class="fa fa-info-circle" aria-hidden="true"></i>
See the paper from [Information Fusion Journal](https://doi.org/10.1016/j.inffus.2022.10.011)

<div class="alert alert-block alert-info">
    
<i class="fa fa-info-circle" aria-hidden="true"></i>
See the paper from [SoftwareX](https://doi.org/10.1016/j.softx.2023.101391)

---
<div style="text-align: right"> <font size=5> <a href="#indice"><i class="fa fa-arrow-circle-up" aria-hidden="true" style="color:#004D7F"></i></a></font></div>

---

<a id="section1"></a>
# <font color="#004D7F" size=6> 1. Description</font>

The growing interest in the use of algorithms-based machine learning for predictive tasks has generated a large and diverse development of algorithms. However, it is widely known that not all of these algorithms are adapted to efficient solutions in certain tidy data format datasets. For this reason, novel techniques are currently being developed to convert tidy data into images with the aim of using vision models such as Convolutional Neural Network (CNN) or ViT. TINTOlib offers the opportunity to convert tidy data into images through several techniques: TINTO, IGTD, REFINED, SuperTML, BarGraph, DistanceMatrix, Combination, FeatureWrap and BIE.

In this tutorial, we develop a ViT with synthetic images.

<figure><center>
  <img src="Images/Tabular-to-image-ViT.png" width="600" height="350" alt="Gráfica">
  <figcaption><blockquote>ViT architecture with synthetic images.</a></blockquote></figcaption>
</center></figure>

---
<a id="section11"></a>
# <font color="#004D7F" size=5> 1.1. Main Features</font>

- Supports all CSV data in **[Tidy Data](https://www.jstatsoft.org/article/view/v059i10)** format.
- For now, the algorithm converts tabular data for binary and multi-class classification problems into machine learning.
- Input data formats:
    - **Tabular files**: The input data could be in **[CSV](https://en.wikipedia.org/wiki/Comma-separated_values)**, taking into account the **[Tidy Data](https://www.jstatsoft.org/article/view/v059i10)** format.
    - **Dataframe***: The input data could be in **[Pandas Dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)**, taking into account the **[Tidy Data](https://www.jstatsoft.org/article/view/v059i10)** format.
    - **Tidy Data**: The **target** (variable to be predicted) should be set as the last column of the dataset. Therefore, the first columns will be the features.
    - All data must be in numerical form. TINTOlib does not accept data in string or any other non-numeric format.
- Runs on **Linux**, **Windows** and **macOS** systems.
- Compatible with **[Python](https://www.python.org/)** 3.7 or higher.

---
<a id="section12"></a>
# <font color="#004D7F" size=5> 1.2. Citation</font>

**TINTOlib** is an python library that makes **Synthetic Images** from [Tidy Data](https://www.jstatsoft.org/article/view/v059i10) (also knows as **Tabular Data**).

**Citing TINTO**: If you used TINTO in your work, please cite the **[SoftwareX](https://doi.org/10.1016/j.softx.2023.101391)**:

```bib
@article{softwarex_TINTO,
    title = {TINTO: Converting Tidy Data into Image for Classification
            with 2-Dimensional Convolutional Neural Networks},
    journal = {SoftwareX},
    author = {Manuel Castillo-Cara and Reewos Talla-Chumpitaz and
              Raúl García-Castro and Luis Orozco-Barbosa},
    year = {2023},
    pages = {101391},
    issn = {2352-7110},
    doi = {https://doi.org/10.1016/j.softx.2023.101391}
}
```

And use-case developed in **[INFFUS Paper](https://doi.org/10.1016/j.inffus.2022.10.011)**

```bib
@article{inffus_TINTO,
    title = {A novel deep learning approach using blurring image
            techniques for Bluetooth-based indoor localisation},
    journal = {Information Fusion},
    author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and
              Luis Orozco-Barbosa and Raúl García-Castro},
    volume = {91},
    pages = {173-186},
    year = {2023},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}
```

---
<a id="section13"></a>
# <font color="#004D7F" size=5> 1.3. Documentation and License</font>

TINTOlib has a wide range of documentation on both GitHub and PiPY.

Moreover, TINTOlib is free and open software with Apache 2.0 license.

<div class="alert alert-block alert-info">
    
<i class="fa fa-info-circle" aria-hidden="true"></i>
You can see all information about TINTOlib in [GitHub](https://github.com/oeg-upm/TINTOlib)

<div class="alert alert-block alert-info">

<i class="fa fa-info-circle" aria-hidden="true"></i>
You can see all information about TINTOlib documentation in the official [Webpage](https://tintolib.readthedocs.io/en/latest/installation.html)

<div class="alert alert-block alert-info">

<i class="fa fa-info-circle" aria-hidden="true"></i>
You can see all information about TINTOlib documentation in [PyPI](https://tintolib.readthedocs.io/en/latest/installation.html)

<div class="alert alert-block alert-info">

<i class="fa fa-info-circle" aria-hidden="true"></i>
You can see more information and examples in [TINTOlib Crash Course](https://github.com/oeg-upm/TINTOlib-Crash_Course)

---
<div style="text-align: right"> <font size=5> <a href="#indice"><i class="fa fa-arrow-circle-up" aria-hidden="true" style="color:#004D7F"></i></a></font></div>

---

<a id="section2"></a>
# <font color="#004D7F" size=6> 2. Libraries</font>

---
<a id="section21"></a>
# <font color="#004D7F" size=5> 2.1. System setup</font>

Before installing the libraries you must have the `mpi4py` package installed on the native (Linux) system. This link shows how to install it:
- Link: [`mpi4py` in Linux](https://www.geeksforgeeks.org/how-to-install-python3-mpi4py-package-on-linux/)

For example, in Linux:

```
    sudo apt-get install python3
    sudo apt install python3-pip
    sudo apt install python3-mpi4py
```

If you are in Windows, Mac or, also, Linux, you can install from PyPI if you want:
```
    sudo pip3 install mpi4py
```

<div class="alert alert-block alert-info">
    
<i class="fa fa-info-circle" aria-hidden="true"></i>
Note that you must **restart the kernel or the system** so that it can load the libraries.

Now, once you have installed `mpi4py` you can install the PyPI libraries and dependences.

In [None]:
!pip install -U tintolib

In [None]:
!pip install -U torchmetrics pytorch_lightning TINTOlib imblearn keras_preprocessing mpi4py tifffile tqdm seaborn bitstring opencv-python pydot graphviz keras

<div class="alert alert-block alert-info">
    
<i class="fa fa-info-circle" aria-hidden="true"></i>
Note that you must **restart the kernel** so that it can load the libraries.

---
<a id="section22"></a>
# <font color="#004D7F" size=5> 2.2. Invoke the libraries</font>

The first thing we need to do is to declare the libraries

In [None]:
import gc
import os
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import sklearn
import tensorflow as tf
import tifffile as tifi
import keras
from keras.utils import plot_model
from keras import ops
from PIL import Image
from sklearn.metrics import (mean_absolute_error, mean_absolute_percentage_error,
                             root_mean_squared_error, mean_squared_error, r2_score)
from sklearn.model_selection import train_test_split

from imblearn.over_sampling import RandomOverSampler

# TensorFlow and Keras
from keras import layers, models, Model
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from keras.layers import (Activation, BatchNormalization, concatenate,
                                     Conv2D, Dense, Dropout, Flatten, Input,
                                     LayerNormalization, MaxPool2D, MaxPooling2D)
from keras.losses import MeanAbsoluteError, MeanAbsolutePercentageError
from keras.models import Sequential, Model, load_model
from keras.optimizers import Adadelta, Adam, Adamax, SGD

#Models of TINTOlib
from TINTOlib.barGraph import BarGraph
from TINTOlib.combination import Combination
from TINTOlib.distanceMatrix import DistanceMatrix
from TINTOlib.igtd import IGTD
from TINTOlib.refined import REFINED
from TINTOlib.supertml import SuperTML
from TINTOlib.tinto import TINTO
from TINTOlib.featureWrap import FeatureWrap
from TINTOlib.bie import BIE

# SET RANDOM SEED FOR REPRODUCIBILITY
SEED = 64
#torch.manual_seed(SEED)
#torch.cuda.manual_seed(SEED)
#torch.cuda.manual_seed_all(SEED)
#torch.backends.cudnn.deterministic = True
#torch.backends.cudnn.benchmark = False
os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

---
<div style="text-align: right"> <font size=5> <a href="#indice"><i class="fa fa-arrow-circle-up" aria-hidden="true" style="color:#004D7F"></i></a></font></div>

---

<a id="section3"></a>
# <font color="#004D7F" size=6> 3. Data processing</font>

TINTOlib creates a folder structure to store images corresponding to each target in a problem. For regression problems, since there are no distinct classes, all images are stored in a single subfolder named images/. Additionally, a CSV file is generated containing:

- The file paths of all images.
- The target value for each image, which corresponds to a sample from the original dataset.

<a id="section31"></a>
# <font color="#004D7F" size=5> 3.1. Read the dataset</font>

In this part, we proceed to read the dataset.

In [None]:
dataset_name = 'boston'

#Read CSV
df = pd.read_csv(f"Datasets/{dataset_name}.csv")
df.head(2)

In [None]:
df.shape

To determine the appropriate size for a square image that contains all the feature pixels, you need to calculate the square root of the total number of features. The resulting value can be used for the methods that requires inserting the image size.

In [None]:
# Get the shape of the dataframe
num_columns = df.shape[1]

# Calculate number of columns - 1
columns_minus_one = num_columns - 1

# Calculate the square root for image size
import math
image_size = math.ceil(math.sqrt(columns_minus_one))
print(image_size)

---
<a id="section32"></a>
# <font color="#004D7F" size=5> 3.2. Create images with TINTOlib</font>

We prepare the declaration of the classes with the TINTOlib method we want to transform. Note that TINTOlib has several methods and we will have to choose one of them since each method generates different images.

In addition, we establish the paths where the dataset is located and also the folder where the images will be created.

In [None]:
#Select the model and the parameters
#problem_type = "supervised"
problem_type = "regression"

# Transformation methods
#image_model = TINTO(problem=problem_type, blur=True, option='maximum', pixels=20, random_seed=SEED)
# name = f"TINTO_blur_maximum"
# image_model = REFINED(problem=problem_type, random_seed=SEED, zoom=1, n_processors=8)
# name = f"REFINED"
# image_model = IGTD(problem=problem_type, scale=[image_size,image_size], fea_dist_method='Euclidean', image_dist_method='Euclidean', error='abs', max_step=30000, val_step=300, random_seed=SEED)
# name = f"IGTD_fEuclidean_iEuclidean_abs"
# image_model = BarGraph(problem=problem_type, zoom=2)
# name = f"BarGraph_zoom2"
# image_model = DistanceMatrix(problem=problem_type, zoom=2)
# name = f"DistanceMatrix_zoom2"
image_model = Combination(problem=problem_type, zoom=3)
name = f"Combination"
# image_model = SuperTML(problem=problem_type, pixels=pixel, font_size=30, feature_importance=True, random_seed=SEED)
# name = f"SuperTML-VF_FS30"
# image_model = FeatureWrap(problem = problem_type, bins=10)
# name = f"FeatureWrap_bins10"
# image_model = BIE(problem = problem_type)
# name = f"BIE"

#Define the dataset path and the folder where the images will be saved
results_folder = f"Results/{dataset_name}_{name}"
images_folder = f"Synthetic_images/images_{dataset_name}_{name}"

<div class="alert alert-block alert-info">

<i class="fa fa-info-circle" aria-hidden="true"></i>
You can see all information about TINTOlib documentation in [PyPI](https://tintolib.readthedocs.io/en/latest/installation.html)

---
<a id="section33"></a>
# <font color="#004D7F" size=5> 3.3. Generate images</font>

In this section, we generate images from the dataset using three key functions of the image generation model:

- fit: Trains the image generation model without generating images. This function is used exclusively for training purposes.
- fit_transform: Trains the image generation model and simultaneously generates images for the dataset. This function is applied to the training dataset, where the model is both trained and used to create images.
- transform: Generates images using the pre-trained model. After training on the training dataset, this function is used to generate images for unseen data, such as validation and test datasets.

Each row in the dataset is transformed into a unique image, ensuring that the number of generated images matches the number of rows in the dataset. The resulting datasets include paths to these images, which are then combined with the original data for further processing.

Split the data into training, validation, and test sets:

In [None]:
import cv2
from sklearn.preprocessing import MinMaxScaler

X_train, X_val = train_test_split(df, test_size=0.20, random_state=SEED)
X_val, X_test = train_test_split(X_val, test_size=0.50, random_state=SEED)

X_train = X_train.reset_index(drop=True)
X_val = X_val.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)

Define a function to streamline the repetitive process of generating images, updating paths, and combining datasets:

In [None]:
def process_dataset(X, folder_name, generate_function, problem_type):
    """
    Handles dataset processing, including image generation, path updates, 
    and combining the dataset with image paths.

    Parameters:
    ----------
    X : DataFrame
        The dataset to process (training, validation, or test).
    
    folder_name : str
        The name of the folder where generated images will be stored 
        (e.g., 'train', 'val', 'test').
    
    generate_function : function
        The function used for training and generating images. It can be one of the following:
        - `fit`: Trains the model without generating images.
        - `fit_transform`: Trains the model and generates images for the dataset (used for training).
        - `transform`: Uses the pre-trained model to generate images for validation and testing.
    
    problem_type : str
        The type of problem being addressed (e.g., regression, supervised).
        This is used to locate the corresponding `.csv` file containing image paths.

    Returns:
    --------
    X_processed : DataFrame
        The dataset with updated image paths and raw tabular data, ready for further processing.
    
    y_processed : Series
        The labels corresponding to the dataset (target values).
    """
    # Generate the images if the folder does not exist
    folder_path = f"{images_folder}/{folder_name}"
    if not os.path.exists(folder_path):
        generate_function(X, folder_path)
    else:
        print(f"The images for {folder_name} are already generated")

    # Load image paths
    img_paths = os.path.join(folder_path, f"{problem_type}.csv")
    imgs = pd.read_csv(img_paths)

    # Update image paths
    imgs["images"] = folder_path + "/" + imgs["images"]

    # Combine datasets
    combined_dataset = pd.concat([imgs, X], axis=1)

    # Split data and labels
    X_processed = combined_dataset.drop(df.columns[-1], axis=1).drop("values", axis=1)
    y_processed = combined_dataset["values"]

    return X_processed, y_processed


In [None]:
### X_train
X_train, y_train = process_dataset(X_train, "train", image_model.fit_transform, problem_type)

In [None]:
### X_val
X_val, y_val = process_dataset(X_val, "val", image_model.transform, problem_type)

In [None]:
### X_test
X_test, y_test = process_dataset(X_test, "test", image_model.transform, problem_type)

---
<div style="text-align: right"> <font size=5> <a href="#indice"><i class="fa fa-arrow-circle-up" aria-hidden="true" style="color:#004D7F"></i></a></font></div>

---

<a id="section4"></a>
# <font color="#004D7F" size=6> 4. Pre-modelling phase</font>

Once the data is ready, we load it into memory with an iterator in order to pass it to the ViT.

---
<a id="section41"></a>
# <font color="#004D7F" size=5> 4.1. Data curation</font>

Note that each method generates images of **different pixel size**. For example:
- `TINTO` method has a parameter that you can specify the size in pixels which by default is 20.
- Other parameters such as `Combined` generates the size automatically and you must obtain them from the _shape_ of the images.

<div class="alert alert-block alert-info">

<i class="fa fa-info-circle" aria-hidden="true"></i>
You can see all information about TINTOlib documentation in [PyPI](https://tintolib.readthedocs.io/en/latest/installation.html)

Split in train/test/validation.

Note that the partitioning of the images is also performed, in addition to the tabular data.

<div class="alert alert-block alert-info">

<i class="fa fa-info-circle" aria-hidden="true"></i>
💡 **Important!!!**:  Keep in mind that, depending on the method used, you need to identify the number of pixels in the image. For example, in TINTO it is specified as a parameter, but in IGTD it is done afterwards, once the image is created (and even the pixels of width and height can be different).

In [None]:
#TIDY DATA SPLITTED
X_train_num = X_train.drop("images",axis=1)
X_val_num = X_val.drop("images",axis=1)
X_test_num = X_test.drop("images",axis=1)

#IMAGES
# For 3 channels (RGB)
X_train_img = np.array([cv2.imread(img) for img in X_train["images"]])
X_val_img = np.array([cv2.imread(img) for img in X_val["images"]])
X_test_img = np.array([cv2.imread(img) for img in X_test["images"]])

# For 1 channels (GRAY SCALE)
"""X_train_img = np.array([cv2.imread(img,cv2.IMREAD_GRAYSCALE) for img in X_train["images"]])
X_val_img = np.array([cv2.imread(img,cv2.IMREAD_GRAYSCALE) for img in X_val["images"]])
X_test_img = np.array([cv2.imread(img,cv2.IMREAD_GRAYSCALE) for img in X_test["images"]])"""

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Scale numerical data
X_train_num = scaler.fit_transform(X_train_num)
X_val_num = scaler.transform(X_val_num)
X_test_num = scaler.transform(X_test_num)

attributes = X_train_num.shape[1]
height, width, channels = X_train_img[0].shape
imgs_shape = (height, width, channels)

print("Images shape: ",imgs_shape)
print("Attributres: ",attributes)
pixel=X_train_img[0].shape[0]
print("Image size (pixels):", pixel)

In [None]:
# Plot an example image (e.g., the first image in the array)
example_image = X_train_img[0]

# Convert the image from BGR (OpenCV default) to RGB for correct color display
example_image_rgb = cv2.cvtColor(example_image, cv2.COLOR_BGR2RGB)

# Display the image using matplotlib
plt.imshow(example_image_rgb)
plt.title("Example Image from X_train")
plt.axis('off')  # Hide the axis for a cleaner look
plt.show()

In [None]:
X_train_num

<a id="section5"></a>
# <font color="#004D7F" size=6> 5. Modeling with ViT</font>

Now we can start the ViT training. Before that we prepare the algorithm for reading data.

---
<a id="section52"></a>
# <font color="#004D7F" size=5> 5.1. ViT for TINTOlib images</font>

This is an example of a simple ViT for TINTOlib images. Note that we are not looking for the optimization of the ViT but to show an example of TINTOlib execution.

It is crucial to select an appropriate patch size. The patch size should be a divisor of the input image size; for example, an image of 20x20 with a patch size of 5 would result in a total of 16 patches (4x4 grid). Given the high computational cost, the patch size should be carefully chosen based on the dimensions of the image.

<figure><center>
  <img src="Images/visionTransformer.png" width="500" height="350" alt="Graph">
  <figcaption><blockquote>ViT Architecture and components. Extracted from the <a href="https://arxiv.org/abs/2010.11929">ViT article</a></blockquote></figcaption>
</center></figure>

This code helps identify the valid patch sizes by finding the divisors of a given number.

In [None]:
def find_divisors(n):
    divisors = []
    for i in range(1, int(n**0.5) + 1):
        if n % i == 0:
            divisors.append(i)
            if i != n // i:  # Check to include both divisors if they are not the same
                divisors.append(n // i)
    divisors.sort()
    return divisors

In [None]:
find_divisors(imgs_shape[1])

This code defines the hyperparameters for the ViT model implemented. The hyperparameters are as follows:

- `image_size`: input image size.
- `patch_size`: size of the patches extracted from the images.
- `num_patches`: total number of patches extracted from each image.
- `projection_dim`: dimensionality of the linear projection for the patches.
- `num_heads`: number of attention heads in the transformer.
- `transformer_units`: list of units in the transformer layers.
- `transformer_layers`: number of layers in the transformer.
- `mlp_head_units`: list of units in the dense layers of the final classifier.

These hyperparameters are used to configure the ViT model and its training process.


In [None]:
pixel

In [None]:
image_size = pixel
patch_size = 3
num_patches = (image_size // patch_size) ** 2
projection_dim = 32
num_heads = 4
transformer_units = [
    projection_dim * 2,
    projection_dim,
]
transformer_layers = 4
mlp_head_units = [
    128,
    64,
]

<a id="section511"></a>
## <font color="#004D7F" size=5> 5.1.1. `Patches` Class</font>

The `Patches` class divides an image into small, fixed-size patches, rearranging them into a tensor that can be used as input for a Vision Transformer. This is essential because Transformers work with sequences, and this class converts images into sequences of patches.

<figure><center>
  <img src="Images/patch_embedding.png" width="850" height="200" alt="Graph">
  <figcaption><blockquote>ViT Architecture - Patch embedding. Extracted from <a href="https://arxiv.org/abs/2010.11929">Aman Arora's Blog</a></blockquote></figcaption>
</center></figure>

The `Patches` class is a subclass of `layers.Layer` in Keras, meaning it is a custom layer. This class is responsible for dividing an image into small patches that will be used as inputs to the Vision Transformer.

##### `__init__` Constructor
- `__init__` is the class constructor.
- `patch_size` is a parameter that specifies the size of each patch into which the image will be divided.
- `super().__init__()` calls the constructor of the base class (`layers.Layer`), initializing the layer.

##### `call` Method
- `call` is the method that defines the logic of the layer. It is invoked when a tensor (in this case, images) is passed through the layer.
- `input_shape = ops.shape(images)` retrieves the shape (dimensions) of the input tensor `images`. Assuming `images` is a 4D tensor (batch, height, width, channels).
- `batch_size`, `height`, `width`, and `channels` extract the respective dimensions of the image.
- `num_patches_h` and `num_patches_w` calculate the number of patches in the height and width of the image, respectively, by dividing the corresponding dimension by `patch_size`.
- `patches = keras.ops.image.extract_patches(images, size=self.patch_size)` uses a Keras function to extract patches of size `patch_size` from the input images.
- `patches = ops.reshape(...)` reshapes the `patches` tensor to have the shape `(batch_size, num_patches_h * num_patches_w, patch_size * patch_size * channels)`. This means that each patch is flattened and organized into a sequence of patches.

##### `get_config` Method

- `get_config` is a standard method in Keras for custom layers. It allows the layer's configuration to be saved and reloaded.
- `config = super().get_config()` calls the base class's `get_config` method to retrieve the basic configuration of the layer.
- `config.update({"patch_size": self.patch_size})` adds the `patch_size` to the configuration.
- `return config` returns the complete configuration.

In [None]:
class Patches(layers.Layer):
    def __init__(self, patch_size):
        super().__init__()
        self.patch_size = patch_size

    def call(self, images):
        input_shape = ops.shape(images)
        batch_size = input_shape[0]
        height = input_shape[1]
        width = input_shape[2]
        channels = input_shape[3]
        num_patches_h = height // self.patch_size
        num_patches_w = width // self.patch_size
        patches = keras.ops.image.extract_patches(images, size=self.patch_size)
        patches = ops.reshape(
            patches,
            (
                batch_size,
                num_patches_h * num_patches_w,
                self.patch_size * self.patch_size * channels,
            ),
        )
        return patches

    def get_config(self):
        config = super().get_config()
        config.update({"patch_size": self.patch_size})
        return config


<a id="section512"></a>
## <font color="#004D7F" size=5> 5.1.2. Patch Encoder</font>

The `PatchEncoder` class takes the image patches and projects them into a higher-dimensional space using a dense layer (`Dense`). It then adds positional information to each patch using a positional embedding layer (`Embedding`). This encoding is crucial for the functioning of Transformers, which need to know both the content of the patches and their position in the original image.

<figure><center>
  <img src="Images/vit.png" width="450" height="250" alt="Graph">
  <figcaption><blockquote>ViT Architecture and Transformer Encoder. Extracted from the <a href="https://arxiv.org/abs/2010.11929">ViT article</a></blockquote></figcaption>
</center></figure>

The `PatchEncoder` class is a subclass of `layers.Layer` in Keras, and it is used to project and encode the image patches.

##### `__init__` Constructor
- `__init__` is the class constructor.
- `num_patches` is the total number of patches into which the image has been divided.
- `projection_dim` is the dimension into which the patches will be projected.
- `self.projection` is a `Dense` layer that projects each patch into a higher-dimensional space specified by `projection_dim`.
- `self.position_embedding` is an `Embedding` layer that creates positional embeddings for each patch, with `num_patches` as the vocabulary size and `projection_dim` as the output dimension.

##### `call` Method
- `call` is the method that defines the logic of the layer. It is invoked when a tensor (in this case, patches) is passed through the layer.
- `positions` creates a tensor with a sequence of positions (from 0 to `num_patches - 1`), expanding the dimension to match the batch of data.
- `projected_patches` applies a linear projection to each patch using the `Dense` layer, resulting in a higher-dimensional tensor.
- `encoded` adds the linear projection of the patches (`projected_patches`) to the positional embeddings (`self.position_embedding(positions)`). This sum incorporates information about the position of each patch, which is crucial for the Transformer to understand the spatial arrangement of the patches.
- `return encoded` returns the encoded tensor.

##### `get_config` Method
- `get_config` is a standard method in Keras for custom layers. It allows the layer's configuration to be saved and reloaded.
- `config = super().get_config()` calls the base class's `get_config` method to retrieve the basic configuration of the layer.
- `config.update({"num_patches": self.num_patches})` adds `num_patches` to the configuration.
- `return config` returns the complete configuration.


In [None]:
class PatchEncoder(layers.Layer):
    def __init__(self, num_patches, projection_dim):
        super().__init__()
        self.num_patches = num_patches
        self.projection = layers.Dense(units=projection_dim)
        self.position_embedding = layers.Embedding(
            input_dim=num_patches, output_dim=projection_dim
        )

    def call(self, patch):
        positions = ops.expand_dims(
            ops.arange(start=0, stop=self.num_patches, step=1), axis=0
        )
        projected_patches = self.projection(patch)
        encoded = projected_patches + self.position_embedding(positions)
        return encoded

    def get_config(self):
        config = super().get_config()
        config.update({"num_patches": self.num_patches})
        return config

<a id="section513"></a>
## <font color="#004D7F" size=5> 5.1.3. Classifier</font>

The `mlp` function constructs an MLP that transforms the input tensor `x` through several dense and dropout layers. Each dense layer applies a linear transformation followed by a `gelu` activation, and each dropout layer randomly deactivates a percentage of the units from the previous layer to improve model generalization. This function is useful for adding non-linear learning capacity to the model, allowing it to capture more complex relationships in the data:
- `x`: the input tensor to be transformed.
- `hidden_units`: a list specifying the number of units (neurons) in each hidden layer of the MLP network.
- `dropout_rate`: the dropout rate applied after each dense layer, helping to prevent overfitting.


<figure><center>
  <img src="Images/vit.png" width="450" height="250" alt="Graph">
  <figcaption><blockquote>ViT Architecture and Transformer Encoder. Extracted from the <a href="https://arxiv.org/abs/2010.11929">ViT article</a></blockquote></figcaption>
</center></figure>

In [None]:
def mlp(x, hidden_units, dropout_rate):
    for units in hidden_units:
        x = layers.Dense(units, activation=keras.activations.gelu)(x)
        x = layers.Dropout(dropout_rate)(x)
    return x

<a id="section514"></a>
## <font color="#004D7F" size=5> 5.1.4. ViT Classifier</font>

The `create_vit_classifier` function constructs a complete Vision Transformer classifier.
1. It first divides the input image into patches and encodes those patches.
2. Then, it passes the encoded patches through multiple Transformer layers, each of which includes layer normalization, multi-head attention, residual connections, and MLP layers.
3. Finally, it normalizes and flattens the representation before passing it through a final MLP network to produce the output features.

<figure><center>
  <img src="Images/Encoder-decoder.png" width="450" height="250" alt="Graph">
  <figcaption><blockquote>Transformer Architecture. Extracted from the <a href="https://arxiv.org/abs/2010.11929">ViT article</a></blockquote></figcaption>
</center></figure>

#### Model Input
- `inputs`: defines the model input with a shape specified by `input_shape`.

#### Patch Creation
- `patches`: instantiates the `Patches` layer (defined earlier) to divide the input image into smaller patches of the size specified by `patch_size`.

#### Patch Encoding
- `encoded_patches`: instantiates the `PatchEncoder` layer (defined earlier) to project the patches into a high-dimensional space and add positional embeddings.

#### Transformer Blocks
- Loops through `transformer_layers` to create multiple layers of the Transformer block.
    - `x1`: applies layer normalization to the `encoded_patches`.
    - `attention_output`: applies a multi-head attention layer.
    - `x2`: performs a residual (skip) connection by adding `attention_output` and `encoded_patches`.
    - `x3`: applies another layer normalization to `x2`.
    - `x3`: passes through an MLP using the `mlp` function defined earlier.
    - `encoded_patches`: performs another residual connection by adding `x3` and `x2`.

#### Final Representation
- `representation`: applies layer normalization, flattens the tensor, and applies dropout for regularization.

#### Final MLP Network
- `features`: applies another MLP to the final representation.


In [None]:
def create_vit_classifier():
    inputs = keras.Input(shape=imgs_shape)
    # Augment data.
    #augmented = data_augmentation(inputs)
    # Create patches.
    patches = Patches(patch_size)(inputs)
    # Encode patches.
    encoded_patches = PatchEncoder(num_patches, projection_dim)(patches)

    # Create multiple layers of the Transformer block.
    for _ in range(transformer_layers):
        # Layer normalization 1.
        x1 = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
        # Create a multi-head attention layer.
        attention_output = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=projection_dim, dropout=0.1
        )(x1, x1)
        # Skip connection 1.
        x2 = layers.Add()([attention_output, encoded_patches])
        # Layer normalization 2.
        x3 = layers.LayerNormalization(epsilon=1e-6)(x2)
        # MLP.
        x3 = mlp(x3, hidden_units=transformer_units, dropout_rate=0.1)
        # Skip connection 2.
        encoded_patches = layers.Add()([x3, x2])

    # Create a [batch_size, projection_dim] tensor.
    representation = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
    representation = layers.Flatten()(representation)
    representation = layers.Dropout(0.1)(representation)

    # Create the Keras model.
    model = keras.Model(inputs=inputs, outputs=representation)
    return model

In [None]:
vit_model = create_vit_classifier()

---
<a id="section52"></a>
# <font color="#004D7F" size=5> 5.2. MLP</font>

Finally, we pass the output to a MLP to perform a regression task

In [None]:
output = vit_model.output
x = Dense(128, activation="relu")(output)
x = Dense(64, activation="relu")(x)
x = Dense(32, activation="relu")(x)
x = Dense(16, activation="relu")(x)
x = Dense(1, activation="linear")(x)
model = Model(inputs=[vit_model.input], outputs=x)

---
<a id="section54"></a>
# <font color="#004D7F" size=5> 5.3. Metrics</font>

Define metrics and some hyperparameters

In [None]:
import tensorflow.keras.backend as K

def r_square(y_true, y_pred):
    SS_res = K.sum(K.square(y_true - y_pred))
    SS_tot = K.sum(K.square(y_true - K.mean(y_true)))
    r2 = 1 - SS_res / (SS_tot + K.epsilon())
    return r2

METRICS = [
    tf.keras.metrics.MeanSquaredError(name = 'mse'),
    tf.keras.metrics.MeanAbsoluteError(name = 'mae'),
    tf.keras.metrics.RootMeanSquaredError(name = 'rmse'),
    r_square,
]

Print the hybrid model

In [None]:
os.makedirs(results_folder, exist_ok=True)

# Redirect the summary output to the specified file
with open(results_folder+"/model_summary.txt", "w") as f:
    model.summary(print_fn=lambda x: f.write(x + '\n'))

# Desactivar la visualización automática de matplotlib
plt.ioff()
# Now, you can also save the model plot
plot_model(model, to_file=results_folder+'model_plot.png', show_shapes=True, expand_nested=True)
# Reactivar la visualización automática de matplotlib (opcional)
plt.ion()

---
<a id="section55"></a>
# <font color="#004D7F" size=5> 5.5. Compile and fit</font>


In [None]:
opt = Adam(learning_rate=1e-4)

In [None]:
model.compile(
    loss="mse",
    optimizer=opt,
    metrics = METRICS
)

In [None]:
# Configure EarlyStopping for binary classification
early_stopper = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',  # Monitor the validation accuracy
    min_delta=0.001,         # Minimum change in the monitored quantity to qualify as an improvement
    patience=10,             # Number of epochs with no improvement after which training will be stopped
    verbose=1,               # Log when training stops
    mode='min',              # Maximize the accuracy; min the loss
    restore_best_weights=True  # Restore model weights from the epoch with the best value of the monitored quantity
)

In [None]:
model_history=model.fit(
    x=[X_train_img], y=y_train,
    validation_data=([X_val_img], y_val),
    epochs=50,
    batch_size=16,
    callbacks = [early_stopper]
)

In [None]:
print(model_history.history.keys())

<a id="section6"></a>
# <font color="#004D7F" size=6> 6. Results</font>

Finally, we can evaluate our hybrid model with the images created by TINTOlib in any of the ways represented below.

---
<a id="section61"></a>
# <font color="#004D7F" size=5> 6.1. Train/Validation representation</font>

In [None]:
plt.plot(model_history.history['loss'], color = 'red', label = 'loss')
plt.plot(model_history.history['val_loss'], color = 'green', label = 'val loss')
plt.legend(loc = 'upper right')
plt.show()

In [None]:
plt.plot(model_history.history['mse'], color = 'red', label = 'mse')
plt.plot(model_history.history['val_mse'], color = 'green', label = 'val mse')
plt.legend(loc = 'upper right')
plt.show()

---
<a id="section62"></a>
# <font color="#004D7F" size=5> 6.2. Validation/Test evaluation</font>

In [None]:
score_test= model.evaluate([X_val_img], y_val)

In [None]:
prediction = model.predict([X_test_img])

test_mape = mean_absolute_percentage_error(y_test, prediction)
test_mae = mean_absolute_error(y_test, prediction)
test_mse = mean_squared_error(y_test, prediction)
test_rmse = root_mean_squared_error(y_test, prediction)
test_r2 = r2_score(y_test, prediction)

# Print the evaluation metrics
print("Mean Absolute Percentage Error:", test_mape)
print("Mean Absolute Error:", test_mae)
print("Mean Squared Error:", test_mse)
print("Root Mean Squared Error:", test_rmse)
print("R2 Score:", test_r2)

# Define the metrics and their values
metrics = {
    "Mean Absolute Percentage Error": test_mape,
    "Mean Absolute Error": test_mae,
    "Mean Squared Error": test_mse,
    "Root Mean Squared Error": test_rmse,
    "R2 Score": test_r2,
}

# Save the metrics to a text file
with open(f"{results_folder}/metrics.txt", "w") as file:
    for metric, value in metrics.items():
        file.write(f"{metric}: {value}\n")

print(f"Metrics saved to {results_folder}/metrics.txt")

In [None]:
train_mse = model_history.history["mse"][-1]
train_r2 = model_history.history["r_square"][-1]

val_mse = model_history.history["val_mse"][-1]
val_r2 = model_history.history["val_r_square"][-1]

print("Train Mean Squared Error:", train_mse)
print("Train R2 Score:", train_r2)

print("Val Mean Squared Error:", val_mse)
print("Val R2 Score:", val_r2)

<div style="text-align: right"> <font size=5> <a href="#indice"><i class="fa fa-arrow-circle-up" aria-hidden="true" style="color:#004D7F"></i></a></font></div>

---

<div style="text-align: right"> <font size=6><i class="fa fa-coffee" aria-hidden="true" style="color:#004D7F"></i> </font></div>