# Fully Convolutional Network (FCN) Training-Test (Tensorflow 2.2)

In this notebook it is demonstrated how to train a deep learning (DL) model built using an fully convolutional network(FCN) architecture to predict the column heights (CHs) in gold Nanoparticles (NPs) represented in Hight-Resolution Transmission Electron Microscopy (HRTEM) images using Tensorflow 2.2.0.

Given the complexity of the problem, we realized that parallel compution is mandatory, since the learning process to accurately predict the CHs for each element requires a substantial amount of epochs. There two ways to implement a parallel DL calculation: **data parallelization** and **model parallelization**. Data parallelization is implemented using the **Mirrored Strategy** method from Tensorflow. A detailed explaination of data parallelization using Mirrored Strategy is provided here:

**Mirrored Stategy (Data Parallelization)**:  https://www.tensorflow.org/tutorials/distribute/custom_training

How does it work?:
- All the variables and the model graph is replicated on the replicas.
- Input is evenly distributed across the replicas.
- Each replica calculates the loss and gradients for the input it received.
- The gradients are synced across all the replicas by summing them.
- After the sync, the same update is made to the copies of the variables on each replica.


Model parallelization is implemented using the **Horovod** library. A detailed explaination of how to use **Horovod** for model parallelization is provided here:

**Horovod (Model Parallelization)**: https://github.com/horovod/horovod.

Also, we have impelemented a technique called Mixed Precision which accelerates tensors operation on GPUs with computing capability at least 7.0 and a technique called Accelerated Linear Algebra (XLA) from Tensorflow, in order to accelerate as much as possible the computation. More info can be found in the related webpages:

**Mixed Precision**: https://www.tensorflow.org/guide/mixed_precision.

"Mixed precision is the combined use of different numerical precisions in a computational method. There are numerous benefits to using numerical formats with lower precision than 32-bit floating point. First, they require less memory, enabling the training and deployment of larger neural networks. Second, they require less memory bandwidth, thereby speeding up data transfer operations. Third, math operations run much faster in reduced precision, especially on GPUs with Tensor Core support for that precision. Mixed precision training achieves all these benefits while ensuring that no task-specific accuracy is lost compared to full precision training. It does so by identifying the steps that require full precision and using 32-bit floating point for only those steps while using 16-bit floating point everywhere else. Significant training speedups are experienced by switching to mixed precision -- up to 3x overall speedup on the most arithmetically intense model architectures. Half-precision floating point format (FP16) uses 16 bits, compared to 32 bits for single precision (FP32). Lowering the required memory enables training of larger models or training with larger mini-batches." (https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html)}.


**XLA**: https://www.tensorflow.org/xla.


Luckily, we benefit of a cluster of **4 NVIDIA V100 GPUs** with **computing capability of 7.0**. Even in this case, sufficiently accurately results have been achieved in at least 5 months ( approximately 600 epochs required) using model parallelization and at least 3 months (approximately 400 epochs required) using model parallelization. We have realized that model parallelization is a little bit faster in both the computation and in achieving a sufficiently high performance (less epochs required).

In this notebook we illustrate both the implementations.


The main files are *training_data-parallelization.py* and *training_model-parallelization.py*.In addition, the file *fcn.py* contains the implementation of the FCN, while *training_utils.py* contains the modules to perform random imaging transormations of the input images and to calculate the R^2 between the predicted and true CHs, as well as to plot the input data in a debug folder. 

## Data Parallelization Implementation

Here we provide the code to implement the training of the FCN using data parallelization in Tensorflow 2.2

### -Step 1: importing the libraries:

- **Numpy**.

- **Tensorflow**. In particular, we import the module mixed_precision to implement the mixed precision technique.

- **fcn**: file containing the architecture of the FCN.

- **training_utils**: file containing the modules for the calculation of the R^2 (R2_CHs), the implementation of the random transformations on the input images (Random_Imaging) and plotting in debug folder (plot_debug).

- **time, datetime**: libraries to manage timing, used to calculated to processing time of the learning process in terms of images/second.

- **loggin,platform**: print on which host the code is running.

In [1]:
import numpy as np

import tensorflow as tf
from tensorflow.keras.mixed_precision import experimental as mixed_precision

from fcn import FCN
from training_utils import R2_CHs,Random_Imaging,plot_debug

import os

import time
from datetime import datetime

import logging
import platform

### - Step 2: defining the directories path to load data and save results:

- **training_folder_path, test_folder_path**: paths to training and test data. The data are saved in numpy arrays data_1.npy, data_2.npy, etc. as tensors which contain both the images and the labels maps.


- **training_results_folder_path, test_results_folder_path**: paths to the parent directories containing the saved training and test results.


- **debug_folder_path**: path to debug directory to save the plots of the input images and labels just to check what it is going through the network.


- **weights_folder_path**: path to the directory to save the weights of the FCN at each epoch.


- **training_learning_curve_folder_path,test_learning_curve_folder_path**: paths to the directories containing the training and test learning curves.


In [2]:
training_folder_path = '../training_data/data/'
test_folder_path = '../test_data/data/'

training_results_folder_path = 'results_data-parallelization/training_results/'
debug_folder_path = training_results_folder_path + 'debug/'
weights_folder_path = training_results_folder_path + 'weights/'
training_learning_curve_folder_path = training_results_folder_path + 'train_learning_curve/'

test_results_folder_path = 'results_data-parallelization/test_results/'
test_learning_curve_folder_path = test_results_folder_path + 'test_learning_curve/'


if training_results_folder_path and not os.path.exists(training_results_folder_path):
    os.makedirs(training_results_folder_path)

if debug_folder_path and not os.path.exists(debug_folder_path):
    os.makedirs(debug_folder_path)

if weights_folder_path and not os.path.exists(weights_folder_path):
    os.makedirs(weights_folder_path)

if training_learning_curve_folder_path and not os.path.exists(training_learning_curve_folder_path):
    os.makedirs(training_learning_curve_folder_path)

if test_results_folder_path and not os.path.exists(test_results_folder_path):
    os.makedirs(test_results_folder_path)

if test_learning_curve_folder_path and not os.path.exists(test_learning_curve_folder_path):
    os.makedirs(test_learning_curve_folder_path)

### - Step 3: defining the computing techniques: Mirrored Strategy, Mixed Precision, Config Proto and XLA

 - **Mirrored Strategy**: implementation of data parallelization. **For the sake of the visualization of the distributed training implementation**, the code is run on my personal laptop on **2 CPUs cores**. The results provided in the paper are obtained running the model on a cluster of **4 V-100 GPUs**.
 
 - **Mixed Precision**: mixed precision should be activated (mp = True) only if the cod is run on an NVIDIS GPU with a computing capability at least of 7.0. In other case, mixed precison actually slows down the calculation. 
 
 - **Config Proto**: method to define server parameters for training. In particular:
 
 
   - **allow_soft_placement**: dynamic allocation of GPU memory.
   
   - **log_device_placement**: printing of device information.
   
   - **gpu_options.allow_growth**: allowing to allocate only the memory required by the process, instead of allocating the full memory of the device where the process runs.
   
   - **gpu_options.force_gpu_compatible**: force all tensors to be gpu_compatible. All CPU tensors will be allocated with Cuda pinned memory.
   
   - **graph_options.optimizer_options.global_jit_level**: XLA activation.
   
   

In [3]:
# Mirrored Strategy (1)
#strategy = tf.distribute.MirroredStrategy()
strategy = tf.distribute.MirroredStrategy(['/cpu:1','/cpu:2'])

num_devices = strategy.num_replicas_in_sync

# Mixed Precision (2) 
# set to True only if running on a device with computing capability at least 7.0 (ex. V-100 GPU)
mp = False

if mp:

    policy = mixed_precision.Policy('mixed_float16')

    mixed_precision.set_policy(policy)

# set gpus options
config_proto = tf.compat.v1.ConfigProto()

config_proto.allow_soft_placement = True

config_proto.log_device_placement = True

config_proto.gpu_options.allow_growth = True

config_proto.gpu_options.force_gpu_compatible = True

# XLA (3)
config_proto.graph_options.optimizer_options.global_jit_level = tf.compat.v1.OptimizerOptions.ON_1

# session definition
sess = tf.compat.v1.InteractiveSession(config = config_proto)

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:1', '/job:localhost/replica:0/task:0/device:CPU:2')
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device



### -Step 4: Training-Test datasets definition

   - **training_data_path**: path to training data files in '.npy' format.
   
   
   - **train_dataset**: definition of the trainig dataset object using the Tensorflow function **data.Dataset.list_files**. Shuffling of the data which is required in the training process is implemented using the **.shuffle** method,setting the **buffer_size** as the total number of training data and **batch_size** as the global batch size.
   
   
   - **batch_size_per_replica**: batch of data allocated to a single device. In the example of this notebook, the batch_size_per_replica is set to 4.
   
   
   - **global_batch_size**: batch of data allocated among all the used devices. In the example of this notebook, we use 2 CPU devices, and since batch_size_per_replica is 4, the global_batch_size is 8. Since there are 80 training data, we have 10 global batches in total.
   
   
   - **distributed_train_dataset**: the training dataset object implemented for parallel distribution using the Mirrored Strategy technique.
   
   
All the operations described above are used for the test dataset as well.
   

In [4]:
training_data_path = training_folder_path + str('*.npy')
test_data_path = test_folder_path + str('*.npy')

train_dataset = tf.data.Dataset.list_files(training_data_path)
test_dataset = tf.data.Dataset.list_files(test_data_path)

num_training_data = len(os.listdir(training_folder_path))
num_test_data = len(os.listdir(test_folder_path))

batch_size_per_replica = 4

global_batch_size = batch_size_per_replica * num_devices

train_dataset = train_dataset.shuffle(buffer_size = num_training_data).batch(batch_size = global_batch_size)
test_dataset = test_dataset.batch(batch_size = global_batch_size)

num_global_batches_train = num_training_data // global_batch_size

distributed_train_dataset = strategy.experimental_distribute_dataset(train_dataset)
distributed_test_dataset = strategy.experimental_distribute_dataset(test_dataset)

### -Step 5: Model definition

- **get_model**: function to define the model on the basis of the implemented FCN network architecture.


- **with strategy.scope()**: model definition under the data distribution strategy.


- **optimizer**: Adam optimizer is used to compile the model. If mixed precision is enabled, the optimizer must be scaled, since the mixed precision operation involves the calculation of a scaled loss. Loss scaling is used to preserve small gradient values.


- **loss_object**: the mean squared error (MSE) is adopted as loss function. With Mirrored Strategy, the model on each replica does a forward pass with its respective input and calculates the loss. Now, instead of dividing the loss by the number of examples in its respective input (BATCH_SIZE_PER_REPLICA), **the loss should be divided by the GLOBAL_BATCH_SIZE**. This needs to be done because after the gradients are calculated on each replica, they are **synced across the replicas by summing them (reduction)**. If using **tf.keras.losses classes** (as in the example below), **the loss reduction needs to be explicitly specified to be NONE**, so we can disable automatic reduction and explicitly define it using other functions which are **appropriate for distributed training**. 

  In particular, we use **tf.nn.compute_average_loss(per_example_loss,global_batch_size = GLOBAL_BATCH_SIZE)** to perform the loss reduction. 
  

  It should be noted that even if the data are distributed across different devices (CPUs like in this example or GPUs), the reducction takes place always on the head CPU, labeled as CPU:0

In [5]:
def get_model(FCN,input_shape, output_channels):

    input_channel = 1

    input_tensor = tf.keras.Input(shape = input_shape+(input_channel,))

    model = FCN(input_tensor, output_channels)

    return model

input_shape = (256,256)
    
output_channels = 1
    
with strategy.scope():

    model = get_model(FCN, input_shape, output_channels)

    optimizer = tf.keras.optimizers.Adam()

    if mp:
        
        optimizer = mixed_precision.LossScaleOptimizer(optimizer, loss_scale='dynamic')

    loss_object = tf.keras.losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.NONE)


    def compute_loss(labels,predictions):

        per_example_loss = loss_object(labels, predictions)

        per_example_loss /= tf.cast(tf.reduce_prod(tf.shape(labels)[1:]), tf.float32)

        return tf.nn.compute_average_loss(per_example_loss, global_batch_size = labels.shape[0])

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Redu

### -Step 6: Extract a global batch

 - **get_global_batch**: function to extract a global batch of images and labels. The input is 'global_batch' which is a list containing a batch of paths data in '.npy' format. Each data is loaded in a for loop and the batch of images and labels are populated. The images and labels are 4D tensors, which is the appropriate format for deep learning computation.
 
  Random transformations (lighting, blurring, increasing/decreasing contrast and rotations are applied to the images with the class **Random_Imaging**. 
  The function **plot_debug** is used to save a plot of the images and the label maps for each element in debug folder, in order to visualize the input going through the network.

In [6]:
def get_global_batch(global_batch):

    global_batch = np.array(global_batch)

    global_batch_images = []

    global_batch_labels = []

    for i in range(len(global_batch)):
        
        
        # load numpy file
        data = np.load(global_batch[i])
        
        # extract the image
        img = data[:,:,:,0]
        img = img.reshape(img.shape+(1,)).astype(np.float32)

        # extract the label segmentation map
        lbl = data[:,:,:,1:]
        
        # apply random transformations
        rnd_imgng = Random_Imaging(image=img,labels=lbl)
        img,lbl = rnd_imgng.get_trasform()

        global_batch_images.append(img)

        global_batch_labels.append(lbl)
        
        # plot the batch of images and labels in debug folder to visualize it
        plot_debug(img, lbl, i, debug_folder_path)

    global_batch_images = np.concatenate(global_batch_images)

    global_batch_labels = np.concatenate(global_batch_labels)

    return  [global_batch_images, global_batch_labels]

### -Step 6: Training and Test functions definition

  - **train_step**: functions wich define the training process (forward pass + loss calculation + backpropagation + update of the model's parameters). The input is a local batch of data (batch of data for a single device).
  
  
  - **test_step**: functions wich define the test process (forward pass + loss calculation). The input is a local batch of data (batch of data for a single device).
  
  
  - **@tf.function**: compilation of a function into a callable TensorFlow graph.

In [7]:
@tf.function
def train_step(local_batch):
    
    local_batch_images, local_batch_labels = local_batch

    with tf.GradientTape() as tape:

        local_batch_predictions = model(local_batch_images, training=True)

        train_loss = compute_loss(local_batch_labels, local_batch_predictions)


        if mp:

            scaled_train_loss = optimizer.get_scaled_loss(train_loss)


    if mp:

        scaled_gradients = tape.gradient(scaled_train_loss, model.trainable_variables)

        gradients = optimizer.get_unscaled_gradients(scaled_gradients)

    else:

        gradients = tape.gradient(train_loss, model.trainable_variables)

    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    return train_loss, local_batch_predictions


@tf.function
def test_step(local_batch):

    local_batch_images, local_batch_labels = local_batch

    local_batch_predictions = model(local_batch_images, training = False)

    test_loss = compute_loss(local_batch_labels, local_batch_predictions)

    return test_loss,local_batch_predictions

### -Step 7: Parallel Training and Test definition


  - **distributed_train_step**: implementation of the training step in the parallel distribution. The loss and predictions are calculated on each device using **strategy.run** applied to the **train_step** function and the **gloabal_batch**. The **loss reduction** is implemented using the **.reduce** method of the strategy.
  
  
   - **distributed_test_step**: implementation of the test step in the parallel distribution. The loss and predictions are calculated on each device using **strategy.run** applied to the **test_step** function and the **gloabal_batch**. The **loss reduction** is implemented using the **.reduce** method of the strategy.

In [8]:
@tf.function
def distributed_train_step(global_batch):

    per_replica_losses, per_replica_predictions = strategy.run(train_step, args=(global_batch,))

    loss = strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,axis=None)
    predictions = strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_predictions,axis=None)

    return loss,predictions


@tf.function
def distributed_test_step(global_batch):

    per_replica_losses, per_replica_predictions = strategy.run(test_step, args=(global_batch,))

    loss = strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses, axis=None)
    predictions = strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_predictions, axis=None)

    return loss, predictions

### -Step 8: Training and Test loop definition

  - **train_test_loop**: function wich implements the training and test loop. Inputs:
  
  
   - **first_epoch**: epoch at which the run should start. If the training is run from scratch, of course first_epoch has to be set to 0. If for any reason the training is interrupted (out of memory problems, server shutdown etc.) setting first_epoch allows to re-load the saved results up to the epoch when training was interrupted and re-start training from there.
   
   
   - **num_epochs**: number of epochs for training the model.
   
   - **save_every**: to save space, we save the weights with a certain epochs step. In this example we save the wieghts every 5 epochs.
   
   
At the beginning of the run, the code prints out the host where it is running as well as the number of GPUs used in the computation.
   
After each batch has been processed, the code prints the running loss and average R2, as well as the processing frequency (PF), calculated as the number of images per second processed by the model. In my personal laptop the processing frequency is very low (0.2 imgs/s), with P-100 GPU it's 20 imgs/s and with V-100 GPU it's 40 imgs/s.


The **R2_CHs** is used to calculate the R2 between the predicted and the true CHs. It should be noted that the R2 requires a substantial amount of epochs to grow from its initial value of 0.


The **loss and R2 results**, as well as the **model's parameter**s are saved in the appropriate folders defined in **Step 2**.

In [9]:
def training_test_loop(first_epoch,num_epochs, save_every):
    
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
    logging.getLogger('tensorflow').setLevel(logging.FATAL)

    print("Running on host '{}'".format(platform.node()))
    print('Running on {} devices'.format(num_devices))


    train_loss_learning_curve = []
    train_r2_learning_curve = []

    test_loss_learning_curve = []
    test_r2_learning_curve = []

    if first_epoch > 0:
        model.load_weights(weights_folder_path+'epoch-{}.h5'.format(first_epoch))

        train_loss_learning_curve = list(np.load(training_learning_curve_folder_path+'train_loss_learning_curve.npy'))
        train_r2_learning_curve = list(np.load(training_learning_curve_folder_path+'train_r2_learning_curve.npy'))
        test_loss_learning_curve = list(np.load(test_learning_curve_folder_path+'test_loss_learning_curve.npy'))
        test_r2_learning_curve = list(np.load(test_learning_curve_folder_path+'test_r2_learning_curve.npy'))


    for epoch in range(first_epoch, first_epoch + num_epochs):

        total_train_loss = 0.0

        total_average_r2_train = 0.0

        processed_batches_train = 0

        for train_batch_index,train_batch in enumerate(train_dataset):

            train_images,train_labels = get_global_batch(train_batch)

            num_images = train_images.shape[0]

            before = time.time()

            train_loss,train_predictions = distributed_train_step([train_images,train_labels])

            total_train_loss = total_train_loss + train_loss

            processed_batches_train += 1

            train_loss = total_train_loss / processed_batches_train

            r2_CHs = R2_CHs(train_predictions, train_labels)
            r2_train = r2_CHs.get_r2_batch()

            total_average_r2_train += r2_train
            r2_average_train = total_average_r2_train/processed_batches_train

            totaltime = time.time() - before
            
            if (train_batch_index +1) % 1 == 0:
                
                
               # print('Epoch [{}/{}] : Batch [{}/{}] : Train Loss = {:.4f}, Train R2 = {:.4f}, Processing Frequency = {:.1f} imgs/s'.format(epoch+1,
                print('Epoch [{}/{}] : Batch [{}/{}] : Train Loss = {:.4f}, Train R2 = {:.4f}'.format(epoch+1,                                                                                                                        
                                                                                first_epoch + num_epochs,
                                                                                train_batch_index +1,
                                                                                num_global_batches_train,
                                                                                train_loss,
                                                                                r2_average_train))
                                                                          #      num_images/totaltime))

            
        total_test_loss = 0

        total_average_r2_test = 0.0

        processed_batches_test = 0

        for test_batch_index,test_batch in enumerate(test_dataset):

            test_images, test_labels = get_global_batch(test_batch)

            test_loss, test_predictions = distributed_test_step([test_images, test_labels])

            total_test_loss += test_loss

            processed_batches_test += 1

            test_loss = total_test_loss/processed_batches_test

            r2_CHs = R2_CHs(test_predictions, test_labels)
            r2_test = r2_CHs.get_r2_batch()

            total_average_r2_test += r2_test
            r2_average_test = total_average_r2_test / processed_batches_test

        print('Epoch [{}/{}]: Test Loss = {:.4f}, Test R2 = {:.4f}'.format(epoch + 1,
                                                                           first_epoch + num_epochs,
                                                                           test_loss,
                                                                           r2_average_test))


        train_loss_learning_curve.append(train_loss)
        train_loss_learning_curve_array = np.array(train_loss_learning_curve)

        train_r2_learning_curve.append(r2_train)
        train_r2_learning_curve_array = np.array(train_r2_learning_curve)

        np.save(training_learning_curve_folder_path+'train_loss_learning_curve',train_loss_learning_curve_array)
        np.save(training_learning_curve_folder_path+'train_r2_learning_curve',train_r2_learning_curve_array)


        test_loss_learning_curve.append(test_loss)
        test_loss_learning_curve_array = np.array(test_loss_learning_curve)

        test_r2_learning_curve.append(r2_test)
        test_r2_learning_curve_array = np.array(test_r2_learning_curve)

        np.save(test_learning_curve_folder_path+'test_loss_learning_curve',test_loss_learning_curve_array)
        np.save(test_learning_curve_folder_path+'test_r2_learning_curve',test_r2_learning_curve_array)

        if epoch % save_every == 0:
            model.save_weights(weights_folder_path+'epoch-{}.h5'.format(epoch+1))
    
    

The training and test loop can be run by simply defining **first_epoch** and **num_epochs** and calling the function **train_test_loop**.

In [None]:
first_epoch = 0
num_epochs = 500
save_every = 5

training_test_loop(first_epoch,num_epochs,save_every)

Running on host 'Marcos-MBP.attlocal.net'
Running on 2 devices
Epoch [1/500] : Batch [1/10] : Train Loss = 0.6586, Train R2 = 0.0000
Epoch [1/500] : Batch [2/10] : Train Loss = 0.5955, Train R2 = 0.0000
Epoch [1/500] : Batch [3/10] : Train Loss = 0.5512, Train R2 = 0.0097
Epoch [1/500] : Batch [4/10] : Train Loss = 0.4608, Train R2 = 0.0672
Epoch [1/500] : Batch [5/10] : Train Loss = 0.4282, Train R2 = 0.0782
Epoch [1/500] : Batch [6/10] : Train Loss = 0.4013, Train R2 = 0.1105
Epoch [1/500] : Batch [7/10] : Train Loss = 0.4071, Train R2 = 0.1142
Epoch [1/500] : Batch [8/10] : Train Loss = 0.3781, Train R2 = 0.1288
Epoch [1/500] : Batch [9/10] : Train Loss = 0.3536, Train R2 = 0.1528
Epoch [1/500] : Batch [10/10] : Train Loss = 0.3552, Train R2 = 0.1708
Epoch [1/500]: Test Loss = 76.5387, Test R2 = 0.0000
Epoch [2/500] : Batch [1/10] : Train Loss = 0.4712, Train R2 = 0.4067
Epoch [2/500] : Batch [2/10] : Train Loss = 0.3433, Train R2 = 0.3603
Epoch [2/500] : Batch [3/10] : Train Loss =

Epoch [11/500] : Batch [10/10] : Train Loss = 0.0810, Train R2 = 0.4165
Epoch [11/500]: Test Loss = 0.3902, Test R2 = 0.0020
Epoch [12/500] : Batch [1/10] : Train Loss = 0.1270, Train R2 = 0.2811
Epoch [12/500] : Batch [2/10] : Train Loss = 0.1085, Train R2 = 0.3469
Epoch [12/500] : Batch [3/10] : Train Loss = 0.0911, Train R2 = 0.3752
Epoch [12/500] : Batch [4/10] : Train Loss = 0.1321, Train R2 = 0.4037
Epoch [12/500] : Batch [5/10] : Train Loss = 0.1211, Train R2 = 0.4161
Epoch [12/500] : Batch [6/10] : Train Loss = 0.1181, Train R2 = 0.4363
Epoch [12/500] : Batch [7/10] : Train Loss = 0.1149, Train R2 = 0.4204
Epoch [12/500] : Batch [8/10] : Train Loss = 0.1123, Train R2 = 0.4266
Epoch [12/500] : Batch [9/10] : Train Loss = 0.1065, Train R2 = 0.4315
Epoch [12/500] : Batch [10/10] : Train Loss = 0.1054, Train R2 = 0.4344
Epoch [12/500]: Test Loss = 0.4043, Test R2 = 0.0000
Epoch [13/500] : Batch [1/10] : Train Loss = 0.1245, Train R2 = 0.2495
Epoch [13/500] : Batch [2/10] : Train Lo

It should be noted that the results visualized in this notebook are relared to the trial dataset of 80 training data and 16 test data, and **they are not representative of the results illustrated in the paper**, for which a training dataset of 8000 data and a test dataset of 2000 data are required.