# Fully Convolutional Network (FCN) Training (Tensorflow 2.2)

In this notebook it is demonstrated how to train a deep learning (DL) model built using an fully convolutional network(FCN) architecture to predict the column heights (CHs) in high entropy alloys (HEAs) using Tensorflow 2.2.0

Given the complexity of the problem, we realized that parallel compution is mandatory, since the learning process to accurately predict the CHs for each element requires a substantial amount of epochs. There two ways to implement a parallel DL calculation: **data parallelization** and **model parallelization**. Data parallelization is implemented using the **Mirrored Strategy** method from Tensorflow. A detailed explaination of data parallelization using Mirrored Strategy is provided here:

**Mirrored Stategy (Data Parallelization)**:  https://www.tensorflow.org/tutorials/distribute/custom_training


Model parallelization is implemented using the **Horovod** library. A detailed explaination of how to use **Horovod** for model parallelization is provided here:

**Horovod (Model Parallelization)**: https://github.com/horovod/horovod

Also, we have impelemented a technique called Mixed Precision which accelerates tensors operation on GPUs with computing capability at least 7.0 and a technique called Accelerated Linear Algebra (XLA) from Tensorflow, in order to accelerate as much as possible the computation. More info can be found in the related webpages:

**Mixed Precision**: https://www.tensorflow.org/guide/mixed_precision

**XLA**: https://www.tensorflow.org/xla


Luckily, we benefit of a cluster of 4 NVIDIA V100 GPUs with computing capability of 7.0.. Even in this case, sufficiently accurately results have been achieved in at least 5 months ( approximately 600 epochs required) using model parallelization and at least 3 months (approximately 400 epochs required) using model parallelization. We have realized that model parallelization is a little bit faster in both the computation and in achieving a sufficiently high performance (less epochs required).

In this notebook we illustrate both the implementations.


The main files are *training_data-parallelization.py* and *training_model-parallelization.py*.In addition, the file *fcn.py* contains the implementation of the FCN, while *training_utils.py* contains the modules to perform random imaging transormations of the input images and to calculate the R^2 between the predicted and true CHs, as well as to plot the input data in a debug folder. 

## Data Parallelization

Here we provide the code to implement the training of the FCN using data parallelization in Tensorflow 2.2

### Step 1: importing the libraries:

- Numpy.

- Tensorflow. In particular, we import the module mixed_precision to implement the mixed precision technique.

- fcn: file containing the architecture of the FCN.

- training_utils: file containing the modules for the calculation of the R^2 (R2_CHs), the implementation of the random transformations on the input images (Random_Imaging) and plotting in debug folder (plot_debug).

- time, datetime: libraries to manage timing, used to calculated to processing time of the learning process in terms of images/second.

- loggin,platform

In [11]:
import numpy as np

import tensorflow as tf
from tensorflow.keras.mixed_precision import experimental as mixed_precision

from fcn import FCN
from training_utils import R2_CHs,Random_Imaging,plot_debug

import os

import time
from datetime import datetime

import logging
import platform

### - Step 2: defining the directories path to load data and save results:

- **training_folder_path, test_folder_path**: paths to training and test data. The data are saved in numpy arrays data_1.npy, data_2.npy, etc. as tensors which contain both the images and the labels maps.


- **training_results_folder_path, test_results_folder_path**: paths to the parent directories containing the saved training and test results.


- **debug_folder_path**: path to debug directory to save the plots of the input images and labels just to check what it is going through the network.


- **weights_folder_path**: path to the directory to save the weights of the FCN at each epoch.


- **training_learning_curve_folder_path,test_learning_curve_folder_path**: paths to the directories containing the training and test learning curves.


In [12]:
training_folder_path = '../training_data-try/data/'
test_folder_path = '../test_data-try/data/'

training_results_folder_path = 'results_data-parallelization/training_results/'
debug_folder_path = training_results_folder_path + 'debug/'
weights_folder_path = training_results_folder_path + 'weights/'
training_learning_curve_folder_path = training_results_folder_path + 'train_learning_curve/'

test_results_folder_path = 'results_data-parallelization/test_results/'
test_learning_curve_folder_path = test_results_folder_path + 'test_learning_curve/'


if training_results_folder_path and not os.path.exists(training_results_folder_path):
    os.makedirs(training_results_folder_path)

if debug_folder_path and not os.path.exists(debug_folder_path):
    os.makedirs(debug_folder_path)

if weights_folder_path and not os.path.exists(weights_folder_path):
    os.makedirs(weights_folder_path)

if training_learning_curve_folder_path and not os.path.exists(training_learning_curve_folder_path):
    os.makedirs(training_learning_curve_folder_path)

if test_results_folder_path and not os.path.exists(test_results_folder_path):
    os.makedirs(test_results_folder_path)

if test_learning_curve_folder_path and not os.path.exists(test_learning_curve_folder_path):
    os.makedirs(test_learning_curve_folder_path)

### - Step 3: defining the computing techniques: Mirrored Strategy, Mixed Precision, Config Proto and XLA

 - **Mirrored Strategy**: implementation of data parallelization.
 
 - **Mixed Precision**: mixed precision should be activated (mp = True) only if the cod is run on an NVIDIS GPU with a computing capability at least of 7.0. In other case, mixed precison actually slows down the calculation. 
 
 - **Config Proto**: method to define server parameters for training. In particular:
 
 
   - **allow_soft_placement**: dynamic allocation of GPU memory.
   
   - **log_device_placement**: printing of device information.
   
   - **gpu_options.allow_growth**: allowing to allocate only the memory required by the process, instead of allocating the full memory of the device where the process runs.
   
   - **gpu_options.force_gpu_compatible**: force all tensors to be gpu_compatible. All CPU tensors will be allocated with Cuda pinned memory.
   
   - **graph_options.optimizer_options.global_jit_level**: XLA activation.
   
   

In [14]:
# Mirrored Strategy (1)
strategy = tf.distribute.MirroredStrategy()


# Mixed Precision (2)
mp = False

if mp:

    policy = mixed_precision.Policy('mixed_float16')

    mixed_precision.set_policy(policy)

# set gpus options
config_proto = tf.compat.v1.ConfigProto()

config_proto.allow_soft_placement = True

config_proto.log_device_placement = True

config_proto.gpu_options.allow_growth = True

config_proto.gpu_options.force_gpu_compatible = True

# XLA (3)
config_proto.graph_options.optimizer_options.global_jit_level = tf.compat.v1.OptimizerOptions.ON_1

# session definition
sess = tf.compat.v1.InteractiveSession(config = config_proto)

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device



In [None]:
num_chemical_elements = 5