# Training Networks

## 0. Recap

From [dataset creation](../dataset_management/dataset_generation.ipynb) and [building neural network](../prototype/define_model.ipynb) sections we have built a network and a dataset to train. It is time to actually training the model.


## 1. Load and pre-process dataset

Load the dataset using the tools we have built so far. To speed-up the training process, usually we need to batch the training data, and possibly pre-fetch the data.

In [8]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

from pathlib import Path
import sys

_project_root = Path().cwd().parent
sys.path.append(str(_project_root))

from dataset_management.tools import load_dataset
from tensorflow.python.data.ops.dataset_ops import AUTOTUNE

dataset_dir = _project_root / 'dataset'
train_dir = dataset_dir / 'train'
valid_dir = dataset_dir / 'valid'

trainset = load_dataset(train_dir)
validset = load_dataset(valid_dir)

batch_size = 32

trainset = trainset.batch(batch_size).prefetch(AUTOTUNE)
validset = validset.batch(batch_size).prefetch(AUTOTUNE)

for (history, y, future) in trainset:
    print(f"history shape: {history.shape}")
    print(f"label shape: {y.shape}")
    print(f"future shape: {future.shape}")
    break

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[INFO] 42300 files founded.
[INFO] 4701 files founded.
history shape: (32, 144, 5)
label shape: (32,)
future shape: (32, 12, 5)


## 2. Create model

Create a model for training.

In [7]:
from prototype.fcn import build_fcn


fcn = build_fcn(input_size=144)
fcn.summary()

Model: "FCN"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 144, 5)]          0         
_________________________________________________________________
conv1_1 (ConvBn)             (None, 144, 8)            312       
_________________________________________________________________
conv1_2 (ConvBn)             (None, 144, 16)           448       
_________________________________________________________________
pool1_1 (MaxPooling1D)       (None, 72, 16)            0         
_________________________________________________________________
conv2_1 (ConvBn)             (None, 72, 16)            832       
_________________________________________________________________
conv2_2 (ConvBn)             (None, 72, 32)            1664      
_________________________________________________________________
conv3_1 (Inception)          (None, 36, 64)            41888   

## 3. Monitor the training

During training, we often would like to see the various training outcomes and metrics as it goes to identify problems as quickly as possible. To make this happen, we would make use of the TensorBoard tool. 

In [11]:
import shutil
import tensorflow as tf
from datetime import datetime

from settings import PROJECT_ROOT


class Monitor:
    def __init__(self, caption):
        log_root = PROJECT_ROOT / 'training' / 'logs'
        fullpath = log_root / caption
        try:
            shutil.rmtree(str(fullpath))
        except FileNotFoundError:
            pass
        fullpath.mkdir(exist_ok=True, parents=True)
        self.logdir = fullpath
        self.caption = caption
        train_path = fullpath / 'train'
        valid_path = fullpath / 'valid'
        self.train_writer = tf.summary.create_file_writer(str(train_path))
        self.valid_writer = tf.summary.create_file_writer(str(valid_path))

    def scalar(self, tag, value, step):
        if tag.startswith('train_'):
            writer = self.train_writer
            tag = tag[len('train_'):]
        else:
            writer = self.valid_writer
            if tag.startswith('valid_'):
                tag = tag[len('valid_'):]
        with writer.as_default():
            tf.summary.scalar(tag, data=value, step=step)

    def write_reports(self, results, step, prefix=None):
        tags = results
        if prefix is not None:
            tags = { f"{prefix}{k}": v for k, v in results.items() }
        for key, val in tags.items():
            self.scalar(key, val, step)

    def graph(self, model):
        from tensorflow.python.ops import summary_ops_v2
        from tensorflow.python.keras import backend as K

        with self.train_writer.as_default():
            with summary_ops_v2.always_record_summaries():
                if not model.run_eagerly:
                    summary_ops_v2.graph(K.get_graph(), step=0)


experiment_name = f"FCN@{datetime.now().strftime('%-y%m%d-%H:%M:%S')}"
monitor = Monitor(experiment_name)

## 4. Set up optimizer

Next up, create the model optimizer with one of the learning algorithms (SGD, Adam, etc.)

In [12]:
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)

## 5. Train

Before actually getting into training, there's one more step which is to define how the network processes each batch of the data, including transforming the label data (integer) into one-hot vectors; balancing the losses because the data distribution is skewed; and updating the weights using the optimizer, etc.

In [None]:
def train_on_batch(model, batch_data, optimizer):
    history, label, _ = batch_data
    y_true = tf.one_hot(label + 1, depth=3)

In [None]:
epochs = 20
global_step = 0
for e in range(epochs):
    print(f"epoch #{e + 1}/{epochs} @{datetime.now()}:")
    for local_step, batch_data in enumerate(trainset):
        global_step += 1
        train_report = train_on_batch(fcn, batch_data, opt)