# Developing Custom Image Classification Model

In this lab, you will evolve the custom image classification model developed in Lab1:

Developed | Cultivated | Barren
--------- | ------ | ----------
![Developed](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/developed1.png) | ![Cultivated](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/cultivated1.png) | ![Barren](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/barren1.png)

Forested | Grassland | Shrub
---------| ----------| -----
![Forested](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/forest1.png) | ![Grassland](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/grassland1.png) | ![Shrub](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/shrub1.png)

You will utilize a custom image classifier using a Deep Learning technique called *Fine-tuning*. *Fine-tuning* is a flavor of Transfer learning (demonstrated in Lab 1).

In fine-tuning, you remove the last layer(s) (usually the FCNN layers) of the pre-trained network and replace it with the new untrained layers that match the given ML task. You than re-train the full-network (pre-trained "trunk" and new top) using images from your custom domain. It is also a common practice to freeze the weights of the first few layers of the pre-trained network. This is because these layers capture universal features like curves and edges that are also relevant to the new problem. You want to keep those weights intact. Instead, you "force" the network to focus on learning dataset-specific features in the subsequent layers.

Since *fine-tuning* is much more computationally intensive than transfer learning approach used in Lab 1, we will train the network using the Horovod distributed training algorithm. Azure Machine Learning provides built-in support for Horovod that simpliefies cluster configuration and job scheduling.

![Fine-tuning](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/fine-tune.png)

## Download the lab dataset

In [None]:
%%sh

azcopy --source https://azureailabs.blob.core.windows.net/aerialsmall --destination /tmp/datasets/aerialsmall --recursive

In [None]:
%%sh

ls -l /tmp/datasets/aerialsmall

## Connect to AML Workspace

In [None]:
# Check core SDK version number
import azureml.core
print("SDK version:", azureml.core.VERSION)

In [None]:
import azureml.core
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

## Upload the dataset to the default Datastore

We will upload the dataset to the default Datastore to make it available to all nodes in the cluster.

In [None]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

ds.upload(src_dir='/tmp/datasets/aerialsmall', target_path='aerialsmall', overwrite=True, show_progress=True)

## Create Azure ML Managed Compute

To run the lab's scripts we will utilize Azure ML managed compute resources. Specifically, an autoscale cluster of *Standard_NC6* VMs (equipped with Tesla K80 GPU). 

In [None]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os


# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "gpu-cluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 1)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "Standard_NC6")

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

     # For a more detailed view of current AmlCompute status, use the 'status' property    
    print(compute_target.status.serialize())

## Training

### Pre-training

Before you can retrain the weights in the base network it is recommend to pre-train the new top with all weights in the base network frozen. We will run pretraining for a few epochs on a single node of the cluster.

#### Create training script

In [None]:
import os
script_folder = './script'
os.makedirs(script_folder, exist_ok=True)

In [None]:
%%writefile $script_folder/pre-train.py

import os
import numpy as np
import random
import h5py

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, Flatten, Input
from tensorflow.keras.regularizers import l1_l2
from tensorflow.keras.applications import vgg16


from azureml.core import Run


# Create custom callback to track accuracy measures in AML Experiment
class RunCallback(tf.keras.callbacks.Callback):
    def __init__(self, run):
        self.run = run
        
    def on_epoch_end(self, batch, logs={}):
        self.run.log(name="training_acc", value=float(logs.get('acc')))
        self.run.log(name="validation_acc", value=float(logs.get('val_acc')))


def custom_classifier(input_shape=(224,224,3), units=256, classes=6,  l1=0.01, l2=0.01, optimizer='adadelta'):
    # Create a base vgg16 model
    base_model = vgg16.VGG16(
        weights='imagenet',
        input_shape=input_shape,
        include_top=False,
        pooling='avg')
    # Add new top
    x = base_model.output
    x = Dense(units, activation='relu')(x)
    x = Dropout(0.5)(x)
    y = Dense(classes, activation='softmax', kernel_regularizer=l1_l2(l1=l1, l2=l2))(x)
    model = Model(inputs=base_model.inputs, outputs=y)
    
    return model, base_model
       

def main(argv=None):
    
    
    print("Loading data from:", FLAGS.data_folder)
    # Create training and validation data generators
    train_data_dir = os.path.join(FLAGS.data_folder, 'train')
    valid_data_dir = os.path.join(FLAGS.data_folder, 'valid')
     
    # A hack to mitigate a bug in TF.Keras 1.12
    def preprocess_input_new(x):
        img = vgg16.preprocess_input(image.img_to_array(x))
        return image.array_to_img(img)
    
    batchsize=64
    classes = ["Barren", "Cultivated", "Developed", "Forest", "Herbaceous", "Shrub"]
    
    train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    train_generator = train_datagen.flow_from_directory(
        directory=train_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)

    valid_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    valid_generator = train_datagen.flow_from_directory(
        directory=valid_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)
    
    print(len(train_generator))
    print(len(valid_generator))
    
    
    # Create a custom model
    model, base_model = custom_classifier()
    
    # freeze all base model layers
    for layer in base_model.layers:
        layer.trainable = False

    # Use adadelta optimizer for pretraining the top layer
    model.compile(loss='categorical_crossentropy',
              optimizer = 'adadelta',
              metrics=['accuracy'])
    
    model.summary()
    
    
    # Configure callbacks to generate Tensorboard and AML logs
    run = Run.get_submitted_run()
    callbacks = [tf.keras.callbacks.TensorBoard(log_dir='./logs'),
                 RunCallback(run)]
    
    # Start training
    model.fit_generator(
        train_generator,
        steps_per_epoch=len(train_generator),
        epochs=FLAGS.epochs,
        callbacks=callbacks,
        validation_data=valid_generator,
        validation_steps=len(valid_generator))
    
    # Save the trained model to outputs which is a standard folder expected by AML
    print("Training completed.")
    os.makedirs('outputs', exist_ok=True)
    model_file = os.path.join('outputs', 'aerial_model_pretrain.h5')
    weights_file = os.path.join('outputs', 'aerial_model_weights_pretrain.h5')
    print("Saving model to: {0}".format(model_file))
    model.save(model_file)
    print("Saving model weights to: {0}".format(weights_file))
    model.save_weights(weights_file)
 

# Default global parameters
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 32, "Number of images per batch")
tf.app.flags.DEFINE_integer('epochs', 10, "Number of epochs to train")
tf.app.flags.DEFINE_integer('units', 512, "Number of epochs to train")
tf.app.flags.DEFINE_string('data_folder', 'aerialsmall', "Folder with images")

if __name__ == '__main__':
    tf.app.run()
    

#### Create AML Experiment
We will track pre-traning in a dedicated Experiment.

In [None]:
from azureml.core import Experiment
experiment_name = 'aerial-finetune-pretrain'
exp = Experiment(workspace=ws, name=experiment_name)

#### Run a pre-training on a single node of the cluster

We will use *TensorFlow* estimator. *TensorFlow* estimator automatically configures the runtime image with all required pre-requisites - TensorFlow, CUDA, etc.

Due to time limitations of the lab, we will run pre-training for 3 epochs only. Note that in a real production scenario you would want to run pre-training for more epochs.

In [None]:
from azureml.train.dnn import TensorFlow

ds = ws.get_default_datastore()

script_params = {
    '--data_folder': ds.path('aerialsmall').as_mount(),
    '--epochs': 3
}

pip_packages = ['h5py', 'pillow', 'scipy']

est = TensorFlow(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                entry_script='pre-train.py',
                use_gpu=True,
                pip_packages=pip_packages
                )


In [None]:
tags = {"Run Type": "Top pre-train"}
run = exp.submit(est, tags=tags)
run

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

#### Monitor the run with Tensorboard

Azure Machine Learning has a built-in support for **Tensorboard**.

In [None]:
from azureml.contrib.tensorboard import Tensorboard
tb = Tensorboard([run])
tb.start()

To connect to Tensorboard navigate to `http://<external IP of your DSVM>:6006`. If you have issues connecting, double check that you opened port 6006 as described in the lab set up instructions.

In [None]:
tb.stop()

You can cancel the run with the `cancel` method.

In [None]:
# run.cancel()

Or block till the run completes.

In [None]:
#run.wait_for_completion(show_output=True)

#### Retrieve weights

The training scripts saves the weights of the pre-trained model into the `outputs` folder. This folder is automatically copied to the *Experiment* after the run completes.

In [None]:
print(run.get_file_names())

In [None]:
run.download_file('outputs/aerial_model_weights_pretrain.h5', '/tmp/models/aerial_model_weights_pretrain.h5')

In [None]:
%%sh

ls /tmp/models

#### Upload the weights to the default datastore

The fine tuning script (below) will download the pre-trained weights from the default datastore.

In [None]:
# Upload the dataset to the DataStore

ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)
ds.upload(src_dir='/tmp/models', target_path='models', overwrite=True, show_progress=True)

### Fine-tuning

We are now ready to run distributed training on AML Compute cluster.

#### Create training script

The training script uses Horovod API to coordinate distributed training. In addition to the top layers, we will also re-train the last convolutional layer in the base VGG16 network.

In [None]:
import os
script_folder = './script'
os.makedirs(script_folder, exist_ok=True)

In [None]:
%%writefile $script_folder/fine-tune.py

import os
import numpy as np
import random
import h5py

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, Flatten, Input
from tensorflow.keras.regularizers import l1_l2
from tensorflow.keras.applications import vgg16
from tensorflow.keras import optimizers

import horovod.tensorflow.keras as hvd

from azureml.core import Run

# Create custom callback to track accuracy measures in AML Experiment
class RunCallback(tf.keras.callbacks.Callback):
    def __init__(self, run):
        self.run = run
        
    def on_epoch_end(self, batch, logs={}):
        self.run.log(name="training_acc", value=float(logs.get('acc')))
        self.run.log(name="validation_acc", value=float(logs.get('val_acc')))


        
def custom_classifier(input_shape=(224,224,3), units=256, classes=6,  l1=0.01, l2=0.01, optimizer='adadelta'):
    # Create a base vgg16 model
    base_model = vgg16.VGG16(
        weights='imagenet',
        input_shape=input_shape,
        include_top=False,
        pooling='avg')
    # Add new top
    x = base_model.output
    x = Dense(units, activation='relu')(x)
    x = Dropout(0.5)(x)
    y = Dense(classes, activation='softmax', kernel_regularizer=l1_l2(l1=l1, l2=l2))(x)
    model = Model(inputs=base_model.inputs, outputs=y)
    
    return model, base_model
    

def main(argv=None):
    
    
    # Initialize Horovod
    hvd.init()
    
    # Horovod: pin GPU to be used to process local rank (one GPU per process)
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    config.gpu_options.visible_device_list = str(hvd.local_rank())
    tf.keras.backend.set_session(tf.Session(config=config))

    print("Initialized Horovod")
    
    print("Loading data from:", FLAGS.data_folder)
    # Create training and validation data generators
    train_data_dir = os.path.join(FLAGS.data_folder, 'train')
    valid_data_dir = os.path.join(FLAGS.data_folder, 'valid')
    
    # A hack to mitigate a bug in TF.Keras 1.12
    def preprocess_input_new(x):
        img = vgg16.preprocess_input(image.img_to_array(x))
        return image.array_to_img(img)
    
    # Configure training and validation data generators
    batchsize=64
    classes = ["Barren", "Cultivated", "Developed", "Forest", "Herbaceous", "Shrub"]
    
    train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    train_generator = train_datagen.flow_from_directory(
        directory=train_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)

    valid_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    valid_generator = train_datagen.flow_from_directory(
        directory=valid_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)
    
    print(len(train_generator))
    print(len(valid_generator))
    
    
    # Create a custom model
    model, base_model = custom_classifier()
    
    # Load the weights pretrained in the previous step on the first worker, 
    # which will broadcast them to other workers.
    if hvd.rank() == 0:
        weights_file = os.path.join(FLAGS.weights_folder, FLAGS.weights_filename)
        model.load_weights(weights_file)
        print("------------------------------------")
        print("Loaded pre-trained weights on Rank 0")
    
    # Make last convolutional layer trainable
    for layer in base_model.layers[:14]:
        layer.trainable = False
    
    for layer in base_model.layers[14:]:
        layer.trainable = True

    # Wrap an optimizer in Horovod
    # For fine tuning use SGD with a low learning rate
    optimizer = hvd.DistributedOptimizer(optimizers.SGD(lr=1e-4, momentum=0.9))
    
    model.compile(loss='categorical_crossentropy',
              optimizer = optimizer,
              metrics=['accuracy'])
    
    # Configure callbacks
    callbacks = [
        # Horovod: broadcast initial variable states from rank 0 to all other processes.
        # This is necessary to ensure consistent initialization of all workers when
        # training is started with loaded weights.
        hvd.callbacks.BroadcastGlobalVariablesCallback(0),

        # Horovod: average metrics among workers at the end of every epoch.
        #
        # Note: This callback must be in the list before the ReduceLROnPlateau,
        # TensorBoard, or other metrics-based callbacks.
        hvd.callbacks.MetricAverageCallback()

    ]
            
    # Horovod: save checkpoints only on worker 0 to prevent other workers from corrupting them.
    # Configure Tensorboard and Azure ML Tracking
    if hvd.rank() == 0:
        callbacks.append(tf.keras.callbacks.ModelCheckpoint('./checkpoint-{epoch}.h5'))
        callbacks.append(tf.keras.callbacks.TensorBoard(log_dir='./logs'))
        run = Run.get_submitted_run()
        callbacks.append(RunCallback(run))
   
    model.summary()
    
    # Start trining
    model.fit_generator(
        train_generator,
        steps_per_epoch=len(train_generator)//hvd.size(),
        epochs=FLAGS.epochs,
        validation_data=valid_generator,
        validation_steps=3*len(valid_generator)//hvd.size(),
        callbacks=callbacks)
    
    # Save the trained model to outputs folder on the first worker
    if hvd.rank() == 0:  
        print("Training completed.")
        os.makedirs('outputs', exist_ok=True)
        model_file = os.path.join('outputs', 'aerial_model_fine_tune.h5')
        model.save(model_file)


# Default global parameters
FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_integer('batch_size', 32, "Number of images per batch")
tf.app.flags.DEFINE_integer('epochs', 10, "Number of epochs to train")
tf.app.flags.DEFINE_integer('units', 512, "Number of epochs to train")
tf.app.flags.DEFINE_string('data_folder', 'aerialsmall', "Folder with images")
tf.app.flags.DEFINE_string('weights_folder', 'models', "Folder with model weights")
tf.app.flags.DEFINE_string('weights_filename', 'aerial_model_weights_pretrain.h5', "Folder with model weights")


if __name__ == '__main__':
 
    tf.app.run()
    

#### Create AML Experiment
We will track the Horovod runs in a dedicated experiment.

In [None]:
from azureml.core import Experiment
experiment_name = 'aerial-finetune-train'
exp = Experiment(workspace=ws, name=experiment_name)

#### Run a fine-tuning training on  the cluster

*TensorFlow* estimator encapsulates idiosyncrasies of Horovod cluster configuration and job scheduling.

In [None]:
from azureml.train.estimator import Estimator

from azureml.train.dnn import TensorFlow

ds = ws.get_default_datastore()

script_params = {
    '--data_folder': ds.path('aerialsmall').as_mount(),
    '--weights_folder': ds.path('models').as_download(),
    '--epochs': 7
}

# We need to install the up to date version of Horovod
# since the version in a standard AML GPU image does not support
# horovod.tensorflow.keras
pip_packages = ['h5py', 'pillow', 'scipy', 'horovod']

est = TensorFlow(source_directory=script_folder,
                      compute_target=compute_target,
                      script_params=script_params,
                      entry_script='fine-tune.py',
                      node_count=3,
                      process_count_per_node=1,
                      distributed_backend='mpi',
                      use_gpu=True,
                      pip_packages=pip_packages)

In [None]:
tags = {"Run Type": "Top pre-train"}
run = exp.submit(est, tags=tags)
run

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()