# Developing Custom Image Classification Model

In this lab, you will developed a custom image classification model to automatically classify the type of land shown in aerial images of 224-meter x 224-meter plots. Land use classification models can be used to track urbanization, deforestation, loss of wetlands, and other major environmental trends using periodically collected aerial imagery. The images used in this lab are based off of imagery from the U.S. National Land Cover Database. U.S. National Land Cover Database defines six primary classes of land use: *Developed*, *Barren*, *Forested*, *Grassland*, *Shrub*, *Cultivated*. Example images from each land use class are shown here:

Developed | Cultivated | Barren
--------- | ------ | ----------
![Developed](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/developed1.png) | ![Cultivated](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/cultivated1.png) | ![Barren](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/barren1.png)

Forested | Grassland | Shrub
---------| ----------| -----
![Forested](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/forest1.png) | ![Grassland](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/grassland1.png) | ![Shrub](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/shrub1.png)

You will develop a custom image classifier using a Deep Learning technique called *Fine-tuning*. *Fine-tuning* is a flavor of Transfer learning, which is one of the fastest (code and run-time-wise) ways to start using deep learning. It allows for the reuse of knowledge gained while solving one problem to a different but related problem. For example, knowledge gained while learning to recognize landmarks and landscapes could apply when trying to recognize aerial land plots. Transfer Learning makes it feasible to train very effective ML models on relatively small training data sets.

In fine-tuning, you remove the last layer(s) (usually the FCNN layers) of the pre-trained network and replace it with the new untrained layers that match the given ML task. You than re-train the network using images from your custom domain. It is also a common practice to freeze the weights of the first few layers of the pre-trained network. This is because these layers capture universal features like curves and edges that are also relevant to the new problem. You want to keep those weights intact. Instead, you "force" the network to focus on learning dataset-specific features in the subsequent layers.



## Connect to AML Workspace

In [1]:
import azureml.core
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

Found the config file in: /data/home/demouser/notebooks/aml_config/config.json
jkamlworkshop
jkamlworkshop
southcentralus
952a710c-8d9c-40c1-9fec-f752138cc0b3


## Create Azure ML Managed Compute

To run the lab's scripts we will utilize Azure ML managed compute resources. Specifically, an autoscale cluster of *Standard_NC6* VMs (equipped with Tesla K80 GPU). 

In [9]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os


# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "gpu-cluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 1)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 2)

vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "Standard_NC6")
#vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "Standard_NC6s_v3")

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

     # For a more detailed view of current AmlCompute status, use the 'status' property    
    print(compute_target.status.serialize())

creating a new compute target...
Creating
Succeeded.........
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-01-13T21:28:25.469000+00:00', 'creationTime': '2019-01-13T21:27:11.053008+00:00', 'currentNodeCount': 1, 'errors': None, 'modifiedTime': '2019-01-13T21:27:39.253893+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 1, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 2, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 1, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


## Pre-train new top

In [None]:
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image

batchsize=64

train_data_dir = '/tmp/datasets/aerialsmall/train'
valid_data_dir = '/tmp/datasets/aerialsmall/valid'
classes = ["Barren", "Cultivated", "Developed", "Forest", "Herbaceous", "Shrub"]   

# A hack to mitigate a bug in Keras
def preprocess_input_new(x):
    img = vgg16.preprocess_input(image.img_to_array(x))
    return image.array_to_img(img)

train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
train_generator = train_datagen.flow_from_directory(
    directory=train_data_dir,
    target_size=(224, 224),
    classes=classes,
    batch_size=batchsize)

valid_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
valid_generator = train_datagen.flow_from_directory(
    directory=valid_data_dir,
    target_size=(224, 224),
    classes=classes,
    batch_size=batchsize)


In [None]:
# freeze all base model layers
for layer in base_model.layers:
    layer.trainable = False

# Use adadelta optimizer for pretraining the top layer
model.compile(loss='categorical_crossentropy',
              optimizer = 'adadelta',
              metrics=['accuracy'])
    
model.summary()

In [None]:
model.fit_generator(
    train_generator,
    steps_per_epoch=len(train_generator),
    epochs=20,
    validation_data=valid_generator,
    validation_steps=len(valid_generator))

In [None]:
# Make last convolutional layer trainable
for layer in base_model.layers[:14]:
    layer.trainable = False
    
for layer in base_model.layers[14:]:
    layer.trainable = True

# For fine tuning use SGD with a low learning rate
model.compile(loss='categorical_crossentropy',
              optimizer = optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
model.fit_generator(
    train_generator,
    steps_per_epoch=len(train_generator),
    epochs=50,
    validation_data=valid_generator,
    validation_steps=len(valid_generator))

## Training

### Pre-train the new top layers

#### Create training script

In [10]:
import os
script_folder = './script'
os.makedirs(script_folder, exist_ok=True)

In [47]:
%%writefile $script_folder/train.py

import os
import numpy as np
import random
import h5py

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, Flatten, Input
from tensorflow.keras.regularizers import l1_l2
from tensorflow.keras.applications import vgg16

from sklearn.model_selection import train_test_split

from azureml.core import Run


# Create custom callback to track accuracy measures in AML Experiment
class RunCallback(tf.keras.callbacks.Callback):
    def __init__(self, run):
        self.run = run
        
    def on_epoch_end(self, batch, logs={}):
        self.run.log(name="training_acc", value=float(logs.get('acc')))
        self.run.log(name="validation_acc", value=float(logs.get('val_acc')))


def custom_classifier(input_shape=(224,224,3), units=256, classes=6,  l1=0.01, l2=0.01, optimizer='adadelta'):
    # Create a base vgg16 model
    base_model = vgg16.VGG16(
        weights='imagenet',
        input_shape=input_shape,
        include_top=False,
        pooling='avg')
    # Add new top
    x = base_model.output
    x = Dense(units, activation='relu')(x)
    x = Dropout(0.5)(x)
    y = Dense(classes, activation='softmax', kernel_regularizer=l1_l2(l1=l1, l2=l2))(x)
    model = Model(inputs=base_model.inputs, outputs=y)
    
    return model, base_model
    
        

FLAGS = tf.app.flags.FLAGS

# Default global parameters
tf.app.flags.DEFINE_integer('batch_size', 32, "Number of images per batch")
tf.app.flags.DEFINE_integer('epochs', 10, "Number of epochs to train")
tf.app.flags.DEFINE_integer('units', 512, "Number of epochs to train")
tf.app.flags.DEFINE_string('data_folder', 'aerialsmall', "Folder with images")


def main(argv=None):
    
    
    print("Loading data from:", FLAGS.data_folder)
    # Create training and validation data generators
    train_data_dir = os.path.join(FLAGS.data_folder, 'train')
    valid_data_dir = os.path.join(FLAGS.data_folder, 'valid')
     
    # A hack to mitigate a bug in TF.Keras 1.12
    def preprocess_input_new(x):
        img = vgg16.preprocess_input(image.img_to_array(x))
        return image.array_to_img(img)
    
    batchsize=64
    classes = ["Barren", "Cultivated", "Developed", "Forest", "Herbaceous", "Shrub"]
    
    train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    train_generator = train_datagen.flow_from_directory(
        directory=train_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)

    valid_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    valid_generator = train_datagen.flow_from_directory(
        directory=valid_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)
    
    print(len(train_generator))
    print(len(valid_generator))
    
    
    # Create a custom model
    model, base_model = custom_classifier()
    
    # freeze all base model layers
    for layer in base_model.layers:
        layer.trainable = False

    # Use adadelta optimizer for pretraining the top layer
    model.compile(loss='categorical_crossentropy',
              optimizer = 'adadelta',
              metrics=['accuracy'])
    
    model.summary()
    
    
    # Configure callbacks to generate Tensorboard and AML logs
    
    # get hold of the current run
    run = Run.get_submitted_run()
    callbacks = [tf.keras.callbacks.TensorBoard(log_dir='./logs'),
                 RunCallback(run)]
    
    # Start training
    model.fit_generator(
        train_generator,
        steps_per_epoch=len(train_generator),
        epochs=FLAGS.epochs,
        callbacks=callbacks,
        validation_data=valid_generator,
        validation_steps=len(valid_generator))
    
    # Save the trained model to outputs which is a standard folder expected by AML
    print("Training completed.")
    os.makedirs('outputs', exist_ok=True)
    model_file = os.path.join('outputs', 'aerial_model_pretrain.h5')
    weights_file = os.path.join('outputs', 'aerial_model_weights_pretrain.h5')
    print("Saving model to: {0}".format(model_file))
    model.save(model_file)
    print("Saving model weights to: {0}".format(weights_file))
    model.save_weights(weights_file)
 

if __name__ == '__main__':
    tf.app.run()
    

Overwriting ./script/train.py


### Configure datastore

The training images have been uploaded to a public Azure blob storage container. We will register this container as an AML Datastore within our workspace. Before the data prep script runs, the datastore's content - training images - will be copied to the local storage on compute nodes.

After the script completes, its output - the bottleneck features file - will be uploaded by AML to the workspace's default datastore.

In [73]:
from azureml.core import Datastore

images_account = 'azureailabs'
images_container = 'aerialsmall'
datastore_name = 'training_images'

# Check if the datastore exists. If not create a new one
try:
    input_ds = Datastore.get(ws, datastore_name)
    print('Found existing datastore for input images:', input_ds.name)
except:
    input_ds = Datastore.register_azure_blob_container(workspace=ws, datastore_name=datastore_name,
                                            container_name=images_container,
                                            account_name=images_account)
    print('Creating new datastore for input images')

 
   
print(input_ds.name, input_ds.datastore_type, input_ds.account_name, input_ds.container_name)

output_ds = ws.get_default_datastore()
print("Using the default datastore for output: ")
print(output_ds.name, output_ds.datastore_type, output_ds.account_name, output_ds.container_name)


Creating new datastore for input images
training_images AzureBlob azureailabs aerialsmall
Using the default datastore for output: 
workspaceblobstore AzureBlob jkamlworstoragedirzndtd azureml-blobstore-5635fa3a-71d5-4d1b-bc80-b0847d6f842b


### Create AML Experiment
We will track runs of the feature extraction script in a dedicated Experiment.

In [13]:
from azureml.core import Experiment
experiment_name = 'aerial-finetune-pretrain'
exp = Experiment(workspace=ws, name=experiment_name)

### Run a pre-training on a single node of the cluster

Due to time limitations of the lab, we will run pre-training for 2 epochs only. 

In [31]:
from azureml.train.estimator import Estimator

script_params = {
    '--data_folder': input_ds.as_download(),
    '--epochs': 2
}

pip_packages = ['h5py', 
                'pillow', 
                'scikit-learn', 
                'tqdm', 
                'tensorflow-gpu==1.12']


est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                entry_script='train.py',
                node_count=1,
                process_count_per_node=1,
                use_gpu=True,
                pip_packages=pip_packages,
                )


In [48]:
tags = {"Run Type": "Top pre-train"}
run = exp.submit(est, tags=tags)
run

Experiment,Id,Type,Status,Details Page,Docs Page
aerial-finetune-pretrain,aerial-finetune-pretrain_1547419586666,azureml.scriptrun,Queued,Link to Azure Portal,Link to Documentation


In [49]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

### Monitor with Tensorboard

In [51]:
from azureml.contrib.tensorboard import Tensorboard
tb = Tensorboard([run])
tb.start()

http://svcusscds1:6006


'http://svcusscds1:6006'

In [52]:
tb.stop()

You can cancel the run with the `cancel` method.

In [17]:
#run.cancel()

In [None]:
#run.wait_for_completion(show_output=True)

### Retrieve weights

In [54]:
print(run.get_file_names())

['azureml-logs/60_control_log.txt', 'azureml-logs/80_driver_log.txt', 'logs/events.out.tfevents.1547419788.8903cdfb81314734996b299332e698e8000000', 'outputs/aerial_model_pretrain.h5', 'outputs/aerial_model_weights_pretrain.h5', 'driver_log', 'azureml-logs/azureml.log']


In [56]:
run.download_file('outputs//aerial_model_weights_pretrain.h5', '/tmp/models/aerial_model_weights_pretrain.h5')

In [58]:
%%sh

ls /tmp/models

aerial_model_weights_pretrain.h5


### Upload weights to the default datastore

In [59]:
# Upload the dataset to the DataStore

ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)
ds.upload(src_dir='/tmp/models', target_path='models', overwrite=True, show_progress=True)

AzureBlob jkamlworstoragedirzndtd azureml-blobstore-5635fa3a-71d5-4d1b-bc80-b0847d6f842b
Uploading /tmp/models/aerial_model_weights_pretrain.h5
Uploaded /tmp/models/aerial_model_weights_pretrain.h5, 1 files out of an estimated total of 1


$AZUREML_DATAREFERENCE_9bcbdacbd10a460faad8c691e870deb3

### Configure fine-tuning job

#### Create training script

In [None]:
import os
script_folder = './script'
os.makedirs(script_folder, exist_ok=True)

In [82]:
%%writefile $script_folder/fine-tune.py

import os
import numpy as np
import random
import h5py

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, Flatten, Input
from tensorflow.keras.regularizers import l1_l2
from tensorflow.keras.applications import vgg16
from tensorflow.keras import optimizers

from sklearn.model_selection import train_test_split

from azureml.core import Run


def custom_classifier(input_shape=(224,224,3), units=256, classes=6,  l1=0.01, l2=0.01, optimizer='adadelta'):
    # Create a base vgg16 model
    base_model = vgg16.VGG16(
        weights='imagenet',
        input_shape=input_shape,
        include_top=False,
        pooling='avg')
    # Add new top
    x = base_model.output
    x = Dense(units, activation='relu')(x)
    x = Dropout(0.5)(x)
    y = Dense(classes, activation='softmax', kernel_regularizer=l1_l2(l1=l1, l2=l2))(x)
    model = Model(inputs=base_model.inputs, outputs=y)
    
    return model, base_model
    
        

FLAGS = tf.app.flags.FLAGS

# Default global parameters
tf.app.flags.DEFINE_integer('batch_size', 32, "Number of images per batch")
tf.app.flags.DEFINE_integer('epochs', 10, "Number of epochs to train")
tf.app.flags.DEFINE_integer('units', 512, "Number of epochs to train")
tf.app.flags.DEFINE_string('data_folder', 'aerialsmall', "Folder with images")
tf.app.flags.DEFINE_string('weights_folder', 'models', "Folder with model weights")
tf.app.flags.DEFINE_string('weights_filename', 'aerial_model_weights_pretrain.h5', "Folder with model weights")


def main(argv=None):
    
    # get hold of the current run
    run = Run.get_submitted_run()
    
    print("Loading data from:", FLAGS.data_folder)
    # Create training and validation data generators
    train_data_dir = os.path.join(FLAGS.data_folder, 'train')
    valid_data_dir = os.path.join(FLAGS.data_folder, 'valid')
    
    # A hack to mitigate a bug in TF.Keras 1.12
    def preprocess_input_new(x):
        img = vgg16.preprocess_input(image.img_to_array(x))
        return image.array_to_img(img)
    
    batchsize=64
    classes = ["Barren", "Cultivated", "Developed", "Forest", "Herbaceous", "Shrub"]
    
    train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    train_generator = train_datagen.flow_from_directory(
        directory=train_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)

    valid_datagen = ImageDataGenerator(preprocessing_function=preprocess_input_new)
    valid_generator = train_datagen.flow_from_directory(
        directory=valid_data_dir,
        target_size=(224, 224),
        classes=classes,
        batch_size=batchsize)
    
    print(len(train_generator))
    print(len(valid_generator))
    
    
    # Create a custom model
    model, base_model = custom_classifier()
    
    weights_file = os.path.join(FLAGS.weights_folder, FLAGS.weights_filename)
    model.load_weights(weights_file)
    
    # Make last convolutional layer trainable
    for layer in base_model.layers[:14]:
        layer.trainable = False
    
    for layer in base_model.layers[14:]:
        layer.trainable = True

    # For fine tuning use SGD with a low learning rate
    model.compile(loss='categorical_crossentropy',
              optimizer = optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])
    
    model.summary()
    
    model.fit_generator(
        train_generator,
        steps_per_epoch=len(train_generator),
        epochs=10,
        validation_data=valid_generator,
        validation_steps=len(valid_generator))
    
    # Save the trained model to outputs which is a standard folder expected by AML
    print("Training completed.")
    os.makedirs('outputs', exist_ok=True)
    model_file = os.path.join('outputs', 'aerial_model_pretrain.hd5')
    weights_file = os.path.join('outputs', 'aerial_model_weights_pretrain.hd5')
    print("Saving model to: {0}".format(model_file))
    model.save(model_file, save_format='h5')
    print("Saving model weights to: {0}".format(weights_file))
    model.save_weights(weights_file, save_format='h5')
 

if __name__ == '__main__':
    tf.app.run()
    

Overwriting ./script/fine-tune.py


### Create AML Experiment
We will track runs of the feature extraction script in a dedicated Experiment.

In [None]:
from azureml.core import Experiment
experiment_name = 'aerial-finetune-train'
exp = Experiment(workspace=ws, name=experiment_name)

### Run a fine-tuning training on  the cluster

Due to time limitations of the lab, we will run pre-training for 2 epochs only. 

In [75]:
from azureml.train.estimator import Estimator

script_params = {
    '--data_folder': input_ds.as_download(),
    '--weights_folder': ds.path('models').as_download(),
    '--epochs': 2
}

pip_packages = ['h5py', 
                'pillow', 
                'scikit-learn', 
                'tqdm', 
                'tensorflow-gpu==1.12']


est = Estimator(source_directory=script_folder,
                script_params=script_params,
                compute_target=compute_target,
                entry_script='fine-tune.py',
                node_count=1,
                process_count_per_node=1,
                use_gpu=True,
                pip_packages=pip_packages,
                )


In [83]:
tags = {"Run Type": "Top pre-train"}
run = exp.submit(est, tags=tags)
run

Experiment,Id,Type,Status,Details Page,Docs Page
aerial-finetune-pretrain,aerial-finetune-pretrain_1547423391085,azureml.scriptrun,Queued,Link to Azure Portal,Link to Documentation


In [84]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [78]:
run.cancel()

In [None]:
%%sh

ls /tmp/

7212294

In [None]:
train_data_dir = '/tmp/datasets/aerialsmall/train'
valid_data_dir = '/tmp/datasets/aerialsmall/valid'

train_generator = ImageGenerator(train_data_dir, preprocess_fn=vgg16.preprocess_input)
valid_generator = ImageGenerator(valid_data_dir, preprocess_fn=vgg16.preprocess_input)