# Image Classification Model Training with TensorFlow in Sagemaker

This tutorial shows you how to train an image classification model that can be deployed onto our FireFly-DL cameras. We use SageMaker as our training environment, which allows users to train a model on AWS’s cloud platform. In this tutorial we use the Flowers dataset as an example dataset to train a model that can classify five different types of flowers. However, you can choose to upload your own dataset to your S3 bucket and train your classification model on that dataset using this notebook as well.

By using SageMaker TensorFlow container we leverage several key functionalities.
1.	Allows us to use our custom script to specify the model architecture (*TF-Slim MobileNet_V1* in this case) that is compatible with the FireFly-DL cameras. In addition, we can leverage Transfer Learning by initializing our model using ImageNet weights.
2.	We can pass our training script as an argument to the *sagemaker.tensorflow.TensorFlow* method to create an estimator object, which we can call the .fit method on to start the training process.
3.	We can finally generate a model artifact (trained model) that you can download and convert directly using our NeuroUtility tool, which then can be deployed to our FireFly-DL cameras.


## Import Libraries

First, we import Sagemaker, TensorFlow and several other python libraries needed in this tuturial.

In [None]:
%matplotlib inline

# AWS SageMaker python SDK
import sagemaker
import tensorflow as tf

# Additionl libraries
import numpy as np
import random
import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import tarfile
import urllib
import boto3


print('Libraries imported')

## Setup
We will set up a few things before starting the workflow.
1.	Create your cloud storage bucket on S3 and assign the name to the variable bucket_name in the code block below.
2.	Get the execution role for accessing your AWS resources, such as, your S3 bucket and GPU instance.


In [None]:
bucket_name = 'firefly-flowers' # MUST PROVIDE BUCKET NAME

# check if bucket exists
s3 = boto3.client('s3')
response = s3.list_buckets()
buckets = [bucket['Name'] for bucket in response['Buckets']]

if bucket_name not in buckets:
    print(f' S3 bucket name "{bucket_name}" does not exists.')
else:
    print(f' S3 bucket name "{bucket_name}" found.')
    
sess = sagemaker.Session() # initiolize a sagemaker session
role = sagemaker.get_execution_role() # we are using the notebook instance role for training in this example


---

## Option 1: Flowers Dataset 

Here we provide an example image dataset of five different types of flowers. This section is optional, and if you have your own dataset you can skip ahead to Option 2.

### 1. Download and Extract Flower Dataset
As an example; we will use the Oxford Flowers dataset to train our model. This dataset can be downloaded from the following link http://download.tensorflow.org/example_images/flower_photos.tgz.
The flower images are annotated using the parent directory name, and are split between five classes/folders according to the flower type:
1. Daisy
2. Sunflowers
3. Roses
4. Tulips
5. Dandelion

The following code downloads the flower photos and extracts the content to the *'/flower_photos'* directory in your current Jupyter notebook instance.
current Jupyter notebook instance.

In [None]:
def download_and_extract(url, data_dir, download_dir):
    target_file = url.split('/')[-1]
    if target_file not in os.listdir(download_dir):
        print('Downloading', url)
        urllib.request.urlretrieve(url, os.path.join(download_dir, target_file))
        tf = tarfile.open(url.split('/')[-1])
        tf.extractall(data_dir)
    else:
        print('Already downloaded', url)

In [None]:
image_dir = 'flower_photos' # Path to image directory folder. This must point to parent directery of the class folders.

flowers_url = 'http://download.tensorflow.org/example_images/flower_photos.tgz'
download_and_extract(flowers_url, './', '.')

### 2. Visulization Flower Dataset

The code below loops over the downloaded image dataset and randomly display’s some the images from the dataset.

In [None]:
import glob
dirs = [f for f in os.listdir(image_dir) if '.txt' not in f]
print('list of class labels', dirs)


In [None]:
# create list of all images
file_list = list()
for root, dirs, files in os.walk(image_dir):
    for file in files:
        if file.endswith('.jpg'):
            #print(root, file)
            file_list.append((root,file))


fig=plt.figure(figsize=(8, 8))
columns = 2
rows = 2

for i in range(1, columns*rows +1):
    img_path =random.choice([os.path.join(root,file) for root, file in file_list])
    img = Image.open(img_path, 'r').convert('RGB')
    ax = fig.add_subplot(rows, columns, i)
    ax.title.set_text(img_path.split('/')[-2])
    plt.imshow(img)


### 3. Upload Training Images to Your S3 bucket

Next, we upload the training images to your S3 cloud storage bucket.


In [None]:
# Get a list of all bucket names from the response
s3 = boto3.client('s3')
response = s3.list_buckets()
buckets = [bucket['Name'] for bucket in response['Buckets']]

if bucket_name in buckets:
    print('Uploading data to S3')
    response = s3.list_objects_v2(
                Bucket=bucket_name,
                Prefix =image_dir,
                MaxKeys=10)
    # print(response)
    if 'Contents' not in list(response.keys()):
        s3_data_path = sess.upload_data(path=image_dir, bucket=bucket_name, key_prefix=image_dir)
    else:
        s3_data_path = f's3://{bucket_name}/{image_dir}' 
    print('Uploaded to', s3_data_path)
else:
    print(f' S3 bucket name "{bucket_name}" does not exists.')


You can skip Option 2 if you wish and go directly to *Train with TensorFlow Estimator* section if you want to train the model on the flowers dataset.

---

## Option 2: Prepare and Upload Your Own Images

### 1. Collect Your Own Data

First, you must collect and label some images that you would like to use to train the classification model on. 
1. Collect training images. 
    * The train.py script only supports the following image formats *'jpg', 'jpeg', 'png', and 'bmp'*.


2. Label the images into classes using the parent directory name.
    * Each image most be save into only one folder (representing the class)
    * The ground-truth label for each image is taken from the parent directory name.


### 2. Upload Training Images to Your S3 Bucket
 
Next, upload your training images directly to the S3 bucket.
1.	Create a folder (*image_dir*) inside your S3 bucket (*bucket_name*).
2.	All the class folder (e.g. class_1, class_2 ...), which contain the images must be uploaded under the *image_dir* folder.

**Important Note:**
Verify that the bucket (*bucket_name*) and image folder (*image_dir*) variable names match the S3 bucket and image folder names, where your images were uploaded to. The above diagram shows the expected S3 image folder and file structure.


In [None]:
image_dir = 'flower_photos' # MUST PROVIDE CORRECT IMAGE FOLDER NAME

s3_data_path = f's3://{bucket_name}/{image_dir}' 
print('s3 image path', s3_data_path)

3.	Run the next code block to check your S3 bucket folder structure is correct. If the folder structure is correct, the code output's a list of classes and the number of images per class.

In [None]:

# Get a list of all bucket names from the response
def check_s3_response(response, dic):
    if 'correct_image_format' not in dic.keys() and 'wrong_image_format' not in dic.keys():
        dic = {'correct_image_format':{}, 'wrong_image_format':list()}
        
    for key in response['Contents']:
#         print(key['Key'].split('/'))
        # Create file path list
        file_path_list = key['Key'].split('/')
        # check images
        if len(file_path_list) > 2:
            if file_path_list[-1].split('.')[1] in ['jpg', 'jpeg', 'png','bmp']:
                # check class exists and append image to list
                if file_path_list[-2] not in dic['correct_image_format'].keys():
                    dic['correct_image_format'][file_path_list[-2]] = list()
                dic['correct_image_format'][file_path_list[-2]].append(file_path_list[-1])
            else:
                dic['correct_image_format'].append('/'.join(file_path_list))
    return dic

print(f"Scanning S3 bucket '{s3_data_path}' for images \n")

# Get a list of all bucket names from the response
s3 = boto3.client('s3')
response = s3.list_buckets()
buckets = [bucket['Name'] for bucket in response['Buckets']]

if bucket_name in buckets:
    response = s3.list_objects_v2(
                Bucket=bucket_name,
                Prefix =image_dir,
                MaxKeys=1000)
    # print(response)
    if 'Contents' in list(response.keys()):
        dic = {}
        dic = check_s3_response(response, dic)    
        while(response['IsTruncated']):
            response = s3.list_objects_v2(
                    Bucket=bucket_name,
                    Prefix=image_dir,
                    ContinuationToken=response['NextContinuationToken'],
                    MaxKeys=1000)
        #         print(response)         
            dic = check_s3_response(response, dic)
        print(f"Class folders found in {image_dir} {list(dic['correct_image_format'].keys())}")
        print('Number of images found in each class')
        for class_dir in dic['correct_image_format'].keys():
            num_images = len(dic['correct_image_format'][class_dir])
            print(f'{class_dir}: {num_images}')
    else:
        s3_data_path = ''
        print(f"'{image_dir}' does not exists in '{bucket_name}' s3 bucket")
        
else:
    print(f' S3 bucket name "{bucket_name}" does not exists.')
    s3_data_path = ''

print('\n')
print(f'S3 image path set to {s3_data_path}')

## TODO:  Visulize random samples of the images

---

## Train with TensorFlow Estimator


### 1. Train.py Script
The training script used in this tutorial was adapted from *TensorFlow for Poets* classification example. We have modified it to handle the parameters passed in by SageMaker. 


### 2. Prepare for training job

1.	Import Sagemaker TensorFlow python libraries.
2.	Specify the model hyper-parameters.


In [None]:
from sagemaker.tensorflow import TensorFlow

hyperparameters = {
    # Model backbone architecture
    'architecture':'mobilenet_0.25_224',
    'feature_vector':'L-2',
    
    # Training parameters
    'epochs': 2000, 
    'learning_rate': 0.001,
    'testing_percentage':10,
    'validation_percentage':10,
    'train_batch_size':128,
    'test_batch_size':-1,
    'validation_batch_size':-1,
    'final_tensor_name':'final_result',
    
    # Image Augmentation 
    'flip_left_right':False,
    'flip_up_down':False,
    'random_rotate':10,
    'random_brightness':10,
    'random_scale':10,   
}


### 3. Create a training job using the TensorFlow estimator
The sagemaker.tensorflow.TensorFlow estimator method handles locating the script mode container, uploading your script to your S3 bucket location, and creating a Sagemaker training job. Below we describe some of the important input arguments for the estimator method.
* entry_point is set to 'train.py' which is our training script located in our current instance root directory.
* py_version is set to 'py3' to indicate that we are using script mode
* framework_version is set to '1.13' to indicate what Tensorflow version we are using.
* train_instance_type is set to 'ml.p2.xlarge' which is a type of GPU instance that we will use to train our model.



In [None]:
output_path = f's3://{bucket_name}/'
blocks_estimator = TensorFlow(
    entry_point='train.py',
    role=role,
    train_instance_type='ml.p2.xlarge',
    train_instance_count=1,
    hyperparameters=hyperparameters,
    framework_version='1.13',
    py_version='py3',
    output_path=output_path
)

print('The output trained model will save to the following s3 bucket', output_path)

### 4. Start the training job

We call estimator.fit(s3://...) to start our training job with the S3 location as input argument.

When training starts, the TensorFlow container executes the train.py script passing in the hyper-parameters and S3 path (*model_dir*) argument. The training script will print out the training and evaluation accuracies at each epoch. In addition, if you specify a test set percentage the script will produce a test accuracy at the end of the training.



In [None]:
print(f'Training images s3 path: {s3_data_path}')
blocks_estimator.fit(s3_data_path)

## Training Job Cost Estimate

For a typical training job the total cost is split mainly between:
1.	Data storage and access costs.
2.	Model training cost. For example an ml.p2.xlarge (GPU) instance type costs 1.26 USD per hour (majority of the cost).
3.	Running the current Notebook instance.
You can use the code block below to estimate the total cost. Copy the *Billable seconds* amount that is printed out at the end of the training process to the variable *Billable_time_in_seconds*.



In [None]:
Billable_time_in_seconds = 137 # Enter the bilable time in seconds
print(f'Training cost ${Billable_time_in_seconds * 1.26 / 3600}') # $1.26 per hour

## Deploy The TensorFlow Model




### Download the trained model artifact from your S3 bucket.


After training is complete Sagemaker automatically compresses and copies the trained model artifact (*model.tar.gz*) to your S3 bucket. Please note the following:
* The trained model artifact is saved under the following directory path *s3://bucket_name/tensorflow-training-... /output/* (e.g. s3://firefly-flowers/tensorflow-training-2020-07-03-20-50-48-055/output).
* Select the compressed file (*model.tar.gz*) in your S3 console and click the download button to download the file.
* Decompress the file using your preferred file decompression tool. Inside the model folder you should find the trained model protbuf file (.pb).

### Use NeuroUtility to freeze and convert your model


Please take note of the following model parameters when using NeuroUtility. These settings can be found under the *Tensorflow Network Configuration* tab.
* The Network Input Width and Height is dependent on the type of model architecture set before training. For our example, because we use mobilenet_0.25_224 model architecture, we set the Network Input Width and the Network Input Height to 224.
* Input Layer Name parameter is by default defined in the training script as *input*.
* Output Layer Name parameter is by default defined in the training script as *final_result*.

