## Prequisites and Preprocessing

### Permissions and environment variables

Here we set up the linkage and authentication to AWS services. There are three parts to this:

* The roles used to give learning and hosting access to your data. This will automatically be obtained from the role used to start the notebook
* The S3 bucket that you want to use for training and model data
* The Amazon sagemaker image classification docker image which need not be changed

In [None]:
%%time
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = 'ic-transfer-learning'

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri

training_image = get_image_uri(sess.boto_region_name, 'image-classification', repo_version="latest")

## Preparing data for our model

### Configure where to fetch our training data:

In [None]:
data_bucket_name='s3webcamuploader83a65c76f8384092b63d212639122190' # An S3 Bucket Name
dataset_name = 'objects-home-a' # A path inside the S3 bucket containing sub-folders of images (one per label class)

Set up some environment variables

In [None]:
base_dir='/tmp'

%env BASE_DIR=$base_dir
%env S3_DATA_BUCKET_NAME = $data_bucket_name
%env DATASET_NAME = $dataset_name

# Find im2rec in our environment
import sys,os

suffix='/mxnet/tools/im2rec.py'
im2rec = list(filter( (lambda x: os.path.isfile(x + suffix )), sys.path))[0] + suffix
%env IM2REC=$im2rec

Pull down our data from S3

In [None]:
%%bash
# Pull our images from S3
set -x
aws s3 sync s3://$S3_DATA_BUCKET_NAME/public/$DATASET_NAME $BASE_DIR/$DATASET_NAME --quiet

In order to train our image classifier, Sagemaker wants our data in either LST file format or RecordIO format.

We can use the `im2rec.py` tool to create either of these types. Here, we'll use `im2rec.py` to split our data 
into `test` and `train` groups, creating LST files for each of these groups and outputting a file that contains 
a mapping of each class label index to its class.

Then we'll convert the LST files into RecordIO files using the same `im2rec.py` tool.

In [None]:
%%bash
set -x
# Use the IM2REC script to convert our images into RecordIO files

cd $BASE_DIR

# First we need to create two LST files (training and test lists), noting the correct label class for each image
# We'll also save the output of the LST files command, since it includes a list of all of our label classes
echo "Creating LST files"
python $IM2REC --list --recursive --pass-through --test-ratio=0.3 --train-ratio=0.7 $DATASET_NAME $DATASET_NAME > ${DATASET_NAME}_classes

echo "Label classes:"
cat ${DATASET_NAME}_classes

# Then we create RecordIO files from the LST files
echo "Creating RecordIO files"
python $IM2REC --num-thread=4 ${DATASET_NAME}_train.lst $DATASET_NAME
python $IM2REC --num-thread=4 ${DATASET_NAME}_test.lst $DATASET_NAME
ls -lh *.rec

Finally, we'll upload our RecordIO files (they end in `.rec`) to an S3 bucket we'll use for training.

In [None]:
# Upload our train and test RecordIO files to S3 in the bucket that our sagemaker session is using

s3train_path = 's3://{}/{}/train/'.format(bucket, prefix)
s3validation_path = 's3://{}/{}/validation/'.format(bucket, prefix)

# Clean up any existing data
!aws s3 rm s3://{bucket}/{prefix}/train --recursive
!aws s3 rm s3://{bucket}/{prefix}/validation --recursive

# Upload the rec files to the train and validation channels
!aws s3 cp /tmp/{dataset_name}_train.rec $s3train_path
!aws s3 cp /tmp/{dataset_name}_test.rec $s3validation_path



### Configuring the data for our model training to use
Set the data type and channels used for training

In [None]:
train_data = sagemaker.session.s3_input(
    s3train_path, 
    distribution='FullyReplicated', 
    content_type='application/x-recordio', 
    s3_data_type='S3Prefix'
)

validation_data = sagemaker.session.s3_input(
    s3validation_path, 
    distribution='FullyReplicated', 
    content_type='application/x-recordio', 
    s3_data_type='S3Prefix'
)

data_channels = {'train': train_data, 'validation': validation_data}

Once we have the data available in the correct format for training, the next step is to actually train the model using the data. Before training the model, we need to setup the training parameters. The next section will explain the parameters in detail.

## Training
Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job.
### Training parameters
There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include:

* **Training instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings. 
* **Training instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training 
* **Output path**: This the s3 folder in which the training output is stored

In [None]:
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
ic = sagemaker.estimator.Estimator(
    training_image,
    role, 
    train_instance_count=1, 
    train_instance_type='ml.p3.2xlarge',
    train_volume_size = 10,
    train_max_run = 360000,
    input_mode= 'File',
    output_path=s3_output_location,
    sagemaker_session=sess
)

Apart from the above set of parameters, there are hyperparameters that are specific to the algorithm. These are:

* **num_layers**: The number of layers (depth) for the network. We use 18 in this samples but other values such as 50, 152 can be used.
* **use_pretrained_model**: Set to 1 to use pretrained model for transfer learning.
* **image_shape**: The input image dimensions,'num_channels, height, width', for the network. It should be no larger than the actual image size. The number of channels should be same as the actual image.
* **num_classes**: This is the number of output classes for the new dataset. Imagenet was trained with 1000 output classes but the number of output classes can be changed for fine-tuning. For caltech, we use 257 because it has 256 object categories + 1 clutter class.
* **num_training_samples**: This is the total number of training samples. It is set to 15240 for caltech dataset with the current split.
* **mini_batch_size**: The number of training samples used for each mini batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run.
* **epochs**: Number of training epochs.
* **learning_rate**: Learning rate for training.
* **precision_dtype**: Training datatype precision (default: float32). If set to 'float16', the training will be done in mixed_precision mode and will be faster than float32 mode


In [None]:
mod = None

num_classes=! ls -l {base_dir}/{dataset_name} | wc -l
num_classes=int(num_classes[0]) - 1

num_training_samples=! cat {base_dir}/{dataset_name}_train.lst | wc -l
num_training_samples = int(num_training_samples[0])

# Learn more about the Sagemaker built-in Image Classifier hyperparameters here: https://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html

# These hyperparameters we won't want to change, as they define things like
# the size of the images we'll be sending for input, the number of training classes we have, etc.
base_hyperparameters=dict(
    use_pretrained_model=1,
    image_shape='3,224,224',
    num_classes=num_classes,
    num_training_samples=num_training_samples,
    precision_dtype='float32',
)

# These are hyperparameters we may want to tune, as they can affect the model training success:
hyperparameters={
    **base_hyperparameters, 
    **dict(
        epochs=30,
        learning_rate=0.001,
        mini_batch_size=5,
    )
}


ic.set_hyperparameters(**hyperparameters)

hyperparameters

### Start the training
Start training by calling the fit method in the estimator. This will take some time because it's provisioning a new container runtime to train our model, then the actual training happens, then the trained model gets uploaded to S3 and the container is shut down.

In [None]:
ic.fit(inputs=data_channels, logs=True)

job = ic.latest_training_job
model_path = f"{base_dir}/{job.name}"

print(f"\n\n Finished training! The model is available for download at: {ic.output_path}/{job.name}/output/model.tar.gz")

### Visualizing the training and validation progress

If we want, we can parse CloudWatch logs to create a chart of the training progress.

In [None]:
# Graphing Training / Validation progress

import boto3
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np

client = boto3.client('logs')

lgn='/aws/sagemaker/TrainingJobs'
job = ic.latest_training_job

# Update this with the actual name in CloudWatch logs
lsn=job.job_name+'/'+'algo-1-1537929893'
log=client.get_log_events(logGroupName=lgn, logStreamName=lsn)

trn_accs=[]
val_accs=[]
for e in log['events']:
  msg=e['message']
  if 'Validation-accuracy' in msg:
        val = msg.split("=")
        val = val[1]
        val_accs.append(float(val))
  if 'Train-accuracy' in msg:
        trn = msg.split("=")
        trn = trn[1]
        trn_accs.append(float(trn))

print("Maximum validation accuracy: %f " % max(val_accs))
plt.clf()
fig, ax = plt.subplots()
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
trn_plot, = ax.plot(range(hyperparameters["epochs"]), trn_accs, label="Training accuracy")
val_plot, = ax.plot(range(hyperparameters["epochs"]), val_accs, label="Validation accuracy")
plt.legend(handles=[trn_plot,val_plot])
ax.yaxis.set_ticks(np.arange(0.4, 1.05, 0.05))
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter('%0.2f'))
plt.show()

## Running a model locally
Before deploying our model to a new endpoint, it can be helpful to download the trained model and run it locally on this notebook host. Deploying the model takes some time.

In [None]:
job = ic.latest_training_job
model_path = f"{base_dir}/{job.name}"

# Download the trained model from S3
! mkdir -p {model_path}
! aws s3 cp {ic.output_path}/{job.name}/output/model.tar.gz {model_path}/ --quiet
! cd {model_path} && tar -xzf model.tar.gz && rm model.tar.gz
! echo "Model extracted to {model_path}"

In [None]:
import mxnet as mx

# MXNet wants the filename prefix for the model, which we can infer from prefix of the symbol.json file's name
symbol_filename = ! ls {model_path}/*-symbol.json
symbol_filename = symbol_filename[0].split('/')[-1]
job_name_prefix = symbol_filename.replace('-symbol.json', '')
model_files_prefix = f"{model_path}/{symbol_filename.replace('-symbol.json', '')}"

# Which snapshot to use?
epoch_snapshot_number=30

# Run our model on the CPU
ctx = mx.cpu()

# Initialize our model
sym, arg_params, aux_params = mx.model.load_checkpoint(model_files_prefix, epoch_snapshot_number)
mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
mod.bind(
    for_training=False, 
    data_shapes=[('data', (1,3,224,224))], 
    label_shapes=mod._label_shapes
)
mod.set_params(arg_params, aux_params, allow_missing=True)



# Load classes from file (the captured output of the lst file creation)
classes_lines = open('/tmp/objects-home-a_classes','r').read().splitlines()
classes=[parts.split(' ')[0] for parts in classes_lines]



# Imports for working with image data
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

# Define a simple data batch (used for wrapping up an image to submit to the model)
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])

def get_image(fname, show=False):
    img = mx.image.imread(fname)
    if img is None:
        return None
    if show:
        plt.imshow(img.asnumpy())
        plt.axis('off')
    # convert into format (batch, RGB, width, height)
    img = mx.image.imresize(img, 224, 224) # resize
    img = img.transpose((2, 0, 1)) # Channel first
    img = img.expand_dims(axis=0) # batchify
    return img

def classify_local(img_path, classes):
    img = get_image(img_path, show=False)
    # compute the predict probabilities
    mod.forward(Batch([img]))
    prob = mod.get_outputs()[0].asnumpy()
    # get highest prob index
    best_prob_index = np.argmax(prob)
    return(classes[best_prob_index], np.squeeze(prob)[best_prob_index])


In [None]:
# Clean up our local model to free RAM
mod = None

### Helper function to classify all images in a directory (with one sub-dir per class to test)

In [None]:
def classify_all(imgs_dir, classifier_func, classes, expected_class_from_filename_func):
    results = []

    for root, dirs, files in os.walk(imgs_dir):
        for file in files:
            if file.endswith(".jpg"):
                full_path = os.path.join(root, file)
                (predicted_class, probability) = classifier_func(full_path, classes)
                expected_class = expected_class_from_filename_func(full_path)
                is_correct = predicted_class == expected_class
                result = dict(
                    file=full_path,
                    is_correct=is_correct,
                    predicted_class=predicted_class,
                    expected_class=expected_class,
                    probability=probability
                )
                results.append(result)

    return results

### Classify some new test images never seen by the network

In [None]:
# Download a virgin data set
test_dataset_name = 'objects-home-b'
! aws s3 sync s3://{data_bucket_name}/public/{test_dataset_name}/ {base_dir}/{test_dataset_name} --quiet
! echo "Synced data to {base_dir}/{test_dataset_name}"


In [None]:
import os

# Run our virgin data set through the model and calculate the percentage of correct classifications
test_dir = base_dir + '/' + test_dataset_name
expected_class_from_filename_func = lambda f : f.split('/')[3]

print(f"\nTesting with {test_dir}")

results = classify_all(test_dir, classify_local, classes, expected_class_from_filename_func)
            
rights = [r for r in results if r['is_correct'] is True]
wrongs = [r for r in results if r['is_correct'] is False]
n_right = len(rights) * 1.0
n_wrong = len(wrongs) * 1.0

print("\nUsing Classes: " +str(classes))

print("\nTotal accuracy on new images: " + str(n_right / (n_right+n_wrong)))


print("\nBreakdown of accuracy by class:")
stats = { 'right': {}, 'wrong': {} }
for r in results:
    right_or_wrong='right' if r['is_correct'] else 'wrong'
    stats[right_or_wrong].setdefault(r['expected_class'], []).append(r)

for cls in classes:
    rights = stats['right'].get(cls, [])
    wrongs = stats['wrong'].get(cls, [])
    n_right = len(rights) * 1.0
    n_wrong = len(wrongs) * 1.0
    print(f"Class {cls} has {n_right + n_wrong} images with total accuracy of: " + str(n_right / (n_right+n_wrong)))

## Deploy the trained model

In [None]:
%%time
# Deploying a model to an endpoint takes a few minutes, are you sure you want to do this?
CONFIRM_DEPLOY = False # Change to True to deploy
assert(CONFIRM_DEPLOY)

ic_classifier = ic.deploy(
    initial_instance_count = 1,
    instance_type = 'ml.t2.medium'
)

### Calling a deployed endpoint

In [None]:
import json
import numpy as np
import os

def classify_deployed(file_name, classes):
    payload = None
    with open(file_name, 'rb') as f:
        payload = f.read()
        payload = bytearray(payload)

    ic_classifier.content_type = 'application/x-image'
    result = json.loads(ic_classifier.predict(payload))
    best_prob_index = np.argmax(result)
    return (classes[best_prob_index], result[best_prob_index])



### Clean up

When we're done with the endpoint, we can just delete it and the backing instances will be released.  Run the following cell to delete the endpoint.

In [None]:
ic_classifier.delete_endpoint()

In [None]:

# job = ic.latest_training_job
# job_name_prefix = symbol_filename.replace('-symbol.json', '')
# job_date_suffix = job.name.replace(job_name_prefix+'-', '')
# job_date_suffix
# datetime.strptime('2018-09-26-06-59-50-461', "%Y-%m-%d-%H-%M-%S-%f").timestamp()
# #model_files_prefix
# # t = '1984-06-02T19:05:00.000Z'
# # parsed_t = dp.parse(t)