# Sock Image classification transfer learning 




## Overview

Distributed image classification of socks using transfer learning mode. Use Amazon sagemaker image classification algorithm in transfer learning mode to fine-tune a pre-trained model (trained on sock images data) to learn to classify a new dataset.  Based off [this example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-transfer-learning-highlevel.ipynb)


In [None]:
%%time
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
sess = sagemaker.Session()
bucket = 'deeplens-sagemaker-socksort'
prefix = 'ic-transfer-learning'

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri

training_image = get_image_uri(sess.boto_region_name, 'image-classification', repo_version="latest")
print (training_image)

## Fine-tuning the Image classification model

The sock dataset consist of images from 7 categoriesand has 360 images with about 30 images per category. 

The image classification algorithm takes a  [recordio format](https://mxnet.incubator.apache.org/faq/recordio.html) format. 

We will use the recordio format for training and use the validation split

In [None]:
import os
import urllib.request
import boto3

        
def upload_to_s3(channel, file):
    s3 = boto3.resource('s3')
    data = open(file, "rb")
    key = channel + '/' + file
    s3.Bucket(bucket).put_object(Key=key, Body=data)

s3 = boto3.client('s3')
with open('sock-images_rec_val.rec', 'wb') as f:
    s3.download_fileobj(bucket, 'sock-images_rec_val.rec', f)

with open('sock-images_rec_train.rec', 'wb') as f:
    s3.download_fileobj(bucket, 'sock-images_rec_train.rec', f)
  

upload_to_s3('validation', 'sock-images_rec_val.rec')
upload_to_s3('train', 'sock-images_rec_train.rec')


In [None]:
# Two channels: train and validation
s3train = 's3://{}/{}/train/'.format(bucket, prefix)
s3validation = 's3://{}/{}/validation/'.format(bucket, prefix)

# upload the lst files to train and validation channels
!aws s3 cp sock-images_rec_train.rec $s3train 
!aws s3 cp sock-images_rec_val.rec $s3validation 

Once we have the data available in the correct format for training, the next step is to actually train the model using the data. Before training the model, we need to setup the training parameters. The next section will explain the parameters in detail.

## Training
Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job.
### Training parameters
There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include:

* **Training instance count**: This is the number of instances on which to run the training. When the number of instances is greater than one, then the image classification algorithm will run in distributed settings. 
* **Training instance type**: This indicates the type of machine on which to run the training. Typically, we use GPU instances for these training 
* **Output path**: This the s3 folder in which the training output is stored

In [None]:
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)

ic = sagemaker.estimator.Estimator(training_image,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.p2.xlarge',
                                         train_volume_size = 50,
                                         train_max_run = 360000,
                                         input_mode= 'File',
                                         output_path=s3_output_location,
                                         sagemaker_session=sess)


ic.set_hyperparameters(num_layers=18,
                             use_pretrained_model=1,
                             image_shape = "3,512,512",
                             num_classes=7,
                             num_training_samples=505,
                             mini_batch_size=12,
                             epochs=100,
                            # learning_rate=0.01,
                             learning_rate=0.0005,
                             precision_dtype='float32')                                         


## Input data specification
Set the data type and channels used for training

In [None]:
train_data = sagemaker.session.s3_input(s3train, distribution='FullyReplicated', 
                        content_type='application/x-recordio', s3_data_type='S3Prefix')
validation_data = sagemaker.session.s3_input(s3validation, distribution='FullyReplicated', 
                             content_type='application/x-recordio', s3_data_type='S3Prefix')

data_channels = {'train': train_data, 'validation': validation_data}

## Start the training
Start training by calling the fit method in the estimator

In [None]:
# Around 5 minues
ic.fit(inputs=data_channels, logs=True)

In [None]:
# Use to attach to previous model


# import sagemaker
# ic = sagemaker.estimator.Estimator.attach('image-classification-2020-02-09-05-18-19-609')



# Inference

***

A trained model does nothing on its own. We now want to use the model to perform inference. For this example, that means predicting the class of the image. You can deploy the created model by using the deploy method in the estimator

In [None]:
# This takes a really long time

ic_classifier = ic.deploy(initial_instance_count = 1,
                                          instance_type = 'ml.m4.xlarge')

### Evaluation

Evaluate the image through the network for inteference. The network outputs class probabilities and typically, one selects the class with the maximum probability as the final class output.

**Note:** The output class detected by the network may not be accurate in this example. To limit the time taken and cost of training, we have trained the model only for a couple of epochs. If the network is trained for more epochs (say 20), then the output class will be more accurate.

### Download test image and Evaluate

In [None]:
import os
import urllib.request
import boto3
from IPython.display import Image
import cv2
import json
import numpy as np

# Set these
test_image_bucket = 'deeplens-sagemaker-socksort'
test_image_name = 'testimages/08.jpg'

# No need to set these
tmp_file_name = 'tmp-test-image.jpg'
resized_file_name = 'resized-test-image.jpg'
s3 = boto3.client('s3')
with open(tmp_file_name, 'wb') as f:
    s3.download_fileobj(test_image_bucket, test_image_name, f)

# Resize test image
W = 500.
oriimg = cv2.imread(tmp_file_name)
height, width, depth = oriimg.shape
imgScale = W/width
newX,newY = oriimg.shape[1]*imgScale, oriimg.shape[0]*imgScale
newimg = cv2.resize(oriimg,(int(newX),int(newY)))
cv2.imwrite(resized_file_name, newimg)

with open(resized_file_name, 'rb') as f:
    payload = f.read()
    payload = bytearray(payload)
    
ic_classifier.content_type = 'application/x-image'
result = json.loads(ic_classifier.predict(payload))
# the result will output the probabilities for all classes
# find the class with maximum probability and print the class index
index = np.argmax(result)
object_categories = ['confluent', 'databricks', 'github', 'google', 'mongo', 'streamset', 'running-science']
print("Result: label - " + object_categories[index] + ", probability - " + str(result[index]))
print()
print(result)
print (ic._current_job_name)
Image(resized_file_name) 


### Clean up

When we're done with the endpoint, we can just delete it and the backing instances will be released.  Run the following cell to delete the endpoint.

In [None]:
ic_classifier.delete_endpoint()