objectdetection-chestxray


Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes an image as input and provides a bounding box on the image where an object of interest is found. It also identifies what type of object the box encapsulates. To create such a solution, we need to acquire and process a traning dataset, create and setup a training job for the alorithm so that it can learn about the dataset. Finally, we can then host the trained model in an endpoint, to which we can supply images.




Ground truth process as per following blog:
https://aws.amazon.com/blogs/aws/amazon-sagemaker-ground-truth-build-highly-accurate-datasets-and-reduce-labeling-costs-by-up-to-70/

Test data downloaded from here:
https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/data#


In [1]:
import sagemaker
from sagemaker import get_execution_role
 
role = get_execution_role()
print(role)
sess = sagemaker.Session()

arn:aws:iam::773208840593:role/my_AmazonSageMakerFullAccess


In [2]:
from sagemaker.amazon.amazon_estimator import get_image_uri
training_image = get_image_uri(sess.boto_region_name, 'object-detection', repo_version="latest")
print (training_image)

'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


813361260812.dkr.ecr.eu-central-1.amazonaws.com/object-detection:latest


In [26]:
bucket = 'raz-sagemaker' 
prefix = 'models/object-detection-chest-xray'

s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
print(s3_output_location)

bucket = 'raz-sagemaker'
prefix = 'annotation/chest_xray/raz-groundtruth-chest-xray-clone/manifests/output/output.manifest'
s3_train_data = 's3://{}/{}'.format(bucket, prefix)
print(s3_train_data)

bucket = 'raz-sagemaker'
prefix = 'annotation/chest_xray/validation/raz-groundtruth-chest-xray-clone-validation/manifests/output/output.manifest'
s3_validation_data = 's3://{}/{}'.format(bucket, prefix)
print(s3_validation_data)

s3://raz-sagemaker/models/object-detection-chest-xray/output
s3://raz-sagemaker/annotation/chest_xray/raz-groundtruth-chest-xray-clone/manifests/output/output.manifest
s3://raz-sagemaker/annotation/chest_xray/validation/raz-groundtruth-chest-xray-clone-validation/manifests/output/output.manifest


In [27]:
import numpy as np
import boto3
import tempfile

s3 = boto3.resource('s3', sess.boto_region_name)
bucket = s3.Bucket('raz-sagemaker')
object = bucket.Object('annotation/chest_xray/raz-groundtruth-chest-xray-clone/manifests/output/output.manifest')
tmp = tempfile.NamedTemporaryFile()
print(object)

with open(tmp.name, 'wb') as f:
    object.download_fileobj(f)
    
with open(tmp.name) as f:
    num_training_samples = sum(1 for line in f)
    print (num_training_samples)

s3.Object(bucket_name='raz-sagemaker', key='annotation/chest_xray/raz-groundtruth-chest-xray-clone/manifests/output/output.manifest')
107


In [28]:
od_model = sagemaker.estimator.Estimator(training_image,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.p3.2xlarge',
                                         train_volume_size = 50,
                                         train_max_run = 360000,
                                         input_mode = 'Pipe',
                                         output_path=s3_output_location,
                                         sagemaker_session=sess)




The object detection algorithm at its core is the Single-Shot Multi-Box detection algorithm (SSD). This algorithm uses a base_network, which is typically a VGG or a ResNet. The Amazon SageMaker object detection algorithm supports VGG-16 and ResNet-50 now. It also has a lot of options for hyperparameters that help configure the training job. The next step in our training, is to setup these hyperparameters and data channels for training the model. Consider the following example definition of hyperparameters. See the SageMaker Object Detection documentation for more details on the hyperparameters.

One of the hyperparameters here for instance is the epochs. This defines how many passes of the dataset we iterate over and determines that training time of the algorithm. For the sake of demonstration let us run only 30 epochs.

Details here: 
https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html

In [29]:
od_model.set_hyperparameters(base_network='resnet-50',
                             use_pretrained_model=1,
                             num_classes=1,
                             mini_batch_size=16,
                             epochs=30,
                             learning_rate=0.001,
                             lr_scheduler_step='10',
                             lr_scheduler_factor=0.1,
                             optimizer='sgd',
                             momentum=0.9,
                             weight_decay=0.0005,
                             overlap_threshold=0.5,
                             nms_threshold=0.45,
                             image_shape=512,
                             label_width=600,
                             num_training_samples=num_training_samples)

In [34]:
train_data = sagemaker.session.s3_input(s3_train_data, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='AugmentedManifestFile', attribute_names=['source-ref', 'raz-groundtruth-chest-xray-clone'])
validation_data = sagemaker.session.s3_input(s3_validation_data, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='AugmentedManifestFile', attribute_names=['source-ref', 'raz-groundtruth-chest-xray-clone-validation'])




In [None]:
data_channels = {'train': train_data, 'validation': validation_data}
print(data_channels)
od_model.fit(inputs=data_channels, logs=True)

{'train': <sagemaker.inputs.s3_input object at 0x7f3a65953eb8>, 'validation': <sagemaker.inputs.s3_input object at 0x7f3a659531d0>}
2020-07-10 15:28:20 Starting - Starting the training job...
2020-07-10 15:28:22 Starting - Launching requested ML instances.........

Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don't have to host on the same insantance (or type of instance) that we used to train. Training is a prolonged and compute heavy job that require a different of compute and memory requirements that hosting typically do not. We can choose any type of instance we want to host the model. In our case we chose the ml.p3.2xlarge instance to train, but we choose to host the model on the less expensive cpu instance, ml.m4.xlarge. The endpoint deployment can be accomplished as follows:

In [None]:
object_detector = od_model.deploy(initial_instance_count = 1,
                                 instance_type = 'ml.m4.xlarge')

In [None]:
Load a tet data 

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from PIL import Image


import numpy as np
import boto3
import tempfile
 
s3 = boto3.resource('s3', region_name='eu-central-1')
bucket = s3.Bucket('raz-sagemaker')
object = bucket.Object('ultrasound-jpeg/09-41-06_1.jpg')
tmp = tempfile.NamedTemporaryFile()
print(object)

with open(tmp.name, 'wb') as f:
    object.download_fileobj(f)
    img=mpimg.imread(tmp.name)
    plt.imshow(img)