# Amazon SageMaker Object Detection - Satellite Images

## Introduction

Object detection is the process of identifying and localizing objects in an image. A typical object detection solution takes in an image as input and the output is a bounding box and classification of an object.  But before we have this solution, we need to acquire and process a traning dataset, create and setup a training job for the alorithm so that the aglorithm can learn about the dataset and then host the algorithm as an endpoint, to which we can supply the query image.

## Setup

To train the Object Detection algorithm on Amazon SageMaker, we need to setup and authenticate the use of AWS services. To begin with we need an AWS account role with SageMaker access. This role is used to give SageMaker access to your data in S3 will automatically be obtained from the role used to start the notebook.

In [36]:
%%time
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
print(role)

sess = sagemaker.Session()

arn:aws:iam::431615879134:role/sagemaker-test-role
CPU times: user 118 ms, sys: 0 ns, total: 118 ms
Wall time: 185 ms


We also need the S3 bucket that you want to use for training and to store the tranied model artifacts. In this notebook, we require a custom bucket that exists so as to keep the naming clean. You can end up using a default bucket that SageMaker comes with as well.

In [18]:
# data bucket
bucket = sess.default_bucket()
prefix = 'XView-ObjectDetection'

In [19]:
# training image location for object detection algorithm 
from sagemaker.amazon.amazon_estimator import get_image_uri
training_image = get_image_uri(sess.boto_region_name, 'object-detection', repo_version="latest")
print (training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/object-detection:latest


## Data

The data for this notebook comes from the XView competition[].  The data is described here https://arxiv.org/pdf/1802.07856.pdf.  Other interesting satellite image datasets could also be used.  For example:

* spacenet
* COCO

In [5]:
# val data
!aws s3 cp s3://random.datasets.sengstacken/xview_satellite_images/val.tgz ./data/

download: s3://random.datasets.sengstacken/xview_satellite_images/val.tgz to data/val.tgz


In [6]:
!tar -xvzf ./data/val.tgz -C ./data

./val_images/
./val_images/1038.tif
./val_images/1040.tif
./val_images/1043.tif
./val_images/1054.tif
./val_images/1060.tif
./val_images/1062.tif
./val_images/1064.tif
./val_images/1066.tif
./val_images/1069.tif
./val_images/1071.tif
./val_images/1073.tif
./val_images/1075.tif
./val_images/108.tif
./val_images/1082.tif
./val_images/1097.tif
./val_images/1098.tif
./val_images/11.tif
./val_images/1102.tif
./val_images/1115.tif
./val_images/1116.tif
./val_images/1117.tif
./val_images/1122.tif
./val_images/1134.tif
./val_images/1138.tif
./val_images/1148.tif
./val_images/1159.tif
./val_images/1161.tif
./val_images/1177.tif
./val_images/1194.tif
./val_images/12.tif
./val_images/1207.tif
./val_images/121.tif
./val_images/1213.tif
./val_images/122.tif
./val_images/1234.tif
./val_images/1235.tif
./val_images/1254.tif
./val_images/1258.tif
./val_images/1263.tif
./val_images/1267.tif
./val_images/1282.tif
./val_images/1308.tif
./val_images/1326.tif
./val_images/1333.tif
./val_images/1358.tif
./v

In [7]:
# train data
!aws s3 cp s3://random.datasets.sengstacken/xview_satellite_images/train.tgz ./data/

^Cmpleted 426.5 MiB/14.4 GiB (157.9 MiB/s) with 1 file(s) remaining
Completed 429.0 MiB/14.4 GiB (149.6 MiB/s) with 1 file(s) remaining

In [8]:
!tar -xvzf ./data/train.tgz -C ./data/

./train_images/
./train_images/10.tif
./train_images/._100.tif
./train_images/100.tif
./train_images/._102.tif
./train_images/102.tif
./train_images/1036.tif
./train_images/1037.tif
./train_images/104.tif
./train_images/1042.tif
./train_images/1044.tif
./train_images/1046.tif
^C


In [9]:
# train labels
!aws s3 cp s3://random.datasets.sengstacken/xview_satellite_images/train_labels.tgz ./data/

download: s3://random.datasets.sengstacken/xview_satellite_images/train_labels.tgz to data/train_labels.tgz


In [10]:
!tar -xvzf ./data/train_labels.tgz -C ./data/

./xView_train.geojson


## Data Preparation

Before using this dataset, we need to perform some data cleaning. The algorithm expects the dataset in a particular JSON format. The xView dataset, while containing annotations in JSON, does not follow our specifications. We will use this as an opportunity to introduce our JSON format by performing this convertion. To begin with we create appropriate directories for training images, validation images, as well as the annotation files for both.

### Prepare dataset 

The Amazon SageMaker Object Detection algorithm expects lables to be indexed from `0`. It also expects lables to be unique, successive and not skip any integers. For instance, if there are ten classes, the algorithm expects and the labels only be in the set `[0,1,2,3,4,5,6,7,8,9]`. 

In the xView validation set unfortunately, the labels do not satistify this requirement. Some indices are skipped and the labels start from `1`. We therefore need a mapper that will convert this index system to our requirement. Let us create a generic mapper therefore that could also be used to other datasets that might have nonunique or even string labels. All we need in a dictionary that would create a key-value mapping where an original label is hashed to a label that we require.

In [None]:
!mkdir jpeg_images

In [None]:
# Convert tiff to jpeg images
import os
from PIL import Image
from pathlib import Path

p = "./data/train_images"
img_files = []
for infile in os.listdir("./data/train_images"):
    if infile[-3:] == "tif" or infile[-3:] == "bmp" :
        
        try:
            outfile = './data/jpeg_images/' + infile[:-3] + "jpeg"
            im = Image.open(p + '/' + infile)
            out = im.convert("RGB")
            out.save(outfile, "JPEG", quality=80)
            img_files.append(outfile)
            print('saved - ',outfile)
        except:
            print('failed - ',infile)

In [21]:
import os
img_files = os.listdir('./data/jpeg_images/')

In [37]:
rlist = {11:"Fixed-wing Aircraft",
         12:"Small Aircraft", 
         13:"Passenger/Cargo Plane",
         15:"Helicopter",
         17:"Passenger Vehicle",
         18:"Small Car",
         19:"Bus",
         20:"Pickup Truck",
         21:"Utility Truck",
         23:"Truck",
         24:"Cargo Truck",
         25:"Truck Tractor w/ Box Trailer",
         26:"Truck Tractor",
         27:"Trailer",
         28:"Truck Tractor w/ Flatbed Trailer",
         29:"Truck Tractor w/ Liquid Tank",
         32:"Crane Truck",
         33:"Railway Vehicle",
         34:"Passenger Car",
         35:"Cargo/Container Car",
         36:"Flat Car",
         37:"Tank car",
         38:"Locomotive",
         40:"Maritime Vessel",
         41:"Motorboat",
         42:"Sailboat",
         44:"Tugboat",
         45:"Barge",
         47:"Fishing Vessel",
         49:"Ferry",
         50:"Yacht",
         51:"Container Ship",
         52:"Oil Tanker",
         53:"Engineering Vehicle",
         54:"Tower crane",
         55:"Container Crane",
         56:"Reach Stacker",
         57:"Straddle Carrier",
         59:"Mobile Crane",
         60:"Dump Truck",
         61:"Haul Truck",
         62:"Scraper/Tractor",
         63:"Front loader/Bulldozer",
         64:"Excavator",
         65:"Cement Mixer",
         66:"Ground Grader",
         71:"Hut/Tent",
         72:"Shed",
         73:"Building",
         74:"Aircraft Hangar",
         75:"Unknown",
         76:"Damaged Building",
         77:"Facility",
         79:"Construction Site",
         82:"Unknown",
         83:"Vehicle Lot",
         84:"Helipad",
         86:"Storage Tank",
         89:"Shipping container lot",
         91:"Shipping Container",
         93:"Pylon",
         94:"Tower"}

def get_mapper():
    original_list = [11, 12, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33,
                    34, 35, 36, 37, 38, 40, 41, 42, 44, 45, 47, 49, 50, 51, 52, 53, 54,
                    55, 56, 57, 59, 60, 61, 62, 63, 64, 65, 66, 71, 72, 73, 74, 75, 76, 77, 79, 82, 83, 84,
                    86, 89, 91, 93, 94]
    iter_counter = 0
    xView = {}
    for orig in original_list:
        xView[orig] = iter_counter
        iter_counter += 1
    return xView

Let us use this dictionary, to create a look up method. Let us do so in a way that any dictionary could be used to create this method.

In [38]:
def get_mapper_fn(map):  
    def mapper(in_category):
        return map[in_category]
    return mapper

fix_index_mapping = get_mapper_fn(get_mapper())

The method `fix_index_mapping` is essentially a look-up method, which we can use to convert lables. Let us now iterate over every annotation in the dataset and prepare our data. Note how the keywords are created and a structure is established. For more information on the JSON format details, refer the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html).

In [23]:
!pip install tqdm



In [24]:
!pip install -U scikit-image

Collecting scikit-image
  Downloading scikit_image-0.17.2-cp36-cp36m-manylinux1_x86_64.whl (12.4 MB)
[K     |████████████████████████████████| 12.4 MB 11.3 MB/s eta 0:00:01
Collecting PyWavelets>=1.1.1
  Downloading PyWavelets-1.1.1-cp36-cp36m-manylinux1_x86_64.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 49.4 MB/s eta 0:00:01
Collecting tifffile>=2019.7.26
  Downloading tifffile-2020.7.4-py3-none-any.whl (138 kB)
[K     |████████████████████████████████| 138 kB 66.8 MB/s eta 0:00:01
Installing collected packages: PyWavelets, tifffile, scikit-image
  Attempting uninstall: PyWavelets
    Found existing installation: PyWavelets 0.5.2
    Uninstalling PyWavelets-0.5.2:
      Successfully uninstalled PyWavelets-0.5.2
  Attempting uninstall: scikit-image
    Found existing installation: scikit-image 0.16.2
    Uninstalling scikit-image-0.16.2:
      Successfully uninstalled scikit-image-0.16.2
Successfully installed PyWavelets-1.1.1 scikit-image-0.17.2 tifffile-2020.7.4


In [16]:
import data_utilities.wv_util as wv
import data_utilities.aug_util as aug

coords, chips, classes = wv.get_labels('./data/xView_train.geojson')




100%|██████████| 601937/601937 [00:02<00:00, 220708.53it/s]


In [None]:
!mkdir ./data/jpeg_chips
!mkdir ./data/generated

In [None]:
import os
import numpy as np
from PIL import Image
import json
from tqdm import trange

for i in trange(len(img_files)):
    
    if '.jpeg' in img_files[i]:
        arr = wv.get_image('./data/jpeg_images/'+img_files[i])
        match_name = img_files[i].strip('.jpeg')+'.tif'
        c_coords = coords[chips==match_name]
        c_classes = classes[chips==match_name].astype(np.int64)
        c_img, c_box, c_cls = wv.chip_image(img = arr, coords= c_coords, classes=c_classes, shape=(500,500))

        for j in range(len(c_img)):
            if(len(c_cls) >= 1 and c_cls[j][0] !=0):
                save_name = img_files[i].strip('.jpeg') + "_" + str(j) + '.jpeg'
                chip = Image.fromarray(c_img[j])
                chip.save('./data/jpeg_chips/' + save_name)

                jsonFile = save_name.strip('.jpeg')+'.json'

                line = {}
                line['file'] = save_name
                line['image_size'] = [{
                    'width':500,
                    'height':500,
                    'depth':3
                }]

                line['annotations'] = []
                line['categories'] = []

    #             for k in range(len(c_box[j])):
    #                 line['annotations'].append({
    #                     'class_id':int(fix_index_mapping(c_cls[j][k])),
    #                     'top':int(c_box[j][k][0]),
    #                     'left':int(c_box[j][k][1]),
    #                     'width':int(c_box[j][k][2]),
    #                     'height':int(c_box[j][k][3])
    #                 })
    #                 line['categories'].append({
    #                     'class_id':int(fix_index_mapping(c_cls[j][k])),
    #                     'name':rlist.get(c_cls[j][k])
    #                 })

                for k in range(len(c_box[j])):
                    line['annotations'].append({
                        'class_id':int(fix_index_mapping(c_cls[j][k])),
                        'top':int(c_box[j][k][1]),
                        'left':int(c_box[j][k][0]),
                        'width':int(c_box[j][k][2]-c_box[j][k][0]),
                        'height':int(c_box[j][k][3]-c_box[j][k][1])
                    })
                    line['categories'].append({
                        'class_id':int(fix_index_mapping(c_cls[j][k])),
                        'name':rlist.get(c_cls[j][k])
                    })    
                with open(os.path.join('./data/generated', jsonFile),'w') as p:
                    json.dump(line,p)

In [25]:
len(os.listdir('./data/generated/'))

14673

In [26]:
file_names = []
path = './data/jpeg_chips/'

folder = os.fsencode(path)

for file in os.listdir(folder):
    filename = os.fsdecode(file)
    file_names.append(filename)

After removing the images without annotations, we have 6432 annotated images. Let us split this dataset and create our training and validation datasets, with which our algorithm will train. To do so, we will simply split the dataset into training and validation data and move them to their respective folders.

In [27]:
len(file_names)

14673

In [28]:
from sklearn.model_selection import train_test_split

train = 0.8
train_list,val_list= train_test_split(file_names,train_size=train,random_state=1234)

In [None]:
!mkdir './data/train/'
!mkdir './data/train_annotation'
!mkdir './data/validation/'
!mkdir './data/validation_annotation/'

In [39]:
file_names


['1193_15.jpeg',
 '42_0.jpeg',
 '203_28.jpeg',
 '325_15.jpeg',
 '47_4.jpeg',
 '1465_36.jpeg',
 '756_4.jpeg',
 '1910_5.jpeg',
 '2542_14.jpeg',
 '1690_4.jpeg',
 '1095_8.jpeg',
 '5_13.jpeg',
 '2436_26.jpeg',
 '1124_11.jpeg',
 '2011_21.jpeg',
 '2305_10.jpeg',
 '2032_22.jpeg',
 '1831_24.jpeg',
 '145_22.jpeg',
 '871_40.jpeg',
 '1701_11.jpeg',
 '1985_19.jpeg',
 '92_10.jpeg',
 '629_15.jpeg',
 '2398_8.jpeg',
 '2370_14.jpeg',
 '805_4.jpeg',
 '73_17.jpeg',
 '2497_14.jpeg',
 '2012_45.jpeg',
 '1124_6.jpeg',
 '1912_24.jpeg',
 '379_24.jpeg',
 '1795_25.jpeg',
 '1509_29.jpeg',
 '2214_9.jpeg',
 '1284_12.jpeg',
 '128_18.jpeg',
 '2371_14.jpeg',
 '1128_33.jpeg',
 '1932_48.jpeg',
 '2542_19.jpeg',
 '105_17.jpeg',
 '1181_12.jpeg',
 '5_12.jpeg',
 '2519_20.jpeg',
 '112_30.jpeg',
 '2515_12.jpeg',
 '805_8.jpeg',
 '1403_30.jpeg',
 '1431_13.jpeg',
 '1150_23.jpeg',
 '1807_5.jpeg',
 '1807_34.jpeg',
 '1245_20.jpeg',
 '523_7.jpeg',
 '42_13.jpeg',
 '1441_9.jpeg',
 '1061_27.jpeg',
 '1472_10.jpeg',
 '2565_39.jpeg',
 '18_1

In [None]:
import shutil

#Moving training files to the training folders
for i in train_list:
    if '.jpeg' in i:
        image_file = './data/jpeg_chips/'+i
        shutil.copy(image_file, './data/train/')
        shutil.copy('./data/generated/'+i.strip('.jpeg')+'.json', './data/train_annotation/')

#Moving validation files to the validation folders
for i in val_list:
    if '.jpeg' in i:
        image_file = './data/jpeg_chips/'+i
        shutil.copy(image_file, './data/validation/')
        shutil.copy('./data/generated/'+i.strip('.jpeg')+'.json', './data/validation_annotation/')

In [30]:
len(train_list)

11738

In [31]:
len(os.listdir('./data/train/'))

11739

### Upload to S3
Next step in this process is to upload the data to the S3 bucket, from which the algorithm can read and use the data. We do this using multiple channels. Channels are simply directories in the bucket that differentiate between training and validation data. Let us simply call these directories `train` and `validation`. We will therefore require four channels: two for the data and two for annotations, the annotations ones named with the suffixes `_annotation`.

In [40]:
train_channel = prefix + '/train'
validation_channel = prefix + '/validation'
train_annotation_channel = prefix + '/train_annotation'
validation_annotation_channel = prefix + '/validation_annotation'

In [None]:
%%time

sess.upload_data(path='./data/train', bucket=bucket, key_prefix=train_channel)
sess.upload_data(path='./data/validation', bucket=bucket, key_prefix=validation_channel)
sess.upload_data(path='./data/train_annotation', bucket=bucket, key_prefix=train_annotation_channel)
sess.upload_data(path='./data/validation_annotation', bucket=bucket, key_prefix=validation_annotation_channel)

In [41]:
s3_train_data = 's3://{}/{}'.format(bucket, train_channel)
s3_validation_data = 's3://{}/{}'.format(bucket, validation_channel)
s3_train_annotation = 's3://{}/{}'.format(bucket, train_annotation_channel)
s3_validation_annotation = 's3://{}/{}'.format(bucket, validation_annotation_channel)

Next we need to setup an output location at S3, where the model artifact will be dumped. These artifacts are also the output of the algorithm's traning job.

In [42]:
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)

## Training
Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job.

In [43]:
od_model = sagemaker.estimator.Estimator(training_image,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.p3.16xlarge',
                                         train_volume_size = 50,
                                         train_max_run = 360000, 
                                         input_mode = 'File',
                                         output_path=s3_output_location,
                                         sagemaker_session=sess)

The object detection algorithm at its core is the [Single-Shot Multi-Box detection algorithm (SSD)](https://arxiv.org/abs/1512.02325). This algorithm uses a `base_network`, which is typically a [VGG](https://arxiv.org/abs/1409.1556) or a [ResNet](https://arxiv.org/abs/1512.03385). The Amazon SageMaker object detection algorithm supports VGG-16 and ResNet-50 now. It also has a lot of options for hyperparameters that help configure the training job. The next step in our training, is to setup these hyperparameters and data channels for training the model. Consider the following example definition of hyperparameters. See the SageMaker Object Detection [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) for more details on the hyperparameters.

One of the hyperparameters here for instance is the `epochs`. This defines how many passes of the dataset we iterate over and determines that training time of the algorithm. For the sake of demonstration let us run only `30` epochs.

In [47]:
od_model.set_hyperparameters(base_network='resnet-50',
                             use_pretrained_model=1,
                             num_classes=62,
                             mini_batch_size=32,
                             epochs=200,
                             learning_rate=0.001,
                             lr_scheduler_step='10',
                             lr_scheduler_factor=0.1,
                             optimizer='rmsprop',
                             momentum=0.9,
                             weight_decay=0.0005,
                             overlap_threshold=0.5,
                             nms_threshold=0.45,
                             image_shape=500,
                             label_width=6000,
                             num_training_samples=11738)

Now that the hyperparameters are setup, let us prepare the handshake between our data channels and the algorithm. To do this, we need to create the `sagemaker.session.s3_input` objects from our data channels. These objects are then put in a simple dictionary, which the algorithm consumes. Notice that here we use a `content_type` as `image/jpeg` for the image channels and the annoation channels. Notice how unlike the [RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb), we use four channels here.

In [45]:
train_data = sagemaker.session.s3_input(s3_train_data, distribution='FullyReplicated', 
                        content_type='image/jpeg', s3_data_type='S3Prefix')
validation_data = sagemaker.session.s3_input(s3_validation_data, distribution='FullyReplicated', 
                             content_type='image/jpeg', s3_data_type='S3Prefix')
train_annotation = sagemaker.session.s3_input(s3_train_annotation, distribution='FullyReplicated', 
                             content_type='image/jpeg', s3_data_type='S3Prefix')
validation_annotation = sagemaker.session.s3_input(s3_validation_annotation, distribution='FullyReplicated', 
                             content_type='image/jpeg', s3_data_type='S3Prefix')

data_channels = {'train': train_data, 'validation': validation_data, 
                 'train_annotation': train_annotation, 'validation_annotation':validation_annotation}

We have our `Estimator` object, we have set the hyperparameters for this object and we have our data channels linked with the algorithm. The only remaining thing to do is to train the algorithm. The following cell will train the algorithm. Training the algorithm involves a few steps. Firstly, the instances that we requested while creating the `Estimator` classes are provisioned and are setup with the appropriate libraries. Then, the data from our channels are downloaded into the instance. Once this is done, the training job begins. The provisioning and data downloading will take time, depending on the size of the data. Therefore it might be a few minutes before we start getting data logs for our training jobs. The data logs will also print out Mean Average Precision (mAP) on the validation data, among other losses, for every run of the dataset once or one epoch. This metric is a proxy for the quality of the algorithm. 

Once the job has finished a "Job complete" message will be printed. The trained model can be found in the S3 bucket that was setup as `output_path` in the estimator.

In [None]:
od_model.fit(inputs=data_channels, logs=True)

2020-07-08 20:51:09 Starting - Starting the training job...
2020-07-08 20:51:11 Starting - Launching requested ML instances.........
2020-07-08 20:53:00 Starting - Preparing the instances for training....................................
2020-07-08 20:58:49 Downloading - Downloading input data....................................
2020-07-08 21:05:08 Training - Training image download completed. Training in progress.[34mDocker entrypoint called with argument(s): train[0m
[34m[07/08/2020 21:05:12 INFO 139773230958400] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/default-input.json: {u'label_width': u'350', u'early_stopping_min_epochs': u'10', u'epochs': u'30', u'overlap_threshold': u'0.5', u'lr_scheduler_factor': u'0.1', u'_num_kv_servers': u'auto', u'weight_decay': u'0.0005', u'mini_batch_size': u'32', u'use_pretrained_model': u'0', u'freeze_layer_pattern': u'', u'lr_scheduler_step': u'', u'early_stopping': u'False', u'early_stopping_patience': u

## Hosting
Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don't have to host on the same insantance (or type of instance) that we used to train. Training is a prolonged and compute heavy job that require a different of compute and memory requirements that hosting typically do not. We can choose any type of instance we want to host the model. In our case we chose the `ml.p3.2xlarge` instance to train, but we choose to host the model on the less expensive cpu instance, `ml.m4.xlarge`. The endpoint deployment can be accomplished as follows:

In [None]:
object_detector = incr_od_model2.deploy(initial_instance_count = 1,
                                 instance_type = 'ml.m4.xlarge')

In [None]:
sess.create_endpoint_config_from_existing()

## Inference
Now that the trained model is deployed at an endpoint that is up-and-running, we can use this endpoint for inference. 

In [None]:
file_name = './data/jpeg_images/1127.jpeg'
file_name = './data/jpeg_chips/1127_22.jpeg'

with open(file_name, 'rb') as image:
    f = image.read()
    b = bytearray(f)

Let us use our endpoint to try to detect objects within this image. Since the image is `jpeg`, we use the appropriate `content_type` to run the prediction job. The endpoint returns a JSON file that we can simply load and peek into.

In [None]:
import json

object_detector.content_type = 'image/jpeg'
results = object_detector.predict(b)
detections = json.loads(results)
print (detections)

The results are in a format that is similar to the input .lst file (See [RecordIO Notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb) for more details on the .lst file definition. )with an addition of a confidence score for each detected object. The format of the output can be represented as `[class_index, confidence_score, xmin, ymin, xmax, ymax]`. Typically, we don't consider low-confidence predictions.

We have provided additional script to easily visualize the detection outputs. You can visulize the high-confidence preditions with bounding box by filtering out low-confidence detections using the script below:

In [None]:
def visualize_chip(img_file, dets, classes=[]):
        """
        visualize detections in one image
        Parameters:
        ----------
        img : numpy.array
            image, in bgr format
        dets : numpy.array
            ssd detections, numpy.array([[id, score, x1, y1, x2, y2]...])
            each row is one object
        classes : tuple or list of str
            class names
        thresh : float
            score threshold
        """
        import random
        import matplotlib.pyplot as plt
        import matplotlib.image as mpimg

        img=mpimg.imread(img_file)
        plt.figure(figsize=(20,20))
        plt.axis('off')
        plt.imshow(img)
        height = 1#img.shape[0]
        width = 1#img.shape[1]
        colors = dict()
        for det in dets['annotations']:
            print(det)
            klass = det['class_id']
            y0 = det['top']
            x0 = det['left']
            width = det['width']
            height = det['height']
            cls_id = int(klass)
            if cls_id not in colors:
                colors[cls_id] = (random.random(), random.random(), random.random())

            rect = plt.Rectangle((x0, y0), width,
                                 height, fill=False,
                                 edgecolor='red',
                                 linewidth=3.5)
            plt.gca().add_patch(rect)

        plt.show()

In [None]:
import matplotlib.pyplot as plt
N=62
# define the colormap
cmap = plt.cm.jet
# extract all colors from the .jet map
cmaplist = [cmap(i) for i in range(cmap.N)]
# create the new map
cmap = cmap.from_list('Custom cmap', cmaplist, cmap.N)

In [None]:
def visualize_detection(img_file, dets, classes=[], thresh=0.5):
        """
        visualize detections in one image
        Parameters:
        ----------
        img : numpy.array
            image, in bgr format
        dets : numpy.array
            ssd detections, numpy.array([[id, score, x1, y1, x2, y2]...])
            each row is one object
        classes : tuple or list of str
            class names
        thresh : float
            score threshold
        """
        import random
        import matplotlib.pyplot as plt
        import matplotlib.image as mpimg

        img=mpimg.imread(img_file)
        plt.figure(figsize=(20,20))
        plt.axis('off')
        plt.imshow(img)
        height = img.shape[0]
        width = img.shape[1]
        colors = dict()
        for det in dets:
            print(det)
            (klass, score, x0, y0, x1, y1) = det
            if score < thresh:
                continue
            cls_id = int(klass)
            if cls_id not in colors:
                colors[cls_id] = (random.random(), random.random(), random.random())
                #colors[cls_id] = cmaplist[cls_id][1:]
            xmin = int(x0 * width)
            ymin = int(y0 * height)
            xmax = int(x1 * width)
            ymax = int(y1 * height)
#             rect = plt.Rectangle((xmin, ymin), xmax - xmin,
#                                  ymax - ymin, fill=False,
#                                  edgecolor=colors[cls_id],
#                                  linewidth=3.5)
            rect = plt.Rectangle((xmin, ymin), xmax - xmin,
                                 ymax - ymin, fill=False,
                                 edgecolor=colors[cls_id],
                                 linewidth=3.5)
            plt.gca().add_patch(rect)

        plt.show()

In [None]:
def visualize_training(img_file, img_class, img_bbox, classesclasses=[]):
        """
        visualize detections in one image
        Parameters:
        ----------
        img : numpy.array
            image, in bgr format
        dets : numpy.array
            ssd detections, numpy.array([[id, score, x1, y1, x2, y2]...])
            each row is one object
        classes : tuple or list of str
            class names
        """
        import random
        import matplotlib.pyplot as plt
        import matplotlib.image as mpimg
        
        img=mpimg.imread(img_file)
        plt.figure(figsize=(20,20))
        plt.axis('off')
        plt.imshow(img)
        height = 1#img.shape[0]
        width = 1#img.shape[1]
        colors = dict()
        
        for p,det in enumerate(img_bbox):
            (x0, y0, x1, y1) = det
            klass = img_class[p]
            cls_id = int(klass)
            if cls_id not in colors:
                colors[cls_id] = (random.random(), random.random(), random.random())
            xmin = int(x0 * width)
            ymin = int(y0 * height)
            xmax = int(x1 * width)
            ymax = int(y1 * height)
            rect = plt.Rectangle((xmin, ymin), xmax - xmin,
                                 ymax - ymin, fill=False,
                                 edgecolor=colors[cls_id],
                                 linewidth=3.5)
            plt.gca().add_patch(rect)

        plt.show()

To achieve better detection results, you can try to use the more data from the dataset and train the model for more epochs. Tuning the hyperparameters, such as `mini_batch_size`, `learning_rate`, and `optimizer`, also helps to get a better detector.

In [1]:
file_name = './data/jpeg_images/1127.jpeg'
fclasses = classes[chips==file_name.split('/')[-1].strip('.jpeg') + '.tif'].astype(np.int64)
fbbox = coords[chips==file_name.split('/')[-1].strip('.jpeg') + '.tif']
visualize_training(file_name, fclasses, fbbox, list(rlist.values()))

NameError: name 'classes' is not defined

In [None]:
file_name = './data/jpeg_chips/1127_22.jpeg'

import json
with open('./data/generated/1127_22.json') as f:
  chip_data = json.load(f)

visualize_chip(file_name, chip_data, list(rlist.values()))

In [None]:
%matplotlib inline

# Setting a threshold 0.20 will only plot detection results that have a confidence score greater than 0.20.
threshold = 0.16

# Visualize the detections.
visualize_detection(file_name, detections['prediction'], list(rlist.values()), threshold)

## Delete the Endpoint
Having an endpoint running will incur some costs. Therefore as a clean-up job, we should delete the endpoint.

In [None]:
sagemaker.Session().delete_endpoint(object_detector.endpoint)

## Start Training from a saved model

In [None]:
# Given the base estimator, create a new one for incremental training

incr_od_model2 = sagemaker.estimator.Estimator(training_image,
                                         role, 
                                         train_instance_count=2, 
                                         train_instance_type='ml.p3.8xlarge',
                                         train_volume_size = 50,
                                         train_max_run = 360000, 
                                         input_mode = 'File',
                                         output_path=s3_output_location,
                                         sagemaker_session=sess,
                                         model_uri=incr_od_model.model_data)  # This parameter will ingest the previous job's model as a new channel

incr_od_model2.set_hyperparameters(base_network='resnet-50',
                             use_pretrained_model=0,
                             num_classes=62,
                             mini_batch_size=8,
                             epochs=100,
                             learning_rate=0.001,
                             lr_scheduler_step='10',
                             lr_scheduler_factor=0.1,
                             optimizer='sgd',
                             momentum=0.9,
                             weight_decay=0.0005,
                             overlap_threshold=0.5,
                             nms_threshold=0.45,
                             image_shape=500,
                             label_width=6000,
                             num_training_samples=11738)

incr_od_model2.fit(inputs=data_channels, logs=True)