# Tensorflow Object Detection API and AWS Sagemaker

In [4]:
#ssd_resnet50_v1_fpn

In this notebook, you will train and evaluate different models using the [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and [AWS Sagemaker](https://aws.amazon.com/sagemaker/). 

If you ever feel stuck, you can refer to this [tutorial](https://aws.amazon.com/blogs/machine-learning/training-and-deploying-models-using-tensorflow-2-with-the-object-detection-api-on-amazon-sagemaker/).

## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [5]:
%%capture
%pip install tensorflow_io sagemaker -U

In [6]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


Save the IAM role in a variable called `role`. This would be useful when training the model.

In [7]:
role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::359346771395:role/service-role/AmazonSageMaker-ExecutionRole-20240516T125452


In [8]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
          'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://udacity-selfdriving-240516-1358/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the repository
* build the docker image and push it 
* print the container name

In [9]:
%%bash

# clone the repo and get the scripts
git clone https://github.com/tensorflow/models.git docker/models

# get model_main and exporter_main files from TF2 Object Detection GitHub repository
cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
cp docker/models/research/object_detection/model_main_tf2.py source_dir

fatal: destination path 'docker/models' already exists and is not an empty directory.


In [None]:
# build and push the docker image. This code can be commented out after being run once.
# This will take around 10 mins.
#image_name = 'tf2-object-detection'
#!sh ./docker/build_and_push.sh $image_name

To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [14]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

359346771395.dkr.ecr.us-east-1.amazonaws.com/tf2-object-detection:20240527120237


## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be adjusted if you were to experiment with other architectures.

In [15]:
%%bash
pwd
rm -r /tmp/checkpoint
rm -r source_dir/checkpoint
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint

#wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
#tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

#ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8
#wget -O /tmp/ssdmobilenet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
#tar -zxvf /tmp/ssdmobilenet.tar.gz --strip-components 2 --directory source_dir/checkpoint ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint
# result out of memory with pipeline_ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.config
# will also test with: SSD_MobileNet_from_udacity_mentor.config

# Faster R-CNN ResNet152 V1 640x640
#wget -O /tmp/fasterrcnn.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.tar.gz
#tar -zxvf /tmp/fasterrcnn.tar.gz --strip-components 2 --directory source_dir/checkpoint faster_rcnn_resnet152_v1_640x640_coco17_tpu-8/checkpoint

#SSD ResNet50 V1 FPN 640x640 (RetinaNet50)
wget -O /tmp/ssdresnet50.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
tar -zxvf /tmp/ssdresnet50.tar.gz --strip-components 2 --directory source_dir/checkpoint ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint

# Faster R-CNN ResNet50 V1 640x640
#wget -O /tmp/fasterrcnn50.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.tar.gz
#tar -zxvf /tmp/fasterrcnn50.tar.gz --strip-components 2 --directory source_dir/checkpoint faster_rcnn_resnet50_v1_640x640_coco17_tpu-8/checkpoint


/home/ec2-user/SageMaker/cd2688-object-detection-in-urban-environment-project/1_model_training


--2024-06-06 11:59:33--  http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.253.115.207, 172.253.62.207, 142.251.167.207, ...
Connecting to download.tensorflow.org (download.tensorflow.org)|172.253.115.207|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 244817203 (233M) [application/x-tar]
Saving to: ‘/tmp/ssdresnet50.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 12.8M 18s
    50K .......... .......... .......... .......... ..........  0% 23.9M 14s
   100K .......... .......... .......... .......... ..........  0% 22.9M 13s
   150K .......... .......... .......... .......... ..........  0% 19.3M 13s
   200K .......... .......... .......... .......... ..........  0% 23.1M 12s
   250K .......... .......... .......... .......... ..........  0% 23.8M 12s
   300K .......... .......... .......... .......... 

  5100K .......... .......... .......... .......... ..........  2% 44.1M 6s
  5150K .......... .......... .......... .......... ..........  2%  189M 6s
  5200K .......... .......... .......... .......... ..........  2% 38.8M 6s
  5250K .......... .......... .......... .......... ..........  2% 41.6M 6s
  5300K .......... .......... .......... .......... ..........  2% 42.3M 6s
  5350K .......... .......... .......... .......... ..........  2% 44.9M 6s
  5400K .......... .......... .......... .......... ..........  2%  116M 6s
  5450K .......... .......... .......... .......... ..........  2% 47.4M 6s
  5500K .......... .......... .......... .......... ..........  2% 43.5M 6s
  5550K .......... .......... .......... .......... ..........  2% 39.3M 6s
  5600K .......... .......... .......... .......... ..........  2% 51.1M 6s
  5650K .......... .......... .......... .......... ..........  2%  141M 6s
  5700K .......... .......... .......... .......... ..........  2% 41.0M 6s
  5750K ....

 10500K .......... .......... .......... .......... ..........  4%  362M 5s
 10550K .......... .......... .......... .......... ..........  4% 60.6M 5s
 10600K .......... .......... .......... .......... ..........  4% 61.9M 5s
 10650K .......... .......... .......... .......... ..........  4% 64.4M 5s
 10700K .......... .......... .......... .......... ..........  4% 80.9M 5s
 10750K .......... .......... .......... .......... ..........  4%  214M 5s
 10800K .......... .......... .......... .......... ..........  4% 63.1M 5s
 10850K .......... .......... .......... .......... ..........  4% 62.9M 5s
 10900K .......... .......... .......... .......... ..........  4% 67.0M 5s
 10950K .......... .......... .......... .......... ..........  4% 93.2M 5s
 11000K .......... .......... .......... .......... ..........  4%  201M 5s
 11050K .......... .......... .......... .......... ..........  4% 76.7M 5s
 11100K .......... .......... .......... .......... ..........  4% 69.9M 5s
 11150K ....

 15900K .......... .......... .......... .......... ..........  6%  337M 4s
 15950K .......... .......... .......... .......... ..........  6%  304M 4s
 16000K .......... .......... .......... .......... ..........  6%  337M 4s
 16050K .......... .......... .......... .......... ..........  6%  288M 4s
 16100K .......... .......... .......... .......... ..........  6%  336M 4s
 16150K .......... .......... .......... .......... ..........  6% 1.02M 5s
 16200K .......... .......... .......... .......... ..........  6%  399M 5s
 16250K .......... .......... .......... .......... ..........  6%  273M 5s
 16300K .......... .......... .......... .......... ..........  6%  372M 5s
 16350K .......... .......... .......... .......... ..........  6%  262M 5s
 16400K .......... .......... .......... .......... ..........  6%  228M 5s
 16450K .......... .......... .......... .......... ..........  6%  335M 5s
 16500K .......... .......... .......... .......... ..........  6%  131M 5s
 16550K ....

 21300K .......... .......... .......... .......... ..........  8%  251M 4s
 21350K .......... .......... .......... .......... ..........  8%  187M 4s
 21400K .......... .......... .......... .......... ..........  8%  217M 4s
 21450K .......... .......... .......... .......... ..........  8%  212M 4s
 21500K .......... .......... .......... .......... ..........  9%  228M 4s
 21550K .......... .......... .......... .......... ..........  9%  216M 4s
 21600K .......... .......... .......... .......... ..........  9%  206M 4s
 21650K .......... .......... .......... .......... ..........  9%  180M 4s
 21700K .......... .......... .......... .......... ..........  9%  190M 4s
 21750K .......... .......... .......... .......... ..........  9%  190M 4s
 21800K .......... .......... .......... .......... ..........  9%  265M 4s
 21850K .......... .......... .......... .......... ..........  9%  186M 4s
 21900K .......... .......... .......... .......... ..........  9%  145M 4s
 21950K ....

 26700K .......... .......... .......... .......... .......... 11%  157M 4s
 26750K .......... .......... .......... .......... .......... 11%  164M 4s
 26800K .......... .......... .......... .......... .......... 11%  174M 4s
 26850K .......... .......... .......... .......... .......... 11%  135M 4s
 26900K .......... .......... .......... .......... .......... 11%  177M 4s
 26950K .......... .......... .......... .......... .......... 11%  117M 4s
 27000K .......... .......... .......... .......... .......... 11%  164M 4s
 27050K .......... .......... .......... .......... .......... 11%  136M 4s
 27100K .......... .......... .......... .......... .......... 11%  151M 4s
 27150K .......... .......... .......... .......... .......... 11%  119M 4s
 27200K .......... .......... .......... .......... .......... 11%  181M 4s
 27250K .......... .......... .......... .......... .......... 11%  180M 4s
 27300K .......... .......... .......... .......... .......... 11%  125M 4s
 27350K ....

 32100K .......... .......... .......... .......... .......... 13%  304M 3s
 32150K .......... .......... .......... .......... .......... 13%  255M 3s
 32200K .......... .......... .......... .......... .......... 13%  328M 3s
 32250K .......... .......... .......... .......... .......... 13%  330M 3s
 32300K .......... .......... .......... .......... .......... 13%  308M 3s
 32350K .......... .......... .......... .......... .......... 13%  292M 3s
 32400K .......... .......... .......... .......... .......... 13%  284M 3s
 32450K .......... .......... .......... .......... .......... 13%  296M 3s
 32500K .......... .......... .......... .......... .......... 13%  326M 3s
 32550K .......... .......... .......... .......... .......... 13%  248M 3s
 32600K .......... .......... .......... .......... .......... 13%  336M 3s
 32650K .......... .......... .......... .......... .......... 13%  329M 3s
 32700K .......... .......... .......... .......... .......... 13%  326M 3s
 32750K ....

 37500K .......... .......... .......... .......... .......... 15%  398M 3s
 37550K .......... .......... .......... .......... .......... 15%  359M 3s
 37600K .......... .......... .......... .......... .......... 15%  362M 3s
 37650K .......... .......... .......... .......... .......... 15%  331M 3s
 37700K .......... .......... .......... .......... .......... 15%  428M 3s
 37750K .......... .......... .......... .......... .......... 15%  396M 3s
 37800K .......... .......... .......... .......... .......... 15%  354M 3s
 37850K .......... .......... .......... .......... .......... 15%  357M 3s
 37900K .......... .......... .......... .......... .......... 15%  443M 3s
 37950K .......... .......... .......... .......... .......... 15%  387M 3s
 38000K .......... .......... .......... .......... .......... 15%  293M 3s
 38050K .......... .......... .......... .......... .......... 15%  308M 3s
 38100K .......... .......... .......... .......... .......... 15%  241M 3s
 38150K ....

 42900K .......... .......... .......... .......... .......... 17%  198M 3s
 42950K .......... .......... .......... .......... .......... 17%  183M 3s
 43000K .......... .......... .......... .......... .......... 18%  194M 3s
 43050K .......... .......... .......... .......... .......... 18%  223M 3s
 43100K .......... .......... .......... .......... .......... 18%  177M 3s
 43150K .......... .......... .......... .......... .......... 18%  212M 3s
 43200K .......... .......... .......... .......... .......... 18%  192M 3s
 43250K .......... .......... .......... .......... .......... 18%  219M 3s
 43300K .......... .......... .......... .......... .......... 18%  243M 3s
 43350K .......... .......... .......... .......... .......... 18%  241M 3s
 43400K .......... .......... .......... .......... .......... 18%  281M 3s
 43450K .......... .......... .......... .......... .......... 18%  232M 3s
 43500K .......... .......... .......... .......... .......... 18%  224M 3s
 43550K ....

 48300K .......... .......... .......... .......... .......... 20%  267M 3s
 48350K .......... .......... .......... .......... .......... 20%  334M 3s
 48400K .......... .......... .......... .......... .......... 20%  339M 3s
 48450K .......... .......... .......... .......... .......... 20%  283M 3s
 48500K .......... .......... .......... .......... .......... 20%  251M 3s
 48550K .......... .......... .......... .......... .......... 20%  332M 3s
 48600K .......... .......... .......... .......... .......... 20%  325M 3s
 48650K .......... .......... .......... .......... .......... 20%  315M 3s
 48700K .......... .......... .......... .......... .......... 20%  290M 3s
 48750K .......... .......... .......... .......... .......... 20%  269M 3s
 48800K .......... .......... .......... .......... .......... 20%  266M 3s
 48850K .......... .......... .......... .......... .......... 20%  329M 3s
 48900K .......... .......... .......... .......... .......... 20%  288M 3s
 48950K ....

 53700K .......... .......... .......... .......... .......... 22%  225M 3s
 53750K .......... .......... .......... .......... .......... 22%  285M 3s
 53800K .......... .......... .......... .......... .......... 22%  144M 3s
 53850K .......... .......... .......... .......... .......... 22%  296M 3s
 53900K .......... .......... .......... .......... .......... 22%  204M 3s
 53950K .......... .......... .......... .......... .......... 22%  128M 3s
 54000K .......... .......... .......... .......... .......... 22%  288M 3s
 54050K .......... .......... .......... .......... .......... 22%  203M 3s
 54100K .......... .......... .......... .......... .......... 22%  248M 3s
 54150K .......... .......... .......... .......... .......... 22%  150M 3s
 54200K .......... .......... .......... .......... .......... 22%  199M 3s
 54250K .......... .......... .......... .......... .......... 22%  275M 3s
 54300K .......... .......... .......... .......... .......... 22%  256M 3s
 54350K ....

 59100K .......... .......... .......... .......... .......... 24%  251M 2s
 59150K .......... .......... .......... .......... .......... 24%  168M 2s
 59200K .......... .......... .......... .......... .......... 24%  314M 2s
 59250K .......... .......... .......... .......... .......... 24%  347M 2s
 59300K .......... .......... .......... .......... .......... 24%  322M 2s
 59350K .......... .......... .......... .......... .......... 24%  234M 2s
 59400K .......... .......... .......... .......... .......... 24%  259M 2s
 59450K .......... .......... .......... .......... .......... 24%  236M 2s
 59500K .......... .......... .......... .......... .......... 24%  340M 2s
 59550K .......... .......... .......... .......... .......... 24%  125M 2s
 59600K .......... .......... .......... .......... .......... 24%  332M 2s
 59650K .......... .......... .......... .......... .......... 24%  191M 2s
 59700K .......... .......... .......... .......... .......... 24%  232M 2s
 59750K ....

 64500K .......... .......... .......... .......... .......... 26%  165M 2s
 64550K .......... .......... .......... .......... .......... 27%  223M 2s
 64600K .......... .......... .......... .......... .......... 27%  247M 2s
 64650K .......... .......... .......... .......... .......... 27%  336M 2s
 64700K .......... .......... .......... .......... .......... 27%  187M 2s
 64750K .......... .......... .......... .......... .......... 27%  247M 2s
 64800K .......... .......... .......... .......... .......... 27%  208M 2s
 64850K .......... .......... .......... .......... .......... 27%  305M 2s
 64900K .......... .......... .......... .......... .......... 27%  329M 2s
 64950K .......... .......... .......... .......... .......... 27%  184M 2s
 65000K .......... .......... .......... .......... .......... 27%  209M 2s
 65050K .......... .......... .......... .......... .......... 27%  274M 2s
 65100K .......... .......... .......... .......... .......... 27%  336M 2s
 65150K ....

 69900K .......... .......... .......... .......... .......... 29%  222M 2s
 69950K .......... .......... .......... .......... .......... 29%  193M 2s
 70000K .......... .......... .......... .......... .......... 29%  340M 2s
 70050K .......... .......... .......... .......... .......... 29%  286M 2s
 70100K .......... .......... .......... .......... .......... 29%  338M 2s
 70150K .......... .......... .......... .......... .......... 29%  181M 2s
 70200K .......... .......... .......... .......... .......... 29%  213M 2s
 70250K .......... .......... .......... .......... .......... 29%  307M 2s
 70300K .......... .......... .......... .......... .......... 29%  209M 2s
 70350K .......... .......... .......... .......... .......... 29%  279M 2s
 70400K .......... .......... .......... .......... .......... 29%  322M 2s
 70450K .......... .......... .......... .......... .......... 29%  343M 2s
 70500K .......... .......... .......... .......... .......... 29%  344M 2s
 70550K ....

 75300K .......... .......... .......... .......... .......... 31%  201M 2s
 75350K .......... .......... .......... .......... .......... 31%  184M 2s
 75400K .......... .......... .......... .......... .......... 31%  254M 2s
 75450K .......... .......... .......... .......... .......... 31%  251M 2s
 75500K .......... .......... .......... .......... .......... 31%  227M 2s
 75550K .......... .......... .......... .......... .......... 31%  340M 2s
 75600K .......... .......... .......... .......... .......... 31%  272M 2s
 75650K .......... .......... .......... .......... .......... 31%  295M 2s
 75700K .......... .......... .......... .......... .......... 31%  272M 2s
 75750K .......... .......... .......... .......... .......... 31%  253M 2s
 75800K .......... .......... .......... .......... .......... 31%  230M 2s
 75850K .......... .......... .......... .......... .......... 31%  170M 2s
 75900K .......... .......... .......... .......... .......... 31% 67.7M 2s
 75950K ....

 80700K .......... .......... .......... .......... .......... 33% 66.8M 2s
 80750K .......... .......... .......... .......... .......... 33% 69.5M 2s
 80800K .......... .......... .......... .......... .......... 33% 83.3M 2s
 80850K .......... .......... .......... .......... .......... 33%  139M 2s
 80900K .......... .......... .......... .......... .......... 33%  102M 2s
 80950K .......... .......... .......... .......... .......... 33% 78.6M 2s
 81000K .......... .......... .......... .......... .......... 33%  249M 2s
 81050K .......... .......... .......... .......... .......... 33%  328M 2s
 81100K .......... .......... .......... .......... .......... 33%  200M 2s
 81150K .......... .......... .......... .......... .......... 33%  317M 2s
 81200K .......... .......... .......... .......... .......... 33%  247M 2s
 81250K .......... .......... .......... .......... .......... 34%  327M 2s
 81300K .......... .......... .......... .......... .......... 34%  289M 2s
 81350K ....

 86100K .......... .......... .......... .......... .......... 36%  131M 2s
 86150K .......... .......... .......... .......... .......... 36%  189M 2s
 86200K .......... .......... .......... .......... .......... 36%  194M 2s
 86250K .......... .......... .......... .......... .......... 36%  230M 2s
 86300K .......... .......... .......... .......... .......... 36%  184M 2s
 86350K .......... .......... .......... .......... .......... 36%  181M 2s
 86400K .......... .......... .......... .......... .......... 36%  224M 2s
 86450K .......... .......... .......... .......... .......... 36%  323M 2s
 86500K .......... .......... .......... .......... .......... 36%  174M 2s
 86550K .......... .......... .......... .......... .......... 36%  160M 2s
 86600K .......... .......... .......... .......... .......... 36%  162M 2s
 86650K .......... .......... .......... .......... .......... 36%  185M 2s
 86700K .......... .......... .......... .......... .......... 36%  190M 2s
 86750K ....

 91500K .......... .......... .......... .......... .......... 38%  222M 2s
 91550K .......... .......... .......... .......... .......... 38%  183M 2s
 91600K .......... .......... .......... .......... .......... 38%  218M 2s
 91650K .......... .......... .......... .......... .......... 38%  217M 2s
 91700K .......... .......... .......... .......... .......... 38%  277M 2s
 91750K .......... .......... .......... .......... .......... 38%  196M 2s
 91800K .......... .......... .......... .......... .......... 38%  223M 2s
 91850K .......... .......... .......... .......... .......... 38%  233M 2s
 91900K .......... .......... .......... .......... .......... 38%  198M 2s
 91950K .......... .......... .......... .......... .......... 38%  187M 2s
 92000K .......... .......... .......... .......... .......... 38%  254M 2s
 92050K .......... .......... .......... .......... .......... 38%  332M 2s
 92100K .......... .......... .......... .......... .......... 38%  198M 2s
 92150K ....

 96900K .......... .......... .......... .......... .......... 40%  224M 2s
 96950K .......... .......... .......... .......... .......... 40%  239M 2s
 97000K .......... .......... .......... .......... .......... 40%  314M 2s
 97050K .......... .......... .......... .......... .......... 40%  329M 2s
 97100K .......... .......... .......... .......... .......... 40%  258M 2s
 97150K .......... .......... .......... .......... .......... 40%  317M 2s
 97200K .......... .......... .......... .......... .......... 40%  312M 2s
 97250K .......... .......... .......... .......... .......... 40%  334M 2s
 97300K .......... .......... .......... .......... .......... 40%  285M 2s
 97350K .......... .......... .......... .......... .......... 40%  324M 2s
 97400K .......... .......... .......... .......... .......... 40%  323M 2s
 97450K .......... .......... .......... .......... .......... 40%  295M 2s
 97500K .......... .......... .......... .......... .......... 40%  287M 2s
 97550K ....

102300K .......... .......... .......... .......... .......... 42%  348M 2s
102350K .......... .......... .......... .......... .......... 42%  471M 2s
102400K .......... .......... .......... .......... .......... 42% 27.0M 2s
102450K .......... .......... .......... .......... .......... 42%  238M 2s
102500K .......... .......... .......... .......... .......... 42%  182M 2s
102550K .......... .......... .......... .......... .......... 42%  143M 2s
102600K .......... .......... .......... .......... .......... 42%  161M 2s
102650K .......... .......... .......... .......... .......... 42%  162M 2s
102700K .......... .......... .......... .......... .......... 42%  228M 2s
102750K .......... .......... .......... .......... .......... 42%  219M 2s
102800K .......... .......... .......... .......... .......... 43%  394M 2s
102850K .......... .......... .......... .......... .......... 43%  345M 2s
102900K .......... .......... .......... .......... .......... 43%  371M 2s
102950K ....

107700K .......... .......... .......... .......... .......... 45%  355M 1s
107750K .......... .......... .......... .......... .......... 45%  321M 1s
107800K .......... .......... .......... .......... .......... 45%  335M 1s
107850K .......... .......... .......... .......... .......... 45%  383M 1s
107900K .......... .......... .......... .......... .......... 45%  372M 1s
107950K .......... .......... .......... .......... .......... 45%  335M 1s
108000K .......... .......... .......... .......... .......... 45%  428M 1s
108050K .......... .......... .......... .......... .......... 45%  384M 1s
108100K .......... .......... .......... .......... .......... 45%  328M 1s
108150K .......... .......... .......... .......... .......... 45%  311M 1s
108200K .......... .......... .......... .......... .......... 45%  391M 1s
108250K .......... .......... .......... .......... .......... 45%  446M 1s
108300K .......... .......... .......... .......... .......... 45%  305M 1s
108350K ....

113100K .......... .......... .......... .......... .......... 47%  343M 1s
113150K .......... .......... .......... .......... .......... 47%  332M 1s
113200K .......... .......... .......... .......... .......... 47%  393M 1s
113250K .......... .......... .......... .......... .......... 47%  346M 1s
113300K .......... .......... .......... .......... .......... 47%  382M 1s
113350K .......... .......... .......... .......... .......... 47%  344M 1s
113400K .......... .......... .......... .......... .......... 47%  339M 1s
113450K .......... .......... .......... .......... .......... 47%  391M 1s
113500K .......... .......... .......... .......... .......... 47%  330M 1s
113550K .......... .......... .......... .......... .......... 47%  334M 1s
113600K .......... .......... .......... .......... .......... 47%  394M 1s
113650K .......... .......... .......... .......... .......... 47%  363M 1s
113700K .......... .......... .......... .......... .......... 47%  385M 1s
113750K ....

118500K .......... .......... .......... .......... .......... 49%  405M 1s
118550K .......... .......... .......... .......... .......... 49%  351M 1s
118600K .......... .......... .......... .......... .......... 49%  372M 1s
118650K .......... .......... .......... .......... .......... 49%  432M 1s
118700K .......... .......... .......... .......... .......... 49%  387M 1s
118750K .......... .......... .......... .......... .......... 49%  378M 1s
118800K .......... .......... .......... .......... .......... 49%  336M 1s
118850K .......... .......... .......... .......... .......... 49%  426M 1s
118900K .......... .......... .......... .......... .......... 49%  371M 1s
118950K .......... .......... .......... .......... .......... 49%  413M 1s
119000K .......... .......... .......... .......... .......... 49%  305M 1s
119050K .......... .......... .......... .......... .......... 49%  407M 1s
119100K .......... .......... .......... .......... .......... 49%  417M 1s
119150K ....

123900K .......... .......... .......... .......... .......... 51%  373M 1s
123950K .......... .......... .......... .......... .......... 51%  324M 1s
124000K .......... .......... .......... .......... .......... 51%  378M 1s
124050K .......... .......... .......... .......... .......... 51%  386M 1s
124100K .......... .......... .......... .......... .......... 51%  395M 1s
124150K .......... .......... .......... .......... .......... 51%  404M 1s
124200K .......... .......... .......... .......... .......... 51%  303M 1s
124250K .......... .......... .......... .......... .......... 51%  354M 1s
124300K .......... .......... .......... .......... .......... 52%  388M 1s
124350K .......... .......... .......... .......... .......... 52%  386M 1s
124400K .......... .......... .......... .......... .......... 52%  384M 1s
124450K .......... .......... .......... .......... .......... 52%  428M 1s
124500K .......... .......... .......... .......... .......... 52%  402M 1s
124550K ....

129300K .......... .......... .......... .......... .......... 54%  414M 1s
129350K .......... .......... .......... .......... .......... 54%  446M 1s
129400K .......... .......... .......... .......... .......... 54%  363M 1s
129450K .......... .......... .......... .......... .......... 54%  396M 1s
129500K .......... .......... .......... .......... .......... 54%  406M 1s
129550K .......... .......... .......... .......... .......... 54%  366M 1s
129600K .......... .......... .......... .......... .......... 54%  251M 1s
129650K .......... .......... .......... .......... .......... 54%  383M 1s
129700K .......... .......... .......... .......... .......... 54%  410M 1s
129750K .......... .......... .......... .......... .......... 54%  427M 1s
129800K .......... .......... .......... .......... .......... 54%  373M 1s
129850K .......... .......... .......... .......... .......... 54%  380M 1s
129900K .......... .......... .......... .......... .......... 54%  396M 1s
129950K ....

134700K .......... .......... .......... .......... .......... 56%  232M 1s
134750K .......... .......... .......... .......... .......... 56%  234M 1s
134800K .......... .......... .......... .......... .......... 56%  190M 1s
134850K .......... .......... .......... .......... .......... 56%  290M 1s
134900K .......... .......... .......... .......... .......... 56%  243M 1s
134950K .......... .......... .......... .......... .......... 56%  191M 1s
135000K .......... .......... .......... .......... .......... 56%  282M 1s
135050K .......... .......... .......... .......... .......... 56%  209M 1s
135100K .......... .......... .......... .......... .......... 56%  248M 1s
135150K .......... .......... .......... .......... .......... 56%  186M 1s
135200K .......... .......... .......... .......... .......... 56%  326M 1s
135250K .......... .......... .......... .......... .......... 56%  454M 1s
135300K .......... .......... .......... .......... .......... 56%  326M 1s
135350K ....

140100K .......... .......... .......... .......... .......... 58%  221M 1s
140150K .......... .......... .......... .......... .......... 58%  230M 1s
140200K .......... .......... .......... .......... .......... 58%  252M 1s
140250K .......... .......... .......... .......... .......... 58%  131M 1s
140300K .......... .......... .......... .......... .......... 58%  216M 1s
140350K .......... .......... .......... .......... .......... 58%  275M 1s
140400K .......... .......... .......... .......... .......... 58%  206M 1s
140450K .......... .......... .......... .......... .......... 58%  152M 1s
140500K .......... .......... .......... .......... .......... 58%  150M 1s
140550K .......... .......... .......... .......... .......... 58%  136M 1s
140600K .......... .......... .......... .......... .......... 58%  138M 1s
140650K .......... .......... .......... .......... .......... 58%  145M 1s
140700K .......... .......... .......... .......... .......... 58%  210M 1s
140750K ....

145500K .......... .......... .......... .......... .......... 60%  334M 1s
145550K .......... .......... .......... .......... .......... 60%  338M 1s
145600K .......... .......... .......... .......... .......... 60%  279M 1s
145650K .......... .......... .......... .......... .......... 60%  268M 1s
145700K .......... .......... .......... .......... .......... 60%  334M 1s
145750K .......... .......... .......... .......... .......... 60%  311M 1s
145800K .......... .......... .......... .......... .......... 61%  320M 1s
145850K .......... .......... .......... .......... .......... 61%  276M 1s
145900K .......... .......... .......... .......... .......... 61%  339M 1s
145950K .......... .......... .......... .......... .......... 61%  328M 1s
146000K .......... .......... .......... .......... .......... 61%  327M 1s
146050K .......... .......... .......... .......... .......... 61%  282M 1s
146100K .......... .......... .......... .......... .......... 61%  308M 1s
146150K ....

223850K .......... .......... .......... .......... .......... 93%  272M 0s
223900K .......... .......... .......... .......... .......... 93%  202M 0s
223950K .......... .......... .......... .......... .......... 93%  280M 0s
224000K .......... .......... .......... .......... .......... 93%  216M 0s
224050K .......... .......... .......... .......... .......... 93%  375M 0s
224100K .......... .......... .......... .......... .......... 93%  208M 0s
224150K .......... .......... .......... .......... .......... 93%  197M 0s
224200K .......... .......... .......... .......... .......... 93%  380M 0s
224250K .......... .......... .......... .......... .......... 93%  217M 0s
224300K .......... .......... .......... .......... .......... 93%  210M 0s
224350K .......... .......... .......... .......... .......... 93%  209M 0s
224400K .......... .......... .......... .......... .......... 93%  314M 0s
224450K .......... .......... .......... .......... .......... 93%  364M 0s
224500K ....

229250K .......... .......... .......... .......... .......... 95%  213M 0s
229300K .......... .......... .......... .......... .......... 95%  207M 0s
229350K .......... .......... .......... .......... .......... 95%  294K 0s
229400K .......... .......... .......... .......... .......... 95%  141M 0s
229450K .......... .......... .......... .......... .......... 95%  168M 0s
229500K .......... .......... .......... .......... .......... 96%  234M 0s
229550K .......... .......... .......... .......... .......... 96%  252M 0s
229600K .......... .......... .......... .......... .......... 96%  197M 0s
229650K .......... .......... .......... .......... .......... 96%  189M 0s
229700K .......... .......... .......... .......... .......... 96%  190M 0s
229750K .......... .......... .......... .......... .......... 96%  188M 0s
229800K .......... .......... .......... .......... .......... 96%  220M 0s
229850K .......... .......... .......... .......... .......... 96%  170M 0s
229900K ....

234650K .......... .......... .......... .......... .......... 98%  153M 0s
234700K .......... .......... .......... .......... .......... 98%  192M 0s
234750K .......... .......... .......... .......... .......... 98%  177M 0s
234800K .......... .......... .......... .......... .......... 98%  186M 0s
234850K .......... .......... .......... .......... .......... 98%  152M 0s
234900K .......... .......... .......... .......... .......... 98%  184M 0s
234950K .......... .......... .......... .......... .......... 98%  164M 0s
235000K .......... .......... .......... .......... .......... 98%  200M 0s
235050K .......... .......... .......... .......... .......... 98%  162M 0s
235100K .......... .......... .......... .......... .......... 98%  212M 0s
235150K .......... .......... .......... .......... .......... 98%  207M 0s
235200K .......... .......... .......... .......... .......... 98%  237M 0s
235250K .......... .......... .......... .......... .......... 98%  186M 0s
235300K ....

ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0.data-00000-of-00001
ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/checkpoint
ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0.index


## Edit pipeline.config file

The [`pipeline.config`](source_dir/pipeline.config) in the `source_dir` folder should be updated when you experiment with different models. The different config files are available [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).

>Note: The provided `pipeline.config` file works well with the `EfficientDet` model. You would need to modify it when working with other models.

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [None]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir": "/opt/training",        
        "pipeline_config_path": "ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config",
        "num_train_steps": "2000",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.g5.xlarge', # ml.g5.xlarge
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

INFO:sagemaker:Creating training-job with name: tf2-object-detection-2024-06-06-12-00-02-576


2024-06-06 12:00:09 Starting - Starting the training job...
2024-06-06 12:00:25 Starting - Preparing the instances for training...
2024-06-06 12:00:57 Downloading - Downloading input data...
2024-06-06 12:01:17 Downloading - Downloading the training image.........
2024-06-06 12:02:57 Training - Training image download completed. Training in progress..[34m2024-06-06 12:03:12,074 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-06 12:03:12,110 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-06 12:03:12,146 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-06 12:03:12,162 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train",
        "val": "/opt/ml/input/data/val"
    }

[34mINFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)[0m
[34mI0606 12:03:18.270278 140205245101888 mirrored_strategy.py:419] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)[0m
[34mINFO:tensorflow:Maybe overwriting train_steps: 2000[0m
[34mI0606 12:03:18.549876 140205245101888 config_util.py:552] Maybe overwriting train_steps: 2000[0m
[34mINFO:tensorflow:Maybe overwriting use_bfloat16: False[0m
[34mI0606 12:03:18.550029 140205245101888 config_util.py:552] Maybe overwriting use_bfloat16: False[0m
[34mInstructions for updating:[0m
[34mrename to distribute_datasets_from_function[0m
[34mW0606 12:03:18.573611 140205245101888 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version

[34mI0606 12:04:02.368127 140199625475840 api.py:460] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m
[34mI0606 12:04:07.362052 140199625475840 api.py:460] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m
[34mI0606 12:04:12.386094 140199625475840 api.py:460] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m


In [17]:
# DEBUG / TROUBLESHOOT
#ls /tmp/checkpoint
#mkdir source_dir/checkpoint

import os

# Get the current working directory
current_directory = os.getcwd()
print(current_directory)

# Extract the directory name
directory_name = os.path.basename(current_directory)

# Print the directory name
print(directory_name)

files = [f for f in os.listdir('/home/ec2-user/SageMaker/cd2688-object-detection-in-urban-environment-project/1_model_training/source_dir/checkpoint')]
for f in files:
    print(f)
    # do something

print("\n/opt")
files = [f for f in os.listdir('/opt/')]
for f in files:
    print(f)
    # do something

    
# /home/ec2-user/SageMaker/cd2688-object-detection-in-urban-environment-project/1_model_training/source_dir

/home/ec2-user/SageMaker/cd2688-object-detection-in-urban-environment-project/1_model_training
1_model_training
ckpt-0.data-00000-of-00001
checkpoint
ckpt-0.index

/opt
aws
rh
containerd
amazon
aws-ofi-nccl
install-rootless-docker.sh
lib
brazil-npm-registry
docker.tgz
rootless.tgz
.sagemaker
ml


You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)


## Improve on the initial model

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the `pipeline.config` file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. Justify your choices of augmentations in the write-up.

Keep in mind that the following are also available:
* experiment with the optimizer: type of optimizer, learning rate, scheduler etc
* experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.
* visualize results on the test frames using the `2_deploy_model` notebook available in this repository.

In the cell below, write down all the different approaches you have experimented with, why you have chosen them and what you would have done if you had more time and resources. Justify your choices using the tensorboard visualizations (take screenshots and insert them in your write-up), the metrics on the evaluation set and the generated animation you have created with [this tool](../2_run_inference/2_deploy_model.ipynb).

mAP: 
