# Tensorflow Object Detection API and AWS Sagemaker

In [1]:
#faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config

In this notebook, you will train and evaluate different models using the [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and [AWS Sagemaker](https://aws.amazon.com/sagemaker/). 

If you ever feel stuck, you can refer to this [tutorial](https://aws.amazon.com/blogs/machine-learning/training-and-deploying-models-using-tensorflow-2-with-the-object-detection-api-on-amazon-sagemaker/).

## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [2]:
%%capture
%pip install tensorflow_io sagemaker -U

In [3]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


Save the IAM role in a variable called `role`. This would be useful when training the model.

In [4]:
role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::359346771395:role/service-role/AmazonSageMaker-ExecutionRole-20240516T125452


In [5]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
          'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://udacity-selfdriving-240516-1358/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the repository
* build the docker image and push it 
* print the container name

In [6]:
%%bash

# clone the repo and get the scripts
# git clone https://github.com/tensorflow/models.git docker/models

# get model_main and exporter_main files from TF2 Object Detection GitHub repository
# cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
# cp docker/models/research/object_detection/model_main_tf2.py source_dir

In [7]:
# build and push the docker image. This code can be commented out after being run once.
# This will take around 10 mins.

#image_name = 'tf2-object-detection'
#!sh ./docker/build_and_push.sh $image_name

To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [8]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

359346771395.dkr.ecr.us-east-1.amazonaws.com/tf2-object-detection:20240527120237


## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be adjusted if you were to experiment with other architectures.

In [9]:
%%bash
pwd
rm -r /tmp/checkpoint
rm -r source_dir/checkpoint
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint

#wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
#tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

#ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8
#wget -O /tmp/ssdmobilenet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
#tar -zxvf /tmp/ssdmobilenet.tar.gz --strip-components 2 --directory source_dir/checkpoint ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint
# result out of memory with pipeline_ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.config

# Faster R-CNN ResNet152 V1 640x640
#wget -O /tmp/fasterrcnn.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.tar.gz
#tar -zxvf /tmp/fasterrcnn.tar.gz --strip-components 2 --directory source_dir/checkpoint faster_rcnn_resnet152_v1_640x640_coco17_tpu-8/checkpoint

#SSD ResNet50 V1 FPN 640x640 (RetinaNet50)
#wget -O /tmp/ssdresnet50.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
#tar -zxvf /tmp/ssdresnet50.tar.gz --strip-components 2 --directory source_dir/checkpoint ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint

#Faster R-CNN ResNet50 V1 640x640
wget -O /tmp/fasterrcnn50.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.tar.gz
tar -zxvf /tmp/fasterrcnn50.tar.gz --strip-components 2 --directory source_dir/checkpoint faster_rcnn_resnet50_v1_640x640_coco17_tpu-8/checkpoint


/home/ec2-user/SageMaker/cd2688-object-detection-in-urban-environment-project/1_model_training


--2024-06-06 13:38:04--  http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 142.251.111.207, 142.251.16.207, 142.251.163.207, ...
Connecting to download.tensorflow.org (download.tensorflow.org)|142.251.111.207|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 211996178 (202M) [application/x-tar]
Saving to: ‘/tmp/fasterrcnn50.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 14.7M 14s
    50K .......... .......... .......... .......... ..........  0% 29.3M 10s
   100K .......... .......... .......... .......... ..........  0% 26.5M 9s
   150K .......... .......... .......... .......... ..........  0% 34.0M 9s
   200K .......... .......... .......... .......... ..........  0%  283M 7s
   250K .......... .......... .......... .......... ..........  0%  105M 6s
   300K .......... .......... .......... ..........

  5100K .......... .......... .......... .......... ..........  2%  235M 4s
  5150K .......... .......... .......... .......... ..........  2%  212M 4s
  5200K .......... .......... .......... .......... ..........  2%  197M 4s
  5250K .......... .......... .......... .......... ..........  2%  217M 4s
  5300K .......... .......... .......... .......... ..........  2%  159M 4s
  5350K .......... .......... .......... .......... ..........  2%  261M 4s
  5400K .......... .......... .......... .......... ..........  2%  167M 4s
  5450K .......... .......... .......... .......... ..........  2%  280M 4s
  5500K .......... .......... .......... .......... ..........  2%  216M 4s
  5550K .......... .......... .......... .......... ..........  2%  190M 4s
  5600K .......... .......... .......... .......... ..........  2%  239M 4s
  5650K .......... .......... .......... .......... ..........  2%  223M 4s
  5700K .......... .......... .......... .......... ..........  2%  201M 4s
  5750K ....

 10500K .......... .......... .......... .......... ..........  5%  224M 3s
 10550K .......... .......... .......... .......... ..........  5%  258M 3s
 10600K .......... .......... .......... .......... ..........  5%  346M 3s
 10650K .......... .......... .......... .......... ..........  5%  356M 3s
 10700K .......... .......... .......... .......... ..........  5%  296M 3s
 10750K .......... .......... .......... .......... ..........  5%  236M 3s
 10800K .......... .......... .......... .......... ..........  5%  292M 3s
 10850K .......... .......... .......... .......... ..........  5%  139M 3s
 10900K .......... .......... .......... .......... ..........  5%  388M 3s
 10950K .......... .......... .......... .......... ..........  5%  290M 3s
 11000K .......... .......... .......... .......... ..........  5%  187M 3s
 11050K .......... .......... .......... .......... ..........  5%  292M 3s
 11100K .......... .......... .......... .......... ..........  5%  194M 3s
 11150K ....

 15900K .......... .......... .......... .......... ..........  7%  273M 2s
 15950K .......... .......... .......... .......... ..........  7%  315M 2s
 16000K .......... .......... .......... .......... ..........  7%  253M 2s
 16050K .......... .......... .......... .......... ..........  7%  234M 2s
 16100K .......... .......... .......... .......... ..........  7%  456M 2s
 16150K .......... .......... .......... .......... ..........  7%  323M 2s
 16200K .......... .......... .......... .......... ..........  7%  343M 2s
 16250K .......... .......... .......... .......... ..........  7%  191M 2s
 16300K .......... .......... .......... .......... ..........  7%  312M 2s
 16350K .......... .......... .......... .......... ..........  7%  375M 2s
 16400K .......... .......... .......... .......... ..........  7%  254M 2s
 16450K .......... .......... .......... .......... ..........  7%  309M 2s
 16500K .......... .......... .......... .......... ..........  7%  409M 2s
 16550K ....

 21300K .......... .......... .......... .......... .......... 10%  398M 2s
 21350K .......... .......... .......... .......... .......... 10%  315M 2s
 21400K .......... .......... .......... .......... .......... 10%  420M 2s
 21450K .......... .......... .......... .......... .......... 10%  275M 2s
 21500K .......... .......... .......... .......... .......... 10%  411M 2s
 21550K .......... .......... .......... .......... .......... 10%  285M 2s
 21600K .......... .......... .......... .......... .......... 10%  264M 2s
 21650K .......... .......... .......... .......... .......... 10%  357M 2s
 21700K .......... .......... .......... .......... .......... 10%  280M 2s
 21750K .......... .......... .......... .......... .......... 10%  244M 2s
 21800K .......... .......... .......... .......... .......... 10%  407M 2s
 21850K .......... .......... .......... .......... .......... 10%  228M 2s
 21900K .......... .......... .......... .......... .......... 10%  215M 2s
 21950K ....

 26700K .......... .......... .......... .......... .......... 12%  336M 2s
 26750K .......... .......... .......... .......... .......... 12%  432M 2s
 26800K .......... .......... .......... .......... .......... 12%  225M 2s
 26850K .......... .......... .......... .......... .......... 12%  275M 2s
 26900K .......... .......... .......... .......... .......... 13%  429M 2s
 26950K .......... .......... .......... .......... .......... 13%  315M 2s
 27000K .......... .......... .......... .......... .......... 13%  213M 2s
 27050K .......... .......... .......... .......... .......... 13%  288M 2s
 27100K .......... .......... .......... .......... .......... 13%  340M 2s
 27150K .......... .......... .......... .......... .......... 13%  342M 2s
 27200K .......... .......... .......... .......... .......... 13%  449M 2s
 27250K .......... .......... .......... .......... .......... 13%  242M 2s
 27300K .......... .......... .......... .......... .......... 13%  375M 2s
 27350K ....

 32100K .......... .......... .......... .......... .......... 15%  372M 1s
 32150K .......... .......... .......... .......... .......... 15%  335M 1s
 32200K .......... .......... .......... .......... .......... 15%  338M 1s
 32250K .......... .......... .......... .......... .......... 15%  302M 1s
 32300K .......... .......... .......... .......... .......... 15%  369M 1s
 32350K .......... .......... .......... .......... .......... 15%  352M 1s
 32400K .......... .......... .......... .......... .......... 15%  309M 1s
 32450K .......... .......... .......... .......... .......... 15%  329M 1s
 32500K .......... .......... .......... .......... .......... 15%  328M 1s
 32550K .......... .......... .......... .......... .......... 15%  303M 1s
 32600K .......... .......... .......... .......... .......... 15%  358M 1s
 32650K .......... .......... .......... .......... .......... 15%  302M 1s
 32700K .......... .......... .......... .......... .......... 15%  296M 1s
 32750K ....

 37500K .......... .......... .......... .......... .......... 18%  242M 1s
 37550K .......... .......... .......... .......... .......... 18%  286M 1s
 37600K .......... .......... .......... .......... .......... 18%  301M 1s
 37650K .......... .......... .......... .......... .......... 18%  293M 1s
 37700K .......... .......... .......... .......... .......... 18%  265M 1s
 37750K .......... .......... .......... .......... .......... 18%  316M 1s
 37800K .......... .......... .......... .......... .......... 18%  348M 1s
 37850K .......... .......... .......... .......... .......... 18%  338M 1s
 37900K .......... .......... .......... .......... .......... 18%  224M 1s
 37950K .......... .......... .......... .......... .......... 18%  319M 1s
 38000K .......... .......... .......... .......... .......... 18%  297M 1s
 38050K .......... .......... .......... .......... .......... 18%  287M 1s
 38100K .......... .......... .......... .......... .......... 18%  263M 1s
 38150K ....

 42900K .......... .......... .......... .......... .......... 20%  327M 2s
 42950K .......... .......... .......... .......... .......... 20%  219M 2s
 43000K .......... .......... .......... .......... .......... 20%  322M 2s
 43050K .......... .......... .......... .......... .......... 20%  335M 2s
 43100K .......... .......... .......... .......... .......... 20%  337M 2s
 43150K .......... .......... .......... .......... .......... 20%  238M 2s
 43200K .......... .......... .......... .......... .......... 20%  302M 2s
 43250K .......... .......... .......... .......... .......... 20%  302M 2s
 43300K .......... .......... .......... .......... .......... 20%  303M 2s
 43350K .......... .......... .......... .......... .......... 20%  295M 2s
 43400K .......... .......... .......... .......... .......... 20%  253M 2s
 43450K .......... .......... .......... .......... .......... 21%  312M 2s
 43500K .......... .......... .......... .......... .......... 21%  322M 2s
 43550K ....

 48300K .......... .......... .......... .......... .......... 23%  318M 1s
 48350K .......... .......... .......... .......... .......... 23%  279M 1s
 48400K .......... .......... .......... .......... .......... 23%  309M 1s
 48450K .......... .......... .......... .......... .......... 23%  328M 1s
 48500K .......... .......... .......... .......... .......... 23%  280M 1s
 48550K .......... .......... .......... .......... .......... 23%  225M 1s
 48600K .......... .......... .......... .......... .......... 23%  299M 1s
 48650K .......... .......... .......... .......... .......... 23%  274M 1s
 48700K .......... .......... .......... .......... .......... 23%  288M 1s
 48750K .......... .......... .......... .......... .......... 23%  236M 1s
 48800K .......... .......... .......... .......... .......... 23%  255M 1s
 48850K .......... .......... .......... .......... .......... 23%  290M 1s
 48900K .......... .......... .......... .......... .......... 23%  246M 1s
 48950K ....

 53700K .......... .......... .......... .......... .......... 25%  287M 1s
 53750K .......... .......... .......... .......... .......... 25%  250M 1s
 53800K .......... .......... .......... .......... .......... 26%  295M 1s
 53850K .......... .......... .......... .......... .......... 26%  328M 1s
 53900K .......... .......... .......... .......... .......... 26%  274M 1s
 53950K .......... .......... .......... .......... .......... 26%  230M 1s
 54000K .......... .......... .......... .......... .......... 26%  279M 1s
 54050K .......... .......... .......... .......... .......... 26%  277M 1s
 54100K .......... .......... .......... .......... .......... 26%  334M 1s
 54150K .......... .......... .......... .......... .......... 26%  228M 1s
 54200K .......... .......... .......... .......... .......... 26%  317M 1s
 54250K .......... .......... .......... .......... .......... 26%  337M 1s
 54300K .......... .......... .......... .......... .......... 26%  253M 1s
 54350K ....

 59100K .......... .......... .......... .......... .......... 28%  237M 1s
 59150K .......... .......... .......... .......... .......... 28%  243M 1s
 59200K .......... .......... .......... .......... .......... 28%  328M 1s
 59250K .......... .......... .......... .......... .......... 28%  338M 1s
 59300K .......... .......... .......... .......... .......... 28%  240M 1s
 59350K .......... .......... .......... .......... .......... 28%  236M 1s
 59400K .......... .......... .......... .......... .......... 28%  300M 1s
 59450K .......... .......... .......... .......... .......... 28%  269M 1s
 59500K .......... .......... .......... .......... .......... 28%  272M 1s
 59550K .......... .......... .......... .......... .......... 28%  268M 1s
 59600K .......... .......... .......... .......... .......... 28%  339M 1s
 59650K .......... .......... .......... .......... .......... 28%  304M 1s
 59700K .......... .......... .......... .......... .......... 28%  283M 1s
 59750K ....

 64500K .......... .......... .......... .......... .......... 31%  344M 1s
 64550K .......... .......... .......... .......... .......... 31%  287M 1s
 64600K .......... .......... .......... .......... .......... 31%  306M 1s
 64650K .......... .......... .......... .......... .......... 31%  332M 1s
 64700K .......... .......... .......... .......... .......... 31%  331M 1s
 64750K .......... .......... .......... .......... .......... 31%  269M 1s
 64800K .......... .......... .......... .......... .......... 31%  299M 1s
 64850K .......... .......... .......... .......... .......... 31%  304M 1s
 64900K .......... .......... .......... .......... .......... 31%  286M 1s
 64950K .......... .......... .......... .......... .......... 31%  289M 1s
 65000K .......... .......... .......... .......... .......... 31%  303M 1s
 65050K .......... .......... .......... .......... .......... 31%  334M 1s
 65100K .......... .......... .......... .......... .......... 31%  345M 1s
 65150K ....

 69900K .......... .......... .......... .......... .......... 33%  296M 1s
 69950K .......... .......... .......... .......... .......... 33%  285M 1s
 70000K .......... .......... .......... .......... .......... 33%  207M 1s
 70050K .......... .......... .......... .......... .......... 33%  267M 1s
 70100K .......... .......... .......... .......... .......... 33%  321M 1s
 70150K .......... .......... .......... .......... .......... 33%  346M 1s
 70200K .......... .......... .......... .......... .......... 33%  280M 1s
 70250K .......... .......... .......... .......... .......... 33%  289M 1s
 70300K .......... .......... .......... .......... .......... 33%  276M 1s
 70350K .......... .......... .......... .......... .......... 34%  315M 1s
 70400K .......... .......... .......... .......... .......... 34%  220M 1s
 70450K .......... .......... .......... .......... .......... 34%  333M 1s
 70500K .......... .......... .......... .......... .......... 34%  299M 1s
 70550K ....

 75300K .......... .......... .......... .......... .......... 36%  256M 1s
 75350K .......... .......... .......... .......... .......... 36%  252M 1s
 75400K .......... .......... .......... .......... .......... 36%  280M 1s
 75450K .......... .......... .......... .......... .......... 36%  305M 1s
 75500K .......... .......... .......... .......... .......... 36%  286M 1s
 75550K .......... .......... .......... .......... .......... 36%  226M 1s
 75600K .......... .......... .......... .......... .......... 36%  298M 1s
 75650K .......... .......... .......... .......... .......... 36%  281M 1s
 75700K .......... .......... .......... .......... .......... 36%  232M 1s
 75750K .......... .......... .......... .......... .......... 36%  230M 1s
 75800K .......... .......... .......... .......... .......... 36%  299M 1s
 75850K .......... .......... .......... .......... .......... 36%  292M 1s
 75900K .......... .......... .......... .......... .......... 36%  272M 1s
 75950K ....

 80700K .......... .......... .......... .......... .......... 39%  314M 1s
 80750K .......... .......... .......... .......... .......... 39%  252M 1s
 80800K .......... .......... .......... .......... .......... 39%  310M 1s
 80850K .......... .......... .......... .......... .......... 39%  305M 1s
 80900K .......... .......... .......... .......... .......... 39%  316M 1s
 80950K .......... .......... .......... .......... .......... 39%  298M 1s
 81000K .......... .......... .......... .......... .......... 39%  323M 1s
 81050K .......... .......... .......... .......... .......... 39%  261M 1s
 81100K .......... .......... .......... .......... .......... 39%  315M 1s
 81150K .......... .......... .......... .......... .......... 39%  285M 1s
 81200K .......... .......... .......... .......... .......... 39%  310M 1s
 81250K .......... .......... .......... .......... .......... 39%  324M 1s
 81300K .......... .......... .......... .......... .......... 39%  332M 1s
 81350K ....

 86100K .......... .......... .......... .......... .......... 41%  301M 1s
 86150K .......... .......... .......... .......... .......... 41%  244M 1s
 86200K .......... .......... .......... .......... .......... 41%  330M 1s
 86250K .......... .......... .......... .......... .......... 41%  333M 1s
 86300K .......... .......... .......... .......... .......... 41%  304M 1s
 86350K .......... .......... .......... .......... .......... 41%  294M 1s
 86400K .......... .......... .......... .......... .......... 41%  289M 1s
 86450K .......... .......... .......... .......... .......... 41%  326M 1s
 86500K .......... .......... .......... .......... .......... 41%  293M 1s
 86550K .......... .......... .......... .......... .......... 41%  248M 1s
 86600K .......... .......... .......... .......... .......... 41%  339M 1s
 86650K .......... .......... .......... .......... .......... 41%  332M 1s
 86700K .......... .......... .......... .......... .......... 41%  333M 1s
 86750K ....

 91500K .......... .......... .......... .......... .......... 44%  248M 1s
 91550K .......... .......... .......... .......... .......... 44%  314M 1s
 91600K .......... .......... .......... .......... .......... 44%  309M 1s
 91650K .......... .......... .......... .......... .......... 44%  277M 1s
 91700K .......... .......... .......... .......... .......... 44%  245M 1s
 91750K .......... .......... .......... .......... .......... 44%  193M 1s
 91800K .......... .......... .......... .......... .......... 44%  278M 1s
 91850K .......... .......... .......... .......... .......... 44%  333M 1s
 91900K .......... .......... .......... .......... .......... 44%  236M 1s
 91950K .......... .......... .......... .......... .......... 44%  307M 1s
 92000K .......... .......... .......... .......... .......... 44%  332M 1s
 92050K .......... .......... .......... .......... .......... 44%  342M 1s
 92100K .......... .......... .......... .......... .......... 44%  271M 1s
 92150K ....

 96900K .......... .......... .......... .......... .......... 46%  281M 1s
 96950K .......... .......... .......... .......... .......... 46%  316M 1s
 97000K .......... .......... .......... .......... .......... 46%  334M 1s
 97050K .......... .......... .......... .......... .......... 46%  294M 1s
 97100K .......... .......... .......... .......... .......... 46%  274M 1s
 97150K .......... .......... .......... .......... .......... 46%  342M 1s
 97200K .......... .......... .......... .......... .......... 46%  312M 1s
 97250K .......... .......... .......... .......... .......... 46%  303M 1s
 97300K .......... .......... .......... .......... .......... 47%  201M 1s
 97350K .......... .......... .......... .......... .......... 47%  281M 1s
 97400K .......... .......... .......... .......... .......... 47%  338M 1s
 97450K .......... .......... .......... .......... .......... 47%  351M 1s
 97500K .......... .......... .......... .......... .......... 47%  266M 1s
 97550K ....

102300K .......... .......... .......... .......... .......... 49%  245M 1s
102350K .......... .......... .......... .......... .......... 49%  295M 1s
102400K .......... .......... .......... .......... .......... 49%  339M 1s
102450K .......... .......... .......... .......... .......... 49%  334M 1s
102500K .......... .......... .......... .......... .......... 49%  252M 1s
102550K .......... .......... .......... .......... .......... 49%  331M 1s
102600K .......... .......... .......... .......... .......... 49%  313M 1s
102650K .......... .......... .......... .......... .......... 49%  297M 1s
102700K .......... .......... .......... .......... .......... 49%  271M 1s
102750K .......... .......... .......... .......... .......... 49%  285M 1s
102800K .......... .......... .......... .......... .......... 49%  336M 1s
102850K .......... .......... .......... .......... .......... 49%  331M 1s
102900K .......... .......... .......... .......... .......... 49%  270M 1s
102950K ....

107700K .......... .......... .......... .......... .......... 52%  290M 1s
107750K .......... .......... .......... .......... .......... 52%  291M 1s
107800K .......... .......... .......... .......... .......... 52%  308M 1s
107850K .......... .......... .......... .......... .......... 52%  294M 1s
107900K .......... .......... .......... .......... .......... 52%  286M 1s
107950K .......... .......... .......... .......... .......... 52%  307M 1s
108000K .......... .......... .......... .......... .......... 52%  289M 1s
108050K .......... .......... .......... .......... .......... 52%  279M 1s
108100K .......... .......... .......... .......... .......... 52%  281M 1s
108150K .......... .......... .......... .......... .......... 52%  329M 1s
108200K .......... .......... .......... .......... .......... 52%  268M 1s
108250K .......... .......... .......... .......... .......... 52%  311M 1s
108300K .......... .......... .......... .......... .......... 52%  271M 1s
108350K ....

113100K .......... .......... .......... .......... .......... 54%  261M 1s
113150K .......... .......... .......... .......... .......... 54%  318M 1s
113200K .......... .......... .......... .......... .......... 54%  326M 1s
113250K .......... .......... .......... .......... .......... 54%  309M 1s
113300K .......... .......... .......... .......... .......... 54%  276M 1s
113350K .......... .......... .......... .......... .......... 54%  287M 1s
113400K .......... .......... .......... .......... .......... 54%  291M 1s
113450K .......... .......... .......... .......... .......... 54%  334M 1s
113500K .......... .......... .......... .......... .......... 54%  258M 1s
113550K .......... .......... .......... .......... .......... 54%  297M 1s
113600K .......... .......... .......... .......... .......... 54%  278M 1s
113650K .......... .......... .......... .......... .......... 54%  313M 1s
113700K .......... .......... .......... .......... .......... 54%  272M 1s
113750K ....

118500K .......... .......... .......... .......... .......... 57%  251M 1s
118550K .......... .......... .......... .......... .......... 57%  297M 1s
118600K .......... .......... .......... .......... .......... 57%  336M 1s
118650K .......... .......... .......... .......... .......... 57%  279M 1s
118700K .......... .......... .......... .......... .......... 57%  279M 1s
118750K .......... .......... .......... .......... .......... 57%  333M 1s
118800K .......... .......... .......... .......... .......... 57%  311M 1s
118850K .......... .......... .......... .......... .......... 57%  320M 1s
118900K .......... .......... .......... .......... .......... 57%  276M 1s
118950K .......... .......... .......... .......... .......... 57%  269M 1s
119000K .......... .......... .......... .......... .......... 57%  294M 1s
119050K .......... .......... .......... .......... .......... 57%  308M 1s
119100K .......... .......... .......... .......... .......... 57%  244M 1s
119150K ....

123900K .......... .......... .......... .......... .......... 59%  235M 1s
123950K .......... .......... .......... .......... .......... 59%  321M 1s
124000K .......... .......... .......... .......... .......... 59%  229M 1s
124050K .......... .......... .......... .......... .......... 59%  276M 1s
124100K .......... .......... .......... .......... .......... 59%  244M 1s
124150K .......... .......... .......... .......... .......... 59%  341M 1s
124200K .......... .......... .......... .......... .......... 60%  292M 1s
124250K .......... .......... .......... .......... .......... 60%  338M 1s
124300K .......... .......... .......... .......... .......... 60%  208M 1s
124350K .......... .......... .......... .......... .......... 60%  275M 1s
124400K .......... .......... .......... .......... .......... 60%  338M 1s
124450K .......... .......... .......... .......... .......... 60%  256M 1s
124500K .......... .......... .......... .......... .......... 60%  209M 1s
124550K ....

129300K .......... .......... .......... .......... .......... 62%  187M 1s
129350K .......... .......... .......... .......... .......... 62%  211M 1s
129400K .......... .......... .......... .......... .......... 62%  221M 1s
129450K .......... .......... .......... .......... .......... 62%  295M 1s
129500K .......... .......... .......... .......... .......... 62%  204M 1s
129550K .......... .......... .......... .......... .......... 62%  283M 1s
129600K .......... .......... .......... .......... .......... 62%  178M 1s
129650K .......... .......... .......... .......... .......... 62%  273M 1s
129700K .......... .......... .......... .......... .......... 62%  231M 1s
129750K .......... .......... .......... .......... .......... 62%  282M 1s
129800K .......... .......... .......... .......... .......... 62%  164M 1s
129850K .......... .......... .......... .......... .......... 62%  284M 1s
129900K .......... .......... .......... .......... .......... 62%  191M 1s
129950K ....

134700K .......... .......... .......... .......... .......... 65%  279M 1s
134750K .......... .......... .......... .......... .......... 65%  205M 1s
134800K .......... .......... .......... .......... .......... 65%  238M 1s
134850K .......... .......... .......... .......... .......... 65%  289M 1s
134900K .......... .......... .......... .......... .......... 65%  208M 1s
134950K .......... .......... .......... .......... .......... 65%  212M 1s
135000K .......... .......... .......... .......... .......... 65%  183M 1s
135050K .......... .......... .......... .......... .......... 65%  231M 1s
135100K .......... .......... .......... .......... .......... 65%  245M 1s
135150K .......... .......... .......... .......... .......... 65%  226M 1s
135200K .......... .......... .......... .......... .......... 65%  201M 1s
135250K .......... .......... .......... .......... .......... 65%  255M 1s
135300K .......... .......... .......... .......... .......... 65%  168M 1s
135350K ....

140100K .......... .......... .......... .......... .......... 67%  230M 1s
140150K .......... .......... .......... .......... .......... 67%  216M 1s
140200K .......... .......... .......... .......... .......... 67%  271M 1s
140250K .......... .......... .......... .......... .......... 67%  341M 1s
140300K .......... .......... .......... .......... .......... 67%  235M 1s
140350K .......... .......... .......... .......... .......... 67%  270M 1s
140400K .......... .......... .......... .......... .......... 67%  351M 1s
140450K .......... .......... .......... .......... .......... 67%  218M 1s
140500K .......... .......... .......... .......... .......... 67%  205M 1s
140550K .......... .......... .......... .......... .......... 67%  275M 1s
140600K .......... .......... .......... .......... .......... 67%  336M 1s
140650K .......... .......... .......... .......... .......... 67%  140M 1s
140700K .......... .......... .......... .......... .......... 67%  213M 1s
140750K ....

145500K .......... .......... .......... .......... .......... 70%  197M 1s
145550K .......... .......... .......... .......... .......... 70%  222M 1s
145600K .......... .......... .......... .......... .......... 70%  334M 1s
145650K .......... .......... .......... .......... .......... 70%  271M 1s
145700K .......... .......... .......... .......... .......... 70%  179M 1s
145750K .......... .......... .......... .......... .......... 70%  212M 1s
145800K .......... .......... .......... .......... .......... 70%  259M 1s
145850K .......... .......... .......... .......... .......... 70%  337M 1s
145900K .......... .......... .......... .......... .......... 70%  208M 1s
145950K .......... .......... .......... .......... .......... 70%  217M 1s
146000K .......... .......... .......... .......... .......... 70%  225M 1s
146050K .......... .......... .......... .......... .......... 70%  267M 1s
146100K .......... .......... .......... .......... .......... 70%  293M 1s
146150K ....

## Edit pipeline.config file

The [`pipeline.config`](source_dir/pipeline.config) in the `source_dir` folder should be updated when you experiment with different models. The different config files are available [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).

>Note: The provided `pipeline.config` file works well with the `EfficientDet` model. You would need to modify it when working with other models.

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [10]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir": "/opt/training",        
        "pipeline_config_path": "faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config",
        "num_train_steps": "2000",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.g5.xlarge', # ml.g5.xlarge
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

INFO:sagemaker:Creating training-job with name: tf2-object-detection-2024-06-06-13-38-08-759


2024-06-06 13:38:14 Starting - Starting the training job...
2024-06-06 13:38:30 Starting - Preparing the instances for training...
2024-06-06 13:39:04 Downloading - Downloading input data...
2024-06-06 13:39:24 Downloading - Downloading the training image.........
2024-06-06 13:40:59 Training - Training image download completed. Training in progress..[34m2024-06-06 13:41:16,197 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-06 13:41:16,233 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-06 13:41:16,269 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-06 13:41:16,284 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train",
        "val": "/opt/ml/input/data/val"
    }

[34mINFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)[0m
[34mI0606 13:41:22.263926 139787173721920 mirrored_strategy.py:419] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)[0m
[34mINFO:tensorflow:Maybe overwriting train_steps: 2000[0m
[34mI0606 13:41:22.544602 139787173721920 config_util.py:552] Maybe overwriting train_steps: 2000[0m
[34mINFO:tensorflow:Maybe overwriting use_bfloat16: False[0m
[34mI0606 13:41:22.544749 139787173721920 config_util.py:552] Maybe overwriting use_bfloat16: False[0m
[34mInstructions for updating:[0m
[34mrename to distribute_datasets_from_function[0m
[34mW0606 13:41:22.575999 139787173721920 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version

[34m==EVALUATING THE MODEL==[0m
[34mW0606 13:45:09.236361 140264937785152 model_lib_v2.py:1089] Forced number of epochs for all eval validations to be 1.[0m
[34mINFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None[0m
[34mI0606 13:45:09.236537 140264937785152 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None[0m
[34mINFO:tensorflow:Maybe overwriting use_bfloat16: False[0m
[34mI0606 13:45:09.236620 140264937785152 config_util.py:552] Maybe overwriting use_bfloat16: False[0m
[34mINFO:tensorflow:Maybe overwriting eval_num_epochs: 1[0m
[34mI0606 13:45:09.236737 140264937785152 config_util.py:552] Maybe overwriting eval_num_epochs: 1[0m
[34mW0606 13:45:09.236850 140264937785152 model_lib_v2.py:1106] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.[0m
[34mINFO:tensorflow:Reading unweighted datasets: ['/opt/ml/input/data/val/*.tfrecord'][0m
[34m


2024-06-06 13:47:09 Uploading - Uploading generated training model
2024-06-06 13:47:09 Failed - Training job failed


UnexpectedStatusException: Error for Training job tf2-object-detection-2024-06-06-13-38-08-759: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage ""
Command "/bin/sh -c ./run_training.sh --model_dir /opt/training --num_train_steps 2000 --pipeline_config_path faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config --sample_1_of_n_eval_examples 1", exit code: 1

In [None]:
# DEBUG / TROUBLESHOOT
#ls /tmp/checkpoint
#mkdir source_dir/checkpoint

import os

# Get the current working directory
current_directory = os.getcwd()
print(current_directory)

# Extract the directory name
directory_name = os.path.basename(current_directory)

# Print the directory name
print(directory_name)

files = [f for f in os.listdir('/home/ec2-user/SageMaker/cd2688-object-detection-in-urban-environment-project/1_model_training/source_dir/checkpoint')]
for f in files:
    print(f)
    # do something

print("\n/opt")
files = [f for f in os.listdir('/opt/')]
for f in files:
    print(f)
    # do something

    
# /home/ec2-user/SageMaker/cd2688-object-detection-in-urban-environment-project/1_model_training/source_dir

You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)


## Improve on the initial model

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the `pipeline.config` file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. Justify your choices of augmentations in the write-up.

Keep in mind that the following are also available:
* experiment with the optimizer: type of optimizer, learning rate, scheduler etc
* experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.
* visualize results on the test frames using the `2_deploy_model` notebook available in this repository.

In the cell below, write down all the different approaches you have experimented with, why you have chosen them and what you would have done if you had more time and resources. Justify your choices using the tensorboard visualizations (take screenshots and insert them in your write-up), the metrics on the evaluation set and the generated animation you have created with [this tool](../2_run_inference/2_deploy_model.ipynb).

In [None]:
# your write-up goes here.