# Tensorflow Object Detection API and AWS Sagemaker

In this notebook, you will train and evaluate different models using the [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and [AWS Sagemaker](https://aws.amazon.com/sagemaker/). 

If you ever feel stuck, you can refer to this [tutorial](https://aws.amazon.com/blogs/machine-learning/training-and-deploying-models-using-tensorflow-2-with-the-object-detection-api-on-amazon-sagemaker/).

## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [3]:
%%capture
%pip install tensorflow_io sagemaker -U

In [4]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

Save the IAM role in a variable called `role`. This would be useful when training the model.

In [5]:
role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::862328613582:role/service-role/AmazonSageMaker-ExecutionRole-20230725T104161


In [6]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
        'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://object-detection-mg/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the the repository
* build the docker image and push it 
* print the container name

In [7]:
%%bash

# clone the repo and get the scripts
git clone https://github.com/tensorflow/models.git docker/models

# get model_main and exporter_main files from TF2 Object Detection GitHub repository
cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
cp docker/models/research/object_detection/model_main_tf2.py source_dir

fatal: destination path 'docker/models' already exists and is not an empty directory.


In [8]:
# build and push the docker image. This code can be commented after being ran once.
# This will take around 10 mins.
image_name = 'tf2-object-detection'
!sh ./docker/build_and_push.sh $image_name

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Building image with name tf2-object-detection
Sending build context to Docker daemon  728.1MB
Step 1/17 : FROM tensorflow/tensorflow:2.9.0-gpu
2.9.0-gpu: Pulling from tensorflow/tensorflow

[1B17ec1767: Pulling fs layer 
[1B9ecd2bff: Pulling fs layer 
[1B4ae53552: Pulling fs layer 
[1B2d09b8c4: Pulling fs layer 
[1B0d530989: Pulling fs layer 
[1B81af025b: Pulling fs layer 
[1Bc129f45e: Pulling fs layer 
[1B8fcb70c6: Pulling fs layer 
[1B9aa4a247: Pulling fs layer 
[1B3100c8d1: Pulling fs layer 
[1B3a6b487b: Pulling fs layer 
[1Be8773234: Pulling fs layer 
[1B36c9476c: Pulling fs layer 


[4B3a6b487b: Extracting     172B/172B3MBB[12A[2K[13A[2K[13A[2K[14A[2K[7A[2K[14A[2K[9A[2K[14A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[8A[2K[9A[2K[14A[2K[9A[2K[6A[2K[9A[2K[9A[2K[8A[2K[14A[2K[8A[2K[14A[2K[5A[2K[9A[2K[14A[2K[9A[2K[8A[2K[9A[2K[8A[2K[9A[2K[3A[2K[9A[2K[14A[2K[9A[2K[14A[2K[9A[2K[3A[2K[9A[2K[8A[2K[9A[2K[14A[2K[9A[2K[14A[2K[9A[2K[8A[2K[14A[2K[9A[2K[9A[2K[8A[2K[14A[2K[3A[2K[14A[2K[3A[2K[9A[2K[3A[2K[9A[2K[14A[2K[8A[2K[3A[2K[14A[2K[9A[2K[3A[2K[8A[2K[3A[2K[3A[2K[9A[2K[3A[2K[14A[2K[8A[2K[3A[2K[2A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[3A[2K[3A[2K[14A[2K[9A[2K[3A[2K[9A[2K[9A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[14A[2K[3A[2K[9A[2K[14A[2K[14A[2K[9A[2K[3A[2K[14A[2K[3A[2K[3A[2K[9A

[1BDigest: sha256:aa9f4a6a7debc976135702118aedfd0d72bf9e495af6ecfd5a31d9714e3354263A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2KExtracting  50.69MB/583.3MB[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A

The following packages will be upgraded:
  dirmngr gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client
  gpg-wks-server gpgconf gpgsm gpgv
11 upgraded, 119 newly installed, 0 to remove and 145 not upgraded.
Need to get 59.6 MB of archives.
After this operation, 632 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-wks-client amd64 2.2.19-3ubuntu2.2 [97.4 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 dirmngr amd64 2.2.19-3ubuntu2.2 [330 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-wks-server amd64 2.2.19-3ubuntu2.2 [90.2 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gnupg-utils amd64 2.2.19-3ubuntu2.2 [481 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-agent amd64 2.2.19-3ubuntu2.2 [232 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg amd64 2.2.19-3ubuntu2.2 [482 kB]
Get:7 http://archive.ubuntu.com/ubuntu

Get:78 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libglx-mesa0 amd64 21.2.6-0ubuntu0.1~20.04.2 [137 kB]
Get:79 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libglx0 amd64 1.3.2-1~ubuntu0.20.04.2 [32.5 kB]
Get:80 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgl1 amd64 1.3.2-1~ubuntu0.20.04.2 [85.8 kB]
Get:81 http://archive.ubuntu.com/ubuntu focal/main amd64 xorg-sgml-doctools all 1:1.11-1 [12.9 kB]
Get:82 http://archive.ubuntu.com/ubuntu focal/main amd64 x11proto-dev all 2019.2-1ubuntu1 [594 kB]
Get:83 http://archive.ubuntu.com/ubuntu focal/main amd64 x11proto-core-dev all 2019.2-1ubuntu1 [2620 B]
Get:84 http://archive.ubuntu.com/ubuntu focal/main amd64 libxau-dev amd64 1:1.0.9-0ubuntu1 [9552 B]
Get:85 http://archive.ubuntu.com/ubuntu focal/main amd64 libxdmcp-dev amd64 1:1.1.3-0ubuntu1 [25.3 kB]
Get:86 http://archive.ubuntu.com/ubuntu focal/main amd64 xtrans-dev all 1.4.0-1 [68.9 kB]
Get:87 http://archive.ubuntu.com/ubuntu focal/main amd64 libp

Selecting previously unselected package ucf.
Preparing to unpack .../004-ucf_3.0038+nmu1_all.deb ...
Moving old data out of the way
Unpacking ucf (3.0038+nmu1) ...
Selecting previously unselected package libcbor0.6:amd64.
Preparing to unpack .../005-libcbor0.6_0.6.0-0ubuntu1_amd64.deb ...
Unpacking libcbor0.6:amd64 (0.6.0-0ubuntu1) ...
Selecting previously unselected package libdrm-common.
Preparing to unpack .../006-libdrm-common_2.4.107-8ubuntu1~20.04.2_all.deb ...
Unpacking libdrm-common (2.4.107-8ubuntu1~20.04.2) ...
Selecting previously unselected package libdrm2:amd64.
Preparing to unpack .../007-libdrm2_2.4.107-8ubuntu1~20.04.2_amd64.deb ...
Unpacking libdrm2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Selecting previously unselected package libedit2:amd64.
Preparing to unpack .../008-libedit2_3.1-20191231-1_amd64.deb ...
Unpacking libedit2:amd64 (3.1-20191231-1) ...
Selecting previously unselected package libfido2-1:amd64.
Preparing to unpack .../009-libfido2-1_1.3.1-1ubuntu2_amd64.de

Selecting previously unselected package libxcb-dri3-0:amd64.
Preparing to unpack .../051-libxcb-dri3-0_1.14-2_amd64.deb ...
Unpacking libxcb-dri3-0:amd64 (1.14-2) ...
Selecting previously unselected package libxcb-present0:amd64.
Preparing to unpack .../052-libxcb-present0_1.14-2_amd64.deb ...
Unpacking libxcb-present0:amd64 (1.14-2) ...
Selecting previously unselected package libxcb-sync1:amd64.
Preparing to unpack .../053-libxcb-sync1_1.14-2_amd64.deb ...
Unpacking libxcb-sync1:amd64 (1.14-2) ...
Selecting previously unselected package libxcb-xfixes0:amd64.
Preparing to unpack .../054-libxcb-xfixes0_1.14-2_amd64.deb ...
Unpacking libxcb-xfixes0:amd64 (1.14-2) ...
Selecting previously unselected package libxshmfence1:amd64.
Preparing to unpack .../055-libxshmfence1_1.3-1_amd64.deb ...
Unpacking libxshmfence1:amd64 (1.3-1) ...
Selecting previously unselected package libegl-mesa0:amd64.
Preparing to unpack .../056-libegl-mesa0_21.2.6-0ubuntu0.1~20.04.2_amd64.deb ...
Unpacking libegl-mes

Unpacking libprotobuf17:amd64 (3.6.1.3-2ubuntu5.2) ...
Selecting previously unselected package libprotoc17:amd64.
Preparing to unpack .../096-libprotoc17_3.6.1.3-2ubuntu5.2_amd64.deb ...
Unpacking libprotoc17:amd64 (3.6.1.3-2ubuntu5.2) ...
Selecting previously unselected package libwebpdemux2:amd64.
Preparing to unpack .../097-libwebpdemux2_0.6.1-2ubuntu0.20.04.2_amd64.deb ...
Unpacking libwebpdemux2:amd64 (0.6.1-2ubuntu0.20.04.2) ...
Selecting previously unselected package libwebpmux3:amd64.
Preparing to unpack .../098-libwebpmux3_0.6.1-2ubuntu0.20.04.2_amd64.deb ...
Unpacking libwebpmux3:amd64 (0.6.1-2ubuntu0.20.04.2) ...
Selecting previously unselected package libxcb-randr0:amd64.
Preparing to unpack .../099-libxcb-randr0_1.14-2_amd64.deb ...
Unpacking libxcb-randr0:amd64 (1.14-2) ...
Selecting previously unselected package libxslt1.1:amd64.
Preparing to unpack .../100-libxslt1.1_1.1.34-4ubuntu0.20.04.1_amd64.deb ...
Unpacking libxslt1.1:amd64 (1.1.34-4ubuntu0.20.04.1) ...
Selecting

Setting up libllvm12:amd64 (1:12.0.0-3ubuntu1~20.04.5) ...
Setting up git (1:2.25.1-1ubuntu3.11) ...
Setting up python3-xcffib (0.8.1-0.8) ...
Setting up gpg-wks-server (2.2.19-3ubuntu2.2) ...
Setting up libxcb-dri2-0:amd64 (1.14-2) ...
Setting up libdrm2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up python3-lxml:amd64 (4.5.0-1ubuntu0.5) ...
Setting up libxcb-randr0:amd64 (1.14-2) ...
Setting up libx11-6:amd64 (2:1.6.9-2ubuntu1.5) ...
Setting up libfontconfig1:amd64 (2.13.1-2ubuntu3) ...
Setting up libxmuu1:amd64 (2:1.1.3-0ubuntu1) ...
Setting up libdrm-amdgpu1:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up libxcb-dri3-0:amd64 (1.14-2) ...
Setting up mesa-vulkan-drivers:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libdrm-nouveau2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up libxcb1-dev:amd64 (1.14-2) ...
Setting up gpg-wks-client (2.2.19-3ubuntu2.2) ...
Setting up libxrender1:amd64 (1:0.9.10-1) ...
Setting up libgbm1:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libdrm-rad

Collecting sacrebleu<=2.2.0 (from object-detection==0.1)
  Downloading sacrebleu-2.2.0-py3-none-any.whl (116 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.6/116.6 kB 29.0 MB/s eta 0:00:00
Collecting portalocker (from sacrebleu<=2.2.0->object-detection==0.1)
  Downloading portalocker-2.7.0-py2.py3-none-any.whl (15 kB)
Collecting regex (from sacrebleu<=2.2.0->object-detection==0.1)
  Obtaining dependency information for regex from https://files.pythonhosted.org/packages/c4/3d/d7ed16c298101bc7f5e3a65aef1ab34c1d7e1a89893491a4b1faf20701aa/regex-2023.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading regex-2023.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 kB 9.4 MB/s eta 0:00:00
Collecting tabulate>=0.8.9 (from sacrebleu<=2.2.0->object-detection==0.1)
  Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)
Collecting colorama (from sacrebleu<=2.2.0->object-detection==

  Preparing metadata (setup.py): finished with status 'done'
Collecting cloudpickle~=2.2.1 (from apache-beam->object-detection==0.1)
  Downloading cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Collecting fastavro<2,>=0.23.6 (from apache-beam->object-detection==0.1)
  Obtaining dependency information for fastavro<2,>=0.23.6 from https://files.pythonhosted.org/packages/bc/24/a0e07113b1f26e7707f0c0a3923b6e2bfda743ad65da2104e9ef8d985aa6/fastavro-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading fastavro-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Collecting fasteners<1.0,>=0.3 (from apache-beam->object-detection==0.1)
  Downloading fasteners-0.18-py3-none-any.whl (18 kB)
Collecting hdfs<3.0.0,>=2.1.0 (from apache-beam->object-detection==0.1)
  Downloading hdfs-2.7.0-py3-none-any.whl (34 kB)
Collecting httplib2<0.23.0,>=0.8 (from apache-beam->object-detection==0.1)
  Downloading httplib2-0.22.0-py3-none-any.whl (96 kB)
  

Collecting bleach (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading bleach-6.0.0-py3-none-any.whl (162 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 162.5/162.5 kB 30.8 MB/s eta 0:00:00
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo<5.0.0,>=3.8.0->apache-beam->object-detection==0.1)
  Obtaining dependency information for dnspython<3.0.0,>=1.16.0 from https://files.pythonhosted.org/packages/71/30/deee2ffb94194437c730a1c6230d9310ab5f73926a2549cdab91453616bb/dnspython-2.4.1-py3-none-any.whl.metadata
  Downloading dnspython-2.4.1-py3-none-any.whl.metadata (4.9 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0,>=2.24.0->apache-beam->object-detection==0.1)
  Obtaining dependency information for charset-normalizer<4,>=2 from https://files.pythonhosted.org/packages/cb/e7/5e43745003bf1f90668c7be23fc5952b3a2b9c2558f16749411c18039b36/charset_normalizer-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading charset_n

Collecting google-auth<3.0.0.dev0,>=1.19.0 (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Obtaining dependency information for google-auth<3.0.0.dev0,>=1.19.0 from https://files.pythonhosted.org/packages/9c/8d/bff87fc722553a5691d8514da5523c23547f3894189ba03b57592e37bdc2/google_auth-2.22.0-py2.py3-none-any.whl.metadata
  Downloading google_auth-2.22.0-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting joblib>=1.1.1 (from scikit-learn>=0.21.3->seqeval->tf-models-official>=2.5.1->object-detection==0.1)
  Obtaining dependency information for joblib>=1.1.1 from https://files.pythonhosted.org/packages/28/08/9dcdaa5aac4634e4c23af26d92121f7ce445c630efa0d3037881ae2407fb/joblib-1.3.1-py3-none-any.whl.metadata
  Downloading joblib-1.3.1-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn>=0.21.3->seqeval->tf-models-official>=2.5.1->object-detection==0.1)
  Obtaining dependency information for threadpoolctl>=2.0.0 from ht

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.5/120.5 kB 18.1 MB/s eta 0:00:00
Downloading google_auth-2.22.0-py2.py3-none-any.whl (181 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.8/181.8 kB 31.0 MB/s eta 0:00:00
Downloading scikit_learn-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.1/11.1 MB 67.6 MB/s eta 0:00:00
Downloading grpcio-1.56.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/5.2 MB 91.5 MB/s eta 0:00:00
Downloading tensorflow_estimator-2.13.0-py2.py3-none-any.whl (440 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 440.8/440.8 kB 54.8 MB/s eta 0:00:00
Downloading array_record-0.4.0-py38-none-any.whl (3.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 92.4 MB/s eta 0:00:00
Downloading click-8.1.6-py3-none-any.whl (97 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 19.9 MB/s eta 0:00:00
Downloading googl

      Successfully uninstalled numpy-1.22.3
  Attempting uninstall: keras
    Found existing installation: keras 2.9.0
    Uninstalling keras-2.9.0:
      Successfully uninstalled keras-2.9.0
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.46.1
    Uninstalling grpcio-1.46.1:
      Successfully uninstalled grpcio-1.46.1
  Attempting uninstall: absl-py
    Found existing installation: absl-py 1.0.0
    Uninstalling absl-py-1.0.0:
      Successfully uninstalled absl-py-1.0.0
  Attempting uninstall: requests
    Found existing installation: requests 2.22.0
    Uninstalling requests-2.22.0:
      Successfully uninstalled requests-2.22.0
  Attempting uninstall: google-auth
    Found existing installation: google-auth 2.6.6
    Uninstalling google-auth-2.6.6:
      Successfully uninstalled google-auth-2.6.6
  Attempting uninstall: google-auth-oauthlib
    Found existing installation: google-auth-oauthlib 0.4.6
    Uninstalling google-auth-oauthlib-0.4.6:
      Succes

Collecting greenlet>=2.0.0 (from gevent->sagemaker-training)
  Downloading greenlet-2.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (618 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 618.5/618.5 kB 36.8 MB/s eta 0:00:00
Downloading paramiko-3.3.1-py3-none-any.whl (224 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.8/224.8 kB 37.7 MB/s eta 0:00:00
Downloading boto3-1.28.15-py3-none-any.whl (135 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 135.8/135.8 kB 34.1 MB/s eta 0:00:00
Downloading gevent-23.7.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 44.1 MB/s eta 0:00:00
Downloading botocore-1.31.15-py3-none-any.whl (11.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.1/11.1 MB 44.4 MB/s eta 0:00:00
Downloading cryptography-41.0.2-cp37-abi3-manylinux_2_28_x86_64.whl (4.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 66.2 MB/s eta 0:00:00
Downloading zope.event-5.0-py3-none-an

[24Bb153b0a: Pushing  994.6MB/3.469GB[22A[2K[26A[2K[23A[2K[25A[2K[26A[2K[25A[2K[23A[2K[23A[2K[25A[2K[26A[2K[26A[2K[24A[2KPushing  2.135MB/3.469GB[26A[2K[25A[2K[26A[2K[22A[2K[26A[2K[24A[2K[23A[2K[26A[2K[24A[2K[21A[2K[21A[2K[23A[2K[21A[2K[26A[2K[25A[2K[24A[2K[25A[2K[25A[2K[26A[2K[23A[2K[26A[2K[26A[2K[25A[2K[25A[2K[26A[2K[25A[2K[23A[2K[25A[2K[26A[2K[25A[2K[23A[2K[25A[2K[24A[2K[26A[2K[25A[2K[23A[2K[25A[2K[24A[2K[25A[2K[24A[2K[23A[2K[25A[2K[26A[2K[25A[2K[26A[2K[20A[2K[26A[2K[26A[2K[24A[2K[25A[2K[26A[2K[25A[2K[20A[2K[23A[2K[26A[2K[20A[2K[26A[2K[25A[2K[23A[2K[25A[2K[26A[2K[25A[2K[26A[2K[20A[2K[23A[2K[23A[2K[20A[2K[25A[2K[26A[2K[25A[2K[20A[2K[26A[2K[24A[2K[26A[2K[23A[2K[20A[2K[26A[2K[23A[2K[20A[2K[25A[2K[23A[2K[25A[2K[23A[2K[26A[2K[23A[2K[25A[2K[23A[2K[20A[2K[23A[2K[23A[2K[24A[2K[20A[2K[

[24Bb153b0a: Pushing  2.703GB/3.469GB[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2

[24Bb153b0a: Pushed   3.505GB/3.469GB[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2

To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [9]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

862328613582.dkr.ecr.us-east-1.amazonaws.com/tf2-object-detection:20230731154214


## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be ajusted if you were to experiment with other architectures.

In [None]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint
wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

## Edit pipeline.config file

The [`pipeline.config`](source_dir/pipeline.config) in the `source_dir` folder should be updated when you experiment with different models. The different config files are available [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).

>Note: The provided `pipeline.config` file works well with the `EfficientDet` model. You would need to modify it when working with other models.

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [None]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir":"/opt/training",        
        "pipeline_config_path": "pipeline_EfficientDet.config",
        "num_train_steps": "200",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.m5.2xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)


## Improve on the intial model

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the `pipeline.config` file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. Justify your choices of augmentations in the writeup.

Keep in mind that the following are also available:
* experiment with the optimizer: type of optimizer, learning rate, scheduler etc
* experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.
* visualize results on the test frames using the `2_deploy_model` notebook available in this repository.

In the cell below, write down all the different approaches you have experimented with, why you have chosen them and what you would have done if you had more time and resources. Justify your choices using the tensorboard visualizations (take screenshots and insert them in your writeup), the metrics on the evaluation set and the generated animation you have created with [this tool](../2_run_inference/2_deploy_model.ipynb).

### Due to big issues and training failures, accordding to this post (https://github.com/udacity/cd2688-object-detection-in-urban-environment-project/pull/3) only two more models are working: SSD MobileNet V2 FPNLite 640x640 and SSD ResNet50 V1 FPN 640x640 (RetinaNet50)

##  SSD MobileNet V2 FPNLite 640x640

In [10]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint

wget -O /tmp/ssd_mobilenet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
tar -zxvf /tmp/ssd_mobilenet.tar.gz --strip-components 2 --directory source_dir/checkpoint ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint

mkdir: cannot create directory ‘/tmp/checkpoint’: File exists
mkdir: cannot create directory ‘source_dir/checkpoint’: File exists
--2023-07-31 15:54:59--  http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.253.122.128, 2607:f8b0:4004:c09::80
Connecting to download.tensorflow.org (download.tensorflow.org)|172.253.122.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20518283 (20M) [application/x-tar]
Saving to: ‘/tmp/ssd_mobilenet.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 10.1M 2s
    50K .......... .......... .......... .......... ..........  0% 12.2M 2s
   100K .......... .......... .......... .......... ..........  0% 7.69M 2s
   150K .......... .......... .......... .......... ..........  0% 15.0M 2s
   200K .......... .......... .......... .......... ..........  1% 16.1M 2s
   250K .....

  5000K .......... .......... .......... .......... .......... 25% 10.7M 1s
  5050K .......... .......... .......... .......... .......... 25%  284M 1s
  5100K .......... .......... .......... .......... .......... 25% 14.7M 1s
  5150K .......... .......... .......... .......... .......... 25% 61.0M 1s
  5200K .......... .......... .......... .......... .......... 26% 9.15M 1s
  5250K .......... .......... .......... .......... .......... 26%  328M 1s
  5300K .......... .......... .......... .......... .......... 26%  359M 1s
  5350K .......... .......... .......... .......... .......... 26%  191M 1s
  5400K .......... .......... .......... .......... .......... 27%  375M 1s
  5450K .......... .......... .......... .......... .......... 27% 14.9M 1s
  5500K .......... .......... .......... .......... .......... 27%  350M 1s
  5550K .......... .......... .......... .......... .......... 27%  277M 1s
  5600K .......... .......... .......... .......... .......... 28% 61.2M 1s
  5650K ....

 10400K .......... .......... .......... .......... .......... 52% 68.3M 0s
 10450K .......... .......... .......... .......... .......... 52%  298M 0s
 10500K .......... .......... .......... .......... .......... 52%  108M 0s
 10550K .......... .......... .......... .......... .......... 52%  429M 0s
 10600K .......... .......... .......... .......... .......... 53%  409M 0s
 10650K .......... .......... .......... .......... .......... 53%  399M 0s
 10700K .......... .......... .......... .......... .......... 53%  470M 0s
 10750K .......... .......... .......... .......... .......... 53%  311M 0s
 10800K .......... .......... .......... .......... .......... 54%  412M 0s
 10850K .......... .......... .......... .......... .......... 54%  353M 0s
 10900K .......... .......... .......... .......... .......... 54%  462M 0s
 10950K .......... .......... .......... .......... .......... 54%  449M 0s
 11000K .......... .......... .......... .......... .......... 55%  385M 0s
 11050K ....

 15800K .......... .......... .......... .......... .......... 79% 38.3M 0s
 15850K .......... .......... .......... .......... .......... 79%  290M 0s
 15900K .......... .......... .......... .......... .......... 79%  350M 0s
 15950K .......... .......... .......... .......... .......... 79% 50.7M 0s
 16000K .......... .......... .......... .......... .......... 80%  307M 0s
 16050K .......... .......... .......... .......... .......... 80%  349M 0s
 16100K .......... .......... .......... .......... .......... 80%  220M 0s
 16150K .......... .......... .......... .......... .......... 80%  334M 0s
 16200K .......... .......... .......... .......... .......... 81%  314M 0s
 16250K .......... .......... .......... .......... .......... 81%  381M 0s
 16300K .......... .......... .......... .......... .......... 81%  201M 0s
 16350K .......... .......... .......... .......... .......... 81%  146M 0s
 16400K .......... .......... .......... .......... .......... 82%  245M 0s
 16450K ....

ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/ckpt-0.data-00000-of-00001
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/checkpoint
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/ckpt-0.index


In [11]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir":"/opt/training",        
        "pipeline_config_path": "pipeline_sdd_mobile.config",
        "num_train_steps": "200",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.m5.2xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

Using provided s3_resource


INFO:sagemaker:Creating training-job with name: tf2-object-detection-2023-07-31-15-55-01-640


2023-07-31 15:55:03 Starting - Starting the training job...
2023-07-31 15:55:19 Starting - Preparing the instances for training......
2023-07-31 15:56:19 Downloading - Downloading input data...
2023-07-31 15:56:44 Training - Downloading the training image...............
2023-07-31 15:59:20 Training - Training image download completed. Training in progress.[34m2023-07-31 15:59:20,044 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-07-31 15:59:20,047 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-07-31 15:59:20,060 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-07-31 15:59:20,063 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-07-31 15:59:20,076 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-07-31 15:59:20,078 sagemaker-training-to

[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mW0731 15:59:32.070363 140509203113792 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mInstructions for updating:[0m
[34m`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.[0m
[34mW0731 15:59:34.642016 140509203113792 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34m`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 i

[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mW0731 15:59:45.199342 140085602756416 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mW0731 15:59:46.172819 140085602756416 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mINFO:tensorflow:Waiting for new checkpoint at /opt/training[0m
[34mI0731 15:59:48.385236 140085602756416 c

UnexpectedStatusException: Error for Training job tf2-object-detection-2023-07-31-15-55-01-640: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage ""
Command "/bin/sh -c ./run_training.sh --model_dir /opt/training --num_train_steps 200 --pipeline_config_path pipeline_sdd_mobile.config --sample_1_of_n_eval_examples 1", exit code: 1

## SSD ResNet50 V1 FPN 640x640 (RetinaNet50)

SSD: Single Shot MultiBox Detector
SSD is based on a feed-forward convolutional network that producesa fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes, followed by a non-maximum suppression step to produce the
final detections. The early network layers are based on a standard architecture used for high quality image classification. (from:https://arxiv.org/pdf/1512.02325.pdf)

In [None]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint

wget -O /tmp/ssd_resnet50.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
tar -zxvf /tmp/ssd_resnet50.tar.gz --strip-components 2 --directory source_dir/checkpoint ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint

In [None]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir":"/opt/training",        
        "pipeline_config_path": "pipeline_ssd.config",
        "num_train_steps": "200",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)