# Tensorflow Object Detection API and AWS Sagemaker


## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [1]:
%%capture
%pip install tensorflow_io sagemaker -U

In [2]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

Save the IAM role in a variable called `role`. This would be useful when training the model.

In [3]:
role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::157430746956:role/service-role/AmazonSageMaker-ExecutionRole-20230329T033942


In [4]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
        'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://object-detection-in-urban-env/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the the repository
* build the docker image and push it 
* print the container name

In [5]:
%%bash

# clone the repo and get the scripts
git clone https://github.com/tensorflow/models.git docker/models

# get model_main and exporter_main files from TF2 Object Detection GitHub repository
cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
cp docker/models/research/object_detection/model_main_tf2.py source_dir

Cloning into 'docker/models'...


In [6]:
# build and push the docker image. This code can be commented after being ran once.
# # This will take around 10 mins.
# image_name = 'tf2-object-detection'
# !sh ./docker/build_and_push.sh $image_name

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Building image with name tf2-object-detection
Sending build context to Docker daemon  723.5MB
Step 1/17 : FROM tensorflow/tensorflow:2.9.0-gpu
2.9.0-gpu: Pulling from tensorflow/tensorflow

[1B17ec1767: Pulling fs layer 
[1B9ecd2bff: Pulling fs layer 
[1B4ae53552: Pulling fs layer 
[1B2d09b8c4: Pulling fs layer 
[1B0d530989: Pulling fs layer 
[1B81af025b: Pulling fs layer 
[1Bc129f45e: Pulling fs layer 
[1B8fcb70c6: Pulling fs layer 
[1B9aa4a247: Pulling fs layer 
[1B3100c8d1: Pulling fs layer 
[1B3a6b487b: Pulling fs layer 
[1Be8773234: Pulling fs layer 
[1B36c9476c: Pulling fs layer 


[3Be8773234: Extracting  163.2MB/583.3MBB[14A[2K[13A[2K[12A[2K[14A[2K[11A[2K[10A[2K[14A[2K[14A[2K[9A[2K[14A[2K[9A[2K[8A[2K[9A[2K[9A[2K[14A[2K[9A[2K[14A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[9A[2K[5A[2K[9A[2K[5A[2K[8A[2K[8A[2K[14A[2K[4A[2K[8A[2K[9A[2K[8A[2K[14A[2K[8A[2K[14A[2K[9A[2K[9A[2K[9A[2K[9A[2K[8A[2K[3A[2K[14A[2K[3A[2K[8A[2K[14A[2K[9A[2K[8A[2K[9A[2K[14A[2K[9A[2K[3A[2K[14A[2KDownloading  148.4MB/1.913GB[3A[2K[9A[2K[8A[2K[9A[2K[8A[2K[14A[2K[3A[2K[8A[2K[9A[2K[8A[2K[9A[2K[9A[2K[8A[2K[14A[2K[9A[2K[14A[2K[3A[2K[3A[2K[9A[2K[3A[2K[9A[2K[8A[2K[9A[2K[3A[2K[8A[2K[3A[2K[8A[2K[3A[2K[8A[2K[3A[2K[8A[2K[9A[2K[8A[2K[3A[2K[8A[2K[9A[2K[14A[2K[8A[2K[9A[2K[3A[2K[13A[2K[3A[2K[8A[2K[3A[2K[13A[2K[8A[2K[9A[2K[8A[2K[13A[2K[3A[2K[9A[2K[8A[2K[9A[2K[3A[2K[3A[2K[9A[2K[13A[2K[9A[2K[8A[2K[13A[2

  gpg-wks-server gpgconf gpgsm gpgv
11 upgraded, 119 newly installed, 0 to remove and 110 not upgraded.
Need to get 59.5 MB of archives.
After this operation, 631 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-wks-client amd64 2.2.19-3ubuntu2.2 [97.4 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 dirmngr amd64 2.2.19-3ubuntu2.2 [330 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-wks-server amd64 2.2.19-3ubuntu2.2 [90.2 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gnupg-utils amd64 2.2.19-3ubuntu2.2 [481 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-agent amd64 2.2.19-3ubuntu2.2 [232 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg amd64 2.2.19-3ubuntu2.2 [482 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpgconf amd64 2.2.19-3ubuntu2.2 [124 kB]
Get:8 http://archive.ubuntu.com/ubuntu foc

Get:79 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libglx0 amd64 1.3.2-1~ubuntu0.20.04.2 [32.5 kB]
Get:80 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgl1 amd64 1.3.2-1~ubuntu0.20.04.2 [85.8 kB]
Get:81 http://archive.ubuntu.com/ubuntu focal/main amd64 xorg-sgml-doctools all 1:1.11-1 [12.9 kB]
Get:82 http://archive.ubuntu.com/ubuntu focal/main amd64 x11proto-dev all 2019.2-1ubuntu1 [594 kB]
Get:83 http://archive.ubuntu.com/ubuntu focal/main amd64 x11proto-core-dev all 2019.2-1ubuntu1 [2620 B]
Get:84 http://archive.ubuntu.com/ubuntu focal/main amd64 libxau-dev amd64 1:1.0.9-0ubuntu1 [9552 B]
Get:85 http://archive.ubuntu.com/ubuntu focal/main amd64 libxdmcp-dev amd64 1:1.1.3-0ubuntu1 [25.3 kB]
Get:86 http://archive.ubuntu.com/ubuntu focal/main amd64 xtrans-dev all 1.4.0-1 [68.9 kB]
Get:87 http://archive.ubuntu.com/ubuntu focal/main amd64 libpthread-stubs0-dev amd64 0.4-1 [5384 B]
Get:88 http://archive.ubuntu.com/ubuntu focal/main amd64 libxcb1-dev amd64 1.14-

Selecting previously unselected package libcbor0.6:amd64.
Preparing to unpack .../005-libcbor0.6_0.6.0-0ubuntu1_amd64.deb ...
Unpacking libcbor0.6:amd64 (0.6.0-0ubuntu1) ...
Selecting previously unselected package libdrm-common.
Preparing to unpack .../006-libdrm-common_2.4.107-8ubuntu1~20.04.2_all.deb ...
Unpacking libdrm-common (2.4.107-8ubuntu1~20.04.2) ...
Selecting previously unselected package libdrm2:amd64.
Preparing to unpack .../007-libdrm2_2.4.107-8ubuntu1~20.04.2_amd64.deb ...
Unpacking libdrm2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Selecting previously unselected package libedit2:amd64.
Preparing to unpack .../008-libedit2_3.1-20191231-1_amd64.deb ...
Unpacking libedit2:amd64 (3.1-20191231-1) ...
Selecting previously unselected package libfido2-1:amd64.
Preparing to unpack .../009-libfido2-1_1.3.1-1ubuntu2_amd64.deb ...
Unpacking libfido2-1:amd64 (1.3.1-1ubuntu2) ...
Selecting previously unselected package libxau6:amd64.
Preparing to unpack .../010-libxau6_1%3a1.0.9-0ubuntu1_

Selecting previously unselected package libxcb-present0:amd64.
Preparing to unpack .../052-libxcb-present0_1.14-2_amd64.deb ...
Unpacking libxcb-present0:amd64 (1.14-2) ...
Selecting previously unselected package libxcb-sync1:amd64.
Preparing to unpack .../053-libxcb-sync1_1.14-2_amd64.deb ...
Unpacking libxcb-sync1:amd64 (1.14-2) ...
Selecting previously unselected package libxcb-xfixes0:amd64.
Preparing to unpack .../054-libxcb-xfixes0_1.14-2_amd64.deb ...
Unpacking libxcb-xfixes0:amd64 (1.14-2) ...
Selecting previously unselected package libxshmfence1:amd64.
Preparing to unpack .../055-libxshmfence1_1.3-1_amd64.deb ...
Unpacking libxshmfence1:amd64 (1.3-1) ...
Selecting previously unselected package libegl-mesa0:amd64.
Preparing to unpack .../056-libegl-mesa0_21.2.6-0ubuntu0.1~20.04.2_amd64.deb ...
Unpacking libegl-mesa0:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Selecting previously unselected package libegl1:amd64.
Preparing to unpack .../057-libegl1_1.3.2-1~ubuntu0.20.04.2_amd64.deb .

Selecting previously unselected package libwebpdemux2:amd64.
Preparing to unpack .../097-libwebpdemux2_0.6.1-2ubuntu0.20.04.1_amd64.deb ...
Unpacking libwebpdemux2:amd64 (0.6.1-2ubuntu0.20.04.1) ...
Selecting previously unselected package libwebpmux3:amd64.
Preparing to unpack .../098-libwebpmux3_0.6.1-2ubuntu0.20.04.1_amd64.deb ...
Unpacking libwebpmux3:amd64 (0.6.1-2ubuntu0.20.04.1) ...
Selecting previously unselected package libxcb-randr0:amd64.
Preparing to unpack .../099-libxcb-randr0_1.14-2_amd64.deb ...
Unpacking libxcb-randr0:amd64 (1.14-2) ...
Selecting previously unselected package libxslt1.1:amd64.
Preparing to unpack .../100-libxslt1.1_1.1.34-4ubuntu0.20.04.1_amd64.deb ...
Unpacking libxslt1.1:amd64 (1.1.34-4ubuntu0.20.04.1) ...
Selecting previously unselected package mesa-vulkan-drivers:amd64.
Preparing to unpack .../101-mesa-vulkan-drivers_21.2.6-0ubuntu0.1~20.04.2_amd64.deb ...
Unpacking mesa-vulkan-drivers:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Selecting previously unsel

Setting up gpg-wks-server (2.2.19-3ubuntu2.2) ...
Setting up libxcb-dri2-0:amd64 (1.14-2) ...
Setting up libdrm2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up python3-lxml:amd64 (4.5.0-1ubuntu0.5) ...
Setting up libxcb-randr0:amd64 (1.14-2) ...
Setting up libx11-6:amd64 (2:1.6.9-2ubuntu1.2) ...
Setting up libfontconfig1:amd64 (2.13.1-2ubuntu3) ...
Setting up libxmuu1:amd64 (2:1.1.3-0ubuntu1) ...
Setting up libdrm-amdgpu1:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up libxcb-dri3-0:amd64 (1.14-2) ...
Setting up mesa-vulkan-drivers:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libdrm-nouveau2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up libxcb1-dev:amd64 (1.14-2) ...
Setting up gpg-wks-client (2.2.19-3ubuntu2.2) ...
Setting up libxrender1:amd64 (1:0.9.10-1) ...
Setting up libgbm1:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libdrm-radeon1:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up openssh-client (1:8.2p1-4ubuntu0.5) ...
Setting up libdrm-intel1:amd64 (2.4.107-8ubuntu1~2

     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 588.2/588.2 MB 1.2 MB/s eta 0:00:00
Collecting tensorflow-text~=2.11.0
  Downloading tensorflow_text-2.11.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.8/5.8 MB 49.9 MB/s eta 0:00:00
Collecting pyyaml<6.0,>=5.1
  Downloading PyYAML-5.4.1-cp38-cp38-manylinux1_x86_64.whl (662 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 662.4/662.4 kB 61.4 MB/s eta 0:00:00
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 76.9 MB/s eta 0:00:00
Collecting psutil>=5.4.3
  Downloading psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 280.2/280.2 kB 47.5 MB/s eta 0:00:00
Collecting google-api-python-client>=1.6.7
  Downloading google_api_python_cli

Collecting tqdm
  Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.1/77.1 kB 17.1 MB/s eta 0:00:00
Collecting python-slugify
  Downloading python_slugify-8.0.1-py2.py3-none-any.whl (9.7 kB)
Collecting charset-normalizer<4,>=2
  Downloading charset_normalizer-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (195 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 195.9/195.9 kB 33.2 MB/s eta 0:00:00
Collecting protobuf<4,>3.12.2
  Downloading protobuf-3.19.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 15.8 MB/s eta 0:00:00
Collecting flatbuffers>=2.0
  Downloading flatbuffers-23.3.3-py2.py3-none-any.whl (26 kB)
Collecting tensorboard<2.12,>=2.11
  Downloading tensorboard-2.11.2-py3-none-any.whl (6.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 82.4 MB/s eta 0:00:00
Collecting tensorflow-estimator<2.12,>=2.11.0
  Downloading tensorflow_

Building wheels for collected packages: object-detection, avro-python3, pycocotools, crcmod, dill, kaggle, seqeval, docopt, promise
  Building wheel for object-detection (setup.py): started
  Building wheel for object-detection (setup.py): finished with status 'done'
  Created wheel for object-detection: filename=object_detection-0.1-py3-none-any.whl size=1466762 sha256=39412f6e32c2a78dc11986d26f4ab6ae71e48257afb3dd6d6d257e6935c6f52e
  Stored in directory: /tmp/pip-ephem-wheel-cache-v9fbijmp/wheels/28/d2/ce/f2754826bc8f50adf45d76a4c3cffa1a58dd936429295e0ddd
  Building wheel for avro-python3 (setup.py): started
  Building wheel for avro-python3 (setup.py): finished with status 'done'
  Created wheel for avro-python3: filename=avro_python3-1.10.2-py3-none-any.whl size=43992 sha256=3044446c44c44cdf51cb3f4abd244c42eb76efb61d1af431b102eef19dfa7974
  Stored in directory: /root/.cache/pip/wheels/bb/73/e9/d273421f5723c4bf544dcf9eb097bda94421ef8d3252699f0a
  Building wheel for pycocotools (pypr

[0mRemoving intermediate container 6d5af61682c3
 ---> e248e536b5c7
Step 15/17 : ENV TF_CPP_MIN_LOG_LEVEL 3
 ---> Running in 55b6f20adbe4
Removing intermediate container 55b6f20adbe4
 ---> 276d94013d87
Step 16/17 : RUN pip3 install sagemaker-training
 ---> Running in fb6b0bc430ae
Collecting sagemaker-training
  Downloading sagemaker_training-4.4.8.tar.gz (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.9/57.9 kB 11.3 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting boto3
  Downloading boto3-1.26.101-py3-none-any.whl (135 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 135.5/135.5 kB 25.6 MB/s eta 0:00:00
Collecting retrying>=1.3.3
  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Collecting gevent
  Downloading gevent-22.10.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 78.8 MB/s eta 0:00:00
Collecting inotify_simple=

[6Bfae0f58c: Pushing  156.7MB/3.681GB[23A[2K[23A[2K[25A[2K[23A[2K[26A[2K[23A[2K[25A[2K[24A[2K[25A[2K[23A[2K[24A[2K[26A[2K[26A[2K[22A[2K[26A[2K[23A[2K[23A[2K[25A[2K[21A[2K[25A[2K[23A[2K[26A[2K[23A[2K[24A[2K[26A[2K[23A[2K[26A[2K[25A[2K[26A[2K[24A[2K[23A[2K[25A[2K[25A[2K[26A[2K[24A[2K[25A[2K[26A[2K[25A[2K[26A[2K[20A[2K[26A[2K[25A[2K[24A[2K[26A[2K[20A[2K[26A[2K[20A[2K[24A[2K[26A[2K[25A[2K[26A[2K[25A[2K[25A[2K[24A[2K[20A[2K[25A[2K[24A[2K[20A[2K[25A[2K[20A[2K[25A[2K[24A[2K[25A[2K[20A[2K[24A[2K[20A[2K[26A[2K[20A[2K[25A[2K[20A[2K[24A[2K[19A[2K[20A[2K[24A[2K[19A[2K[26A[2K[24A[2K[19A[2K[26A[2K[24A[2K[19A[2K[20A[2K[24A[2K[19A[2K[24A[2K[26A[2K[24A[2K[26A[2K[25A[2K[26A[2K[25A[2K[19A[2K[20A[2K[24A[2K[26A[2K[26A[2K[26A[2K[26A[2K[25A[2K[20A[2K[26A[2K[24A[2K[20A[2K[26A[2K[19A[2K[20A[2K[26A[2

[6Bfae0f58c: Pushing  1.436GB/3.681GB[7A[2K[12A[2K[6A[2K[7A[2K[6A[2K[12A[2K[6A[2K[7A[2K[6A[2K[24A[2K[6A[2K[7A[2K[6A[2K[24A[2K[6A[2K[7A[2K[6A[2K[24A[2K[7A[2K[24A[2K[6A[2K[12A[2K[24A[2K[7A[2K[6A[2K[24A[2K[12A[2K[24A[2K[7A[2K[6A[2K[24A[2K[12A[2K[24A[2K[7A[2K[6A[2K[7A[2K[6A[2K[7A[2K[6A[2K[7A[2K[6A[2K[7A[2K[24A[2K[12A[2K[7A[2K[24A[2K[7A[2K[12A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[12A[2K[24A[2K[12A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[12A[2K[6A[2K[6A[2K[7A[2K[12A[2K[6A[2K[12A[2K[6A[2K[6A[2K[6A[2K[12A[2K[6A[2K[6A[2K[24A[2K[12A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[12A[2K[6A[2K[24A[2K[6A[2K[12A[2K[24A[2K[12A[2K[24A[2K[12A[2K[6A[2K[24A[2K[12A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[12A[2K[6A[2K[24A

[24B3d6031c: Pushing  1.829GB/3.362GB[12A[2K[6A[2K[12A[2K[24A[2K[12A[2K[6A[2K[24A[2K[12A[2K[6A[2K[12A[2K[6A[2K[24A[2K[6A[2K[24A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[24A[2K[12A[2K[12A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[24A[2K[6A[2K[12A[2K[6A[2K[12A[2K[24A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[12A[2K[24A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[12A[2K[6A[2K[24A[2K[6A[2K[12A[2K[6A[2K[24A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[6A[2K[6A[2K[12A[2K[12A[2K[6A[2K[6A[2K[6A[2K[6A[2K[12A[2K[6A[2K[12A[2K[24A[2K[12A[2K[12A[2K[24A[2K[6A[2K[12A[2K[12A[2K[12A[2K[6A[2K[12A[2K[6A[2K[12A[2K[12A[2K[6A[2K[6A[2K[6A[2K[12A[2K[6A[2K[12A[2K[6A[2K[6A[2K[12A[2K[6A[2K[12

[6Bfae0f58c: Pushing  3.513GB/3.681GB[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[6A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6

[24B3d6031c: Pushed   3.394GB/3.362GB[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[6A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[24A[2K[24A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[24A[2K[6A[2K[24A[2K[24A[2K[24A[2K[6A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[24A[2K[6A[2K[6A[2K[24A[2K[24A[2K[24A

To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [7]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

157430746956.dkr.ecr.us-east-1.amazonaws.com/tf2-object-detection:20230329035020


## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be ajusted if you were to experiment with other architectures.

In [8]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint
wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.data-00000-of-00001
efficientdet_d1_coco17_tpu-32/checkpoint/checkpoint
efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.index


--2023-03-29 04:12:27--  http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.253.62.128, 2607:f8b0:4004:c08::80
Connecting to download.tensorflow.org (download.tensorflow.org)|172.253.62.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51839363 (49M) [application/x-tar]
Saving to: ‘/tmp/efficientdet.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 11.7M 4s
    50K .......... .......... .......... .......... ..........  0% 21.4M 3s
   100K .......... .......... .......... .......... ..........  0% 22.6M 3s
   150K .......... .......... .......... .......... ..........  0% 74.8M 2s
   200K .......... .......... .......... .......... ..........  0% 61.3M 2s
   250K .......... .......... .......... .......... ..........  0% 55.5M 2s
   300K .......... .......... .......... .......... ..........  0% 70.9M 2s
   350K ..

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [9]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir":"/opt/training",        
        "pipeline_config_path": "pipeline.config",
        "num_train_steps": "2000",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

INFO:sagemaker:Creating training-job with name: tf2-object-detection-2023-03-29-04-14-56-356


2023-03-29 04:15:06 Starting - Starting the training job...
2023-03-29 04:15:32 Starting - Preparing the instances for training......
2023-03-29 04:16:39 Downloading - Downloading input data...
2023-03-29 04:16:59 Training - Downloading the training image...............
2023-03-29 04:19:25 Training - Training image download completed. Training in progress...[34m2023-03-29 04:19:54,224 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-03-29 04:19:54,257 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-03-29 04:19:54,290 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-03-29 04:19:54,303 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train",
        "val": "/opt/ml/input/data/val

[34mcaused by: ['/usr/local/lib/python3.8/dist-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_18SourceLocationImplE']
[34mcaused by: ['/usr/local/lib/python3.8/dist-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN3tsl13GcsFileSystemE']
[34mINFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)[0m
[34mI0329 04:20:02.658517 140681070671680 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)[0m
[34mINFO:tensorflow:Maybe overwriting train_steps: 2000[0m
[34mI0329 04:20:02.662616 140681070671680 config_util.py:552] Maybe overwriting train_steps: 2000[0m
[34mINFO:tensorflow:Maybe overwriting use_bfloat16: False[0m
[34mI0329 04:20:02.662755 140681070671680 config_util.py:552] Maybe overwriting use_bfloat16: False[0m
[34mI032

[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mW0329 04:20:13.797564 140681070671680 deprecation.py:350] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mW0329 04:20:17.847210 140681070671680 deprecation.py:350] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mI0329 04:20:28.229179 140657943369472 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (

[34mINFO:tensorflow:Step 300 per-step time 0.637s[0m
[34mI0329 04:25:45.093753 140681070671680 model_lib_v2.py:705] Step 300 per-step time 0.637s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.27069592,
 'Loss/localization_loss': 0.021240393,
 'Loss/regularization_loss': 0.029550007,
 'Loss/total_loss': 0.32148632,
 'learning_rate': 0.010480001}[0m
[34mI0329 04:25:45.094061 140681070671680 model_lib_v2.py:708] {'Loss/classification_loss': 0.27069592,
 'Loss/localization_loss': 0.021240393,
 'Loss/regularization_loss': 0.029550007,
 'Loss/total_loss': 0.32148632,
 'learning_rate': 0.010480001}[0m
[34mINFO:tensorflow:Step 400 per-step time 0.637s[0m
[34mI0329 04:26:48.805218 140681070671680 model_lib_v2.py:705] Step 400 per-step time 0.637s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.33149824,
 'Loss/localization_loss': 0.017191237,
 'Loss/regularization_loss': 0.029557055,
 'Loss/total_loss': 0.37824652,
 'learning_rate': 0.0136400005}[0m
[34mI0329 04:26:

[34mINFO:tensorflow:Step 1700 per-step time 0.639s[0m
[34mI0329 04:40:39.328361 140681070671680 model_lib_v2.py:705] Step 1700 per-step time 0.639s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.27832377,
 'Loss/localization_loss': 0.015575209,
 'Loss/regularization_loss': 0.030304596,
 'Loss/total_loss': 0.32420358,
 'learning_rate': 0.05472}[0m
[34mI0329 04:40:39.328707 140681070671680 model_lib_v2.py:708] {'Loss/classification_loss': 0.27832377,
 'Loss/localization_loss': 0.015575209,
 'Loss/regularization_loss': 0.030304596,
 'Loss/total_loss': 0.32420358,
 'learning_rate': 0.05472}[0m
[34mINFO:tensorflow:Step 1800 per-step time 0.637s[0m
[34mI0329 04:41:43.066379 140681070671680 model_lib_v2.py:705] Step 1800 per-step time 0.637s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.2398444,
 'Loss/localization_loss': 0.014858757,
 'Loss/regularization_loss': 0.030452704,
 'Loss/total_loss': 0.28515586,
 'learning_rate': 0.05788}[0m
[34mI0329 04:41:43.066723 

[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mW0329 04:44:09.521589 140039281358656 deprecation.py:350] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mW0329 04:44:10.973599 140039281358656 deprecation.py:350] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mINFO:tensorflow:Waiting for new checkpoint at /opt/training[0m
[34mI0329 04:44:13.637825 140039281358656 c

[34mINFO:tensorflow:Waiting for new checkpoint at /opt/training[0m
[34mI0329 04:49:13.733589 140039281358656 checkpoint_utils.py:140] Waiting for new checkpoint at /opt/training[0m
[34mINFO:tensorflow:Timed-out waiting for a checkpoint.[0m
[34mI0329 04:49:22.747088 140039281358656 checkpoint_utils.py:203] Timed-out waiting for a checkpoint.[0m
[34mcreating index...[0m
[34mindex created![0m
[34mcreating index...[0m
[34mindex created![0m
[34mRunning per image evaluation...[0m
[34mEvaluate annotation type *bbox*[0m
[34mDONE (t=12.53s).[0m
[34mAccumulating evaluation results...[0m
[34mDONE (t=0.26s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.094
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.225
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.067
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.042
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=me

[34mI0329 04:49:41.917581 140398258190144 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m
[34mI0329 04:49:54.570603 140398258190144 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m
[34mI0329 04:50:03.035307 140398258190144 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m
[34mW0329 04:50:06.581950 140398258190144 save_impl.py:66] Skipping full serialization of Keras layer <object_detection.meta_architectures.ssd_meta_arch.SSDMetaArch object at 0x7fb0401d8eb0>, because it is not built.[0m
[34mW0329 04:51:14.382883 140398258190144 save.py:271] Found untraced functions such as WeightSharedConvolutionalBoxPredictor_layer_call_fn, WeightSharedConvolutionalBoxPredictor_layer_call_and_return_conditional_losses, WeightSharedConvolutionalBoxHead_layer_call_fn, WeightSharedConvolutionalBoxHead_layer_call_and_return_conditional_losses, WeightSharedConvolutionalClassH

You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)
