# Object Detection using TAO DetectNet_v2

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/embedded-transfer-learning-toolkit-software-stack-1200x670px.png" width="1080"> 

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained resnet18 model and train a ResNet-18 DetectNet_v2 model on the KITTI dataset
* Prune the trained detectnet_v2 model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Quantize the pruned model using QAT
* Run Inference on the trained model
* Export the pruned, quantized and retrained model to a .etlt file for deployment to DeepStream
* Run inference on the exported. etlt model to verify deployment using TensorRT

### Table of Contents

This notebook shows an example usecase of Object Detection using DetectNet_v2 in the Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Install the TAO Launcher](#head-1)
1. [Prepare dataset and pre-trained model](#head-2)
    1. [Download the dataset](#head-2-1)
    1. [Verify downloaded dataset](#head-2-2)
    1. [Prepare tfrecords from kitti format dataset](#head-2-3)
    2. [Download pre-trained model](#head-2-4)
2. [Provide training specification](#head-3)
3. [Run TAO training](#head-4)
4. [Evaluate trained models](#head-5)
5. [Prune trained models](#head-6)
6. [Retrain pruned models](#head-7)
7. [Evaluate retrained model](#head-8)
8. [Visualize inferences](#head-9)
9. [Model Export](#head-10)
    1. [Int8 Optimization](#head-10-1)
    2. [Generate TensorRT engine](#head-10-2)
10. [Verify Deployed Model](#head-11)
    1. [Inference using TensorRT engine](#head-11-1)
11. [QAT workflow](#head-12)
    1. [Convert pruned model to QAT and retrain](#head-12-1)
    2. [Evaluate QAT converted model](#head-12-2)
    3. [Export QAT trained model to int8](#head-12-3)
    4. [Evaluate a QAT trained model using the exported TensorRT engine](#head-12-4)
    5. [Inference using QAT engine](#head-12-5)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>
When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/detectnet_v2`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

*Note: This notebook currently is by default set up to run training using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` accordingly*

In [4]:
# Setting up env variables for cleaner command line commands.
import os

%env KEY=tlt_encode
%env NUM_GPUS=1
%env USER_EXPERIMENT_DIR=/workspace/tao-experiments/detectnet_v2
%env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/kitti_data

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/detectnet_v2

# Please define this local project directory that needs to be mapped to the TAO docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/detectnet_v2
# !PLEASE MAKE SURE TO UPDATE THIS PATH!.

os.environ["LOCAL_PROJECT_DIR"] = "/home/demo/Desktop/workspace/cv_samples_v1.3.0"

os.environ["LOCAL_DATA_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "kitti_data"
)
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "detectnet_v2"
)

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)
%env SPECS_DIR=/workspace/tao-experiments/detectnet_v2/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

env: KEY=tlt_encode
env: NUM_GPUS=1
env: USER_EXPERIMENT_DIR=/workspace/tao-experiments/detectnet_v2
env: DATA_DOWNLOAD_DIR=/workspace/tao-experiments/kitti_data
env: SPECS_DIR=/workspace/tao-experiments/detectnet_v2/specs
total 40
-rw-rw-r-- 1 demo demo  310 11월 23 18:36 detectnet_v2_tfrecords_kitti_trainval.txt
-rw-rw-r-- 1 demo demo 5557 11월 23 18:36 detectnet_v2_retrain_resnet18_kitti.txt
-rw-rw-r-- 1 demo demo 5620 11월 23 18:36 detectnet_v2_retrain_resnet18_kitti_qat.txt
-rw-rw-r-- 1 demo demo 2406 11월 23 18:36 detectnet_v2_inference_kitti_tlt.txt
-rw-rw-r-- 1 demo demo 2436 11월 23 18:36 detectnet_v2_inference_kitti_etlt.txt
-rw-rw-r-- 1 demo demo 2445 11월 23 18:36 detectnet_v2_inference_kitti_etlt_qat.txt
-rw-rw-r-- 1 demo demo 5551  1월  4 14:12 detectnet_v2_train_resnet18_kitti.txt


The cell below maps the project directory on your local host to a workspace directory in the TAO docker instance, so that the data and the results are mapped from in and out of the docker. For more information please refer to the [launcher instance](https://docs.nvidia.com/tao/tao-toolkit/tao_launcher.html) in the user guide.

When running this cell on AWS, update the drive_map entry with the dictionary defined below, so that you don't have permission issues when writing data into folders created by the TAO docker.

```json
drive_map = {
    "Mounts": [
            # Mapping the data directory
            {
                "source": os.environ["LOCAL_PROJECT_DIR"],
                "destination": "/workspace/tao-experiments"
            },
            # Mapping the specs directory.
            {
                "source": os.environ["LOCAL_SPECS_DIR"],
                "destination": os.environ["SPECS_DIR"]
            },
        ],
    "DockerOptions": {
        "user": "{}:{}".format(os.getuid(), os.getgid())
    }
}
```

In [5]:
# Mapping up the local directories to the TAO docker.
import json
mounts_file = os.path.expanduser("~/.tao_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
    "Mounts": [
        # Mapping the data directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
        # Mapping the specs directory.
        {
            "source": os.environ["LOCAL_SPECS_DIR"],
            "destination": os.environ["SPECS_DIR"]
        },
    ]
}

# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(drive_map, mfile, indent=4)

In [6]:
!cat ~/.tao_mounts.json

{
    "Mounts": [
        {
            "source": "/home/demo/Desktop/workspace/cv_samples_v1.3.0",
            "destination": "/workspace/tao-experiments"
        },
        {
            "source": "/home/demo/Desktop/workspace/cv_samples_v1.3.0/detectnet_v2/specs",
            "destination": "/workspace/tao-experiments/detectnet_v2/specs"
        }
    ]
}

## 1. Install the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.6.9 < 3.8.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

In [7]:
# SKIP this step IF you have already installed the TAO launcher wheel.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [8]:
# View the versions of the TAO launcher
!tao info

Configuration of the TAO Toolkit Instance
dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
format_version: 2.0
toolkit_version: 3.21.11
published_date: 11/08/2021


## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

We will be using the kitti object detection dataset for this example. To find more details, please visit http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download both, the left color images of the object dataset from [here](http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and, the training labels for the object dataset from [here](http://www.cvlibs.net/download.php?file=data_object_label_2.zip), and place the zip files in `$LOCAL_DATA_DIR`

The data will then be extracted to have
* training images in `$LOCAL_DATA_DIR/training/image_2`
* training labels in `$LOCAL_DATA_DIR/training/label_2`
* testing images in `$LOCAL_DATA_DIR/testing/image_2`

You may use this notebook with your own dataset as well. To use this example with your own dataset, please follow the same directory structure as mentioned below.

*Note: There are no labels for the testing images, therefore we use it just to visualize inferences for the trained model.*

### A. Download the dataset <a class="anchor" id="head-2-1"></a>
Once you have gotten the download links in your email, please populate them in place of the `KITTI_IMAGES_DOWNLOAD_URL` and the `KITTI_LABELS_DOWNLOAD_URL`. This next cell, will download the data and place in `$LOCAL_DATA_DIR`

In [None]:
import os
!mkdir -p $LOCAL_DATA_DIR
os.environ["URL_IMAGES"]=KITTI_IMAGES_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then wget $URL_IMAGES -O $LOCAL_DATA_DIR/data_object_image_2.zip; else echo "image archive already downloaded"; fi 
os.environ["URL_LABELS"]=KITTI_LABELS_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then wget $URL_LABELS -O $LOCAL_DATA_DIR/data_object_label_2.zip; else \ echo "label archive already downloaded"; fi 

### B. Verify downloaded dataset <a class="anchor" id="head-2-2"></a>

In [None]:
# Check the dataset is present
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi

In [None]:
# This may take a while: verify integrity of zip files 
!sha256sum $LOCAL_DATA_DIR/data_object_image_2.zip | cut -d ' ' -f 1 | grep -xq '^351c5a2aa0cd9238b50174a3a62b846bc5855da256b82a196431d60ff8d43617$' ; \
if test $? -eq 0; then echo "images OK"; else echo "images corrupt, redownload!" && rm -f $LOCAL_DATA_DIR/data_object_image_2.zip; fi 
!sha256sum $LOCAL_DATA_DIR/data_object_label_2.zip | cut -d ' ' -f 1 | grep -xq '^4efc76220d867e1c31bb980bbf8cbc02599f02a9cb4350effa98dbb04aaed880$' ; \
if test $? -eq 0; then echo "labels OK"; else echo "labels corrupt, redownload!" && rm -f $LOCAL_DATA_DIR/data_object_label_2.zip; fi 

In [None]:
# unpack downloaded datasets to $DATA_DOWNLOAD_DIR.
# The training images will be under $DATA_DOWNLOAD_DIR/training/image_2 and 
# labels will be under $DATA_DOWNLOAD_DIR/training/label_2.
# The testing images will be under $DATA_DOWNLOAD_DIR/testing/image_2.
!unzip -u $LOCAL_DATA_DIR/data_object_image_2.zip -d $LOCAL_DATA_DIR
!unzip -u $LOCAL_DATA_DIR/data_object_label_2.zip -d $LOCAL_DATA_DIR

In [9]:
# verify
import os

DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
num_training_images = len(os.listdir(os.path.join(DATA_DIR, "training/image_2")))
num_training_labels = len(os.listdir(os.path.join(DATA_DIR, "training/label_2")))
num_testing_images = len(os.listdir(os.path.join(DATA_DIR, "testing/image_2")))
print("Number of images in the train/val set. {}".format(num_training_images))
print("Number of labels in the train/val set. {}".format(num_training_labels))
print("Number of images in the test set. {}".format(num_testing_images))

Number of images in the train/val set. 7481
Number of labels in the train/val set. 7481
Number of images in the test set. 7518


In [10]:
# Sample kitti label.
!cat $LOCAL_DATA_DIR/training/label_2/000110.txt

Car 0.27 0 2.50 862.65 129.39 1241.00 304.96 1.73 1.74 4.71 5.50 1.30 8.19 3.07
Car 0.68 3 -0.76 1184.97 141.54 1241.00 187.84 1.52 1.60 4.42 22.39 0.48 24.57 -0.03
Car 0.00 1 1.73 346.64 175.63 449.93 248.90 1.58 1.76 4.18 -5.13 1.67 17.86 1.46
Car 0.00 0 1.75 420.44 170.72 540.83 256.12 1.65 1.88 4.45 -2.78 1.64 16.30 1.58
Car 0.00 0 -0.35 815.59 143.96 962.82 198.54 1.90 1.78 4.72 10.19 0.90 26.65 0.01
Car 0.00 1 -2.09 966.10 144.74 1039.76 182.96 1.80 1.65 3.55 19.49 0.49 35.99 -1.59
Van 0.00 2 -2.07 1084.26 132.74 1173.25 177.89 2.11 1.75 4.31 26.02 0.24 36.41 -1.45
Car 0.00 2 -2.13 1004.98 144.16 1087.13 178.96 1.64 1.70 3.91 21.91 0.30 36.47 -1.59
Car 0.00 2 1.77 407.73 178.44 487.07 230.28 1.55 1.71 4.50 -5.35 1.76 24.13 1.55
Car 0.00 1 1.45 657.19 166.33 702.65 198.71 1.50 1.71 4.44 3.39 1.22 35.96 1.55
Car 0.00 1 -1.46 599.30 171.76 631.96 197.12 1.58 1.71 3.75 0.39 1.54 47.31 -1.45
Car 0.00 0 -1.02 557.79 165.74 591.61 181.27 1.66 1.65 4.45 -3.89 0.91 80.12 -1.07

### C. Prepare tf records from kitti format dataset <a class="anchor" id="head-2-3"></a>

* Update the tfrecords spec file to take in your kitti format dataset
* Create the tfrecords using the detectnet_v2 dataset_convert 

*Note: TfRecords only need to be generated once.*

In [12]:
print("TFrecords conversion spec file for kitti training")
!cat $LOCAL_SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt

TFrecords conversion spec file for kitti training
kitti_config {
  root_directory_path: "/workspace/tao-experiments/kitti_data/training"
  image_dir_name: "image_2"
  label_dir_name: "label_2"
  image_extension: ".png"
  partition_mode: "random"
  num_partitions: 2
  val_split: 14
  num_shards: 10
}
image_directory_path: "/workspace/tao-experiments/kitti_data/training"


In [13]:
!echo $LOCAL_DATA_DIR
!echo $DATA_DIR
!echo $USER_EXPERIMENT_DIR
!echo $DATA_DOWNLOAD_DIR

#FileNotFoundError: [Errno 2] No such file or directory: '/workspace/tao-experiments/data/training/image_2'

/home/demo/Desktop/workspace/cv_samples_v1.3.0/kitti_data
/home/demo/Desktop/workspace/cv_samples_v1.3.0/kitti_data
/workspace/tao-experiments/detectnet_v2
/workspace/tao-experiments/kitti_data


In [14]:
# Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!tao detectnet_v2 dataset_convert \
                  -d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
                  -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

Converting Tfrecords for kitti trainval dataset
2022-01-04 14:14:35,538 [INFO] root: Registry: ['nvcr.io']
2022-01-04 14:14:35,583 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/demo/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
2022-01-04 05:14:40,151 [INFO] iva.detectnet_v2.dataio.build_converter: Instantiating a kitti converter
2022-01-04 05:14:40,151 [INFO] root: Instantiating a kitti converter
2022-01-04 05:14:40,151 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Creating output directory /workspace/tao-experiments/kitti_data/tfrecords/kitti_trainval
2022-01-04 05:14:40,151 [INFO] root: Gen

2022-01-04 14:14:47,747 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


In [15]:
!ls -rlt $LOCAL_DATA_DIR/tfrecords/kitti_trainval/

total 7148
-rw-r--r-- 1 root root  99198  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00000-of-00010
-rw-r--r-- 1 root root 103711  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00001-of-00010
-rw-r--r-- 1 root root 104599  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00002-of-00010
-rw-r--r-- 1 root root 104126  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00003-of-00010
-rw-r--r-- 1 root root  99506  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00004-of-00010
-rw-r--r-- 1 root root  97943  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00005-of-00010
-rw-r--r-- 1 root root 102953  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00006-of-00010
-rw-r--r-- 1 root root 107187  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00007-of-00010
-rw-r--r-- 1 root root  99198  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00008-of-00010
-rw-r--r-- 1 root root 105043  1월  4 14:14 kitti_trainval-fold-000-of-002-shard-00009-of-00010
-rw-r--r-- 1 root root 62267

### D. Download pre-trained model <a class="anchor" id="head-2-4"></a>
Download the correct pretrained model from the NGC model registry for your experiment. Please note that for DetectNet_v2, the input is expected to be 0-1 normalized with input channels in RGB order. Therefore, for optimum results please download model templates from `nvidia/tao/pretrained_detectnet_v2`. The templates are now organized as version strings. For example, to download a resnet18 model suitable for detectnet please resolve to the ngc object shown as `nvidia/tao/pretrained_detectnet_v2:resnet18`. 

All other models are in BGR order expect input preprocessing with mean subtraction and input channels. Using them as pretrained weights may result in suboptimal performance.

You may also use this notebook with the following purpose-built pretrained models 
* [PeopleNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplenet)
* [TrafficCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:trafficcamnet)
* [DashCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:dashcamnet)
* [FaceDetect-IR](https://ngc.nvidia.com/catalog/models/nvidia:tao:facedetectir) 

In [16]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_cat_linux.zip
--2022-01-04 14:15:27--  https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 54.230.169.2, 54.230.169.29, 54.230.169.114, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|54.230.169.2|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25122952 (24M) [application/zip]
Saving to: ‘/home/demo/Desktop/workspace/cv_samples_v1.3.0/ngccli/ngccli_cat_linux.zip’


2022-01-04 14:15:34 (3.23 MB/s) - ‘/home/demo/Desktop/workspace/cv_samples_v1.3.0/ngccli/ngccli_cat_linux.zip’ saved [25122952/25122952]

Archive:  /home/demo/Desktop/workspace/cv_samples_v1.3.0/ngccli/ngccli_cat_linux.zip
  inflating: /home/demo/Desktop/workspace/cv_samples_v1.3.0/ngccli/ngc  
 extracting: /home/demo/Desktop/workspace/cv_samples_v1.3.0/ngccli/ngc.md5  


In [17]:
# List models available in the model registry.
!ngc registry model list nvidia/tao/pretrained_detectnet_v2:*

+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU   | Memor | File  | Statu | Creat |
| on    | acy   | s     | Size  | Model | y Foo | Size  | s     | ed    |
|       |       |       |       |       | tprin |       |       | Date  |
|       |       |       |       |       | t     |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| vgg19 | 82.6  | 80    | 1     | V100  | 153.8 | 153.7 | UPLOA | Aug   |
|       |       |       |       |       |       | 7 MB  | D_COM | 24,   |
|       |       |       |       |       |       |       | PLETE | 2021  |
| vgg16 | 82.2  | 80    | 1     | V100  | 113.2 | 113.2 | UPLOA | Aug   |
|       |       |       |       |       |       | MB    | D_COM | 24,   |
|       |       |       |       |       |       |       | PLETE | 2021  |
| squee | 65.67 | 80    | 1     | V100  | 6.5   | 6.46  | UPLOA | Aug   |
| zenet |       |       |

In [18]:
# Create the target destination to download the model.
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/

In [19]:
# Download the pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_detectnet_v2:resnet18 \
    --dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet18

Downloaded 82.28 MB in 4m 48s, Download speed: 292.24 KB/s               
----------------------------------------------------
Transfer id: pretrained_detectnet_v2_vresnet18 Download status: Completed.
Downloaded local path: /home/demo/Desktop/workspace/cv_samples_v1.3.0/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18
Total files downloaded: 1 
Total downloaded size: 82.28 MB
Started at: 2022-01-04 14:15:50.503494
Completed at: 2022-01-04 14:20:38.803995
Duration taken: 4m 48s
----------------------------------------------------


In [20]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/pretrained_detectnet_v2_vresnet18

total 91160
-rw------- 1 demo demo 93345248  1월  4 14:20 resnet18.hdf5


## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Tfrecords for the train datasets
    * To use the newly generated tfrecords, update the dataset_config parameter in the spec file at `$SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt` 
    * Update the fold number to use for evaluation. In case of random data split, please use fold `0` only
    * For sequence-wise split, you may use any fold generated from the dataset convert tool
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [21]:
!cat $LOCAL_SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/kitti_data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tao-experiments/kitti_data/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "cyclist"
    value: "cyclist"
  }
  target_class_mapping {
    key: "pedestrian"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "person_sitting"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "van"
    value: "car"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
  

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models

*Note: The training may take hours to complete. Also, the remaining notebook, assumes that the training was done in single-GPU mode. When run in multi-GPU mode, please expect to update the pruning and inference steps with new pruning thresholds and updated parameters in the clusterfile.json accordingly for optimum performance.*

*Detectnet_v2 now supports restart from checkpoint. In case the training job is killed prematurely, you may resume training from the closest checkpoint by simply re-running the **same** command line. Please do make sure to use the <u>**same number of GPUs**</u> when restarting the training.*

*When running the training with NUM_GPUs>1, you may need to modify the `batch_size_per_gpu` and `learning_rate` to get similar mAP as a 1GPU training run. In most cases, scaling down the batch-size by a factor of NUM_GPU's or scaling up the learning rate by a factor of NUM_GPU's would be a good place to start.* 

In [None]:
!tao detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -k $KEY \
                        -n resnet18_detector \
                        --gpus $NUM_GPUS

2022-01-04 14:21:04,389 [INFO] root: Registry: ['nvcr.io']
2022-01-04 14:21:04,432 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/demo/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.












2022-01-04 05:21:09,362 [INFO] __main__: Loading experiment spec at /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt.
2022-01-04 05:21:09,363 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt
2022-01-04 05:21:09,758 [INFO] __main__: Cannot iterate over exactly 6434 samp







2022-01-04 05:21:16,903 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 3, 384, 1248) 0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 192, 624) 9472        input_1[0][0]                    
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 192, 624) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 192, 624) 0           bn_conv1[0][0]         

2022-01-04 05:21:16,932 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-01-04 05:21:16,932 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-01-04 05:21:16,932 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-01-04 05:21:16,932 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: 4
2022-01-04 05:21:16,932 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 6434, number of sources: 1, batch size per gpu: 4, steps: 1609


2022-01-04 05:21:17,016 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2022-01-04 05:21:17,229 [INFO] modulus.blocks.data_loaders.



2022-01-04 05:21:19,183 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-01-04 05:21:19,183 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-01-04 05:21:19,183 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-01-04 05:21:19,183 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: 4
2022-01-04 05:21:19,183 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 1047, number of sources: 1, batch size per gpu: 4, steps: 262
2022-01-04 05:21:19,205 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2022-01-04 05:21:19,391 [INFO] modulus.blocks.data_loaders.m

INFO:tensorflow:Graph was finalized.
2022-01-04 05:21:21,580 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2022-01-04 05:21:22,921 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2022-01-04 05:21:23,388 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2022-01-04 05:21:28,054 [INFO] tensorflow: Saving checkpoints for step-0.
INFO:tensorflow:epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.12858038, step = 0
2022-01-04 05:21:43,417 [INFO] tensorflow: epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.12858038, step = 0
2022-01-04 05:21:43,419 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 0/120: loss: 0.12858 learning rate: 0.00000 Time taken: 0:00:00 ETA: 0:00:00
2022-01-04 05:21:43,419 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.902
2022-01-04 05:21:47,112 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 12.304
I

INFO:tensorflow:epoch = 0.4934742075823493, learning_rate = 6.042486e-06, loss = 0.0032675501, step = 794 (5.167 sec)
2022-01-04 05:23:06,018 [INFO] tensorflow: epoch = 0.4934742075823493, learning_rate = 6.042486e-06, loss = 0.0032675501, step = 794 (5.167 sec)
2022-01-04 05:23:06,536 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.084
INFO:tensorflow:global_step/sec: 9.59859
2022-01-04 05:23:06,640 [INFO] tensorflow: global_step/sec: 9.59859
2022-01-04 05:23:09,168 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.998
INFO:tensorflow:epoch = 0.5239279055313859, learning_rate = 6.1135206e-06, loss = 0.0037687898, step = 843 (5.150 sec)
2022-01-04 05:23:11,168 [INFO] tensorflow: epoch = 0.5239279055313859, learning_rate = 6.1135206e-06, loss = 0.0037687898, step = 843 (5.150 sec)
2022-01-04 05:23:11,774 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.379
2022-01-04 05:23:14,415 [INFO] modulus.hooks.sample_counter_hook: Train Samples /

2022-01-04 05:24:34,378 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.389
INFO:tensorflow:global_step/sec: 8.6876
2022-01-04 05:24:34,487 [INFO] tensorflow: global_step/sec: 8.6876
68fcce82bbe0:53:95 [0] NCCL INFO Bootstrap : Using lo:127.0.0.1<0>
68fcce82bbe0:53:95 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
68fcce82bbe0:53:95 [0] NCCL INFO NET/IB : No device found.
68fcce82bbe0:53:95 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0>
68fcce82bbe0:53:95 [0] NCCL INFO Using network Socket
NCCL version 2.9.9+cuda11.3
68fcce82bbe0:53:95 [0] NCCL INFO Channel 00/32 :    0
68fcce82bbe0:53:95 [0] NCCL INFO Channel 01/32 :    0
68fcce82bbe0:53:95 [0] NCCL INFO Channel 02/32 :    0
68fcce82bbe0:53:95 [0] NCCL INFO Channel 03/32 :    0
68fcce82bbe0:53:95 [0] NCCL INFO Channel 04/32 :    0
68fcce82bbe0:53:95 [0] NCCL INFO Channel 05/32 :    0
68fcce82bbe0:53:95 [0] NCCL INFO Channel 06/32 :    0
68fcce82bb

2022-01-04 05:25:27,334 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.406
2022-01-04 05:25:29,885 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.205
INFO:tensorflow:epoch = 1.321939092604102, learning_rate = 8.3041095e-06, loss = 0.00078938063, step = 2127 (5.130 sec)
2022-01-04 05:25:30,186 [INFO] tensorflow: epoch = 1.321939092604102, learning_rate = 8.3041095e-06, loss = 0.00078938063, step = 2127 (5.130 sec)
2022-01-04 05:25:32,405 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.673
2022-01-04 05:25:34,950 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.296
INFO:tensorflow:epoch = 1.353635798632691, learning_rate = 8.405739e-06, loss = 0.0006961665, step = 2178 (5.167 sec)
2022-01-04 05:25:35,353 [INFO] tensorflow: epoch = 1.353635798632691, learning_rate = 8.405739e-06, loss = 0.0006961665, step = 2178 (5.167 sec)
2022-01-04 05:25:37,472 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.668


2022-01-04 05:26:54,260 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.526
2022-01-04 05:26:56,830 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.922
INFO:tensorflow:epoch = 1.8545680546923553, learning_rate = 1.01874275e-05, loss = 0.0008503259, step = 2984 (5.115 sec)
2022-01-04 05:26:57,840 [INFO] tensorflow: epoch = 1.8545680546923553, learning_rate = 1.01874275e-05, loss = 0.0008503259, step = 2984 (5.115 sec)
2022-01-04 05:26:59,354 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.616
2022-01-04 05:27:01,891 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.415
INFO:tensorflow:epoch = 1.8862647607209446, learning_rate = 1.03121065e-05, loss = 0.00072913466, step = 3035 (5.171 sec)
2022-01-04 05:27:03,011 [INFO] tensorflow: epoch = 1.8862647607209446, learning_rate = 1.03121065e-05, loss = 0.00072913466, step = 3035 (5.171 sec)
INFO:tensorflow:global_step/sec: 9.6179
2022-01-04 05:27:03,523 [INFO] tensorflow: g

2022-01-04 05:28:22,372 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.168
2022-01-04 05:28:24,998 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.085
INFO:tensorflow:epoch = 2.3797389683032937, learning_rate = 1.2462153e-05, loss = 0.0007407041, step = 3829 (5.178 sec)
2022-01-04 05:28:25,513 [INFO] tensorflow: epoch = 2.3797389683032937, learning_rate = 1.2462153e-05, loss = 0.0007407041, step = 3829 (5.178 sec)
INFO:tensorflow:global_step/sec: 9.52619
2022-01-04 05:28:26,653 [INFO] tensorflow: global_step/sec: 9.52619
2022-01-04 05:28:27,591 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.559
2022-01-04 05:28:30,170 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.779
INFO:tensorflow:epoch = 2.410814170292107, learning_rate = 1.2611662e-05, loss = 0.0008344226, step = 3879 (5.163 sec)
2022-01-04 05:28:30,676 [INFO] tensorflow: epoch = 2.410814170292107, learning_rate = 1.2611662e-05, loss = 0.0008344226, step = 

INFO:tensorflow:global_step/sec: 9.71656
2022-01-04 05:29:49,248 [INFO] tensorflow: global_step/sec: 9.71656
2022-01-04 05:29:50,170 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.281
2022-01-04 05:29:52,730 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.058
INFO:tensorflow:epoch = 2.907395898073337, learning_rate = 1.5259342e-05, loss = 0.0011152072, step = 4678 (5.175 sec)
2022-01-04 05:29:53,137 [INFO] tensorflow: epoch = 2.907395898073337, learning_rate = 1.5259342e-05, loss = 0.0011152072, step = 4678 (5.175 sec)
2022-01-04 05:29:55,260 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.528
2022-01-04 05:29:57,851 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.603
INFO:tensorflow:epoch = 2.9384711000621504, learning_rate = 1.5442409e-05, loss = 0.00067722215, step = 4728 (5.148 sec)
2022-01-04 05:29:58,284 [INFO] tensorflow: epoch = 2.9384711000621504, learning_rate = 1.5442409e-05, loss = 0.00067722215, step 

2022-01-04 05:31:15,539 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.618
INFO:tensorflow:epoch = 3.4033561218147916, learning_rate = 1.8458486e-05, loss = 0.0007890933, step = 5476 (5.157 sec)
2022-01-04 05:31:15,740 [INFO] tensorflow: epoch = 3.4033561218147916, learning_rate = 1.8458486e-05, loss = 0.0007890933, step = 5476 (5.157 sec)
2022-01-04 05:31:18,068 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.549
2022-01-04 05:31:20,634 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.972
INFO:tensorflow:epoch = 3.4344313238036044, learning_rate = 1.8679919e-05, loss = 0.000799973, step = 5526 (5.117 sec)
2022-01-04 05:31:20,858 [INFO] tensorflow: epoch = 3.4344313238036044, learning_rate = 1.8679919e-05, loss = 0.000799973, step = 5526 (5.117 sec)
2022-01-04 05:31:23,342 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.929
INFO:tensorflow:epoch = 3.464263517712865, learning_rate = 1.889501e-05, loss = 0.0006905318

2022-01-04 05:32:42,891 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.541
INFO:tensorflow:epoch = 3.9322560596643874, learning_rate = 2.2612361e-05, loss = 0.000859872, step = 6327 (5.143 sec)
2022-01-04 05:32:43,192 [INFO] tensorflow: epoch = 3.9322560596643874, learning_rate = 2.2612361e-05, loss = 0.000859872, step = 6327 (5.143 sec)
2022-01-04 05:32:45,466 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.844
2022-01-04 05:32:48,148 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.280
INFO:tensorflow:epoch = 3.9627097576134243, learning_rate = 2.2878188e-05, loss = 0.0007567009, step = 6376 (5.164 sec)
2022-01-04 05:32:48,356 [INFO] tensorflow: epoch = 3.9627097576134243, learning_rate = 2.2878188e-05, loss = 0.0007567009, step = 6376 (5.164 sec)
2022-01-04 05:32:50,690 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.345
INFO:tensorflow:global_step/sec: 9.75514
2022-01-04 05:32:50,798 [INFO] tensorflow: global_s

2022-01-04 05:34:05,737 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.050
2022-01-04 05:34:08,336 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.482
INFO:tensorflow:epoch = 4.457426973275326, learning_rate = 2.766142e-05, loss = 0.0007254675, step = 7172 (5.133 sec)
2022-01-04 05:34:10,659 [INFO] tensorflow: epoch = 4.457426973275326, learning_rate = 2.766142e-05, loss = 0.0007254675, step = 7172 (5.133 sec)
2022-01-04 05:34:10,865 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.551
2022-01-04 05:34:13,416 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.197
INFO:tensorflow:global_step/sec: 9.70953
2022-01-04 05:34:13,518 [INFO] tensorflow: global_step/sec: 9.70953
INFO:tensorflow:epoch = 4.488502175264139, learning_rate = 2.7993276e-05, loss = 0.0007881722, step = 7222 (5.108 sec)
2022-01-04 05:34:15,767 [INFO] tensorflow: epoch = 4.488502175264139, learning_rate = 2.7993276e-05, loss = 0.0007881722, step = 7222

2022-01-04 05:35:33,711 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.750
2022-01-04 05:35:36,252 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.348
INFO:tensorflow:global_step/sec: 9.91086
2022-01-04 05:35:36,353 [INFO] tensorflow: global_step/sec: 9.91086
INFO:tensorflow:epoch = 4.983840894965817, learning_rate = 3.385401e-05, loss = 0.00067820447, step = 8019 (5.200 sec)
2022-01-04 05:35:38,307 [INFO] tensorflow: epoch = 4.983840894965817, learning_rate = 3.385401e-05, loss = 0.00067820447, step = 8019 (5.200 sec)
2022-01-04 05:35:38,809 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.112
2022-01-04 05:35:41,019 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 5/120: loss: 0.00063 learning rate: 0.00003 Time taken: 0:02:46.445410 ETA: 5:19:01.222097
2022-01-04 05:35:41,446 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.933
INFO:tensorflow:epoch = 5.014294592914854, learning_rate = 3.425199e-

INFO:tensorflow:epoch = 5.474829086389061, learning_rate = 4.087344e-05, loss = 0.0007826708, step = 8809 (5.196 sec)
2022-01-04 05:37:00,760 [INFO] tensorflow: epoch = 5.474829086389061, learning_rate = 4.087344e-05, loss = 0.0007826708, step = 8809 (5.196 sec)
2022-01-04 05:37:02,301 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.034
2022-01-04 05:37:04,862 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.051
INFO:tensorflow:epoch = 5.505904288377874, learning_rate = 4.1363845e-05, loss = 0.00087744533, step = 8859 (5.126 sec)
2022-01-04 05:37:05,885 [INFO] tensorflow: epoch = 5.505904288377874, learning_rate = 4.1363845e-05, loss = 0.00087744533, step = 8859 (5.126 sec)
2022-01-04 05:37:07,428 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.972
2022-01-04 05:37:10,030 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.439
INFO:tensorflow:epoch = 5.536979490366687, learning_rate = 4.1860047e-05, loss = 0.00067466544

2022-01-04 05:38:28,668 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.255
2022-01-04 05:38:29,198 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 6/120: loss: 0.00059 learning rate: 0.00005 Time taken: 0:02:48.182508 ETA: 5:19:32.805938
2022-01-04 05:38:31,378 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.898
INFO:tensorflow:epoch = 6.025481665630826, learning_rate = 5.0491315e-05, loss = 0.00073609577, step = 9695 (5.126 sec)
2022-01-04 05:38:33,591 [INFO] tensorflow: epoch = 6.025481665630826, learning_rate = 5.0491315e-05, loss = 0.00073609577, step = 9695 (5.126 sec)
2022-01-04 05:38:34,010 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.992
2022-01-04 05:38:36,621 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.311
INFO:tensorflow:epoch = 6.055935363579863, learning_rate = 5.1084884e-05, loss = 0.0006182772, step = 9744 (5.103 sec)
2022-01-04 05:38:38,694 [INFO] tensorflow: epoch = 6.0559

2022-01-04 05:39:57,155 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.399
2022-01-04 05:39:59,713 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.104
INFO:tensorflow:epoch = 6.548788067122436, learning_rate = 6.1721235e-05, loss = 0.00066062564, step = 10537 (5.161 sec)
2022-01-04 05:40:01,094 [INFO] tensorflow: epoch = 6.548788067122436, learning_rate = 6.1721235e-05, loss = 0.00066062564, step = 10537 (5.161 sec)
2022-01-04 05:40:02,329 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.232
INFO:tensorflow:global_step/sec: 9.77765
2022-01-04 05:40:03,435 [INFO] tensorflow: global_step/sec: 9.77765
2022-01-04 05:40:04,850 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.656
INFO:tensorflow:epoch = 6.579863269111248, learning_rate = 6.246171e-05, loss = 0.0006067768, step = 10587 (5.125 sec)
2022-01-04 05:40:06,219 [INFO] tensorflow: epoch = 6.579863269111248, learning_rate = 6.246171e-05, loss = 0.0006067768, step =

2022-01-04 05:41:24,953 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.770
INFO:tensorflow:global_step/sec: 9.91565
2022-01-04 05:41:26,064 [INFO] tensorflow: global_step/sec: 9.91565
2022-01-04 05:41:27,482 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.549
INFO:tensorflow:epoch = 7.075823492852703, learning_rate = 7.555684e-05, loss = 0.0005386932, step = 11385 (5.149 sec)
2022-01-04 05:41:28,594 [INFO] tensorflow: epoch = 7.075823492852703, learning_rate = 7.555684e-05, loss = 0.0005386932, step = 11385 (5.149 sec)
2022-01-04 05:41:30,000 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.718
2022-01-04 05:41:32,520 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.680
INFO:tensorflow:epoch = 7.107520198881292, learning_rate = 7.6481534e-05, loss = 0.0005027961, step = 11436 (5.131 sec)
2022-01-04 05:41:33,725 [INFO] tensorflow: epoch = 7.107520198881292, learning_rate = 7.6481534e-05, loss = 0.0005027961, step = 1

2022-01-04 05:42:53,240 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.086
2022-01-04 05:42:55,833 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.578
INFO:tensorflow:epoch = 7.6003729024238655, learning_rate = 9.24057e-05, loss = 0.0004600671, step = 12229 (5.227 sec)
2022-01-04 05:42:56,385 [INFO] tensorflow: epoch = 7.6003729024238655, learning_rate = 9.24057e-05, loss = 0.0004600671, step = 12229 (5.227 sec)
2022-01-04 05:42:58,494 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.575
2022-01-04 05:43:01,113 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.197
INFO:tensorflow:epoch = 7.6308266003729015, learning_rate = 9.349201e-05, loss = 0.000585542, step = 12278 (5.179 sec)
2022-01-04 05:43:01,564 [INFO] tensorflow: epoch = 7.6308266003729015, learning_rate = 9.349201e-05, loss = 0.000585542, step = 12278 (5.179 sec)
2022-01-04 05:43:03,858 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.421


2022-01-04 05:44:19,385 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.983
2022-01-04 05:44:21,997 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.291
INFO:tensorflow:epoch = 8.107520198881293, learning_rate = 0.00011225954, loss = 0.00043406873, step = 13045 (5.108 sec)
2022-01-04 05:44:24,292 [INFO] tensorflow: epoch = 8.107520198881293, learning_rate = 0.00011225954, loss = 0.00043406873, step = 13045 (5.108 sec)
2022-01-04 05:44:24,723 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.682
2022-01-04 05:44:27,323 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.468
INFO:tensorflow:epoch = 8.13797389683033, learning_rate = 0.00011357925, loss = 0.00040979683, step = 13094 (5.144 sec)
2022-01-04 05:44:29,436 [INFO] tensorflow: epoch = 8.13797389683033, learning_rate = 0.00011357925, loss = 0.00040979683, step = 13094 (5.144 sec)
2022-01-04 05:44:29,946 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 3

2022-01-04 05:45:47,936 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.733
2022-01-04 05:45:50,674 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.529
INFO:tensorflow:epoch = 8.614667495338718, learning_rate = 0.0001363791, loss = 0.00046664907, step = 13861 (5.126 sec)
2022-01-04 05:45:52,002 [INFO] tensorflow: epoch = 8.614667495338718, learning_rate = 0.0001363791, loss = 0.00046664907, step = 13861 (5.126 sec)
2022-01-04 05:45:53,418 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.448
2022-01-04 05:45:56,151 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.588
INFO:tensorflow:epoch = 8.643878185208203, learning_rate = 0.00013791656, loss = 0.00052180956, step = 13908 (5.164 sec)
2022-01-04 05:45:57,166 [INFO] tensorflow: epoch = 8.643878185208203, learning_rate = 0.00013791656, loss = 0.00052180956, step = 13908 (5.164 sec)
INFO:tensorflow:global_step/sec: 9.20671
2022-01-04 05:45:58,491 [INFO] tensorflow: glob

INFO:tensorflow:epoch = 9.105655686761963, learning_rate = 0.00016465668, loss = 0.00025089132, step = 14651 (5.145 sec)
2022-01-04 05:47:14,440 [INFO] tensorflow: epoch = 9.105655686761963, learning_rate = 0.00016465668, loss = 0.00025089132, step = 14651 (5.145 sec)
2022-01-04 05:47:16,997 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.182
INFO:tensorflow:epoch = 9.135487880671224, learning_rate = 0.00016655264, loss = 0.00028523157, step = 14699 (5.107 sec)
2022-01-04 05:47:19,547 [INFO] tensorflow: epoch = 9.135487880671224, learning_rate = 0.00016655264, loss = 0.00028523157, step = 14699 (5.107 sec)
2022-01-04 05:47:19,547 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.219
INFO:tensorflow:global_step/sec: 9.45209
2022-01-04 05:47:21,741 [INFO] tensorflow: global_step/sec: 9.45209
2022-01-04 05:47:22,167 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.168
INFO:tensorflow:epoch = 9.165320074580483, learning_rate = 0.0001684702

2022-01-04 05:48:44,728 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 34.621
2022-01-04 05:48:47,602 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 34.800
INFO:tensorflow:epoch = 9.6177750155376, learning_rate = 0.00020041592, loss = 0.00026396138, step = 15475 (5.209 sec)
2022-01-04 05:48:47,715 [INFO] tensorflow: epoch = 9.6177750155376, learning_rate = 0.00020041592, loss = 0.00026396138, step = 15475 (5.209 sec)
2022-01-04 05:48:50,524 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 34.221
INFO:tensorflow:epoch = 9.645121193287755, learning_rate = 0.00020253021, loss = 0.0002169277, step = 15519 (5.292 sec)
2022-01-04 05:48:53,007 [INFO] tensorflow: epoch = 9.645121193287755, learning_rate = 0.00020253021, loss = 0.0002169277, step = 15519 (5.292 sec)
INFO:tensorflow:global_step/sec: 8.53029
2022-01-04 05:48:53,126 [INFO] tensorflow: global_step/sec: 8.53029
2022-01-04 05:48:53,597 [INFO] modulus.hooks.sample_counter_hook: Train Sampl

2022-01-04 05:50:10,913 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.544
2022-01-04 05:50:13,466 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.176
INFO:tensorflow:epoch = 10.09633312616532, learning_rate = 0.00024081973, loss = 0.0001796892, step = 16245 (5.204 sec)
2022-01-04 05:50:15,811 [INFO] tensorflow: epoch = 10.09633312616532, learning_rate = 0.00024081973, loss = 0.0001796892, step = 16245 (5.204 sec)
2022-01-04 05:50:16,225 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.246
2022-01-04 05:50:18,775 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.214
INFO:tensorflow:epoch = 10.127408328154132, learning_rate = 0.00024370887, loss = 0.000276882, step = 16295 (5.117 sec)
2022-01-04 05:50:20,928 [INFO] tensorflow: epoch = 10.127408328154132, learning_rate = 0.00024370887, loss = 0.000276882, step = 16295 (5.117 sec)
2022-01-04 05:50:21,338 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.

2022-01-04 05:51:39,778 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.011
2022-01-04 05:51:42,365 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.659
INFO:tensorflow:epoch = 10.6177750155376, learning_rate = 0.00029417037, loss = 0.00018740774, step = 17084 (5.106 sec)
2022-01-04 05:51:43,442 [INFO] tensorflow: epoch = 10.6177750155376, learning_rate = 0.00029417037, loss = 0.00018740774, step = 17084 (5.106 sec)
2022-01-04 05:51:44,955 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.619
INFO:tensorflow:global_step/sec: 9.75046
2022-01-04 05:51:47,122 [INFO] tensorflow: global_step/sec: 9.75046
2022-01-04 05:51:47,530 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.836
INFO:tensorflow:epoch = 10.648850217526412, learning_rate = 0.00029769956, loss = 0.00021276835, step = 17134 (5.122 sec)
2022-01-04 05:51:48,564 [INFO] tensorflow: epoch = 10.648850217526412, learning_rate = 0.00029769956, loss = 0.00021276835, st

2022-01-04 05:53:06,623 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.024
2022-01-04 05:53:09,223 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.471
INFO:tensorflow:epoch = 11.135487880671224, learning_rate = 0.0003588265, loss = 0.00016754713, step = 17917 (5.170 sec)
2022-01-04 05:53:11,078 [INFO] tensorflow: epoch = 11.135487880671224, learning_rate = 0.0003588265, loss = 0.00016754713, step = 17917 (5.170 sec)
INFO:tensorflow:global_step/sec: 9.68384
2022-01-04 05:53:11,405 [INFO] tensorflow: global_step/sec: 9.68384
2022-01-04 05:53:11,841 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.198
2022-01-04 05:53:14,611 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.099
INFO:tensorflow:epoch = 11.165320074580483, learning_rate = 0.00036295826, loss = 0.00021449826, step = 17965 (5.187 sec)
2022-01-04 05:53:16,265 [INFO] tensorflow: epoch = 11.165320074580483, learning_rate = 0.00036295826, loss = 0.00021449826, 

2022-01-04 05:54:38,615 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 34.768
INFO:tensorflow:epoch = 11.623368551895586, learning_rate = 0.0004327112, loss = 0.0003142274, step = 18702 (5.233 sec)
2022-01-04 05:54:38,963 [INFO] tensorflow: epoch = 11.623368551895586, learning_rate = 0.0004327112, loss = 0.0003142274, step = 18702 (5.233 sec)
INFO:tensorflow:global_step/sec: 8.81679
2022-01-04 05:54:41,030 [INFO] tensorflow: global_step/sec: 8.81679
2022-01-04 05:54:41,491 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 34.772
INFO:tensorflow:epoch = 11.651336233685518, learning_rate = 0.00043738037, loss = 0.00011409603, step = 18747 (5.135 sec)
2022-01-04 05:54:44,098 [INFO] tensorflow: epoch = 11.651336233685518, learning_rate = 0.00043738037, loss = 0.00011409603, step = 18747 (5.135 sec)
2022-01-04 05:54:44,324 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 35.297
2022-01-04 05:54:47,178 [INFO] modulus.hooks.sample_counter_hook: Train

2022-01-04 05:56:04,573 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.548
INFO:tensorflow:epoch = 12.116221255438159, learning_rate = 0.00049999997, loss = 0.0001881721, step = 19495 (5.185 sec)
2022-01-04 05:56:06,875 [INFO] tensorflow: epoch = 12.116221255438159, learning_rate = 0.00049999997, loss = 0.0001881721, step = 19495 (5.185 sec)
2022-01-04 05:56:07,311 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.525
INFO:tensorflow:global_step/sec: 9.22279
2022-01-04 05:56:09,612 [INFO] tensorflow: global_step/sec: 9.22279
2022-01-04 05:56:10,024 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.867
INFO:tensorflow:epoch = 12.14605344934742, learning_rate = 0.00049999997, loss = 0.0002247665, step = 19543 (5.184 sec)
2022-01-04 05:56:12,059 [INFO] tensorflow: epoch = 12.14605344934742, learning_rate = 0.00049999997, loss = 0.0002247665, step = 19543 (5.184 sec)
2022-01-04 05:56:12,719 [INFO] modulus.hooks.sample_counter_hook: Train S

INFO:tensorflow:epoch = 12.602237414543193, learning_rate = 0.00049999997, loss = 0.0002071544, step = 20277 (5.174 sec)
2022-01-04 05:57:34,870 [INFO] tensorflow: epoch = 12.602237414543193, learning_rate = 0.00049999997, loss = 0.0002071544, step = 20277 (5.174 sec)
2022-01-04 05:57:37,420 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 34.717
INFO:tensorflow:global_step/sec: 8.64924
2022-01-04 05:57:39,799 [INFO] tensorflow: global_step/sec: 8.64924
INFO:tensorflow:epoch = 12.630205096333125, learning_rate = 0.00049999997, loss = 0.0001324481, step = 20322 (5.154 sec)
2022-01-04 05:57:40,024 [INFO] tensorflow: epoch = 12.630205096333125, learning_rate = 0.00049999997, loss = 0.0001324481, step = 20322 (5.154 sec)
2022-01-04 05:57:40,244 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 35.413
2022-01-04 05:57:43,043 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 35.735
INFO:tensorflow:epoch = 12.658794282162834, learning_rate = 0.000499999

INFO:tensorflow:epoch = 13.089496581727781, learning_rate = 0.00049999997, loss = 0.00016084677, step = 21061 (5.217 sec)
2022-01-04 05:59:02,733 [INFO] tensorflow: epoch = 13.089496581727781, learning_rate = 0.00049999997, loss = 0.00016084677, step = 21061 (5.217 sec)
2022-01-04 05:59:04,122 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.598
2022-01-04 05:59:06,783 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.579
INFO:tensorflow:epoch = 13.11932877563704, learning_rate = 0.00049999997, loss = 0.00012582232, step = 21109 (5.164 sec)
2022-01-04 05:59:07,897 [INFO] tensorflow: epoch = 13.11932877563704, learning_rate = 0.00049999997, loss = 0.00012582232, step = 21109 (5.164 sec)
INFO:tensorflow:global_step/sec: 9.3893
2022-01-04 05:59:09,072 [INFO] tensorflow: global_step/sec: 9.3893
2022-01-04 05:59:09,502 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.783
2022-01-04 05:59:12,165 [INFO] modulus.hooks.sample_counter_hook: Train

INFO:tensorflow:epoch = 13.606587942821628, learning_rate = 0.00049999997, loss = 0.00011517644, step = 21893 (5.206 sec)
2022-01-04 06:00:30,261 [INFO] tensorflow: epoch = 13.606587942821628, learning_rate = 0.00049999997, loss = 0.00011517644, step = 21893 (5.206 sec)
2022-01-04 06:00:30,898 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.154
INFO:tensorflow:global_step/sec: 9.74419
2022-01-04 06:00:33,020 [INFO] tensorflow: global_step/sec: 9.74419
2022-01-04 06:00:33,425 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.587
INFO:tensorflow:epoch = 13.63766314481044, learning_rate = 0.00049999997, loss = 0.0002052053, step = 21943 (5.209 sec)
2022-01-04 06:00:35,470 [INFO] tensorflow: epoch = 13.63766314481044, learning_rate = 0.00049999997, loss = 0.0002052053, step = 21943 (5.209 sec)
2022-01-04 06:00:36,093 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.477
2022-01-04 06:00:38,647 [INFO] modulus.hooks.sample_counter_hook: Train

2022-01-04 06:01:53,972 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.140
INFO:tensorflow:global_step/sec: 9.50323
2022-01-04 06:01:56,208 [INFO] tensorflow: global_step/sec: 9.50323
2022-01-04 06:01:56,608 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.932
INFO:tensorflow:epoch = 14.131758856432565, learning_rate = 0.00049999997, loss = 0.00022249852, step = 22738 (5.188 sec)
2022-01-04 06:01:58,006 [INFO] tensorflow: epoch = 14.131758856432565, learning_rate = 0.00049999997, loss = 0.00022249852, step = 22738 (5.188 sec)
2022-01-04 06:01:59,107 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.015
2022-01-04 06:02:01,634 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.587
INFO:tensorflow:epoch = 14.163455562461156, learning_rate = 0.00049999997, loss = 0.00023929552, step = 22789 (5.127 sec)
2022-01-04 06:02:03,134 [INFO] tensorflow: epoch = 14.163455562461156, learning_rate = 0.00049999997, loss = 0.00023929552

2022-01-04 06:03:21,643 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.004
2022-01-04 06:03:24,243 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.468
INFO:tensorflow:epoch = 14.65941578620261, learning_rate = 0.00049999997, loss = 0.0002516677, step = 23587 (5.102 sec)
2022-01-04 06:03:25,592 [INFO] tensorflow: epoch = 14.65941578620261, learning_rate = 0.00049999997, loss = 0.0002516677, step = 23587 (5.102 sec)
2022-01-04 06:03:26,833 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.611
2022-01-04 06:03:29,423 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.618
INFO:tensorflow:epoch = 14.690490988191423, learning_rate = 0.00049999997, loss = 0.00012235447, step = 23637 (5.184 sec)
2022-01-04 06:03:30,776 [INFO] tensorflow: epoch = 14.690490988191423, learning_rate = 0.00049999997, loss = 0.00012235447, step = 23637 (5.184 sec)
2022-01-04 06:03:32,063 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec:

INFO:tensorflow:epoch = 15.15102548166563, learning_rate = 0.00049999997, loss = 8.745784e-05, step = 24378 (5.155 sec)
2022-01-04 06:04:48,024 [INFO] tensorflow: epoch = 15.15102548166563, learning_rate = 0.00049999997, loss = 8.745784e-05, step = 24378 (5.155 sec)
2022-01-04 06:04:50,222 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.151
2022-01-04 06:04:52,776 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.167
INFO:tensorflow:epoch = 15.182100683654443, learning_rate = 0.00049999997, loss = 0.000115186755, step = 24428 (5.157 sec)
2022-01-04 06:04:53,180 [INFO] tensorflow: epoch = 15.182100683654443, learning_rate = 0.00049999997, loss = 0.000115186755, step = 24428 (5.157 sec)
2022-01-04 06:04:55,353 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.806
2022-01-04 06:04:57,894 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.356
INFO:tensorflow:epoch = 15.213175885643256, learning_rate = 0.00049999997, loss = 9.

2022-01-04 06:06:14,274 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.567
INFO:tensorflow:epoch = 15.686140459912988, learning_rate = 0.00049999997, loss = 9.3368086e-05, step = 25239 (5.169 sec)
2022-01-04 06:06:15,809 [INFO] tensorflow: epoch = 15.686140459912988, learning_rate = 0.00049999997, loss = 9.3368086e-05, step = 25239 (5.169 sec)
2022-01-04 06:06:16,810 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.444
2022-01-04 06:06:19,324 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.780
INFO:tensorflow:global_step/sec: 9.86074
2022-01-04 06:06:19,924 [INFO] tensorflow: global_step/sec: 9.86074
INFO:tensorflow:epoch = 15.717837165941578, learning_rate = 0.00049999997, loss = 0.00012260432, step = 25290 (5.113 sec)
2022-01-04 06:06:20,923 [INFO] tensorflow: epoch = 15.717837165941578, learning_rate = 0.00049999997, loss = 0.00012260432, step = 25290 (5.113 sec)
2022-01-04 06:06:21,856 [INFO] modulus.hooks.sample_counter_hook: T

2022-01-04 06:07:39,048 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.865
2022-01-04 06:07:41,744 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.093
INFO:tensorflow:global_step/sec: 9.20383
2022-01-04 06:07:42,393 [INFO] tensorflow: global_step/sec: 9.20383
INFO:tensorflow:epoch = 16.214418893722808, learning_rate = 0.00049999997, loss = 0.00021401352, step = 26089 (5.184 sec)
2022-01-04 06:07:43,367 [INFO] tensorflow: epoch = 16.214418893722808, learning_rate = 0.00049999997, loss = 0.00021401352, step = 26089 (5.184 sec)
2022-01-04 06:07:44,460 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.828
2022-01-04 06:07:47,194 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.574
INFO:tensorflow:epoch = 16.24362958359229, learning_rate = 0.00049999997, loss = 0.00020346511, step = 26136 (5.162 sec)
2022-01-04 06:07:48,529 [INFO] tensorflow: epoch = 16.24362958359229, learning_rate = 0.00049999997, loss = 0.00020346511, 

2022-01-04 06:09:06,679 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.228
2022-01-04 06:09:09,195 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.737
INFO:tensorflow:epoch = 16.743318831572402, learning_rate = 0.00049999997, loss = 0.00011751326, step = 26940 (5.117 sec)
2022-01-04 06:09:10,796 [INFO] tensorflow: epoch = 16.743318831572402, learning_rate = 0.00049999997, loss = 0.00011751326, step = 26940 (5.117 sec)
2022-01-04 06:09:11,697 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.984
2022-01-04 06:09:14,197 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.993
INFO:tensorflow:epoch = 16.77563704164077, learning_rate = 0.00049999997, loss = 0.00024427837, step = 26992 (5.199 sec)
2022-01-04 06:09:15,995 [INFO] tensorflow: epoch = 16.77563704164077, learning_rate = 0.00049999997, loss = 0.00024427837, step = 26992 (5.199 sec)
2022-01-04 06:09:16,694 [INFO] modulus.hooks.sample_counter_hook: Train Samples / se

INFO:tensorflow:epoch = 17.24238657551274, learning_rate = 0.00049999997, loss = 0.00017091271, step = 27743 (5.119 sec)
2022-01-04 06:10:33,304 [INFO] tensorflow: epoch = 17.24238657551274, learning_rate = 0.00049999997, loss = 0.00017091271, step = 27743 (5.119 sec)
2022-01-04 06:10:33,903 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.828
2022-01-04 06:10:36,406 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.960
INFO:tensorflow:epoch = 17.274704785581104, learning_rate = 0.00049999997, loss = 0.00012090224, step = 27795 (5.197 sec)
2022-01-04 06:10:38,501 [INFO] tensorflow: epoch = 17.274704785581104, learning_rate = 0.00049999997, loss = 0.00012090224, step = 27795 (5.197 sec)
2022-01-04 06:10:38,900 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.085
2022-01-04 06:10:41,395 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.094
INFO:tensorflow:global_step/sec: 9.98659
2022-01-04 06:10:43,002 [INFO] tensorflow: 

2022-01-04 06:11:59,257 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.275
INFO:tensorflow:epoch = 17.7669359850839, learning_rate = 0.00049999997, loss = 6.318958e-05, step = 28587 (5.125 sec)
2022-01-04 06:12:00,617 [INFO] tensorflow: epoch = 17.7669359850839, learning_rate = 0.00049999997, loss = 6.318958e-05, step = 28587 (5.125 sec)
2022-01-04 06:12:01,870 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.273
2022-01-04 06:12:04,458 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.642
INFO:tensorflow:epoch = 17.79738968303294, learning_rate = 0.00049999997, loss = 9.403723e-05, step = 28636 (5.120 sec)
2022-01-04 06:12:05,736 [INFO] tensorflow: epoch = 17.79738968303294, learning_rate = 0.00049999997, loss = 9.403723e-05, step = 28636 (5.120 sec)
INFO:tensorflow:global_step/sec: 9.52847
2022-01-04 06:12:06,168 [INFO] tensorflow: global_step/sec: 9.52847
2022-01-04 06:12:07,106 [INFO] modulus.hooks.sample_counter_hook: Train Sampl

2022-01-04 06:13:23,791 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.815
2022-01-04 06:13:26,408 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.207
INFO:tensorflow:global_step/sec: 9.68656
2022-01-04 06:13:28,134 [INFO] tensorflow: global_step/sec: 9.68656
INFO:tensorflow:epoch = 18.29770043505283, learning_rate = 0.00049999997, loss = 0.00011874712, step = 29441 (5.146 sec)
2022-01-04 06:13:28,236 [INFO] tensorflow: epoch = 18.29770043505283, learning_rate = 0.00049999997, loss = 0.00011874712, step = 29441 (5.146 sec)
2022-01-04 06:13:29,094 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.233
2022-01-04 06:13:31,767 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.413
INFO:tensorflow:epoch = 18.327532628962086, learning_rate = 0.00049999997, loss = 7.80116e-05, step = 29489 (5.126 sec)
2022-01-04 06:13:33,362 [INFO] tensorflow: epoch = 18.327532628962086, learning_rate = 0.00049999997, loss = 7.80116e-05, step

INFO:tensorflow:global_step/sec: 9.62259
2022-01-04 06:14:51,359 [INFO] tensorflow: global_step/sec: 9.62259
2022-01-04 06:14:52,271 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.828
2022-01-04 06:14:54,790 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.707
INFO:tensorflow:epoch = 18.822871348663764, learning_rate = 0.00049999997, loss = 9.304269e-05, step = 30286 (5.137 sec)
2022-01-04 06:14:55,997 [INFO] tensorflow: epoch = 18.822871348663764, learning_rate = 0.00049999997, loss = 9.304269e-05, step = 30286 (5.137 sec)
2022-01-04 06:14:57,309 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.702
2022-01-04 06:14:59,803 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.096
INFO:tensorflow:epoch = 18.854568054692354, learning_rate = 0.00049999997, loss = 9.116044e-05, step = 30337 (5.118 sec)
2022-01-04 06:15:01,115 [INFO] tensorflow: epoch = 18.854568054692354, learning_rate = 0.00049999997, loss = 9.116044e-05, st

2022-01-04 06:16:16,923 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.040
INFO:tensorflow:epoch = 19.3219390926041, learning_rate = 0.00049999997, loss = 0.00013397774, step = 31089 (5.198 sec)
2022-01-04 06:16:18,424 [INFO] tensorflow: epoch = 19.3219390926041, learning_rate = 0.00049999997, loss = 0.00013397774, step = 31089 (5.198 sec)
2022-01-04 06:16:19,422 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.021
2022-01-04 06:16:21,916 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.091
INFO:tensorflow:epoch = 19.354257302672465, learning_rate = 0.00049999997, loss = 7.273596e-05, step = 31141 (5.190 sec)
2022-01-04 06:16:23,613 [INFO] tensorflow: epoch = 19.354257302672465, learning_rate = 0.00049999997, loss = 7.273596e-05, step = 31141 (5.190 sec)
2022-01-04 06:16:24,467 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.210
2022-01-04 06:16:27,050 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 3

2022-01-04 06:17:41,026 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.822
2022-01-04 06:17:43,718 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.148
INFO:tensorflow:epoch = 19.854568054692354, learning_rate = 0.00049999997, loss = 0.00014720336, step = 31946 (5.185 sec)
2022-01-04 06:17:45,997 [INFO] tensorflow: epoch = 19.854568054692354, learning_rate = 0.00049999997, loss = 0.00014720336, step = 31946 (5.185 sec)
2022-01-04 06:17:46,303 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.687
2022-01-04 06:17:48,837 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.460
INFO:tensorflow:epoch = 19.885643256681167, learning_rate = 0.00049999997, loss = 7.981862e-05, step = 31996 (5.109 sec)
2022-01-04 06:17:51,106 [INFO] tensorflow: epoch = 19.885643256681167, learning_rate = 0.00049999997, loss = 7.981862e-05, step = 31996 (5.109 sec)
2022-01-04 06:17:51,407 [INFO] modulus.hooks.sample_counter_hook: Train Samples / se

2022-01-04 06:19:07,994 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.086
INFO:tensorflow:epoch = 20.3455562461156, learning_rate = 0.00049999997, loss = 6.148519e-05, step = 32736 (5.188 sec)
2022-01-04 06:19:09,223 [INFO] tensorflow: epoch = 20.3455562461156, learning_rate = 0.00049999997, loss = 6.148519e-05, step = 32736 (5.188 sec)
2022-01-04 06:19:10,616 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.137
2022-01-04 06:19:13,202 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.675
INFO:tensorflow:epoch = 20.376009944064634, learning_rate = 0.00049999997, loss = 0.00016835677, step = 32785 (5.124 sec)
2022-01-04 06:19:14,347 [INFO] tensorflow: epoch = 20.376009944064634, learning_rate = 0.00049999997, loss = 0.00016835677, step = 32785 (5.124 sec)
2022-01-04 06:19:15,862 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.597
INFO:tensorflow:global_step/sec: 9.73595
2022-01-04 06:19:15,974 [INFO] tensorflow: glob

2022-01-04 06:20:32,298 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.142
2022-01-04 06:20:35,030 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.593
INFO:tensorflow:epoch = 20.8769422001243, learning_rate = 0.00049999997, loss = 0.000104928244, step = 33591 (5.198 sec)
2022-01-04 06:20:36,878 [INFO] tensorflow: epoch = 20.8769422001243, learning_rate = 0.00049999997, loss = 0.000104928244, step = 33591 (5.198 sec)
2022-01-04 06:20:37,741 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 36.902
INFO:tensorflow:global_step/sec: 9.45839
2022-01-04 06:20:37,844 [INFO] tensorflow: global_step/sec: 9.45839
2022-01-04 06:20:40,374 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.973
INFO:tensorflow:epoch = 20.908017402113114, learning_rate = 0.00049999997, loss = 7.402756e-05, step = 33641 (5.191 sec)
2022-01-04 06:20:42,070 [INFO] tensorflow: epoch = 20.908017402113114, learning_rate = 0.00049999997, loss = 7.402756e-05, st

2022-01-04 06:21:58,260 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.028
INFO:tensorflow:global_step/sec: 10.0003
2022-01-04 06:21:58,363 [INFO] tensorflow: global_step/sec: 10.0003
INFO:tensorflow:epoch = 21.385332504661278, learning_rate = 0.00049999997, loss = 5.52049e-05, step = 34409 (5.101 sec)
2022-01-04 06:21:59,262 [INFO] tensorflow: epoch = 21.385332504661278, learning_rate = 0.00049999997, loss = 5.52049e-05, step = 34409 (5.101 sec)
2022-01-04 06:22:00,761 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.988
2022-01-04 06:22:03,261 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.003
INFO:tensorflow:epoch = 21.417650714729643, learning_rate = 0.00049999997, loss = 8.561801e-05, step = 34461 (5.198 sec)
2022-01-04 06:22:04,461 [INFO] tensorflow: epoch = 21.417650714729643, learning_rate = 0.00049999997, loss = 8.561801e-05, step = 34461 (5.198 sec)
2022-01-04 06:22:05,761 [INFO] modulus.hooks.sample_counter_hook: Train S

2022-01-04 06:23:22,036 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.797
2022-01-04 06:23:24,541 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.924
INFO:tensorflow:epoch = 21.92044748290864, learning_rate = 0.00049999997, loss = 0.00012852134, step = 35270 (5.115 sec)
2022-01-04 06:23:26,650 [INFO] tensorflow: epoch = 21.92044748290864, learning_rate = 0.00049999997, loss = 0.00012852134, step = 35270 (5.115 sec)
2022-01-04 06:23:27,052 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.827
2022-01-04 06:23:29,562 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.850
INFO:tensorflow:epoch = 21.952144188937226, learning_rate = 0.00049999997, loss = 0.00015130722, step = 35321 (5.185 sec)
2022-01-04 06:23:31,835 [INFO] tensorflow: epoch = 21.952144188937226, learning_rate = 0.00049999997, loss = 0.00015130722, step = 35321 (5.185 sec)
2022-01-04 06:23:32,139 [INFO] modulus.hooks.sample_counter_hook: Train Samples / se

2022-01-04 06:24:48,118 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.071
INFO:tensorflow:epoch = 22.428216283405842, learning_rate = 0.00049999997, loss = 0.00015581475, step = 36087 (5.191 sec)
2022-01-04 06:24:49,414 [INFO] tensorflow: epoch = 22.428216283405842, learning_rate = 0.00049999997, loss = 0.00015581475, step = 36087 (5.191 sec)
2022-01-04 06:24:50,610 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.140
2022-01-04 06:24:53,106 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.066
INFO:tensorflow:epoch = 22.460534493474206, learning_rate = 0.00049999997, loss = 8.305487e-05, step = 36139 (5.189 sec)
2022-01-04 06:24:54,604 [INFO] tensorflow: epoch = 22.460534493474206, learning_rate = 0.00049999997, loss = 8.305487e-05, step = 36139 (5.189 sec)
2022-01-04 06:24:55,603 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.056
INFO:tensorflow:global_step/sec: 10.0156
2022-01-04 06:24:56,702 [INFO] tensorflow: 

2022-01-04 06:26:14,087 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.410
2022-01-04 06:26:16,601 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.770
INFO:tensorflow:epoch = 22.967060285891858, learning_rate = 0.00049999997, loss = 6.3721076e-05, step = 36954 (5.136 sec)
2022-01-04 06:26:17,105 [INFO] tensorflow: epoch = 22.967060285891858, learning_rate = 0.00049999997, loss = 6.3721076e-05, step = 36954 (5.136 sec)
INFO:tensorflow:global_step/sec: 9.83021
2022-01-04 06:26:17,711 [INFO] tensorflow: global_step/sec: 9.83021
2022-01-04 06:26:19,118 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.739
2022-01-04 06:26:21,706 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.638
INFO:tensorflow:epoch = 22.99813548788067, learning_rate = 0.00049999997, loss = 8.490724e-05, step = 37004 (5.110 sec)
2022-01-04 06:26:22,214 [INFO] tensorflow: epoch = 22.99813548788067, learning_rate = 0.00049999997, loss = 8.490724e-05, st

INFO:tensorflow:global_step/sec: 9.97813
2022-01-04 06:27:38,850 [INFO] tensorflow: global_step/sec: 9.97813
INFO:tensorflow:epoch = 23.471100062150402, learning_rate = 0.00049999997, loss = 0.00013611728, step = 37765 (5.119 sec)
2022-01-04 06:27:39,349 [INFO] tensorflow: epoch = 23.471100062150402, learning_rate = 0.00049999997, loss = 0.00013611728, step = 37765 (5.119 sec)
2022-01-04 06:27:40,252 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.850
2022-01-04 06:27:42,753 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.986
INFO:tensorflow:epoch = 23.502796768178992, learning_rate = 0.00049999997, loss = 7.640719e-05, step = 37816 (5.104 sec)
2022-01-04 06:27:44,453 [INFO] tensorflow: epoch = 23.502796768178992, learning_rate = 0.00049999997, loss = 7.640719e-05, step = 37816 (5.104 sec)
2022-01-04 06:27:45,251 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.042
2022-01-04 06:27:47,746 [INFO] modulus.hooks.sample_counter_hook: Tra

2022-01-04 06:29:02,348 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.856
2022-01-04 06:29:04,902 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.149
2022-01-04 06:29:06,652 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 24/120: loss: 0.00007 learning rate: 0.00050 Time taken: 0:02:44.121952 ETA: 4:22:35.707397
INFO:tensorflow:epoch = 24.001864512119326, learning_rate = 0.00049999997, loss = 0.0001895073, step = 38619 (5.109 sec)
2022-01-04 06:29:06,951 [INFO] tensorflow: epoch = 24.001864512119326, learning_rate = 0.00049999997, loss = 0.0001895073, step = 38619 (5.109 sec)
2022-01-04 06:29:07,512 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.318
2022-01-04 06:29:10,110 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.500
INFO:tensorflow:epoch = 24.032318210068365, learning_rate = 0.00049999997, loss = 9.246566e-05, step = 38668 (5.147 sec)
2022-01-04 06:29:12,098 [INFO] tensorflow: epoch = 2

2022-01-04 06:30:29,210 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.661
INFO:tensorflow:epoch = 24.503418272218767, learning_rate = 0.00049999997, loss = 0.00011900715, step = 39426 (5.145 sec)
2022-01-04 06:30:29,411 [INFO] tensorflow: epoch = 24.503418272218767, learning_rate = 0.00049999997, loss = 0.00011900715, step = 39426 (5.145 sec)
2022-01-04 06:30:31,721 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.839
2022-01-04 06:30:34,217 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.066
INFO:tensorflow:epoch = 24.535114978247357, learning_rate = 0.00049999997, loss = 8.747232e-05, step = 39477 (5.104 sec)
2022-01-04 06:30:34,515 [INFO] tensorflow: epoch = 24.535114978247357, learning_rate = 0.00049999997, loss = 8.747232e-05, step = 39477 (5.104 sec)
2022-01-04 06:30:36,706 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 40.179
INFO:tensorflow:global_step/sec: 9.89212
2022-01-04 06:30:38,922 [INFO] tensorflow: 

2022-01-04 06:31:52,207 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 25/120: loss: 0.00007 learning rate: 0.00050 Time taken: 0:02:45.554001 ETA: 4:22:07.630081
2022-01-04 06:31:54,693 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.600
INFO:tensorflow:epoch = 25.029210689869483, learning_rate = 0.00049999997, loss = 6.5037966e-05, step = 40272 (5.159 sec)
2022-01-04 06:31:57,155 [INFO] tensorflow: epoch = 25.029210689869483, learning_rate = 0.00049999997, loss = 6.5037966e-05, step = 40272 (5.159 sec)
2022-01-04 06:31:57,368 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 37.386
2022-01-04 06:31:59,901 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.490
INFO:tensorflow:global_step/sec: 9.65032
2022-01-04 06:32:02,004 [INFO] tensorflow: global_step/sec: 9.65032
INFO:tensorflow:epoch = 25.06090739589807, learning_rate = 0.00049999997, loss = 7.897613e-05, step = 40323 (5.149 sec)
2022-01-04 06:32:02,303 [INFO] tenso

INFO:tensorflow:epoch = 25.53635798632691, learning_rate = 0.00049999997, loss = 7.689424e-05, step = 41088 (5.141 sec)
2022-01-04 06:33:19,237 [INFO] tensorflow: epoch = 25.53635798632691, learning_rate = 0.00049999997, loss = 7.689424e-05, step = 41088 (5.141 sec)
2022-01-04 06:33:20,345 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.703
INFO:tensorflow:global_step/sec: 9.94041
2022-01-04 06:33:22,482 [INFO] tensorflow: global_step/sec: 9.94041
2022-01-04 06:33:22,889 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.321
INFO:tensorflow:epoch = 25.5680546923555, learning_rate = 0.00049999997, loss = 7.2239825e-05, step = 41139 (5.154 sec)
2022-01-04 06:33:24,391 [INFO] tensorflow: epoch = 25.5680546923555, learning_rate = 0.00049999997, loss = 7.2239825e-05, step = 41139 (5.154 sec)
2022-01-04 06:33:25,405 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.738
2022-01-04 06:33:27,941 [INFO] modulus.hooks.sample_counter_hook: Train Sam

2022-01-04 06:34:42,105 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.562
INFO:tensorflow:global_step/sec: 9.84997
2022-01-04 06:34:44,221 [INFO] tensorflow: global_step/sec: 9.84997
2022-01-04 06:34:44,622 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.732
INFO:tensorflow:epoch = 26.070229956494714, learning_rate = 0.00049999997, loss = 5.7034493e-05, step = 41947 (5.166 sec)
2022-01-04 06:34:46,969 [INFO] tensorflow: epoch = 26.070229956494714, learning_rate = 0.00049999997, loss = 5.7034493e-05, step = 41947 (5.166 sec)
2022-01-04 06:34:47,196 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 38.844
2022-01-04 06:34:49,759 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 39.021
INFO:tensorflow:epoch = 26.100683654443753, learning_rate = 0.00049999997, loss = 7.085952e-05, step = 41996 (5.112 sec)
2022-01-04 06:34:52,081 [INFO] tensorflow: epoch = 26.100683654443753, learning_rate = 0.00049999997, loss = 7.085952e-05, 

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

## 5. Evaluate the trained model <a class="anchor" id="head-5"></a>

In [None]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt\
                           -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
                           -k $KEY

## 6. Prune the trained model <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Equalization criterion (`Applicable for resnets and mobilenets`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

*Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is dependent on the dataset. A pth value `5.2e-6` is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.*

*For some internal studies, we have noticed that a pth value of 0.01 is a good starting point for detectnet_v2 models.*

In [None]:
# Create an output directory if it doesn't exist.
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tao detectnet_v2 prune \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
                  -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.tlt \
                  -eq union \
                  -pth 0.0000052 \
                  -k $KEY

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

## 7. Retrain the pruned model <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification with pretrained weights as pruned model.

*Note: For retraining, please set the `load_graph` option to `true` in the model_config to load the pruned model graph. Also, if after retraining, the model shows some decrease in mAP, it could be that the originally trained model was pruned a little too much. Please try reducing the pruning threshold (thereby reducing the pruning ratio) and use the new model to retrain.*

*Note: DetectNet_v2 now supports Quantization Aware Training, to help with optmizing the model. By default, the training in the cell below doesn't run the model with QAT enabled. For information on training a model with QAT, please refer to the cells under [section 11](#head-11)*

In [None]:
# Printing the retrain experiment file. 
# Note: We have updated the experiment file to include the 
# newly pruned model as a pretrained weights and, the
# load_graph option is set to true 
!cat $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt

In [None]:
# Retraining using the pruned model as pretrained weights 
!tao detectnet_v2 train -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
                        -k $KEY \
                        -n resnet18_detector_pruned \
                        --gpus $NUM_GPUS

In [None]:
# Listing the newly retrained model.
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights

## 8. Evaluate the retrained model <a class="anchor" id="head-8"></a>

This section evaluates the pruned and retrained model, using the `evaluate` command.

In [None]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                           -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                           -k $KEY

## 9. Visualize inferences <a class="anchor" id="head-9"></a>
In this section, we run the `inference` tool to generate inferences on the trained models. To render bboxes from more classes, please edit the spec file `detectnet_v2_inference_kitti_tlt.txt` to include all the classes you would like to visualize and edit the rest of the file accordingly.

In [None]:
# Running inference for detection on n images
!tao detectnet_v2 inference -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt \
                            -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
                            -i $DATA_DOWNLOAD_DIR/testing/image_2 \
                            -k $KEY

The `inference` tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/tlt_infer_testing/images_annotated`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/tlt_infer_testing/labels`

*Note: To run inferences for a single image, simply replace the path to the -i flag in `inference` command with the path to the image.*

In [None]:
# Simple grid visualizer
!pip3 install matplotlib==3.3.3
%matplotlib inline
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the first 12 images.
OUTPUT_PATH = 'tlt_infer_testing/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 10. Model Export <a class="anchor" id="head-10"></a>

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_final
# Removing a pre-existing copy of the etlt if there has been any.
import os
output_file=os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'],
                         "experiment_dir_final/resnet18_detector.etlt")
if os.path.exists(output_file):
    os.system("rm {}".format(output_file))
!tao detectnet_v2 export \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                  -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
                  -k $KEY

In [None]:
print('Exported model:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_final

### A. Int8 Optimization <a class="anchor" id="head-10-1"></a>
DetectNet_v2 model supports int8 inference mode in TensorRT. 
In order to use int8 mode, we must calibrate the model to run 8-bit inferences -

* Generate calibration tensorfile from the training data using detectnet_v2 calibration_tensorfile
* Use tao <task> export to generate int8 calibration table.

*Note: For this example, we generate a calibration tensorfile containing 10 batches of training data.
Ideally, it is best to use at least 10-20% of the training data to do so. The more data provided during calibration, the closer int8 inferences are to fp32 inferences.*

*Note: If the model was trained with QAT nodes available, please refrain from using the post training int8 optimization as mentioned below. Please export the model in int8 mode (using the arg `--data_type int8`) with just the path to the calibration cache file (using the argument `--cal_cache_file`)*

In [None]:
!tao detectnet_v2 calibration_tensorfile -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                                         -m 10 \
                                         -o $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor

In [None]:
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!tao detectnet_v2 export \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                  -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
                  -k $KEY  \
                  --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \
                  --data_type int8 \
                  --batches 10 \
                  --batch_size 4 \
                  --max_batch_size 4\
                  --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
                  --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
                  --verbose

### B. Generate TensorRT engine <a class="anchor" id="head-10-2"></a>
Verify engine generation using the `tao-converter` utility included with the docker.

The `tao-converter` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please instantiate this docker and execute the `tao-converter` command, with the exported `.etlt` file and calibration cache (for int8 mode) on your target device. The tao-converter utility included in this docker only works for x86 devices, with discrete NVIDIA GPU's. 

For the jetson devices, please download the tao-converter for jetson from the dev zone link [here](https://developer.nvidia.com/tao-converter). 

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.etlt` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

In [None]:
!tao converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
                   -k $KEY \
                    -c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
                   -o output_cov/Sigmoid,output_bbox/BiasAdd \
                   -d 3,384,1248 \
                   -i nchw \
                   -m 64 \
                   -t int8 \
                   -e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt \
                   -b 4

## 11. Verify Deployed Model <a class="anchor" id="head-11"></a>
Verify the exported model by visualizing inferences on TensorRT.
In addition to running inference on a `.tlt` model in [step 9](#head-9), the `inference` tool is also capable of consuming the converted `TensorRT engine` from [step 10.B](#head-10-2).

*If after int-8 calibration the accuracy of the int-8 inferences seem to degrade, it could be because the there wasn't enough data in the calibration tensorfile used to calibrate thee model or, the training data is not entirely representative of your test images, and the calibration maybe incorrect. Therefore, you may either regenerate the calibration tensorfile with more batches of the training data and recalibrate the model, or calibrate the model on a few images from the test set. This may be done using `--cal_image_dir` flag in the `export` tool. For more information, please follow the instructions in the USER GUIDE.

### A. Inference using TensorRT engine <a class="anchor" id="head-11-1"></a>

In [None]:
!tao detectnet_v2 inference -e $SPECS_DIR/detectnet_v2_inference_kitti_etlt.txt \
                            -o $USER_EXPERIMENT_DIR/etlt_infer_testing \
                            -i $DATA_DOWNLOAD_DIR/testing/image_2 \
                            -k $KEY

In [None]:
# visualize the first 12 inferenced images.
OUTPUT_PATH = 'etlt_infer_testing/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 11. QAT workflow <a class="anchor" id="head-12"></a>
This section delves into the newly enabled Quantization Aware Training feature with DetectNet_v2. The workflow defined below converts a pruned model from section [5](#head-5) to enable QAT and retrain this model to while accounting the noise introduced due to quantization in the forward pass. 

### A. Convert pruned model to QAT and retrain <a class="anchor" id="head-12-1"></a>
All detectnet models, unpruned and pruned models can be converted to QAT models by setting the `enable_qat` parameter in the `training_config` component of the spec file to `true`.

In [None]:
# Printing the retrain experiment file. 
# Note: We have updated the experiment file to convert the
# pretrained model to qat mode by setting the enable_qat
# parameter.
!cat $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt

In [None]:
!tao detectnet_v2 train -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat \
                        -k $KEY \
                        -n resnet18_detector_pruned_qat \
                        --gpus $NUM_GPUS

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights

### B. Evaluate QAT converted model <a class="anchor" id="head-12-2"></a>
This section evaluates a QAT enabled pruned retrained model. The mAP of this model should be comparable to that of the pruned retrained model without QAT. However, due to quantization, it is possible sometimes to see a drop in the mAP value for certain datasets.

In [None]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
                           -m $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
                           -k $KEY \
                           -f tlt

### C. Export QAT trained model to int8 <a class="anchor" id="head-12-3"></a>
Export a QAT trained model to TensorRT parsable model. This command generates an .etlt file from the trained model and the serializes corresponding int8 scales as a TRT readable calibration cache file.

In [None]:
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.etlt
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin
!tao detectnet_v2 export \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
                  -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.etlt \
                  -k $KEY  \
                  --data_type int8 \
                  --batch_size 64 \
                  --max_batch_size 64\
                  --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.trt.int8 \
                  --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin \
                  --verbose

### D. Evaluate a QAT trained model using the exported TensorRT engine <a class="anchor" id="head-12-4"></a>
This section evaluates a QAT enabled pruned retrained model using the TensorRT int8 engine that was exported in [Section C](#head-12-3). Please note that there maybe a slight difference (~0.1-0.5%) in the mAP from [Section B](#head-12-2), oweing to some differences in the implementation of quantization in TensorRT.

*Note: The TensorRT evaluator might be slightly slower than the TAO evaluator here, because the evaluation dataloader is pinned to the CPU to avoid any clashes between TensorRT and TAO instances in the GPU. Please note that this tool was not intended and has not been developed for profiling the model. It is just a means to qualitatively analyse the model.*

*Please use native TensorRT or DeepStream for the most optimized inferences.*

In [None]:
!tao detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
                           -m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.trt.int8 \
                           -f tensorrt

### E. Inference using QAT engine <a class="anchor" id="head-12-5"></a>
Run inference and visualize detections on test images, using the exported TensorRT engine from [Section C](#head-12-3).

In [None]:
!tao detectnet_v2 inference -e $SPECS_DIR/detectnet_v2_inference_kitti_etlt_qat.txt \
                            -o $USER_EXPERIMENT_DIR/tlt_infer_testing_qat \
                            -i $DATA_DOWNLOAD_DIR/testing/image_2 \
                            -k $KEY

In [None]:
# visualize the first 12 inferenced images.
OUTPUT_PATH = 'tlt_infer_testing_qat/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)