# Distirbuted Training of Mask-RCNN in Amazon SageMaker using S3

This notebook is a step-by-step tutorial on distributed tranining of [Mask R-CNN](https://arxiv.org/abs/1703.06870) implemented in [TensorFlow](https://www.tensorflow.org/) framework. Mask R-CNN is also referred to as heavy weight object detection model and it is part of [MLPerf](https://www.mlperf.org/training-results-0-6/).

Concretely, we will describe the steps for training [TensorPack Faster-RCNN/Mask-RCNN](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN) and [AWS Samples Mask R-CNN](https://github.com/aws-samples/mask-rcnn-tensorflow) in [Amazon SageMaker](https://aws.amazon.com/sagemaker/) using [Amazon S3](https://aws.amazon.com/s3/) as data source.

The outline of steps is as follows:

1. Stage COCO 2017 dataset in [Amazon S3](https://aws.amazon.com/s3/)
2. Build SageMaker training image and push it to [Amazon ECR](https://aws.amazon.com/ecr/)
3. Configure data input channels
4. Configure hyper-prarameters
5. Define training metrics
6. Define training job and start training

Before we get started, let us initialize two python variables ```aws_region``` and ```s3_bucket``` that we will use throughout the notebook:

In [1]:
aws_region =  "ap-southeast-1"
s3_bucket  =  "smart-invoice"
! $(aws ecr get-login --no-include-email --region ap-southeast-1  --registry-ids 393782509758)

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


## Stage COCO 2017 dataset in Amazon S3

We use [COCO 2017 dataset](http://cocodataset.org/#home) for training. We download COCO 2017 training and validation dataset to this notebook instance, extract the files from the dataset archives, and upload the extracted files to your Amazon [S3 bucket](https://docs.aws.amazon.com/en_pv/AmazonS3/latest/gsg/CreatingABucket.html) with the prefix ```mask-rcnn/sagemaker/input/train```. The ```prepare-s3-bucket.sh``` script executes this step.


In [14]:
!cat ./prepare-s3-bucket.sh

#!/bin/bash

set -e

if [ "$#" -ne 1 ]; then
    echo "usage: $0 <s3-bucket-name>"
    exit 1
fi

S3_BUCKET=$1
S3_PREFIX="mask-rcnn/sagemaker/input"

# Stage directory must be on EBS volume with 100 GB available space
STAGE_DIR=$HOME/SageMaker/coco-2017-$(date +"%Y-%m-%d-%H-%M-%S")


echo "Create stage directory: $STAGE_DIR"
mkdir -p $STAGE_DIR

wget -O $STAGE_DIR/train2017.zip http://images.cocodataset.org/zips/train2017.zip
echo "Extracting $STAGE_DIR/train2017.zip"
unzip -o $STAGE_DIR/train2017.zip  -d $STAGE_DIR | awk 'BEGIN {ORS="="} {if(NR%1000==0)print "="}'
echo "Done."
rm $STAGE_DIR/train2017.zip

wget -O $STAGE_DIR/val2017.zip http://images.cocodataset.org/zips/val2017.zip
echo "Extracting $STAGE_DIR/val2017.zip"
unzip -o $STAGE_DIR/val2017.zip -d $STAGE_DIR | awk 'BEGIN {ORS="="} {if(NR%1000==0)print "="}'
echo "Done."
rm $STAGE_DIR/val2017.zip

wget -O $STAGE_DIR/annotations_trainval2017.zip http://images.cocodataset.org/annotations/annotation

 Using your *Amazon S3 bucket* as argument, run the cell below. If you have already uploaded COCO 2017 dataset to your Amazon S3 bucket *in this AWS region*, you may skip this step. The expected time to execute this step is 20 minutes.

In [None]:
%%time
!./prepare-s3-bucket.sh {s3_bucket}

## Build and push SageMaker training images

For this step, the [IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) attached to this notebook instance needs full access to Amazon ECR service. If you created this notebook instance using the ```./stack-sm.sh``` script in this repository, the IAM Role attached to this notebook instance is already setup with full access to ECR service. 

Below, we have a choice of two different implementations:

1. [TensorPack Faster-RCNN/Mask-RCNN](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN) implementation supports a maximum per-GPU batch size of 1, and does not support mixed precision. It can be used with mainstream TensorFlow releases.

2. [AWS Samples Mask R-CNN](https://github.com/aws-samples/mask-rcnn-tensorflow) is an optimized implementation that supports a maximum batch size of 4 and supports mixed precision. This implementation uses TensorFlow base version 1.13 augmented with custom TensorFlow ops. 

It is recommended that you build and push both SageMaker training images and use either image for training later.


### TensorPack Faster-RCNN/Mask-RCNN

Use ```./container/build_tools/build_and_push.sh``` script to build and push the TensorPack Faster-RCNN/Mask-RCNN  training image to Amazon ECR. 

In [None]:
!cat ./container/build_tools/build_and_push.sh

Using your *AWS region* as argument, run the cell below.

In [15]:
%%time
! ./container/build_tools/build_and_push.sh {aws_region}

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
^C
CPU times: user 92.1 ms, sys: 29.1 ms, total: 121 ms
Wall time: 5.34 s


Set ```tensorpack_image``` below to Amazon ECR URI of the image you pushed above.

In [16]:
tensorpack_image = "393782509758.dkr.ecr.ap-southeast-1.amazonaws.com/mask-rcnn-tensorpack-sagemaker:tf1.13-tp26664c3"

### AWS Samples Mask R-CNN
Use ```./container-optimized/build_tools/build_and_push.sh``` script to build and push the AWS Samples Mask R-CNN training image to Amazon ECR.

In [2]:
!cat ./container-optimized/build_tools/build_and_push.sh

#!/usr/bin/env bash

# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.

# The argument to this script is the image name. This will be used as the image on the local
# machine and combined with the account and region to form the repository name for ECR.

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
source $DIR/set_env.sh

# set region
region=
if [ "$#" -eq 1 ]; then
    region=$1
else
    echo "usage: $0 <aws-region>"
    exit 1
fi
  

image=$IMAGE_NAME
tag=$IMAGE_TAG

# Get the account number associated with the current IAM credentials
account=$(aws sts get-caller-identity --query Account --output text)

if [ $? -ne 0 ]
then
    exit 255
fi


fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:${tag}"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --region ${region} --repository-names "${image}" > /dev/null 2>&1
if [ $? -ne 0 ]; then
   

Using your *AWS region* as argument, run the cell below.

In [3]:
%%time
! ./container-optimized/build_tools/build_and_push.sh {aws_region}

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Sending build context to Docker daemon  20.48kB
Step 1/25 : FROM 393782509758.dkr.ecr.ap-southeast-1.amazonaws.com/invoice-extraction:tensorflow-base-1.13.1-gpu-py36-ubuntu-16.04
tensorflow-base-1.13.1-gpu-py36-ubuntu-16.04: Pulling from invoice-extraction

[1B760c94fc: Pulling fs layer 
[1B92f3c37b: Pulling fs layer 
[1Be5e7f12e: Pulling fs layer 
[1B74cc00ca: Pulling fs layer 
[1B53113e13: Pulling fs layer 
[1B2edc87fb: Pulling fs layer 
[1Bf57a58ef: Pulling fs layer 
[1Baa495279: Pulling fs layer 
[1B63b2d43d: Pulling fs layer 
[1Bd887aa10: Pulling fs layer 
[1Baa4e079e: Pulling fs layer 
[9B74cc00ca: Waiting fs layer [1K[K
[3Baa4e079e: Waiting fs layer 
[1Bddc7977b: Pulling fs layer 
[1B16202287: Pulling fs layer 
[3Bddc7977b: Waiting fs layer 
[1B753a2d4b: Pulling fs layer 
[1Bb701a083: Pulling fs layer 
[1B48fc0cc6: Pulling fs layer 
[1B99c2ef74: Pulling fs layer 


[1B92c2b346: Pull complete  959B/959B8MBBK[18A[1K[K[21A[1K[K[17A[1K[K[21A[1K[K[16A[1K[K[16A[1K[K[13A[1K[K[14A[1K[K[13A[1K[K[21A[1K[K[14A[1K[K[12A[1K[K[14A[1K[K[12A[1K[K[21A[1K[K[14A[1K[K[12A[1K[K[21A[1K[K[14A[1K[K[12A[1K[K[K[12A[1K[K[21A[1K[K[12A[1K[K[14A[1K[K[12A[1K[K[14A[1K[K[12A[1K[K[14A[1K[K[12A[1K[K[14A[1K[K[6A[1K[K[5A[1K[K[14A[1K[K[4A[1K[K[21A[1K[K[3A[1K[K[21A[1K[K[12A[1K[K[21A[1K[K[12A[1K[K[14A[1K[K[12A[1K[K[14A[1K[K[12A[1K[K[3A[1K[K[14A[1K[K[14A[1K[K[12A[1K[K[3A[1K[K[21A[1K[K[3A[1K[K[14A[1K[K[3A[1K[K[2A[1K[K[3A[1K[K[2A[1K[K[20A[1K[K[14A[1K[K[2A[1K[K[2A[1K[K[3A[1K[K[3A[1K[K[18A[1K[K[2A[1K[K[17A[1K[K[2A[1K[K[3A[1K[K[2A[1K[K[17A[1K[K[3A[1K[K[14A[1K[K[2A[1K[K[14A[1K[K[2A[1K[K[14A[1K[K[16A[1K[K[3A[1K[K[2A[1K[K[3A[1K[K[16A[1K[K[2A[1K[K[2A[1K

Status: Downloaded newer image for 393782509758.dkr.ecr.ap-southeast-1.amazonaws.com/invoice-extraction:tensorflow-base-1.13.1-gpu-py36-ubuntu-16.04
 ---> 678566651313
Step 2/25 : ENV HOROVOD_VERSION=0.18.1
 ---> Running in e1ac4bdc6304
Removing intermediate container e1ac4bdc6304
 ---> 18c79b9912d5
Step 3/25 : RUN pip install --upgrade pip
 ---> Running in 99acb05fccdc
Requirement already up-to-date: pip in /usr/local/lib/python3.6/site-packages (20.0.2)
Removing intermediate container 99acb05fccdc
 ---> 0760138a0b97
Step 4/25 : RUN pip uninstall -y tensorflow tensorboard tensorflow-estimator keras h5py horovod numpy
 ---> Running in 33a1526ffd64
Found existing installation: tensorflow 1.13.1
Uninstalling tensorflow-1.13.1:
  Successfully uninstalled tensorflow-1.13.1
[0mFound existing installation: tensorflow-estimator 1.13.0
Uninstalling tensorflow-estimator-1.13.0:
  Successfully uninstalled tensorflow-estimator-1.13.0
Found existing installation: Keras 2.2.4
Uninstalling Keras-2.

  2200K .........[0m[91m. .......[0m[91m... .....[0m[91m.[0m[91m.[0m[91m..[0m[91m. ....[0m[91m....[0m[91m.. ..[0m[91m.[0m[91m.......  1%[0m[91m  223K 1m49s
  2250K .[0m[91m...[0m[91m.....[0m[91m.[0m[91m [0m[91m........[0m[91m.. ......[0m[91m.[0m[91m... .....[0m[91m.[0m[91m.... [0m[91m...[0m[91m.[0m[91m......  1% 22.4M 1m47s
  2300K ..........[0m[91m .[0m[91m........[0m[91m. .......[0m[91m... ......[0m[91m.... ....[0m[91m.[0m[91m.....  1% 94.2M[0m[91m 1m44s
  2350K ...[0m[91m....... ..........[0m[91m .........[0m[91m. .......[0m[91m... ......[0m[91m....  1% 99.7M 1m42s
  2400K ....[0m[91m...... ...[0m[91m....... .........[0m[91m. .......... .....[0m[91m.[0m[91m....  1% 93.0M 1m40s
  2450K .....[0m[91m..... .......... ..[0m[91m........[0m[91m ....[0m[91m.....[0m[91m. .......[0m[91m...  1% 97.1M 98s
  2500K ......[0m[91m.... .......... ...[0m[91m....... .[0m[91m......... ..........  1% 95

  6150K ....[0m[91m...... ..........[0m[91m ........[0m[91m.. ......[0m[91m.... ..........  4%  227K 53s
  6200K ..[0m[91m........[0m[91m .......... ......[0m[91m.... .......... ..........  4%  179M 52s
  6250K .......... ......[0m[91m.... ........[0m[91m.. ..........[0m[91m ..........  4%  175M 52s
  6300K .......... ....[0m[91m...... ..........[0m[91m ..........[0m[91m ..[0m[91m....[0m[91m.[0m[91m...  4%[0m[91m 9.75M 52s
  6350K .[0m[91m...[0m[91m...... .[0m[91m.[0m[91m......[0m[91m.[0m[91m.[0m[91m [0m[91m.[0m[91m.[0m[91m.[0m[91m.[0m[91m.[0m[91m.[0m[91m..[0m[91m..[0m[91m [0m[91m.[0m[91m.[0m[91m.[0m[91m....... ....[0m[91m......  4% 11.7M 51s
  6400K ..........[0m[91m ........[0m[91m.. .......... .......... ..[0m[91m........  4%  172M 51s[0m[91m
  6450K ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m........ ..........  4%  149M 50s
  6500K ......[0m[91m.... ....[0m[91m...... ........

 10750K ..[0m[91m..[0m[91m...... .[0m[91m.[0m[91m......[0m[91m..[0m[91m .......... ......[0m[91m.... ....[0m[91m......  8%  233K 37s
 10800K ..........[0m[91m .......... ......[0m[91m.... .......... ..[0m[91m........  8%  152M 37s[0m[91m
 10850K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m ..........  8%  156M 37s
 10900K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ..........  8% 12.7M 37s
 10950K ....[0m[91m...... ..[0m[91m.[0m[91m.......[0m[91m ........[0m[91m.. ...[0m[91m.[0m[91m..[0m[91m.[0m[91m... ....[0m[91m......  8% 22.1M 36s
 11000K ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... .......... ..[0m[91m........  8% 68.6M 36s[0m[91m
 11050K .......... .......... ....[0m[91m...... ..[0m[91m........ ........[0m[91m..  8%  132M 36s
 11100K .......... .......... ..[0m[91m........[0m[91m .......... ..........  8%  171M 36s
 11150

 14550K ....[0m[91m...... ..[0m[91m......[0m[91m..[0m[91m [0m[91m.[0m[91m...[0m[91m...... .......... ....[0m[91m...... 11% 30.4M 31s
 14600K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... .......... 11%  171M 30s[0m[91m
 14650K ........[0m[91m.. ......[0m[91m.... .......... ..[0m[91m........[0m[91m .......... 11%  139M 30s
 14700K .[0m[91m.....[0m[91m.... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. ......[0m[91m.... 11%  172M 30s
 14750K .......... .......... ........[0m[91m.. .......... .......... 11%  177M 30s
 14800K ..........[0m[91m .......... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ 11% 56.6M 30s
 14850K ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. 11%  140M 30s
 14900K ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. .......... 11%  157M 30s
 14950K ....[0m[91m...... ..........[0m[91m .......... ......[0m

 18450K ........[0m[91m.. .......... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. 14%  235K[0m[91m 27s
 18500K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. .......... 14%  137M 27s
 18550K .......... ..........[0m[91m .......... ......[0m[91m.... ....[0m[91m...... 14%  190M[0m[91m 27s
 18600K .......... ........[0m[91m.. .......... ....[0m[91m...... .......... 14% 11.6M 27s[0m[91m
 18650K .......... ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. 14% 42.7M 27s
 18700K .......... .[0m[91m...[0m[91m...[0m[91m... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... 14% 38.6M 27s
 18750K ....[0m[91m...... ..[0m[91m........ .......... .......... ....[0m[91m...... 14%  168M 27s
 18800K .......... ........[0m[91m.. ......[0m[91m.... .......... .......... 14%  180M 27s[0m[91m
 18850K .......... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m

 23050K .......... ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. 17%  236K 24s
 23100K .......... ....[0m[91m...... .......... ........[0m[91m.. .......... 17% 90.5M[0m[91m 24s
 23150K ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... .......... 17%  163M 24s
 23200K ..[0m[91m........[0m[91m .......... ......[0m[91m.... .......... ..[0m[91m........ 17% 12.3M 24s
 23250K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..........[0m[91m .......... 17% 56.0M 24s
 23300K ..[0m[91m....[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... 17%  129M 24s
 23350K ........[0m[91m.. ..[0m[91m........[0m[91m ........[0m[91m.. .......... .......... 17%  136M 24s
 23400K .......... .......... .......... ....[0m[91m...... .......... 18%  151M[0m[91m 24s[0m[91m
 23450K .......... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. 18%  119M 24s
 

 27650K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..........[0m[91m .......... 21%  232K 22s
 27700K .......... .......... .......... .......... .......... 21%  175M 22s
 27750K ....[0m[91m...... .......... ........[0m[91m.. .......... .......... 21%  125M 22s
 27800K ..[0m[91m........[0m[91m .......... .......... ....[0m[91m...... .......... 21%  133M 22s[0m[91m
 27850K .......... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. 21% 10.4M 22s
 27900K .......... .......... ..........[0m[91m .......... ......[0m[91m.... 21%  150M 22s
 27950K .......... .......... ........[0m[91m.. ......[0m[91m.... .......... 21%  160M 22s
 28000K ..........[0m[91m .[0m[91m.......[0m[91m.. ......[0m[91m.... ........[0m[91m.. .......... 21%  156M 22s[0m[91m
 28050K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m .......... 21%  113M 22s
 28100K ......[0m[91m.... .......... ..[0m

 32250K .......... ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. 24%  235K[0m[91m 20s[0m[91m
 32300K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m .........[0m[91m. ......[0m[91m.... 24% 45.4M 20s
 32350K ...[0m[91m....... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 24%  167M 20s
 32400K ..[0m[91m........[0m[91m ...[0m[91m....... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ 24%  180M 20s[0m[91m
 32450K ........[0m[91m.. .......... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. 24% 12.5M 20s
 32500K ......[0m[91m.... ....[0m[91m......[0m[91m ..[0m[91m........ .......... .......... 24%  114M 20s
 32550K ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 25% 41.5M 20s
 32600K ..[0m[91m........[0m[91m ........[0m[91m.[0m[91m. ......[0m[91m.... ....[0m[91m...... ..[0m[91m..

 36250K ....[0m[91m....[0m[91m.[0m[91m. ...[0m[91m.[0m[91m.[0m[91m.[0m[91m....[0m[91m ....[0m[91m...... ..[0m[91m.[0m[91m...[0m[91m....[0m[91m ........[0m[91m.. 27% 18.7M 19s
 36300K .......... ....[0m[91m...... .......... .......... .......... 27% 37.2M 19s
 36350K .......... .......... .......... .......... .......... 27%  184M 19s
 36400K .......... .......... .......... .......... .......... 27%  184M 19s
 36450K .......... .......... .......... .......... .......... 28%  167M 19s
 36500K .......... .......... .......... .......... .......... 28%  188M 18s
 36550K .......... .......... .......... .......... .......... 28%  186M 18s
 36600K .......... .......... .......... .......... .......... 28%  190M 18s
 36650K .......... .......... .......... .......... .......... 28%  149M 18s
 36700K .......... ....[0m[91m...... ..........[0m[91m .......... .......... 28%  159M 18s
 36750K .......... ..[0m[91m........ .......... ......[0m[91m.... ....[0

 39950K .......... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 30%  235K 18s
 40000K ..........[0m[91m .......... ......[0m[91m.... .......... .......... 30% 66.4M 18s[0m[91m
 40050K ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. 30%  155M 18s
 40100K ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... 30%  184M 18s
 40150K ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... .......... 30% 11.6M 18s
 40200K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... .......... 30%  102M 18s[0m[91m
 40250K ........[0m[91m.. ......[0m[91m.[0m[91m... [0m[91m....[0m[91m.[0m[91m..... ..........[0m[91m .......... 30% 27.2M 18s
 40300K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.[0m[91m... 30% 96.1M 18s
 40350K .......... ....[0m[91m....[0m[91m.[0m

 44600K .[0m[91m.[0m[91m........ ........[0m[91m.. .......... ....[0m[91m...... .......... 34%  235K 17s[0m[91m
 44650K .......... ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. 34%  124M 16s
 44700K ......[0m[91m.... ....[0m[91m...... ..........[0m[91m .......... .......... 34%  154M 16s
 44750K ....[0m[91m...... ..........[0m[91m ........[0m[91m.. .......... ....[0m[91m.[0m[91m..... 34%[0m[91m 14.4M[0m[91m 16s
 44800K[0m[91m ..[0m[91m........[0m[91m .......... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ 34% 75.2M 16s[0m[91m
 44850K ........[0m[91m.. .......... .......... ..........[0m[91m .......... 34% 85.8M 16s
 44900K .......... .......... .......... ........[0m[91m.. .......... 34%  144M 16s
 44950K .......... .......... .......... ......[0m[91m.... ....[0m[91m...... 34%  124M 16s
 45000K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... .......... 34%  1

 49200K ..........[0m[91m ........[0m[91m.. ......[0m[91m.... .......... ..[0m[91m........ 37%  234K 15s[0m[91m
 49250K .......... .......... ....[0m[91m...... ..[0m[91m........ .......... 37%  146M 15s
 49300K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. .......... 37% 25.2M 15s
 49350K ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... .......... 37% 12.8M 15s
 49400K ..[0m[91m........ ........[0m[91m.. .......... .......... .......... 37% 98.3M 15s
 49450K .......... ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. 38% 62.2M 15s
 49500K ..[0m[91m....[0m[91m.... ....[0m[91m...... ..[0m[91m.[0m[91m.......[0m[91m ........[0m[91m.. ......[0m[91m.... 38% 38.9M 15s
 49550K ....[0m[91m...... ..........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 38% 74.1M 15s
 49600K [0m[91m..........[0m[91m ........[0m[91m.. ......[0m[

 53800K ..[0m[91m........ .......... ......[0m[91m.... ....[0m[91m...... .......... 41%  235K 14s
 53850K .......... .......... .......... .......... .......... 41%  150M 14s
 53900K .......... ....[0m[91m...... .......... .......... [0m[91m......[0m[91m....[0m[91m 41%  125M 14s
 53950K .......... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 41% 14.4M 14s
 54000K ..........[0m[91m .......... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ 41% 13.1M 14s[0m[91m
 54050K .......... ......[0m[91m.... .......... ..........[0m[91m .......... 41% 73.1M 14s
 54100K .......... .......... .......... ........[0m[91m.. .......... 41%  164M 14s
 54150K .......... .......... .......... .......... .......... 41%  158M 14s
 54200K ..[0m[91m........ .......... .......... ....[0m[91m...... ..........[0m[91m 41%  149M 14s[0m[91m
 54250K ........[0m[91m.. .......... .......... ..[0m[91m........ .......... 41% 11.1M 14s
 5430

 57900K[0m[91m ......[0m[91m.... ....[0m[91m...... ......[0m[91m..[0m[91m.[0m[91m.[0m[91m .......... .......... 44% 35.6M 13s
 57950K .......... .......... ........[0m[91m.. ...[0m[91m...[0m[91m.... ....[0m[91m.[0m[91m..... 44% 59.4M 13s
 58000K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... .......... ..[0m[91m........ 44% 79.9M 13s[0m[91m
 58050K ..[0m[91m........ ......[0m[91m.... ....[0m[91m...... ..[0m[91m....[0m[91m...[0m[91m.[0m[91m [0m[91m........[0m[91m.. 44% 57.4M 13s
 58100K .....[0m[91m..... .......[0m[91m... ..[0m[91m........ ........[0m[91m.. .......... 44%  142M 13s
 58150K ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. .......... ....[0m[91m...... 44%  180M 13s
 58200K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ 44%  159M 13s[0m[91m
 58250K ........[0m[91m.. ......[0m[91m.... .......... ..[0m[91m........[0m[91m ....

 61500K ....[0m[91m...... ....[0m[91m...... .......... ........[0m[91m.. ......[0m[91m.... 47%  239K 13s
 61550K ....[0m[91m...... ..[0m[91m.[0m[91m.......[0m[91m [0m[91m......[0m[91m..[0m[91m.. .......... ....[0m[91m...... 47% 61.9M 13s
 61600K ..........[0m[91m .......... ......[0m[91m.... ....[0m[91m...... .......... 47%  176M 13s
 61650K ........[0m[91m.. .......... ....[0m[91m...... ..........[0m[91m .......... 47% 7.47M 13s
 61700K ......[0m[91m.... .........[0m[91m. ..[0m[91m........[0m[91m .....[0m[91m...[0m[91m.. ......[0m[91m.... 47% 58.5M 12s
 61750K ....[0m[91m...... ..........[0m[91m ........[0m[91m.. ......[0m[91m.... .......... 47% 50.4M 12s[0m[91m
 61800K ..[0m[91m........[0m[91m .......... .......... ....[0m[91m...... ..[0m[91m........ 47%  188M 12s[0m[91m
 61850K ........[0m[91m.. .....[0m[91m.[0m[91m.... ....[0m[91m...... .......... ........[0m[91m.. 47% 73.0M 12s
 61900K ....[0m[91m...

 66100K ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. ..[0m[91m....[0m[91m.... 50%  244K 12s
 66150K ....[0m[91m.[0m[91m..... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 50% 70.1M 12s[0m[91m
 66200K ..........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ......[0m[91m.... 50%  172M 12s
 66250K .......... ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. 50% 15.7M 12s
 66300K .......... ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... 50% 13.1M 12s
 66350K .......... ..[0m[91m........[0m[91m ........[0m[91m.. .......... ....[0m[91m...... 50% 46.2M 12s
 66400K ..........[0m[91m .......... ..[0m[91m....[0m[91m.... .......... .......... 51% 56.4M 12s
 66450K .......... .......... .......... .......... .......... 51%  155M 11s
 66500K .......... .......... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m..

 70700K ......[0m[91m.... .......... [0m[91m..........[0m[91m ........[0m[91m.. ......[0m[91m.... 54%  244K 11s
 70750K .......... ..[0m[91m.[0m[91m.[0m[91m......[0m[91m .......... .......... ....[0m[91m...... 54% 46.5M 11s
 70800K ..........[0m[91m .......... .......... .......... ..[0m[91m........ 54%  192M 11s
 70850K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m .......... 54%  145M 11s
 70900K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... 54% 8.52M 11s
 70950K ....[0m[91m...... ..........[0m[91m ........[0m[91m.. .[0m[91m.....[0m[91m.... ....[0m[91m...... 54% 34.5M[0m[91m 11s
 71000K ..[0m[91m........[0m[91m ........[0m[91m.. .......... ....[0m[91m.....[0m[91m. ..[0m[91m.[0m[91m....... 54%[0m[91m 36.3M 11s
 71050K .......... ......[0m[91m.... .......... ..[0m[91m.[0m[91m.......[0m[91m ........[0m[91m.[0m[91m. 54% 63.3M 1

 74850K ........[0m[91m..[0m[91m ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m..[0m[91m 57% 57.6M 10s
 74900K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m .......... ..........[0m[91m 57% 29.9M 10s
 74950K ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 57%  188M 10s
 75000K ..........[0m[91m .......... ......[0m[91m.... ....[0m[91m...... [0m[91m.......... 57%[0m[91m  198M 10s
 75050K ...[0m[91m....... ......[0m[91m.... .[0m[91m...[0m[91m...... ..[0m[91m........[0m[91m .......... 57% 56.8M 10s
 75100K ..[0m[91m....[0m[91m.... ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... 57% 89.4M 10s
 75150K .......... .......... ........[0m[91m.. .......... .......... 57%  178M 10s
 75200K ..[0m[91m........ ........[0m[91m.. .........[0m[91m. ....[0m[91m...... [0m[91m..........[0m[91m 57%  164M 10s
 75250

 78400K ..........[0m[91m ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m...[0m[91m..... 60%  245K 9s[0m[91m
 78450K ........[0m[91m.. ..[0m[91m........ ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. 60% 58.2M 9s
 78500K ......[0m[91m.[0m[91m.[0m[91m.. ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... 60%[0m[91m 72.2M 9s
 78550K .......... .......... ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 60%  191M 9s
 78600K ..[0m[91m........ ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m..[0m[91m...... 60% 8.60M 9s[0m[91m
 78650K .......... ......[0m[91m.... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. 60% 26.9M 9s
 78700K .......... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. .......... 60% 63.3M 9s
 78750K .......... .......... ........[0m[91m.. .......... ....[0m[91m...... 60% 81.9M 9s
 78800K ..[0m[91m........[0m[91m ........[0m[91m.. ..

 83000K ..[0m[91m........ .......... .......... .......... ..[0m[91m........ 63%  176M 8s[0m[91m
 83050K ......[0m[91m.... ......[0m[91m.... .......... ..........[0m[91m .......... 63%  245K 8s
 83100K .......... ....[0m[91m...... ..[0m[91m........ .......... ......[0m[91m.... 63% 73.0M 8s
 83150K ....[0m[91m...... .......... ........[0m[91m.. .......... ....[0m[91m...... 63%  177M 8s
 83200K[0m[91m ..........[0m[91m .......... [0m[91m......[0m[91m.... .......... ..[0m[91m........ 63% 9.21M 8s[0m[91m
 83250K ........[0m[91m.. .......... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. 63% 37.7M 8s
 83300K ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. .......... 64% 30.8M 8s
 83350K ....[0m[91m...... .......... .......... ......[0m[91m.... ....[0m[91m...... 64% 49.0M 8s
 83400K ..[0m[91m........ ......[0m[91m..[0m[91m.. ......[0m[91m.... .......... ..[0m[91m........ 64% 97.5M 8s[0m[91m
 

 87650K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. 67%  245K 7s
 87700K ......[0m[91m....[0m[91m ....[0m[91m.[0m[91m.....[0m[91m ..[0m[91m........ ........[0m[91m.. .......... 67% 56.3M 7s
 87750K .......... .......... .......... ......[0m[91m.... .......... 67%  163M 7s
 87800K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... .......... 67% 19.8M 7s[0m[91m
 87850K .......... ...[0m[91m...[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. 67% 13.5M 7s
 87900K ......[0m[91m.... ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... 67% 27.9M 7s
 87950K .......... ..[0m[91m........ ........[0m[91m.. .......... ....[0m[91m.[0m[91m.[0m[91m.... 67% 38.8M 7s
 88000K ..[0m[91m........ .......... ......[0m[91m.... ..[0m[91m........ ..[0m[91m........ 67% 65.3M 7s[0m[91m
 88050K ........[0m[91m.. ......[0m[91m.... ....[0m[9

 92250K .......... ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. 70%  246K 7s
 92300K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... 70% 69.3M 7s
 92350K ....[0m[91m...... .......... ........[0m[91m.. .......... ....[0m[91m...... 70%  165M 7s
 92400K ..[0m[91m........ .......... .......... .......... ..[0m[91m........ 71% 22.6M 7s[0m[91m
 92450K .[0m[91m.......[0m[91m.. .......... ....[0m[91m...... ..[0m[91m........[0m[91m ..........[0m[91m 71% 11.1M 7s
 92500K ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. ..........[0m[91m 71% 57.1M 7s
 92550K ....[0m[91m...... ..........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 71% 36.3M 7s
 92600K ..[0m[91m........ .......... .......... .......... .......... 71% 96.6M 7s
 92650K .......... ......[0m[91m.... .......... .......... ........[0m[91m.. 71% 60.1M 6s
 92700K ......[0m

 96550K ....[0m[91m...... ..[0m[91m........[0m[91m .[0m[91m......... ......[0m[91m.[0m[91m... ....[0m[91m.[0m[91m....[0m[91m. 74% 44.0M[0m[91m 6s[0m[91m
 96600K ..[0m[91m........[0m[91m .......[0m[91m... .......... .[0m[91m...[0m[91m...... .......... 74% 44.3M 6s[0m[91m
 96650K .......... ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m .......... 74%  156M 6s
 96700K ......[0m[91m.... ..........[0m[91m .......... ........[0m[91m.. ......[0m[91m.... 74%  181M[0m[91m 6s
 96750K .......... ..[0m[91m........[0m[91m .....[0m[91m..... ..[0m[91m....[0m[91m.... ....[0m[91m...... 74%  108M 6s
 96800K ..[0m[91m..[0m[91m.[0m[91m.....[0m[91m .....[0m[91m....[0m[91m. ....[0m[91m..[0m[91m.... ....[0m[91m...... ..[0m[91m........ 74% 80.9M 6s
 96850K ........[0m[91m.. .......... ....[0m[91m...... ..........[0m[91m .......... 74%  246K 6s
 96900K ......[0m[91m.... ......[0m[91m.... ..[0m[91m.[0

 99950K ....[0m[91m...... ..[0m[91m......[0m[91m..[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 76%  246K 5s
100000K ..........[0m[91m ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m........ 76%  142M 5s
100050K ........[0m[91m.. .......... ....[0m[91m...... ......[0m[91m....[0m[91m .......... 76%  149M 5s
100100K ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... 76% 24.8M 5s
100150K ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... ....[0m[91m.[0m[91m..... 76% 10.2M 5s
100200K ..[0m[91m........[0m[91m ........[0m[91m.. .........[0m[91m. ....[0m[91m...... [0m[91m.......... 76% 56.3M 5s[0m[91m
100250K ........[0m[91m.. ......[0m[91m.[0m[91m... .......... ..[0m[91m........ ........[0m[91m..[0m[91m 77% 31.6M 5s
100300K[0m[91m ......[0m[91m.... ....[0m[91m...... .......... ........[0m[91m.. ......[0m[91m.... 77% 69.8M 5s
100350K ....[

104550K ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... ....[0m[91m...... 80%  245K 4s
104600K ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... .......... 80% 94.5M 4s[0m[91m
104650K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. 80% 78.0M 4s
104700K .......... ...[0m[91m....... ..........[0m[91m ........[0m[91m.. ......[0m[91m.... 80% 19.0M 4s
104750K .......... ..[0m[91m........ ........[0m[91m.. .......... ..[0m[91m..[0m[91m...... 80% 11.5M 4s
104800K ..........[0m[91m ........[0m[91m.[0m[91m. ......[0m[91m.... .......... ..[0m[91m........ 80% 45.8M 4s[0m[91m
104850K ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. 80% 54.7M 4s
104900K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. .......... 80% 35.2M 4s
104950K ....[0m[91m...... ..[0m[91m........ ..

109150K ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m.[0m[91m..... 83%  247K 4s
109200K .......... .......... .......... .......... .......... 83%  107M 4s
109250K .......... .......... .......... .......... ........[0m[91m.. 83%  172M 4s
109300K ......[0m[91m.... .......... ..[0m[91m........[0m[91m .......... ......[0m[91m.... 83% 78.8M 4s
109350K ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... .......... 84% 7.68M 4s
109400K ..[0m[91m........ ........[0m[91m.. ....[0m[91m.[0m[91m.[0m[91m.... ....[0m[91m...... ..[0m[91m........ 84%[0m[91m 33.1M 4s[0m[91m
109450K ........[0m[91m.. .......... ....[0m[91m...... ..[0m[91m........ [0m[91m........[0m[91m.. 84% 99.7M 4s
109500K .......... ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m....[0m[91m 84%[0m[91m 29.6M 4s
109550K .......... ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m...

113800K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ 87%  247K 3s[0m[91m
113850K .......... ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. 87%  144M 3s
113900K .......... ....[0m[91m...... ..........[0m[91m ........[0m[91m.. ......[0m[91m.... 87%  154M 3s
113950K ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 87% 8.67M 3s
114000K ..........[0m[91m .......... ......[0m[91m.... .......... ..[0m[91m........ 87% 31.9M 3s
114050K .......[0m[91m.[0m[91m.[0m[91m. ......[0m[91m....[0m[91m ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. 87%[0m[91m 25.7M 3s
114100K ..[0m[91m....[0m[91m.... ....[0m[91m......[0m[91m [0m[91m.[0m[91m.[0m[91m........ ........[0m[91m.. ......[0m[91m.... 87% 44.8M 3s
114150K ....[0m[91m...... ..........[0m[91m .......[0m[91m... ......[0m[91m.... .......... 87%  159M

118400K ..[0m[91m........[0m[91m ........[0m[91m.. ......[0m[91m.... .......... ..[0m[91m........ 90%  247K 2s
118450K ........[0m[91m.. .......... ....[0m[91m...... ..........[0m[91m .......... 91% 88.7M 2s
118500K ....[0m[91m..[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. .......... 91%  171M 2s
118550K .......... ..[0m[91m........ .......... ......[0m[91m.... .......... 91% 20.3M 2s
118600K ..[0m[91m........ ........[0m[91m.. .......[0m[91m... ....[0m[91m...... .......... 91% 9.42M 2s[0m[91m
118650K ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........[0m[91m ........[0m[91m.. 91% 37.1M 2s
118700K ......[0m[91m.... ....[0m[91m...... ..........[0m[91m .......... ......[0m[91m.... 91% 47.5M 2s[0m[91m
118750K ....[0m[91m...... ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... 91% 76.6M 2s
118800K ..........[0m[91m .......... ......[0m[91m.... .........

122550K .......... .......... .......... ......[0m[91m.... ....[0m[91m...... 94%  174M 1s
122600K .......... ........[0m[91m.. .......... ....[0m[91m...... .......... 94%  129M 1s[0m[91m
122650K ..........[0m[91m ......[0m[91m.... .......... ..[0m[91m........[0m[91m ........[0m[91m.. 94% 34.6M 1s
122700K ......[0m[91m.... ....[0m[91m...... ..........[0m[91m .......... .......... 94% 71.2M 1s
122750K .......... .......... .......... .......... .......... 94% 90.8M 1s
122800K .......... .......... .......... .......... .......... 94% 73.6M 1s
122850K .......... .......... .......... .......... .......... 94%  248M 1s
122900K .......... .......... .......... ........[0m[91m.. .......... 94% 84.0M 1s
122950K ....[0m[91m...... ..........[0m[91m ........[0m[91m.. .......... ....[0m[91m...... 94% 64.8M 1s
123000K ..[0m[91m........ ........[0m[91m.. .......... ....[0m[91m...... .......... 94%  247K 1s
123050K ........[0m[91m.. .......... .......... .

127000K ..[0m[91m........[0m[91m ........[0m[91m.. .......... ....[0m[91m...... .......... 97% 63.6M 1s[0m[91m
127050K .......... ......[0m[91m.... .........[0m[91m. ..[0m[91m........ ....[0m[91m....[0m[91m.. 97% 16.1M 1s
127100K ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ .......... ..[0m[91m....[0m[91m.... 97% 79.8M 1s
127150K ....[0m[91m...... ..[0m[91m........ .......... .......... ....[0m[91m...... 97% 63.4M 1s
127200K ..........[0m[91m ........[0m[91m.. ......[0m[91m.... ....[0m[91m...... ..[0m[91m........ 97%  153M 0s
127250K .......... .........[0m[91m. ....[0m[91m...... ..[0m[91m........[0m[91m .......... 97% 55.0M 0s
127300K ......[0m[91m.... .......... ..[0m[91m........ ........[0m[91m.. .......... 97% 80.1M 0s
127350K ....[0m[91m...... ..[0m[91m........[0m[91m .......... ......[0m[91m.... ..........[0m[91m 97%  113M 0s
127400K ..[0m[91m........ ........[0m[91m.. ......[0m[91m.... ....[0m

Collecting google-pasta>=0.1.1
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting tensorboard<1.13.0,>=1.12.0
  Downloading tensorboard-1.12.2-py3-none-any.whl (3.0 MB)
Collecting tensorflow-estimator>=1.13.0rc0
  Downloading tensorflow_estimator-2.2.0-py2.py3-none-any.whl (454 kB)
Collecting markdown>=2.6.8
  Downloading Markdown-3.2.1-py2.py3-none-any.whl (88 kB)
Collecting h5py
  Downloading h5py-2.10.0-cp36-cp36m-manylinux1_x86_64.whl (2.9 MB)
[91mERROR: sagemaker-containers 2.8.6 has requirement scipy>=1.2.2, but you'll have scipy 1.2.1 which is incompatible.
[0mInstalling collected packages: numpy, google-pasta, markdown, tensorboard, tensorflow-estimator, tensorflow, h5py
Successfully installed google-pasta-0.2.0 h5py-2.10.0 markdown-3.2.1 numpy-1.18.3 tensorboard-1.12.2 tensorflow-1.13.0 tensorflow-estimator-2.2.0
Collecting tensorflow-estimator==1.13.0
  Downloading tensorflow_estimator-1.13.0-py2.py3-none-any.whl (367 kB)
Installing collected packages: ten

  Attempting uninstall: s3transfer
    Found existing installation: s3transfer 0.3.3
    Uninstalling s3transfer-0.3.3:
      Successfully uninstalled s3transfer-0.3.3
  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 5.3.1
    Uninstalling PyYAML-5.3.1:
      Successfully uninstalled PyYAML-5.3.1
Successfully installed PyYAML-3.13 botocore-1.12.120 s3transfer-0.2.1
Removing intermediate container 896aff06b902
 ---> 4bde9a079584
Step 11/25 : RUN pip install boto3
 ---> Running in 9c19ba28e430
Collecting s3transfer<0.4.0,>=0.3.0
  Downloading s3transfer-0.3.3-py2.py3-none-any.whl (69 kB)
Collecting botocore<1.16.0,>=1.15.34
  Downloading botocore-1.15.43-py2.py3-none-any.whl (6.1 MB)
[91mERROR: awscli 1.16.130 has requirement botocore==1.12.120, but you'll have botocore 1.15.43 which is incompatible.
[0m[91mERROR: awscli 1.16.130 has requirement s3transfer<0.3.0,>=0.2.0, but you'll have s3transfer 0.3.3 which is incompatible.
[0mInstalling collected packages: bo

Collecting tabulate>=0.7.7
  Downloading tabulate-0.8.7-py3-none-any.whl (24 kB)
Collecting tqdm>4.11.1
  Downloading tqdm-4.45.0-py2.py3-none-any.whl (60 kB)
Collecting msgpack>=0.5.2
  Downloading msgpack-1.0.0-cp36-cp36m-manylinux1_x86_64.whl (274 kB)
Collecting msgpack-numpy>=0.4.4.2
  Downloading msgpack_numpy-0.4.5-py2.py3-none-any.whl (6.1 kB)
Collecting pyzmq>=16
  Downloading pyzmq-19.0.0-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB)
Installing collected packages: tabulate, tqdm, msgpack, msgpack-numpy, pyzmq, tensorpack
  Running setup.py develop for tensorpack
Successfully installed msgpack-1.0.0 msgpack-numpy-0.4.5 pyzmq-19.0.0 tabulate-0.8.7 tensorpack tqdm-4.45.0
Removing intermediate container 75da6b9a62a8
 ---> 0887ca90ca0b
Step 24/25 : COPY resources/train.py /opt/ml/code/train.py
 ---> 646685b4366f
Step 25/25 : ENV SAGEMAKER_PROGRAM train.py
 ---> Running in 654676defdd4
Removing intermediate container 654676defdd4
 ---> 7c504364a538
Successfully built 7c504364a538
Succes

[24B356ba2a: Pushed   709.3MB/700.3MB[43A[1K[K[40A[1K[K[40A[1K[K[42A[1K[K[41A[1K[K[39A[1K[K[41A[1K[K[38A[1K[K[41A[1K[K[40A[1K[K[36A[1K[K[37A[1K[K[36A[1K[K[37A[1K[K[41A[1K[K[41A[1K[K[36A[1K[K[35A[1K[K[38A[1K[K[35A[1K[K[41A[1K[K[35A[1K[K[38A[1K[K[35A[1K[K[38A[1K[K[37A[1K[K[38A[1K[K[41A[1K[K[38A[1K[K[35A[1K[K[38A[1K[K[35A[1K[K[37A[1K[K[35A[1K[K[37A[1K[K[35A[1K[K[37A[1K[K[38A[1K[K[41A[1K[K[38A[1K[K[41A[1K[K[37A[1K[K[41A[1K[K[37A[1K[K[41A[1K[K[34A[1K[K[34A[1K[K[41A[1K[K[35A[1K[K[41A[1K[K[34A[1K[K[38A[1K[K[35A[1K[K[35A[1K[K[37A[1K[K[35A[1K[K[37A[1K[K[38A[1K[K[35A[1K[K[37A[1K[K[35A[1K[K[38A[1K[K[35A[1K[K[38A[1K[K[33A[1K[K[38A[1K[K[37A[1K[K[35A[1K[K[41A[1K[K[35A[1K[K[33A[1K[K[37A[1K[K[38A[1K[K[35A[1K[K[38A[1K[K[32A[1K[K[33A[1K[K[32A[1K[K[33A[1K[K[37A[1K[K[

 Set ```aws_samples_image``` below to Amazon ECR URI of the image you pushed above.

In [17]:
aws_samples_image = "393782509758.dkr.ecr.ap-southeast-1.amazonaws.com/mask-rcnn-tensorflow-sagemaker:tf1.13-153442b"

## SageMaker Initialization 
We have staged the data and we have built and pushed the training docker image to Amazon ECR. Now we are ready to start using Amazon SageMaker.

In [28]:
%%time
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator

role = get_execution_role() # provide a pre-existing role ARN as an alternative to creating a new role
print(f'SageMaker Execution Role:{role}')

client = boto3.client('sts')
account = client.get_caller_identity()['Account']
print(f'AWS account:{account}')

session = boto3.session.Session()
region = session.region_name
print(f'AWS region:{region}')

SageMaker Execution Role:arn:aws:iam::393782509758:role/service-role/AmazonSageMaker-ExecutionRole-20191115T140457
AWS account:393782509758
AWS region:ap-southeast-1
CPU times: user 108 ms, sys: 2.12 ms, total: 110 ms
Wall time: 1.08 s


Next, we set ```training_image``` to the Amazon ECR image URI you saved in a previous step. 

In [29]:
training_image =  aws_samples_image# set to tensorpack_image or aws_samples_image 
print(f'Training image: {training_image}')

Training image: 393782509758.dkr.ecr.ap-southeast-1.amazonaws.com/mask-rcnn-tensorflow-sagemaker:tf1.13-153442b


## Define SageMaker Data Channels
In this step, we define SageMaker *train* data channel. 

In [30]:
prefix = "mask-rcnn/sagemaker" #prefix in your S3 bucket

s3train = f's3://{s3_bucket}/{prefix}/input/invoice_dataset_400'


train = sagemaker.session.s3_input(s3train, distribution='FullyReplicated', 
                        content_type='application/tfrecord', s3_data_type='S3Prefix')


data_channels = {'train': train}

Next, we define the model output location in S3 bucket.

In [31]:
s3_output_location = f's3://{s3_bucket}/{prefix}/output'

# Configure Hyper-parameters
Next, we define the hyper-parameters. 

Note, some hyper-parameters are different between the two implementations. The batch size per GPU in TensorPack Faster-RCNN/Mask-RCNN is fixed at 1, but is configurable in AWS Samples Mask-RCNN. The learning rate schedule is specified in units of steps in TensorPack Faster-RCNN/Mask-RCNN, but in epochs in AWS Samples Mask-RCNN.

The detault learning rate schedule values shown below correspond to training for a total of 24 epochs, at 120,000 images per epoch.

<table align='left'>
    <caption>TensorPack Faster-RCNN/Mask-RCNN  Hyper-parameters</caption>
    <tr>
    <th style="text-align:center">Hyper-parameter</th>
    <th style="text-align:center">Description</th>
    <th style="text-align:center">Default</th>
    </tr>
    <tr>
        <td style="text-align:center">mode_fpn</td>
        <td style="text-align:left">Flag to indicate use of Feature Pyramid Network (FPN) in the Mask R-CNN model backbone</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">mode_mask</td>
        <td style="text-align:left">A value of "False" means Faster-RCNN model, "True" means Mask R-CNN moodel</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">eval_period</td>
        <td style="text-align:left">Number of epochs period for evaluation during training</td>
        <td style="text-align:center">1</td>
    </tr>
    <tr>
        <td style="text-align:center">lr_schedule</td>
        <td style="text-align:left">Learning rate schedule in training steps</td>
        <td style="text-align:center">'[240000, 320000, 360000]'</td>
    </tr>
    <tr>
        <td style="text-align:center">batch_norm</td>
        <td style="text-align:left">Batch normalization option ('FreezeBN', 'SyncBN', 'GN', 'None') </td>
        <td style="text-align:center">'FreezeBN'</td>
    </tr>
    <tr>
        <td style="text-align:center">images_per_epoch</td>
        <td style="text-align:left">Images per epoch </td>
        <td style="text-align:center">120000</td>
    </tr>
    <tr>
        <td style="text-align:center">data_train</td>
        <td style="text-align:left">Training data under data directory</td>
        <td style="text-align:center">'coco_train2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">data_val</td>
        <td style="text-align:left">Validation data under data directory</td>
        <td style="text-align:center">'coco_val2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">resnet_arch</td>
        <td style="text-align:left">Must be 'resnet50' or 'resnet101'</td>
        <td style="text-align:center">'resnet50'</td>
    </tr>
    <tr>
        <td style="text-align:center">backbone_weights</td>
        <td style="text-align:left">Pre-trained model weights</td>
        <td style="text-align:center">'ImageNet-R50-AlignPadding.npz'</td>
    </tr>
</table>

    
<table align='left'>
    <caption>AWS Samples Mask-RCNN  Hyper-parameters</caption>
    <tr>
    <th style="text-align:center">Hyper-parameter</th>
    <th style="text-align:center">Description</th>
    <th style="text-align:center">Default</th>
    </tr>
    <tr>
        <td style="text-align:center">mode_fpn</td>
        <td style="text-align:left">Flag to indicate use of Feature Pyramid Network (FPN) in the Mask R-CNN model backbone</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">mode_mask</td>
        <td style="text-align:left">A value of "False" means Faster-RCNN model, "True" means Mask R-CNN moodel</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">eval_period</td>
        <td style="text-align:left">Number of epochs period for evaluation during training</td>
        <td style="text-align:center">1</td>
    </tr>
    <tr>
        <td style="text-align:center">lr_epoch_schedule</td>
        <td style="text-align:left">Learning rate schedule in epochs</td>
        <td style="text-align:center">'[(16, 0.1), (20, 0.01), (24, None)]'</td>
    </tr>
    <tr>
        <td style="text-align:center">batch_size_per_gpu</td>
        <td style="text-align:left">Batch size per gpu ( Minimum 1, Maximum 4)</td>
        <td style="text-align:center">4</td>
    </tr>
    <tr>
        <td style="text-align:center">batch_norm</td>
        <td style="text-align:left">Batch normalization option ('FreezeBN', 'SyncBN', 'GN', 'None') </td>
        <td style="text-align:center">'FreezeBN'</td>
    </tr>
    <tr>
        <td style="text-align:center">images_per_epoch</td>
        <td style="text-align:left">Images per epoch </td>
        <td style="text-align:center">120000</td>
    </tr>
    <tr>
        <td style="text-align:center">data_train</td>
        <td style="text-align:left">Training data under data directory</td>
        <td style="text-align:center">'train2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">data_val</td>
        <td style="text-align:left">Validation data under data directory</td>
        <td style="text-align:center">'val2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">resnet_arch</td>
        <td style="text-align:left">Must be 'resnet50' or 'resnet101'</td>
        <td style="text-align:center">'resnet50'</td>
    </tr>
    <tr>
        <td style="text-align:center">backbone_weights</td>
        <td style="text-align:left">Pre-trained model weights</td>
        <td style="text-align:center">'ImageNet-R50-AlignPadding.npz'</td>
    </tr>
</table>

In [32]:
hyperparameters = {
                    "mode_fpn": "True",
                    "mode_mask": "True",
                    "eval_period": 10,
                    "batch_norm": "FreezeBN",
                    "batch_size_per_gpu": 1,
                    "images_per_epoch": 100,
                    "lr_epoch_schedule": '[(160, 0.1), (200, 0.01), (240, None)]'
                  }

## Define Training Metrics
Next, we define the regular expressions that SageMaker uses to extract algorithm metrics from training logs and send them to [AWS CloudWatch metrics](https://docs.aws.amazon.com/en_pv/AmazonCloudWatch/latest/monitoring/working_with_metrics.html). These algorithm metrics are visualized in SageMaker console.

In [33]:
metric_definitions=[
             {
                "Name": "fastrcnn_losses/box_loss",
                "Regex": ".*fastrcnn_losses/box_loss:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_loss",
                "Regex": ".*fastrcnn_losses/label_loss:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_metrics/accuracy",
                "Regex": ".*fastrcnn_losses/label_metrics/accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_metrics/false_negative",
                "Regex": ".*fastrcnn_losses/label_metrics/false_negative:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_metrics/fg_accuracy",
                "Regex": ".*fastrcnn_losses/label_metrics/fg_accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/num_fg_label",
                "Regex": ".*fastrcnn_losses/num_fg_label:\\s*(\\S+).*"
            },
             {
                "Name": "maskrcnn_loss/accuracy",
                "Regex": ".*maskrcnn_loss/accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "maskrcnn_loss/fg_pixel_ratio",
                "Regex": ".*maskrcnn_loss/fg_pixel_ratio:\\s*(\\S+).*"
            },
            {
                "Name": "maskrcnn_loss/maskrcnn_loss",
                "Regex": ".*maskrcnn_loss/maskrcnn_loss:\\s*(\\S+).*"
            },
            {
                "Name": "maskrcnn_loss/pos_accuracy",
                "Regex": ".*maskrcnn_loss/pos_accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/IoU=0.5",
                "Regex": ".*mAP\\(bbox\\)/IoU=0\\.5:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/IoU=0.5:0.95",
                "Regex": ".*mAP\\(bbox\\)/IoU=0\\.5:0\\.95:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/IoU=0.75",
                "Regex": ".*mAP\\(bbox\\)/IoU=0\\.75:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/large",
                "Regex": ".*mAP\\(bbox\\)/large:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/medium",
                "Regex": ".*mAP\\(bbox\\)/medium:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/small",
                "Regex": ".*mAP\\(bbox\\)/small:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(segm)/IoU=0.5",
                "Regex": ".*mAP\\(segm\\)/IoU=0\\.5:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(segm)/IoU=0.5:0.95",
                "Regex": ".*mAP\\(segm\\)/IoU=0\\.5:0\\.95:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(segm)/IoU=0.75",
                "Regex": ".*mAP\\(segm\\)/IoU=0\\.75:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(segm)/large",
                "Regex": ".*mAP\\(segm\\)/large:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(segm)/medium",
                "Regex": ".*mAP\\(segm\\)/medium:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(segm)/small",
                "Regex": ".*mAP\\(segm\\)/small:\\s*(\\S+).*"
            }  
            
    ]

## Define SageMaker Training Job

Next, we use SageMaker [Estimator](https://sagemaker.readthedocs.io/en/stable/estimators.html) API to define a SageMaker Training Job. 

We recommned using 32 GPUs, so we set ```train_instance_count=4``` and ```train_instance_type='ml.p3.16xlarge'```, because there are 8 Tesla V100 GPUs per ```ml.p3.16xlarge``` instance. We recommend using 100 GB [Amazon EBS](https://aws.amazon.com/ebs/) storage volume with each training instance, so we set ```train_volume_size = 100```. We want to replicate training data to each training instance, so we set ```input_mode= 'File'```.

We run the training job in your private VPC, so we need to set the ```subnets``` and ```security_group_ids``` prior to running the cell below. You may specify multiple subnet ids in the ```subnets``` list. The subnets included in the ```sunbets``` list must be part of the output of  ```./stack-sm.sh``` CloudFormation stack script used to create this notebook instance. Specify only one security group id in ```security_group_ids``` list. The security group id must be part of the output of  ```./stack-sm.sh``` script.

For ```train_instance_type``` below, you have the option to use ```ml.p3.16xlarge``` with 16 GB per-GPU memory and 25 Gbs network interconnectivity, or ```ml.p3dn.24xlarge``` with 32 GB per-GPU memory and 100 Gbs network interconnectivity. The ```ml.p3dn.24xlarge``` instance type offers significantly better performance than ```ml.p3.16xlarge``` for Mask R-CNN distributed TensorFlow training.

In [34]:
# security_group_ids =  # ['sg-xxxxxxxx']
# subnets =      # [ 'subnet-xxxxxxx']
sagemaker_session = sagemaker.session.Session(boto_session=session)
mask_rcnn_estimator = Estimator(training_image,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.p3.2xlarge',
                                         #train_instance_type='local_gpu',
                                         train_volume_size = 100,
                                         train_max_run = 10000,
                                         train_max_wait = 10000,
                                         input_mode= 'File',
                                         output_path=s3_output_location,
                                         sagemaker_session=sagemaker_session, 
                                         hyperparameters = hyperparameters,
                                         metric_definitions = metric_definitions,
                                         train_use_spot_instances=True,
                                         base_job_name="mask-rcnn-s3")

Finally, we launch the SageMaker training job. 

The estimated time for downloading data to all the training instances is 20 minutes. The time to complete the training depends on type and number of training instances, and the training image used for training.

In [35]:
mask_rcnn_estimator.fit(inputs=data_channels, logs="All")

2020-04-05 15:14:52 Starting - Starting the training job......
2020-04-05 15:15:23 Starting - Launching requested ML instances......
2020-04-05 15:16:24 Starting - Preparing the instances for training.........
2020-04-05 15:18:09 Downloading - Downloading input data
2020-04-05 15:18:09 Training - Downloading the training image............
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])[0m
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])[0m
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])[0m
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])[0m
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])[0m
  np_resource = np.dtype([("resource", np.ubyte, 1)])[0m
[34mLimited tf.compat.v2.summary API due to missing TensorBoard installation[0m
[34m2020-04-05 15:20:18,404 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-04-05 15:20:18,405 sagemaker-containers INFO     Failed to parse hyperparameter lr_epoch_schedule value [(160, 0

[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @registry.py:135]#033[0m group3/block1/conv2 output: [None, 512, None, None][0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @registry.py:127]#033[0m group3/block1/conv3 input: [None, 512, None, None][0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @batch_norm.py:166]#033[0m #033[5m#033[31mWRN#033[0m [BatchNorm] Using moving_mean/moving_variance in training.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @registry.py:135]#033[0m group3/block1/conv3 output: [None, 2048, None, None][0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @registry.py:127]#033[0m group3/block2/conv1 input: [None, 2048, None, None][0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @batch_norm.py:166]#033[0m #033[5m#033[31mWRN#033[0m [BatchNorm] Using moving_mean/moving_variance in training.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @registry.py:135]#033[0m group3/block2/conv1 output: [None, 512, None, None][0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:29 @registry.py:127]#033[0m gr

[34m[1,0]<stdout>:#033[32m[0405 15:20:45 @collection.py:153]#033[0m Size of these collections were changed in tower-pred-0: (tf.GraphKeys.MODEL_VARIABLES: 183->238)[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:45 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:45 @timer.py:50]#033[0m Load Groundtruth Boxes for val2017 finished, time:0.0096sec.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:45 @summary.py:48]#033[0m [MovingAverageSummary] 27 operations in collection 'MOVING_SUMMARY_OPS' will be run with session hooks.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:45 @summary.py:95]#033[0m Summarizing collection 'summaries' of size 30.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:20:48 @base.py:231]#033[0m Creating the session ..

[34m[1,0]<stdout>:#033[32m[0405 15:21:03 @varmanip.py:104]#033[0m #033[5m#033[31mWRN#033[0m Variable group0/block0/conv2/W has dtype <dtype: 'float16'> but was given a value of dtype float32. Load it after downcasting![0m
[34m[1,0]<stdout>:#033[32m[0405 15:21:03 @varmanip.py:104]#033[0m #033[5m#033[31mWRN#033[0m Variable conv0/W has dtype <dtype: 'float16'> but was given a value of dtype float32. Load it after downcasting![0m
[34m[1,0]<stdout>:#033[32m[0405 15:21:04 @varmanip.py:104]#033[0m #033[5m#033[31mWRN#033[0m Variable group0/block0/conv1/W has dtype <dtype: 'float16'> but was given a value of dtype float32. Load it after downcasting![0m
[34m[1,0]<stdout>:#033[32m[0405 15:21:05 @varmanip.py:104]#033[0m #033[5m#033[31mWRN#033[0m Variable group0/block2/conv1/W has dtype <dtype: 'float16'> but was given a value of dtype float32. Load it after downcasting![0m
[34m[1,0]<stdout>:#033[32m[0405 15:21:08 @varmanip.py:104]#033[0m #033[5m#033[31mWRN#033[0m Variable group0/block0/co

[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @base.py:286]#033[0m Epoch 3 (global_step 300) finished, time:9.65 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @misc.py:111]#033[0m Estimated Time Left: 38 minutes 55 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.37[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.65[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 4270.1[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:20 @monitor.p

[34m[1,0]<stdout>:#033[32m[0405 15:22:39 @base.py:286]#033[0m Epoch 5 (global_step 500) finished, time:9.71 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:39 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:40 @misc.py:111]#033[0m Estimated Time Left: 38 minutes 28 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:40 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.30[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:40 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.71[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:40 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:40 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 4270.1[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:40 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:40 @monitor.p

[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @base.py:286]#033[0m Epoch 7 (global_step 700) finished, time:9.67 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @misc.py:111]#033[0m Estimated Time Left: 37 minutes 57 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.34[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.67[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 4270.1[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:22:59 @monitor.p

[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @base.py:286]#033[0m Epoch 9 (global_step 900) finished, time:9.66 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @misc.py:111]#033[0m Estimated Time Left: 37 minutes 30 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.35[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.67[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 4270.1[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:19 @monitor.p

[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @base.py:286]#033[0m Epoch 11 (global_step 1100) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @misc.py:111]#033[0m Estimated Time Left: 37 minutes 8 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:23:50 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @base.py:286]#033[0m Epoch 13 (global_step 1300) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @misc.py:111]#033[0m Estimated Time Left: 36 minutes 42 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:08 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @base.py:286]#033[0m Epoch 15 (global_step 1500) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @misc.py:111]#033[0m Estimated Time Left: 34 minutes 17 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:26 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @base.py:286]#033[0m Epoch 17 (global_step 1700) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @misc.py:111]#033[0m Estimated Time Left: 33 minutes 55 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:24:45 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @base.py:286]#033[0m Epoch 19 (global_step 1900) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @misc.py:111]#033[0m Estimated Time Left: 33 minutes 34 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:03 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @base.py:286]#033[0m Epoch 21 (global_step 2100) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @misc.py:111]#033[0m Estimated Time Left: 33 minutes 27 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:29 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @base.py:286]#033[0m Epoch 23 (global_step 2300) finished, time:9.17 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @misc.py:111]#033[0m Estimated Time Left: 33 minutes 24 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.90[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.17[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:25:48 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @base.py:286]#033[0m Epoch 25 (global_step 2500) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @misc.py:111]#033[0m Estimated Time Left: 32 minutes 49 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:06 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @base.py:286]#033[0m Epoch 27 (global_step 2700) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @misc.py:111]#033[0m Estimated Time Left: 32 minutes 27 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:24 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:26:42 @base.py:286]#033[0m Epoch 29 (global_step 2900) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:42 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:43 @misc.py:111]#033[0m Estimated Time Left: 32 minutes 12 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:43 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:43 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:43 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:43 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:43 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:26:43 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @base.py:286]#033[0m Epoch 32 (global_step 3200) finished, time:9.11 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @misc.py:111]#033[0m Estimated Time Left: 31 minutes 47 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.97[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.11[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:16 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @base.py:286]#033[0m Epoch 34 (global_step 3400) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @misc.py:111]#033[0m Estimated Time Left: 31 minutes 24 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:34 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @base.py:286]#033[0m Epoch 36 (global_step 3600) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @misc.py:111]#033[0m Estimated Time Left: 31 minutes 4 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:27:52 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @base.py:286]#033[0m Epoch 38 (global_step 3800) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @misc.py:111]#033[0m Estimated Time Left: 30 minutes 49 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:11 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:28:29 @base.py:286]#033[0m Epoch 40 (global_step 4000) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:29 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:29 @saver.py:81]#033[0m Model saved to /opt/ml/model/model-4000.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:29 @misc.py:111]#033[0m Estimated Time Left: 30 minutes 31 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:29 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:36 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m

[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @base.py:286]#033[0m Epoch 41 (global_step 4100) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @misc.py:111]#033[0m Estimated Time Left: 30 minutes 24 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:28:45 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @base.py:286]#033[0m Epoch 43 (global_step 4300) finished, time:9.16 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @misc.py:111]#033[0m Estimated Time Left: 30 minutes 18 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.91[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.16[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:04 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @base.py:286]#033[0m Epoch 45 (global_step 4500) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @misc.py:111]#033[0m Estimated Time Left: 29 minutes 50 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:22 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @base.py:286]#033[0m Epoch 47 (global_step 4700) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @misc.py:111]#033[0m Estimated Time Left: 29 minutes 24 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:40 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @base.py:286]#033[0m Epoch 49 (global_step 4900) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @misc.py:111]#033[0m Estimated Time Left: 29 minutes 3 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:29:59 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @base.py:286]#033[0m Epoch 51 (global_step 5100) finished, time:9.12 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @misc.py:111]#033[0m Estimated Time Left: 28 minutes 45 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.97[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.12[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:23 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @base.py:286]#033[0m Epoch 53 (global_step 5300) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @misc.py:111]#033[0m Estimated Time Left: 28 minutes 27 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:30:41 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @base.py:286]#033[0m Epoch 56 (global_step 5600) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @misc.py:111]#033[0m Estimated Time Left: 27 minutes 57 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:09 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @base.py:286]#033[0m Epoch 58 (global_step 5800) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @misc.py:111]#033[0m Estimated Time Left: 27 minutes 40 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:27 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:31:45 @base.py:286]#033[0m Epoch 60 (global_step 6000) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:45 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:46 @saver.py:81]#033[0m Model saved to /opt/ml/model/model-6000.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:46 @misc.py:111]#033[0m Estimated Time Left: 27 minutes 21 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:46 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:31:51 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m

[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @base.py:286]#033[0m Epoch 62 (global_step 6200) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @misc.py:111]#033[0m Estimated Time Left: 27 minutes 6 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:10 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @base.py:286]#033[0m Epoch 64 (global_step 6400) finished, time:9.11 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @misc.py:111]#033[0m Estimated Time Left: 26 minutes 59 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.97[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.11[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:28 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @base.py:286]#033[0m Epoch 66 (global_step 6600) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @misc.py:111]#033[0m Estimated Time Left: 26 minutes 36 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:32:46 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @base.py:286]#033[0m Epoch 68 (global_step 6800) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @misc.py:111]#033[0m Estimated Time Left: 26 minutes 15 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:05 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:33:23 @base.py:286]#033[0m Epoch 70 (global_step 7000) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:23 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:23 @misc.py:111]#033[0m Estimated Time Left: 25 minutes 53 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:23 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:28 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:Running per image evaluation...[0m
[34m[1,0]<stdout>:E

[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @base.py:286]#033[0m Epoch 71 (global_step 7100) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @misc.py:111]#033[0m Estimated Time Left: 25 minutes 46 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:38 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @base.py:286]#033[0m Epoch 73 (global_step 7300) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @misc.py:111]#033[0m Estimated Time Left: 25 minutes 27 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:33:56 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @base.py:286]#033[0m Epoch 75 (global_step 7500) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @misc.py:111]#033[0m Estimated Time Left: 25 minutes 6 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:14 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @base.py:286]#033[0m Epoch 78 (global_step 7800) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @misc.py:111]#033[0m Estimated Time Left: 24 minutes 38 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:34:42 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:35:00 @base.py:286]#033[0m Epoch 80 (global_step 8000) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:00 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:00 @saver.py:81]#033[0m Model saved to /opt/ml/model/model-8000.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:00 @misc.py:111]#033[0m Estimated Time Left: 24 minutes 20 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:00 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:06 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m

[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @base.py:286]#033[0m Epoch 81 (global_step 8100) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @misc.py:111]#033[0m Estimated Time Left: 24 minutes 12 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:15 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @base.py:286]#033[0m Epoch 84 (global_step 8400) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @misc.py:111]#033[0m Estimated Time Left: 23 minutes 57 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:35:43 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @base.py:286]#033[0m Epoch 86 (global_step 8600) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @misc.py:111]#033[0m Estimated Time Left: 23 minutes 26 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:01 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @base.py:286]#033[0m Epoch 88 (global_step 8800) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @misc.py:111]#033[0m Estimated Time Left: 23 minutes 7 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:19 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:36:37 @base.py:286]#033[0m Epoch 90 (global_step 9000) finished, time:9.03 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:37 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:37 @misc.py:111]#033[0m Estimated Time Left: 22 minutes 47 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:37 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:43 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:Running per image evaluation...[0m
[34m[1,0]<stdout>:E

[34m[1,0]<stdout>:#033[32m[0405 15:36:52 @base.py:286]#033[0m Epoch 91 (global_step 9100) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:52 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:53 @misc.py:111]#033[0m Estimated Time Left: 22 minutes 40 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:53 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:53 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:53 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:53 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:53 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:36:53 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @base.py:286]#033[0m Epoch 93 (global_step 9300) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @misc.py:111]#033[0m Estimated Time Left: 22 minutes 20 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:11 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @base.py:286]#033[0m Epoch 95 (global_step 9500) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @misc.py:111]#033[0m Estimated Time Left: 22 minutes 8 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:29 @monitor.

[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @base.py:286]#033[0m Epoch 97 (global_step 9700) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @misc.py:111]#033[0m Estimated Time Left: 21 minutes 49 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:37:47 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @base.py:286]#033[0m Epoch 99 (global_step 9900) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @misc.py:111]#033[0m Estimated Time Left: 21 minutes 29 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:06 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @base.py:286]#033[0m Epoch 101 (global_step 10100) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @misc.py:111]#033[0m Estimated Time Left: 21 minutes 11 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:30 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @base.py:286]#033[0m Epoch 103 (global_step 10300) finished, time:9.14 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @misc.py:111]#033[0m Estimated Time Left: 21 minutes 3 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.94[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.14[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:38:48 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @base.py:286]#033[0m Epoch 105 (global_step 10500) finished, time:9.11 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @misc.py:111]#033[0m Estimated Time Left: 20 minutes 40 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.97[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.11[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:06 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @base.py:286]#033[0m Epoch 107 (global_step 10700) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @misc.py:111]#033[0m Estimated Time Left: 20 minutes 19 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:25 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @base.py:286]#033[0m Epoch 109 (global_step 10900) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @misc.py:111]#033[0m Estimated Time Left: 19 minutes 59 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:39:43 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @base.py:286]#033[0m Epoch 111 (global_step 11100) finished, time:9.02 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @misc.py:111]#033[0m Estimated Time Left: 19 minutes 40 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:07 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @base.py:286]#033[0m Epoch 113 (global_step 11300) finished, time:9.02 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @misc.py:111]#033[0m Estimated Time Left: 19 minutes 22 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:25 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @base.py:286]#033[0m Epoch 115 (global_step 11500) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @misc.py:111]#033[0m Estimated Time Left: 19 minutes 4 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:40:43 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:41:01 @base.py:286]#033[0m Epoch 117 (global_step 11700) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:01 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:01 @misc.py:111]#033[0m Estimated Time Left: 18 minutes 44 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:01 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:01 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:01 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:02 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:02 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:02 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:41:29 @base.py:286]#033[0m Epoch 120 (global_step 12000) finished, time:9.03 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:29 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:29 @saver.py:81]#033[0m Model saved to /opt/ml/model/model-12000.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:29 @misc.py:111]#033[0m Estimated Time Left: 18 minutes 17 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:29 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:35 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[

[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @base.py:286]#033[0m Epoch 121 (global_step 12100) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @misc.py:111]#033[0m Estimated Time Left: 18 minutes 11 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:41:44 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @base.py:286]#033[0m Epoch 123 (global_step 12300) finished, time:9.16 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @misc.py:111]#033[0m Estimated Time Left: 18 minutes 1 second[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.92[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.16[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:02 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @base.py:286]#033[0m Epoch 126 (global_step 12600) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @misc.py:111]#033[0m Estimated Time Left: 17 minutes 23 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:30 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @base.py:286]#033[0m Epoch 128 (global_step 12800) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @misc.py:111]#033[0m Estimated Time Left: 17 minutes 5 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:42:48 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:43:06 @base.py:286]#033[0m Epoch 130 (global_step 13000) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:06 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:06 @misc.py:111]#033[0m Estimated Time Left: 16 minutes 48 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:06 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:12 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:Running per image evaluation...[0m
[34m[1,0]<stdout>

[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @base.py:286]#033[0m Epoch 131 (global_step 13100) finished, time:9.11 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @misc.py:111]#033[0m Estimated Time Left: 16 minutes 38 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.98[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.11[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:21 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @base.py:286]#033[0m Epoch 133 (global_step 13300) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @misc.py:111]#033[0m Estimated Time Left: 16 minutes 20 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:39 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:43:57 @base.py:286]#033[0m Epoch 135 (global_step 13500) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:57 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:58 @misc.py:111]#033[0m Estimated Time Left: 16 minutes 2 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:58 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:58 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:58 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:58 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:58 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:43:58 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @base.py:286]#033[0m Epoch 137 (global_step 13700) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @misc.py:111]#033[0m Estimated Time Left: 15 minutes 43 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:16 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @base.py:286]#033[0m Epoch 139 (global_step 13900) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @misc.py:111]#033[0m Estimated Time Left: 15 minutes 23 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:34 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @base.py:286]#033[0m Epoch 141 (global_step 14100) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @misc.py:111]#033[0m Estimated Time Left: 15 minutes 6 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:44:58 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @base.py:286]#033[0m Epoch 143 (global_step 14300) finished, time:9.14 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @misc.py:111]#033[0m Estimated Time Left: 14 minutes 54 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.95[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.14[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:17 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @base.py:286]#033[0m Epoch 146 (global_step 14600) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @misc.py:111]#033[0m Estimated Time Left: 14 minutes 17 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:45:44 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @base.py:286]#033[0m Epoch 148 (global_step 14800) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @misc.py:111]#033[0m Estimated Time Left: 13 minutes 59 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:02 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:46:21 @base.py:286]#033[0m Epoch 150 (global_step 15000) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:21 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:21 @misc.py:111]#033[0m Estimated Time Left: 13 minutes 41 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:21 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:26 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:Running per image evaluation...[0m
[34m[1,0]<stdout>

[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @base.py:286]#033[0m Epoch 151 (global_step 15100) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @misc.py:111]#033[0m Estimated Time Left: 13 minutes 35 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:35 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @base.py:286]#033[0m Epoch 153 (global_step 15300) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @misc.py:111]#033[0m Estimated Time Left: 13 minutes 17 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:46:53 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @base.py:286]#033[0m Epoch 155 (global_step 15500) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @misc.py:111]#033[0m Estimated Time Left: 12 minutes 59 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:12 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @base.py:286]#033[0m Epoch 157 (global_step 15700) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @misc.py:111]#033[0m Estimated Time Left: 12 minutes 40 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:30 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:47:58 @base.py:286]#033[0m Epoch 160 (global_step 16000) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:58 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:58 @saver.py:81]#033[0m Model saved to /opt/ml/model/model-16000.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:58 @param.py:163]#033[0m [HyperParamSetter] At global_step=16000, learning_rate changes from 0.001250 to 0.000125[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:58 @misc.py:111]#033[0m Estimated Time Left: 12 minutes 12 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:47:58 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:03 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instance

[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @base.py:286]#033[0m Epoch 161 (global_step 16100) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @misc.py:111]#033[0m Estimated Time Left: 12 minutes 4 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:12 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @base.py:286]#033[0m Epoch 163 (global_step 16300) finished, time:9.15 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @misc.py:111]#033[0m Estimated Time Left: 11 minutes 51 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.93[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.15[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:31 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @base.py:286]#033[0m Epoch 165 (global_step 16500) finished, time:9.08 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @misc.py:111]#033[0m Estimated Time Left: 11 minutes 27 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.08[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:48:49 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @base.py:286]#033[0m Epoch 167 (global_step 16700) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @misc.py:111]#033[0m Estimated Time Left: 11 minutes 9 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:07 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @base.py:286]#033[0m Epoch 169 (global_step 16900) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @misc.py:111]#033[0m Estimated Time Left: 10 minutes 50 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:26 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @base.py:286]#033[0m Epoch 171 (global_step 17100) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @misc.py:111]#033[0m Estimated Time Left: 10 minutes 31 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:49:49 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @base.py:286]#033[0m Epoch 173 (global_step 17300) finished, time:9.01 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @misc.py:111]#033[0m Estimated Time Left: 10 minutes 12 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.01[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:07 @monit

[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @base.py:286]#033[0m Epoch 175 (global_step 17500) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @misc.py:111]#033[0m Estimated Time Left: 9 minutes 54 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:26 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @base.py:286]#033[0m Epoch 177 (global_step 17700) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @misc.py:111]#033[0m Estimated Time Left: 9 minutes 35 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:50:44 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @base.py:286]#033[0m Epoch 179 (global_step 17900) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @misc.py:111]#033[0m Estimated Time Left: 9 minutes 18 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:02 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @base.py:286]#033[0m Epoch 181 (global_step 18100) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @misc.py:111]#033[0m Estimated Time Left: 9 minutes[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:26 @monitor.py:469]#0

[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @base.py:286]#033[0m Epoch 183 (global_step 18300) finished, time:9.16 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @misc.py:111]#033[0m Estimated Time Left: 8 minutes 47 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.92[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.16[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:51:44 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @base.py:286]#033[0m Epoch 185 (global_step 18500) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @misc.py:111]#033[0m Estimated Time Left: 8 minutes 25 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:03 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @base.py:286]#033[0m Epoch 187 (global_step 18700) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @misc.py:111]#033[0m Estimated Time Left: 8 minutes 6 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:21 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @base.py:286]#033[0m Epoch 189 (global_step 18900) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @misc.py:111]#033[0m Estimated Time Left: 7 minutes 47 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.02[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:52:39 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @base.py:286]#033[0m Epoch 191 (global_step 19100) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @misc.py:111]#033[0m Estimated Time Left: 7 minutes 28 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:03 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @base.py:286]#033[0m Epoch 193 (global_step 19300) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @misc.py:111]#033[0m Estimated Time Left: 7 minutes 10 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:21 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @base.py:286]#033[0m Epoch 195 (global_step 19500) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @misc.py:111]#033[0m Estimated Time Left: 6 minutes 51 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:40 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @base.py:286]#033[0m Epoch 197 (global_step 19700) finished, time:9.11 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @misc.py:111]#033[0m Estimated Time Left: 6 minutes 33 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.97[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.11[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:53:58 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @base.py:286]#033[0m Epoch 199 (global_step 19900) finished, time:9.11 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @misc.py:111]#033[0m Estimated Time Left: 6 minutes 15 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.97[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.11[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:16 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @base.py:286]#033[0m Epoch 201 (global_step 20100) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @misc.py:111]#033[0m Estimated Time Left: 5 minutes 59 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:54:40 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @base.py:286]#033[0m Epoch 204 (global_step 20400) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @misc.py:111]#033[0m Estimated Time Left: 5 minutes 30 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:08 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @base.py:286]#033[0m Epoch 206 (global_step 20600) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @misc.py:111]#033[0m Estimated Time Left: 5 minutes 11 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:26 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @base.py:286]#033[0m Epoch 208 (global_step 20800) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @misc.py:111]#033[0m Estimated Time Left: 4 minutes 52 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:55:44 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:56:03 @base.py:286]#033[0m Epoch 210 (global_step 21000) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:03 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:03 @misc.py:111]#033[0m Estimated Time Left: 4 minutes 34 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:03 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:08 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:Running per image evaluation...[0m
[34m[1,0]<stdout>:

[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @base.py:286]#033[0m Epoch 211 (global_step 21100) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @misc.py:111]#033[0m Estimated Time Left: 4 minutes 25 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:17 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:56:35 @base.py:286]#033[0m Epoch 213 (global_step 21300) finished, time:9.09 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:35 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:36 @misc.py:111]#033[0m Estimated Time Left: 4 minutes 7 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:36 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:36 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.09[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:36 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:36 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:36 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:36 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @base.py:286]#033[0m Epoch 215 (global_step 21500) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @misc.py:111]#033[0m Estimated Time Left: 3 minutes 49 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:56:54 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @base.py:286]#033[0m Epoch 217 (global_step 21700) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @misc.py:111]#033[0m Estimated Time Left: 3 minutes 30 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:12 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @base.py:286]#033[0m Epoch 219 (global_step 21900) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @misc.py:111]#033[0m Estimated Time Left: 3 minutes 12 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:30 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @base.py:286]#033[0m Epoch 221 (global_step 22100) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @misc.py:111]#033[0m Estimated Time Left: 2 minutes 53 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:57:54 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @base.py:286]#033[0m Epoch 223 (global_step 22300) finished, time:9.11 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @misc.py:111]#033[0m Estimated Time Left: 2 minutes 36 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.98[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.11[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:13 @monito

[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @base.py:286]#033[0m Epoch 226 (global_step 22600) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @misc.py:111]#033[0m Estimated Time Left: 2 minutes 8 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.04[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:40 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @base.py:286]#033[0m Epoch 228 (global_step 22800) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @misc.py:111]#033[0m Estimated Time Left: 1 minute 49 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:58:59 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:59:17 @base.py:286]#033[0m Epoch 230 (global_step 23000) finished, time:9.04 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:17 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:17 @misc.py:111]#033[0m Estimated Time Left: 1 minute 31 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:17 @eval.py:414]#033[0m Running evaluation ...[0m
[34m[1,0]<stdout>:loading annotations into memory...[0m
[34m[1,0]<stdout>:Done (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:22 @dataset.py:50]#033[0m Instances loaded from /opt/ml/input/data/train/annotations/instances_val2017.json.[0m
[34m[1,0]<stdout>:Loading and preparing results...[0m
[34m[1,0]<stdout>:DONE (t=0.00s)[0m
[34m[1,0]<stdout>:creating index...[0m
[34m[1,0]<stdout>:index created![0m
[34m[1,0]<stdout>:Running per image evaluation...[0m
[34m[1,0]<stdout>:E

[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @base.py:286]#033[0m Epoch 231 (global_step 23100) finished, time:9.06 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @misc.py:111]#033[0m Estimated Time Left: 1 minute 22 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.06[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:31 @monitor

[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @base.py:286]#033[0m Epoch 233 (global_step 23300) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @misc.py:111]#033[0m Estimated Time Left: 1 minute 4 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 15:59:50 @monitor.p

[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @base.py:286]#033[0m Epoch 235 (global_step 23500) finished, time:9.1 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @misc.py:111]#033[0m Estimated Time Left: 46 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 10.99[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.10[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:08 @monitor.py:469]#0

[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @base.py:286]#033[0m Epoch 237 (global_step 23700) finished, time:9.05 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @misc.py:111]#033[0m Estimated Time Left: 27.5 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.05[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.05[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:26 @monitor.py:469

[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @base.py:286]#033[0m Epoch 239 (global_step 23900) finished, time:9.07 seconds.[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @trainers.py:419]#033[0m Running horovod broadcast ...[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @misc.py:111]#033[0m Estimated Time Left: 9.16 seconds[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @performance.py:179]#033[0m [ThroughputTracker] Over last epoch, MeanEpochThroughput: 11.03[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @performance.py:180]#033[0m [ThroughputTracker] Over last epoch, EpochWallClockDuration: 9.07[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @performance.py:181]#033[0m [ThroughputTracker] Over last epoch, CallbackOverheadDuration: 0.00[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @monitor.py:469]#033[0m PeakMemory(MB)/gpu:0: 5050.4[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @monitor.py:469]#033[0m QueueInput/queue_size: 50[0m
[34m[1,0]<stdout>:#033[32m[0405 16:00:45 @monitor.py:469


2020-04-05 16:08:25 Completed - Training job completed
Training seconds: 3027
Billable seconds: 772
Managed Spot Training savings: 74.5%
