# Training model in Google Cloud

The container where this notebook is running has installed Google Cloud SDK.
To train model in a cluster on Google Cloud you must log in. To do this, execute the following command from the console

```
docker exec -it <container name> gcloud init
```

## Config variables

Name of the bucket in Cloud Storage where we save the files

In [1]:
GCS_BUCKET='es_kiff'

Config file name

In [2]:
#CONFIG_FILE = 'rfcn_resnet101GCP.config'
#CONFIG_FILE = 'ssd_mobilenetGCP.config'
CONFIG_FILE = 'faster_rcnn_inception_resnetGCP.config'

Pretraining model name

In [3]:
#PRETRAINING_MODEL = 'faster_rcnn_resnet101_coco_11_06_2017'
#PRETRAINING_MODEL = 'ssd_mobilenet_v1_coco_2018_01_28'
PRETRAINING_MODEL = 'faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28'

Path to train, test and labels files (container path)

In [4]:
trainRecordPath = '/u01/notebooks/TFM/DatasetCreator/out/train.record'
testRecordPath = '/u01/notebooks/TFM/DatasetCreator/out/test.record'
labelsPath = '/u01/notebooks/TFM/DatasetCreator/out/label_map.pbtxt'

## Data and configuration preparation

Upload files to Cloud Storage

In [30]:
!gsutil cp $trainRecordPath gs://$GCS_BUCKET/data/
!gsutil cp $testRecordPath gs://$GCS_BUCKET/data/
!gsutil cp $labelsPath gs://$GCS_BUCKET/data/label_map.pbtxt

Copying file:///u01/notebooks/TFM/DatasetCreator/out/train.record [Content-Type=application/octet-stream]...
==> NOTE: You are uploading one or more large file(s), which would run          
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

- [1 files][246.7 MiB/246.7 MiB]    4.3 MiB/s                                   
Operation completed over 1 objects/246.7 MiB.                                    
Copying file:///u01/notebooks/TFM/DatasetC

Get pretraining model and upload to Cloud Storage

In [17]:
!wget http://download.tensorflow.org/models/object_detection/{PRETRAINING_MODEL}.tar.gz
!tar -xvf {PRETRAINING_MODEL}.tar.gz
!gsutil cp {PRETRAINING_MODEL}/model.ckpt.* gs://$GCS_BUCKET/data/

--2019-05-25 21:18:52--  http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.217.168.176, 2a00:1450:4003:80a::2010
Connecting to download.tensorflow.org (download.tensorflow.org)|172.217.168.176|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 672221478 (641M) [application/x-tar]
Saving to: ‘faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz’


2019-05-25 21:19:27 (18.6 MB/s) - ‘faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz’ saved [672221478/672221478]

faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/
faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/model.ckpt.index
faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/checkpoint
faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/pipeline.config
faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/model.ckpt.data-00000-of-00001
faster_r

Replace paths in config file and upload to Cloud Storage

In [7]:
!sed -i "s|PATH_TO_BE_CONFIGURED|"gs://$GCS_BUCKET"/data|g"  /u01/notebooks/TFM/Configs/$CONFIG_FILE
!gsutil cp /u01/notebooks/TFM/Configs/$CONFIG_FILE gs://$GCS_BUCKET/data/$CONFIG_FILE

Copying file:///u01/notebooks/TFM/Configs/faster_rcnn_inception_resnetGCP.config [Content-Type=application/octet-stream]...
- [1 files][  3.1 KiB/  3.1 KiB]                                                
Operation completed over 1 objects/3.1 KiB.                                      


Change to tesorflow research folder

In [9]:
cd /u01/notebooks/models/research/

/u01/notebooks/models/research


Packaging to run in Cloud ML

In [9]:
!bash object_detection/dataset_tools/create_pycocotools_package.sh /tmp/pycocotools
!python setup.py sdist
!(cd slim && python setup.py sdist)

Cloning into 'cocoapi'...
remote: Enumerating objects: 953, done.[K
remote: Total 953 (delta 0), reused 0 (delta 0), pack-reused 953[K
Receiving objects: 100% (953/953), 11.70 MiB | 9.83 MiB/s, done.
Resolving deltas: 100% (565/565), done.
running sdist
running egg_info
writing object_detection.egg-info/PKG-INFO
writing dependency_links to object_detection.egg-info/dependency_links.txt
writing requirements to object_detection.egg-info/requires.txt
writing top-level names to object_detection.egg-info/top_level.txt
reading manifest file 'object_detection.egg-info/SOURCES.txt'
writing manifest file 'object_detection.egg-info/SOURCES.txt'
running check


creating object_detection-0.1
creating object_detection-0.1/object_detection
creating object_detection-0.1/object_detection.egg-info
creating object_detection-0.1/object_detection/anchor_generators
creating object_detection-0.1/object_detection/box_coders
creating object_detection-0.1/object_detection/builders
creating object_detection-0

copying object_detection/builders/model_builder_test.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/optimizer_builder.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/optimizer_builder_test.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/post_processing_builder.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/post_processing_builder_test.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/preprocessor_builder.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/preprocessor_builder_test.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/region_similarity_calculator_builder.py -> object_detection-0.1/object_detection/builders
copying object_detection/builders/region_similarity_calculator_builder_test.py -> object_detection-0.1/obje

copying object_detection/meta_architectures/ssd_meta_arch_test.py -> object_detection-0.1/object_detection/meta_architectures
copying object_detection/meta_architectures/ssd_meta_arch_test_lib.py -> object_detection-0.1/object_detection/meta_architectures
copying object_detection/metrics/__init__.py -> object_detection-0.1/object_detection/metrics
copying object_detection/metrics/calibration_evaluation.py -> object_detection-0.1/object_detection/metrics
copying object_detection/metrics/calibration_evaluation_test.py -> object_detection-0.1/object_detection/metrics
copying object_detection/metrics/calibration_metrics.py -> object_detection-0.1/object_detection/metrics
copying object_detection/metrics/calibration_metrics_test.py -> object_detection-0.1/object_detection/metrics
copying object_detection/metrics/coco_evaluation.py -> object_detection-0.1/object_detection/metrics
copying object_detection/metrics/coco_evaluation_test.py -> object_detection-0.1/object_detection/metrics
copying

copying object_detection/utils/category_util_test.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/config_util.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/config_util_test.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/context_manager.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/context_manager_test.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/dataset_util.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/dataset_util_test.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/json_utils.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/json_utils_test.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/label_map_util.py -> object_detection-0.1/object_detection/utils
copying object_detection/utils/l

copying nets/nasnet/nasnet_test.py -> slim-0.1/nets/nasnet
copying nets/nasnet/nasnet_utils.py -> slim-0.1/nets/nasnet
copying nets/nasnet/nasnet_utils_test.py -> slim-0.1/nets/nasnet
copying nets/nasnet/pnasnet.py -> slim-0.1/nets/nasnet
copying nets/nasnet/pnasnet_test.py -> slim-0.1/nets/nasnet
copying preprocessing/__init__.py -> slim-0.1/preprocessing
copying preprocessing/cifarnet_preprocessing.py -> slim-0.1/preprocessing
copying preprocessing/inception_preprocessing.py -> slim-0.1/preprocessing
copying preprocessing/lenet_preprocessing.py -> slim-0.1/preprocessing
copying preprocessing/preprocessing_factory.py -> slim-0.1/preprocessing
copying preprocessing/vgg_preprocessing.py -> slim-0.1/preprocessing
copying slim.egg-info/PKG-INFO -> slim-0.1/slim.egg-info
copying slim.egg-info/SOURCES.txt -> slim-0.1/slim.egg-info
copying slim.egg-info/dependency_links.txt -> slim-0.1/slim.egg-info
copying slim.egg-info/top_level.txt -> slim-0.1/slim.egg-info
Writing slim-0.1/setup.cfg
Crea

Run training and validation in Cloud ML

In [10]:
!gcloud ai-platform jobs submit training `whoami`_object_detection_diagrams_`date +%m_%d_%Y_%H_%M_%S` \
    --runtime-version 1.12 \
    --job-dir=gs://$GCS_BUCKET/model_dir \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
    --module-name object_detection.model_main \
    --region us-central1 \
    --config /u01/notebooks/TFM/Configs/cloud.yml \
    -- \
    --model_dir=gs://$GCS_BUCKET/model_dir \
    --pipeline_config_path=gs://$GCS_BUCKET/data/$CONFIG_FILE

Job [root_object_detection_diagrams_05_27_2019_23_27_17] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe root_object_detection_diagrams_05_27_2019_23_27_17

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs root_object_detection_diagrams_05_27_2019_23_27_17
jobId: root_object_detection_diagrams_05_27_2019_23_27_17
state: QUEUED


## Show training process

TensorBoard uses Application Default Credentials to authenticate to Google Cloud. Run in console to login
```
docker exec -it <container name> gcloud auth application-default login
```

Launch tensorboard to view progress of training and eval jobs on Google Cloud 

gcloud auth application-default login

In [12]:
!tensorboard --logdir=gs://es_kiff/model_dir

TensorBoard 1.13.1 at http://134aafc9ad33:6006 (Press CTRL+C to quit)
^C


Now we can open tensorboard page in [http://localhost:6006](http://localhost:6006)

Finally, to export model for inference execute

In [10]:
!python object_detection/export_inference_graph.py \
    --input_type=image_tensor  \
    --pipeline_config_path=gs://$GCS_BUCKET/data/$CONFIG_FILE  \
    --trained_checkpoint_prefix=gs://$GCS_BUCKET/model_dir/model.ckpt-47041  \
    --output_directory=gs://$GCS_BUCKET/outputmodel

Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
Instructions for updating:
Use `tf.profiler.profile(graph, run_meta, op_log, cmd, options)`. Build `options` with `tf.profiler.ProfileOptionBuilder`. See README.md for details
Instructions for updating:
Use tf.compat.v1.graph_util.remove_training_nodes
642 ops no flops stats due to incomplete shapes.
Parsing Inputs...
Incomplete shape.

-max_depth                  10000
-min_bytes                  0
-min_peak_bytes             0
-min_residual_bytes         0
-min_output_bytes           0
-min_micros                 0
-min_accelerator_micros     0
-min_cpu_micros             0
-min_params                 0
-min_float_ops              0
-min_occurrence             0
-step              

642 ops no flops stats due to incomplete shapes.
Parsing Inputs...
Incomplete shape.

-max_depth                  10000
-min_bytes                  0
-min_peak_bytes             0
-min_residual_bytes         0
-min_output_bytes           0
-min_micros                 0
-min_accelerator_micros     0
-min_cpu_micros             0
-min_params                 0
-min_float_ops              1
-min_occurrence             0
-step                       -1
-order_by                   float_ops
-account_type_regexes       .*
-start_name_regexes         .*
-trim_name_regexes          .*BatchNorm.*,.*Initializer.*,.*Regularizer.*,.*BiasAdd.*
-show_name_regexes          .*
-hide_name_regexes          
-account_displayed_op_only  true
-select                     float_ops
-output                     stdout:

Incomplete shape.

Doc:
scope: The nodes in the model graph are organized by their names, which is hierarchical like filesystem.
flops: Number of float operations. Note: Please read the implement

2019-06-02 05:05:13.775637: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-02 05:05:13.800324: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1976240000 Hz
2019-06-02 05:05:13.801194: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55ff4bf018e0 executing computations on platform Host. Devices:
2019-06-02 05:05:13.801241: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_mod

Export model is in 
```
gs://$GCS_BUCKET/outputmodel
```