<a href="https://colab.research.google.com/github/hailusong/colab-god-idclass/blob/master/god_idclass_gcs_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GCS Setup: Detection Model

Environment variables setup.<br>
**Tensorflow runtime version list** can be found at [here](https://cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list)

In [0]:
DEFAULT_HOME='/content'
TF_RT_VERSION='1.13'
PYTHON_VERSION='3.5'

YOUR_GCS_BUCKET='id-norm'
YOUR_PROJECT='orbital-purpose-130316'

Select the right model from [this official list](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md):

| model | dataset | datetime | notes |
| - |  - | - | - |
| ssd_inception_v2 | coco | 2018_01_28 | |
| ~~ssd_inception_v3~~ | ~~pets~~ | ~~11_06_2017~~ | |
| ssd_mobilenet_v2 | coco | 2018_03_29 | |
| faster_rcnn_resnet101 | coco | 11_06_2017 | |

In [0]:
MODEL_NAME = 'ssd_mobilenet_v2'
PRETRAINED_DATASET = 'coco'
PRETRAINED_TS = '2018_03_29'
PRETRAINED_MODEL_NAME = f'{MODEL_NAME}_{PRETRAINED_DATASET}_{PRETRAINED_TS}'
PIPELINE_CONFIG_NAME = f'pipeline_{MODEL_NAME}'

## Session and Environment Verification (Destination - Local)

Establish security session with Google Cloud

In [0]:
from google.colab import auth
auth.authenticate_user()


################# RE-RUN ABOVE CELLS IF NEED TO RESTART RUNTIME #################

Verify Versions: TF, Python, IPython and prompt_toolkit (these two need to have compatible version), and protoc

In [4]:
import tensorflow as tf
print(tf.__version__)
assert(tf.__version__.startswith(TF_RT_VERSION + '.')), f'tf.__version__ {tf.__version__} not matching with specified TF runtime version env variable {TF_RT_VERSION}'

1.13.1


In [0]:
!python -V
!ipython --version
!pip show prompt_toolkit
!protoc --version

Python 3.6.7
5.5.0
Name: prompt-toolkit
Version: 1.0.15
Summary: Library for building powerful interactive command lines in Python
Home-page: https://github.com/jonathanslenders/python-prompt-toolkit
Author: Jonathan Slenders
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/local/lib/python3.6/dist-packages
Requires: wcwidth, six
Required-by: jupyter-console, ipython
libprotoc 3.0.0


## Install Google Object Detection API in Colab
Reference is https://colab.research.google.com/drive/1kHEQK2uk35xXZ_bzMUgLkoysJIWwznYr


### Downgrade prompt-toolkit to 1.0.15 (Destination - Local)
Run this **ONLY** if the Installation not Working

In [0]:
# !pip install 'prompt-toolkit==1.0.15'

### Google Object Detection API Installation (Destination - Local)

In [10]:
!apt-get install -y -qq protobuf-compiler python-pil python-lxml
![ ! -e {DEFAULT_HOME}/models ] && git clone --depth=1 --quiet https://github.com/tensorflow/models.git {DEFAULT_HOME}/models
!ls {DEFAULT_HOME}/models

AUTHORS     CONTRIBUTING.md    LICENSE	 README.md  samples    WORKSPACE
CODEOWNERS  ISSUE_TEMPLATE.md  official  research   tutorials


In [11]:
import os
os.chdir(f'{DEFAULT_HOME}/models/research')
!pwd

/content/models/research


*From Wikipedia ...*: 

**protocol buffers** are a language-neutral, platform-neutral extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. 

You define how you want your data to be structured once, then you can **use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages**.

Remember **.proto defines structured data** and **protoc generates the source code** the serailize/de-serialize.

In [12]:
!protoc object_detection/protos/*.proto --python_out=.
# !ls object_detection/protos/*.proto
# !cat object_detection/protos/anchor_generator.proto
!ls {DEFAULT_HOME}/models/research/object_detection/builders/anchor*

/content/models/research/object_detection/builders/anchor_generator_builder.py
/content/models/research/object_detection/builders/anchor_generator_builder_test.py


#### Add Google Object Detection API into System Path

In [0]:
import sys
sys.path.append(f'{DEFAULT_HOME}/models/research')
sys.path.append(f'{DEFAULT_HOME}/models/research/slim')

Note that ! calls out to a shell (in a **NEW** process), while % affects the **SAME** process associated with the notebook.

Since we append pathes to sys.path, we **HAVE TO** use % command to run the Python

Also it is **IMPORTANT** to have **%matplotlib inline** otherwise %run model_builder_test.py will **cause function attribute error** when accessing matplotlib.pyplot attributes from **iPython's run_line_magic** 

In [0]:
# !find . -name 'inception*' -print
%matplotlib inline

In [17]:
# If see the error 'function' object has no attribute 'called', just run the %matplotlib cell and this cell AGAIN 
%run object_detection/builders/model_builder_test.py

import os
os.chdir(f'{DEFAULT_HOME}')

............s...
----------------------------------------------------------------------
Ran 16 tests in 0.106s

OK (skipped=1)


### Pre-trained Data Prepatation (Destination - GCS)
e.g. pre-trained model weights

Download, unzip and move COCO-pretrained weights data to GCS<br>

In [0]:
import os
os.chdir(f'{DEFAULT_HOME}')
!wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/{PRETRAINED_MODEL_NAME}.tar.gz

--2019-04-09 14:43:53--  http://storage.googleapis.com/download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.141.128, 2607:f8b0:400c:c06::80
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.141.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 187925923 (179M) [application/x-tar]
Saving to: ‘ssd_mobilenet_v2_coco_2018_03_29.tar.gz’


2019-04-09 14:43:54 (179 MB/s) - ‘ssd_mobilenet_v2_coco_2018_03_29.tar.gz’ saved [187925923/187925923]



In [0]:
!ls {DEFAULT_HOME}/{PRETRAINED_MODEL_NAME}.tar.gz

/content/ssd_mobilenet_v2_coco_2018_03_29.tar.gz


In [0]:
![ ! -e {PRETRAINED_MODEL_NAME} ] && tar -xvf {PRETRAINED_MODEL_NAME}.tar.gz
!gsutil cp {PRETRAINED_MODEL_NAME}/model.ckpt.* gs://{YOUR_GCS_BUCKET}/data_{MODEL_NAME}/

ssd_mobilenet_v2_coco_2018_03_29/checkpoint
ssd_mobilenet_v2_coco_2018_03_29/model.ckpt.meta
ssd_mobilenet_v2_coco_2018_03_29/pipeline.config
ssd_mobilenet_v2_coco_2018_03_29/saved_model/saved_model.pb
ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb
ssd_mobilenet_v2_coco_2018_03_29/saved_model/
ssd_mobilenet_v2_coco_2018_03_29/saved_model/variables/
ssd_mobilenet_v2_coco_2018_03_29/model.ckpt.index
ssd_mobilenet_v2_coco_2018_03_29/
ssd_mobilenet_v2_coco_2018_03_29/model.ckpt.data-00000-of-00001
Copying file://ssd_mobilenet_v2_coco_2018_03_29/model.ckpt.data-00000-of-00001 [Content-Type=application/octet-stream]...
Copying file://ssd_mobilenet_v2_coco_2018_03_29/model.ckpt.index [Content-Type=application/octet-stream]...
Copying file://ssd_mobilenet_v2_coco_2018_03_29/model.ckpt.meta [Content-Type=application/octet-stream]...
\ [3 files][ 67.7 MiB/ 67.7 MiB]                                                
Operation completed over 3 objects/67.7 MiB.                           

## Configuring the Object Detection Pipeline (Destination - GCS)

In [5]:
![ -e {DEFAULT_HOME}/colab-god-idclass ] && git -C {DEFAULT_HOME}/colab-god-idclass pull
![ ! -e {DEFAULT_HOME}/colab-god-idclass ] && git clone --depth=1 https://github.com/hailusong/colab-god-idclass.git {DEFAULT_HOME}/colab-god-idclass

Cloning into '/content/colab-god-idclass'...
remote: Enumerating objects: 26, done.[K
remote: Counting objects: 100% (26/26), done.[K
remote: Compressing objects: 100% (25/25), done.[K
remote: Total 26 (delta 8), reused 8 (delta 0), pack-reused 0[K
Unpacking objects: 100% (26/26), done.


In [0]:
!ls -al {DEFAULT_HOME}/colab-god-idclass/configs/{PIPELINE_CONFIG_NAME}.config
!sed 's/..YOUR_GCS_BUCKET./{YOUR_GCS_BUCKET}/g' < {DEFAULT_HOME}/colab-god-idclass/configs/{PIPELINE_CONFIG_NAME}.config > {DEFAULT_HOME}/colab-god-idclass/configs/{PIPELINE_CONFIG_NAME}_processed.config
!gsutil cp {DEFAULT_HOME}/colab-god-idclass/configs/{PIPELINE_CONFIG_NAME}_processed.config \
           {DEFAULT_HOME}/colab-god-idclass/configs/label_map.pbtxt \
           gs://{YOUR_GCS_BUCKET}/data_{MODEL_NAME}

-rw-r--r-- 1 root root 5339 Apr  9 14:37 /content/colab-god-idclass/configs/pipeline_ssd_mobilenet_v2.config
Copying file:///content/colab-god-idclass/configs/pipeline_ssd_mobilenet_v2_processed.config [Content-Type=application/octet-stream]...
Copying file:///content/colab-god-idclass/configs/label_map.pbtxt [Content-Type=application/octet-stream]...
/ [2 files][  5.3 KiB/  5.3 KiB]                                                
Operation completed over 2 objects/5.3 KiB.                                      


## Convert Our Label CSV Data to TF Record and Upload
Source code is based on https://github.com/datitran/raccoon_dataset/blob/master/generate_tfrecord.py

In [6]:
%pdb

Automatic pdb calling has been turned ON


In [7]:
# Download CSV files first
!gsutil cp gs://{YOUR_GCS_BUCKET}/train-merged.csv {DEFAULT_HOME}/train-merged.csv
!gsutil cp gs://{YOUR_GCS_BUCKET}/valid-merged.csv {DEFAULT_HOME}/valid-merged.csv

Copying gs://id-norm/train-merged.csv...
/ [1 files][ 48.2 KiB/ 48.2 KiB]                                                
Operation completed over 1 objects/48.2 KiB.                                     
Copying gs://id-norm/valid-merged.csv...
/ [1 files][ 11.9 KiB/ 11.9 KiB]                                                
Operation completed over 1 objects/11.9 KiB.                                     


In [0]:
import os
os.chdir(f'{DEFAULT_HOME}')

!head {DEFAULT_HOME}/train-merged.csv
!mkdir -p {DEFAULT_HOME}/coversion
!git -C {DEFAULT_HOME}/colab-god-idclass pull

# Train records first
%run {DEFAULT_HOME}/colab-god-idclass/src/generate_tfrecord.py --csv_input={DEFAULT_HOME}/train-merged.csv --output_path={DEFAULT_HOME}/coversion/train.record

filename,bbox1_x1,bbox1_y1,bbox1_x2,bbox1_y2,label
generated/Train/non-id1/0.png,10,5,143,93,UNKNOWN
generated/Train/non-id1/1.png,15,0,126,74,UNKNOWN
generated/Train/non-id1/2.png,40,23,119,76,UNKNOWN
generated/Train/non-id1/3.png,20,51,246,202,UNKNOWN
generated/Train/non-id1/4.png,15,33,129,109,UNKNOWN
generated/Train/non-id1/5.png,38,43,114,94,UNKNOWN
generated/Train/non-id1/6.png,51,10,223,125,UNKNOWN
generated/Train/non-id1/7.png,38,48,198,155,UNKNOWN
generated/Train/non-id1/8.png,38,33,255,178,UNKNOWN
Already up to date.
removing attribute logtostderr
removing attribute alsologtostderr
removing attribute log_dir
removing attribute v
removing attribute verbosity
removing attribute stderrthreshold
removing attribute showprefixforinfo
removing attribute run_with_pdb
removing attribute pdb_post_mortem
removing attribute run_with_profiling
removing attribute profile_file
removing attribute use_cprofile_for_profiling
removing attribute only_check_args
removing attribute test_random_see

NotFoundError: ignored

> [0;32m/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py[0m(533)[0;36m__exit__[0;34m()[0m
[0;32m    530 [0;31m    [0;31m# as there is a reference to status from this from the traceback due to[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    531 [0;31m    [0;31m# raise.[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    532 [0;31m    [0;32mfinally[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 533 [0;31m      [0;32mdel[0m [0mself[0m[0;34m.[0m[0mstatus[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    534 [0;31m    [0;32mreturn[0m [0;32mFalse[0m  [0;31m# False values do not suppress exceptions[0m[0;34m[0m[0;34m[0m[0m
[0m


In [0]:
# Validation records second
!head {DEFAULT_HOME}/valid-merged.csv
%run {DEFAULT_HOME}/colab-god-idclass/src/generate_tfrecord.py --csv_input={DEFAULT_HOME}/valid-merged.csv --output_path={DEFAULT_HOME}/coversion/test.record

In [0]:
!gsutil cp {DEFAULT_HOME}/coversion/train.record {DEFAULT_HOME}/coversion/test.record gs://{YOUR_GCS_BUCKET}/data_{MODEL_NAME}

## Checking Your Google Cloud Storage Bucket

In [0]:
!gsutil ls gs://{YOUR_GCS_BUCKET}/data_{MODEL_NAME}/
!gsutil ls gs://{YOUR_GCS_BUCKET}/generated

gs://id-norm/data_ssd_mobilenet_v2/label_map.pbtxt
gs://id-norm/data_ssd_mobilenet_v2/model.ckpt.data-00000-of-00001
gs://id-norm/data_ssd_mobilenet_v2/model.ckpt.index
gs://id-norm/data_ssd_mobilenet_v2/model.ckpt.meta
gs://id-norm/data_ssd_mobilenet_v2/pipeline_ssd_mobilenet_v2_processed.config
gs://id-norm/generated/bbox-train-non-id1.csv
gs://id-norm/generated/bbox-train-non-id2.csv
gs://id-norm/generated/bbox-train-non-id3.csv
gs://id-norm/generated/bbox-train-on-dl.csv
gs://id-norm/generated/bbox-train-on-hc.csv
gs://id-norm/generated/bbox-valid-non-id1.csv
gs://id-norm/generated/bbox-valid-non-id2.csv
gs://id-norm/generated/bbox-valid-non-id3.csv
gs://id-norm/generated/bbox-valid-on-dl.csv
gs://id-norm/generated/bbox-valid-on-hc.csv
gs://id-norm/generated/pnts-train-non-id1.csv
gs://id-norm/generated/pnts-train-non-id2.csv
gs://id-norm/generated/pnts-train-non-id3.csv
gs://id-norm/generated/pnts-train-on-dl.csv
gs://id-norm/generated/pnts-train-on-hc.csv
gs://id-norm/generated/p