<a href="https://colab.research.google.com/github/hailusong/colab-god-idclass/blob/master/god_idclass.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Custom Train Google Object Detection to Detect ID BBox

In [0]:
DEFAULT_HOME='/content'

## Session and Environment Verification

Establish security session with Google Cloud

In [0]:
from google.colab import auth
auth.authenticate_user()

YOUR_GCS_BUCKET='id-norm'

################# RE-RUN ABOVE CELLS IF NEED TO RESTART RUNTIME #################

Verify Versions: TF, Python, IPython and prompt_toolkit (these two need to have compatible version), and protoc

In [3]:
import tensorflow as tf
print(tf.__version__)

1.13.1


In [15]:
!python -V
!ipython --version
!pip show prompt_toolkit
!protoc --version

Python 3.6.7
5.5.0
Name: prompt-toolkit
Version: 1.0.15
Summary: Library for building powerful interactive command lines in Python
Home-page: https://github.com/jonathanslenders/python-prompt-toolkit
Author: Jonathan Slenders
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/local/lib/python3.6/dist-packages
Requires: wcwidth, six
Required-by: jupyter-console, ipython
libprotoc 3.0.0


## Install Google Object Detection API in Colab
Reference is https://colab.research.google.com/drive/1kHEQK2uk35xXZ_bzMUgLkoysJIWwznYr


### Downgrade prompt-toolkit to 1.0.15
Run this **ONLY** if the Installation not Working

In [0]:
!pip install 'prompt-toolkit==1.0.15'

### Google Object Detection API Installation

In [16]:
!apt-get install -y -qq protobuf-compiler python-pil python-lxml
![ ! -e {DEFAULT_HOME}/models ] && git clone --depth=1 --quiet https://github.com/tensorflow/models.git {DEFAULT_HOME}/models
!ls {DEFAULT_HOME}/models

AUTHORS     CONTRIBUTING.md    LICENSE	 README.md  samples    WORKSPACE
CODEOWNERS  ISSUE_TEMPLATE.md  official  research   tutorials


In [17]:
import os
os.chdir(f'{DEFAULT_HOME}/models/research')
!pwd

/content/models/research


*From Wikipedia ...*: 

**protocol buffers** are a language-neutral, platform-neutral extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. 

You define how you want your data to be structured once, then you can **use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages**.

Remember **.proto defines structured data** and **protoc generates the source code** the serailize/de-serialize.

In [18]:
!protoc object_detection/protos/*.proto --python_out=.
# !ls object_detection/protos/*.proto
# !cat object_detection/protos/anchor_generator.proto
!ls {DEFAULT_HOME}/models/research/object_detection/builders/anchor*

/content/models/research/object_detection/builders/anchor_generator_builder.py
/content/models/research/object_detection/builders/anchor_generator_builder_test.py


In [0]:
import sys
sys.path.append(f'{DEFAULT_HOME}/models/research')
sys.path.append(f'{DEFAULT_HOME}/models/research/slim')

Note that ! calls out to a shell (in a **NEW** process), while % affects the **SAME** process associated with the notebook.

Since we append pathes to sys.path, we **HAVE TO** use % command to run the Python

Also it is **IMPORTANT** to have **%matplotlib inline** otherwise %run model_builder_test.py will **cause function attribute error** when accessing matplotlib.pyplot attributes from **iPython's run_line_magic** 

In [0]:
# !find . -name 'inception*' -print
%matplotlib inline

In [24]:
# If see the error 'function' object has no attribute 'called', just run the %matplotlib cell and this cell AGAIN 
%run object_detection/builders/model_builder_test.py

............s...
----------------------------------------------------------------------
Ran 16 tests in 0.137s

OK (skipped=1)


### Pre-trained Data Prepatation
e.g. pre-trained model weights

Download, unzip and move COCO-pretrained weights data to GCS

In [25]:
import os
os.chdir(f'{DEFAULT_HOME}')
!wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz

--2019-03-18 20:10:01--  http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.141.128, 2607:f8b0:400c:c06::80
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.141.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 595490113 (568M) [application/x-tar]
Saving to: ‘faster_rcnn_resnet101_coco_11_06_2017.tar.gz’


2019-03-18 20:10:05 (179 MB/s) - ‘faster_rcnn_resnet101_coco_11_06_2017.tar.gz’ saved [595490113/595490113]



In [26]:
![ ! -e faster_rcnn_resnet101_coco_11_06_2017 ] && tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
!gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://{YOUR_GCS_BUCKET}/data/

faster_rcnn_resnet101_coco_11_06_2017/
faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.index
faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.meta
faster_rcnn_resnet101_coco_11_06_2017/frozen_inference_graph.pb
faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.data-00000-of-00001
faster_rcnn_resnet101_coco_11_06_2017/graph.pbtxt
Copying file://faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.data-00000-of-00001 [Content-Type=application/octet-stream]...
==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because

## Configuring the Object Detection Pipeline

In [5]:
![ -e {DEFAULT_HOME}/colab-god-idclass ] && git -C {DEFAULT_HOME}/colab-god-idclass pull
![ ! -e {DEFAULT_HOME}/colab-god-idclass ] && git clone --depth=1 https://github.com/hailusong/colab-god-idclass.git {DEFAULT_HOME}/colab-god-idclass
!ls -al {DEFAULT_HOME}/colab-god-idclass/configs/faster_rcnn_resnet101_pets.config

remote: Enumerating objects: 7, done.[K
remote: Counting objects:  14% (1/7)   [Kremote: Counting objects:  28% (2/7)   [Kremote: Counting objects:  42% (3/7)   [Kremote: Counting objects:  57% (4/7)   [Kremote: Counting objects:  71% (5/7)   [Kremote: Counting objects:  85% (6/7)   [Kremote: Counting objects: 100% (7/7)   [Kremote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (1/1)   [Kremote: Compressing objects: 100% (1/1), done.[K
remote: Total 4 (delta 2), reused 4 (delta 2), pack-reused 0[K
Unpacking objects:  25% (1/4)   Unpacking objects:  50% (2/4)   Unpacking objects:  75% (3/4)   Unpacking objects: 100% (4/4)   Unpacking objects: 100% (4/4), done.
From https://github.com/hailusong/colab-god-idclass
   3786b9f..b97b73e  master     -> origin/master
Updating 3786b9f..b97b73e
Fast-forward
 src/generate_tfrecord.py | 2 [32m+[m[31m-[m
 1 file changed, 1 insertion(+), 1 deletion(-)
-rw-r--r-- 1 root root 3735 Mar 18 20:10 /cont

In [28]:
!gsutil cp {DEFAULT_HOME}/colab-god-idclass/configs/faster_rcnn_resnet101_pets.config gs://{YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

Copying file:///content/colab-god-idclass/configs/faster_rcnn_resnet101_pets.config [Content-Type=application/octet-stream]...
/ [1 files][  3.7 KiB/  3.7 KiB]                                                
Operation completed over 1 objects/3.7 KiB.                                      


### Checking Your Google Cloud Storage Bucket

In [29]:
!gsutil ls gs://{YOUR_GCS_BUCKET}/data
!gsutil ls gs://{YOUR_GCS_BUCKET}/generated

gs://id-norm/data/faster_rcnn_resnet101_pets.config
gs://id-norm/data/model.ckpt.data-00000-of-00001
gs://id-norm/data/model.ckpt.index
gs://id-norm/data/model.ckpt.meta
gs://id-norm/generated/bbox-train-non-id1.csv
gs://id-norm/generated/bbox-train-non-id2.csv
gs://id-norm/generated/bbox-train-non-id3.csv
gs://id-norm/generated/bbox-train-on-dl.csv
gs://id-norm/generated/bbox-train-on-hc.csv
gs://id-norm/generated/bbox-valid-non-id1.csv
gs://id-norm/generated/bbox-valid-non-id2.csv
gs://id-norm/generated/bbox-valid-non-id3.csv
gs://id-norm/generated/bbox-valid-on-dl.csv
gs://id-norm/generated/bbox-valid-on-hc.csv
gs://id-norm/generated/pnts-train-non-id1.csv
gs://id-norm/generated/pnts-train-non-id2.csv
gs://id-norm/generated/pnts-train-non-id3.csv
gs://id-norm/generated/pnts-train-on-dl.csv
gs://id-norm/generated/pnts-train-on-hc.csv
gs://id-norm/generated/pnts-valid-non-id1.csv
gs://id-norm/generated/pnts-valid-non-id2.csv
gs://id-norm/generated/pnts-valid-non-id3.csv
gs://id-norm/g

## Prepare Our Own Data: Download, Convert and Upload

Use Google Cloud SDK gsutil to download the data file

In [30]:
# Download the file.
!gsutil cp gs://{YOUR_GCS_BUCKET}/generated.tar.gz /tmp/generated.tar.gz
!ls /tmp/*gz

Copying gs://id-norm/generated.tar.gz...
\ [1 files][129.7 MiB/129.7 MiB]                                                
Operation completed over 1 objects/129.7 MiB.                                    
/tmp/generated.tar.gz


Prepare the data file (unzip, untar)

In [6]:
![[ ! -f /tmp/generated.tar && -f /tmp/generated.tar.gz ]] && gunzip /tmp/generated.tar.gz
![[ ! -e ./generated && -f /tmp/generated.tar ]] && tar xf /tmp/generated.tar
!ls generated

bbox-train-non-id1.csv	bbox-valid-on-dl.csv	pnts-valid-non-id1.csv
bbox-train-non-id2.csv	bbox-valid-on-hc.csv	pnts-valid-non-id2.csv
bbox-train-non-id3.csv	merged.csv		pnts-valid-non-id3.csv
bbox-train-on-dl.csv	pnts-train-non-id1.csv	pnts-valid-on-dl.csv
bbox-train-on-hc.csv	pnts-train-non-id2.csv	pnts-valid-on-hc.csv
bbox-valid-non-id1.csv	pnts-train-non-id3.csv	Train
bbox-valid-non-id2.csv	pnts-train-on-dl.csv	Valid
bbox-valid-non-id3.csv	pnts-train-on-hc.csv


In [10]:
!head -1 {DEFAULT_HOME}/generated/bbox-train-on-dl.csv | sed 's/^,/filename,/' > {DEFAULT_HOME}/merged.csv
!tail -q --lines=+2 {DEFAULT_HOME}/generated/bbox-*.csv | sed 's/\\/\//g' >> {DEFAULT_HOME}/merged.csv
!ls {DEFAULT_HOME}/generated
!head {DEFAULT_HOME}/merged.csv

bbox-train-non-id1.csv	bbox-valid-on-dl.csv	pnts-valid-non-id1.csv
bbox-train-non-id2.csv	bbox-valid-on-hc.csv	pnts-valid-non-id2.csv
bbox-train-non-id3.csv	merged.csv		pnts-valid-non-id3.csv
bbox-train-on-dl.csv	pnts-train-non-id1.csv	pnts-valid-on-dl.csv
bbox-train-on-hc.csv	pnts-train-non-id2.csv	pnts-valid-on-hc.csv
bbox-valid-non-id1.csv	pnts-train-non-id3.csv	Train
bbox-valid-non-id2.csv	pnts-train-on-dl.csv	Valid
bbox-valid-non-id3.csv	pnts-train-on-hc.csv
filename,bbox1_x1,bbox1_y1,bbox1_x2,bbox1_y2,label
generated/Train/non-id1/0.png,89,71,288,309,UNKNOWN
generated/Train/non-id1/1.png,147,127,427,455,UNKNOWN
generated/Train/non-id1/2.png,91,50,293,266,UNKNOWN
generated/Train/non-id1/3.png,60,133,235,439,UNKNOWN
generated/Train/non-id1/4.png,33,55,134,212,UNKNOWN
generated/Train/non-id1/5.png,45,115,160,334,UNKNOWN
generated/Train/non-id1/6.png,72,26,196,130,UNKNOWN
generated/Train/non-id1/7.png,55,83,195,295,UNKNOWN
generated/Train/non-id1/8.png,34,76,192,337,UNKNOWN


Upload unzip data file to GCS bucket in parallel mode (-m)

In [11]:
!gsutil cp {DEFAULT_HOME}/merged.csv gs://{YOUR_GCS_BUCKET}

Copying file:///content/merged.csv [Content-Type=text/csv]...
/ [1 files][ 62.4 KiB/ 62.4 KiB]                                                
Operation completed over 1 objects/62.4 KiB.                                     


### Convert Our Label CSV Data to TF REcord
Source code is based on https://github.com/datitran/raccoon_dataset/blob/master/generate_tfrecord.py

In [6]:
%pdb

Automatic pdb calling has been turned ON


In [7]:
import os
os.chdir(f'{DEFAULT_HOME}/models/research')

!head {DEFAULT_HOME}/merged.csv
!mkdir -p {DEFAULT_HOME}/coversion
!git -C {DEFAULT_HOME}/colab-god-idclass pull
%run {DEFAULT_HOME}/colab-god-idclass/src/generate_tfrecord.py --csv_input={DEFAULT_HOME}/merged.csv --output_path={DEFAULT_HOME}/coversion/train.record

filename,bbox1_x1,bbox1_y1,bbox1_x2,bbox1_y2,label
generated/Train/non-id1/0.png,89,71,288,309,UNKNOWN
generated/Train/non-id1/1.png,147,127,427,455,UNKNOWN
generated/Train/non-id1/2.png,91,50,293,266,UNKNOWN
generated/Train/non-id1/3.png,60,133,235,439,UNKNOWN
generated/Train/non-id1/4.png,33,55,134,212,UNKNOWN
generated/Train/non-id1/5.png,45,115,160,334,UNKNOWN
generated/Train/non-id1/6.png,72,26,196,130,UNKNOWN
generated/Train/non-id1/7.png,55,83,195,295,UNKNOWN
generated/Train/non-id1/8.png,34,76,192,337,UNKNOWN
Already up to date.
Successfully created the TFRecords: /content/coversion/train.record


In [8]:
!gsutil cp {DEFAULT_HOME}/coversion/train.record gs://{YOUR_GCS_BUCKET}/coversion/train.record

Copying file:///content/coversion/train.record [Content-Type=application/octet-stream]...
\ [1 files][130.4 MiB/130.4 MiB]                                                
Operation completed over 1 objects/130.4 MiB.                                    
