<a href="https://colab.research.google.com/github/hailusong/colab-god-idclass/blob/master/god_idclass.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Custom Train Google Object Detection to Detect ID BBox

In [0]:
DEFAULT_HOME='/content'

## Environment Preparation

Establish security session with Google Cloud

In [0]:
from google.colab import auth
auth.authenticate_user()

Use Google Cloud SDK gsutil to download the data file

In [3]:
# Download the file.
YOUR_GCS_BUCKET='id-norm'
!gsutil cp gs://{YOUR_GCS_BUCKET}/generated.tar.gz /tmp/generated.tar.gz
!ls /tmp/*gz

Copying gs://id-norm/generated.tar.gz...
- [1 files][129.7 MiB/129.7 MiB]                                                
Operation completed over 1 objects/129.7 MiB.                                    
/tmp/generated.tar.gz


Prepare the data file (unzip, untar)

In [4]:
![[ ! -f /tmp/generated.tar && -f /tmp/generated.tar.gz ]] && gunzip /tmp/generated.tar.gz
![[ ! -e ./generated && -f /tmp/generated.tar ]] && tar xf /tmp/generated.tar
!ls generated

bbox-train-non-id1.csv	bbox-valid-on-dl.csv	pnts-valid-non-id2.csv
bbox-train-non-id2.csv	bbox-valid-on-hc.csv	pnts-valid-non-id3.csv
bbox-train-non-id3.csv	pnts-train-non-id1.csv	pnts-valid-on-dl.csv
bbox-train-on-dl.csv	pnts-train-non-id2.csv	pnts-valid-on-hc.csv
bbox-train-on-hc.csv	pnts-train-non-id3.csv	Train
bbox-valid-non-id1.csv	pnts-train-on-dl.csv	Valid
bbox-valid-non-id2.csv	pnts-train-on-hc.csv
bbox-valid-non-id3.csv	pnts-valid-non-id1.csv


Verify Versions: TF, Python, IPython and prompt_toolkit (these two need to have compatible version), and protoc

In [5]:
import tensorflow as tf
print(tf.__version__)

1.13.1


In [6]:
!python -V
!ipython --version
!pip show prompt_toolkit
!protoc --version

Python 3.6.7
5.5.0
Name: prompt-toolkit
Version: 1.0.15
Summary: Library for building powerful interactive command lines in Python
Home-page: https://github.com/jonathanslenders/python-prompt-toolkit
Author: Jonathan Slenders
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/local/lib/python3.6/dist-packages
Requires: wcwidth, six
Required-by: jupyter-console, ipython
libprotoc 3.0.0


## Install Google Object Detection API in Colab
Reference is https://colab.research.google.com/drive/1kHEQK2uk35xXZ_bzMUgLkoysJIWwznYr


### Downgrade prompt-toolkit to 1.0.15
Run this **ONLY** if the Installation not Working

In [0]:
!pip install 'prompt-toolkit==1.0.15'

### Google Object Detection API Installation

In [11]:
!apt-get install -y -qq protobuf-compiler python-pil python-lxml
![ ! -e {DEFAULT_HOME}/models ] && git clone --depth=1 --quiet https://github.com/tensorflow/models.git {DEFAULT_HOME}/models
!ls {DEFAULT_HOME}/models

AUTHORS     CONTRIBUTING.md    LICENSE	 README.md  samples    WORKSPACE
CODEOWNERS  ISSUE_TEMPLATE.md  official  research   tutorials


In [12]:
import os
os.chdir(f'{DEFAULT_HOME}/models/research')
!pwd

/content/models/research


*From Wikipedia ...*: 

**protocol buffers** are a language-neutral, platform-neutral extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. 

You define how you want your data to be structured once, then you can **use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages**.

Remember **.proto defines structured data** and **protoc generates the source code** the serailize/de-serialize.

In [13]:
!protoc object_detection/protos/*.proto --python_out=.
# !ls object_detection/protos/*.proto
# !cat object_detection/protos/anchor_generator.proto
!ls {DEFAULT_HOME}/models/research/object_detection/builders/anchor*

/content/models/research/object_detection/builders/anchor_generator_builder.py
/content/models/research/object_detection/builders/anchor_generator_builder_test.py


In [0]:
import sys
sys.path.append(f'{DEFAULT_HOME}/models/research')
sys.path.append(f'{DEFAULT_HOME}/models/research/slim')

Note that ! calls out to a shell (in a **NEW** process), while % affects the **SAME** process associated with the notebook.

Since we append pathes to sys.path, we **HAVE TO** use % command to run the Python

Also it is **IMPORTANT** to have **%matplotlib inline** otherwise %run model_builder_test.py will **cause function attribute error** when accessing matplotlib.pyplot attributes from **iPython's run_line_magic** 

In [0]:
# !find . -name 'inception*' -print

In [26]:
%matplotlib inline
%run object_detection/builders/model_builder_test.py

.....................s
----------------------------------------------------------------------
Ran 22 tests in 0.207s

OK (skipped=1)


### Prepare Extra Data
e.g. pre-trained model weights

Download, unzip and move COCO-pretrained weights data to GCS

In [28]:
import os
os.chdir(f'{DEFAULT_HOME}')
!wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz

/content
--2019-03-05 20:24:54--  http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.141.128, 2607:f8b0:400c:c06::80
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.141.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 595490113 (568M) [application/x-tar]
Saving to: ‘faster_rcnn_resnet101_coco_11_06_2017.tar.gz’


2019-03-05 20:24:57 (181 MB/s) - ‘faster_rcnn_resnet101_coco_11_06_2017.tar.gz’ saved [595490113/595490113]



In [31]:
![ ! -e faster_rcnn_resnet101_coco_11_06_2017 ] && tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
!gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://{YOUR_GCS_BUCKET}/data/

Copying file://faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.data-00000-of-00001 [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/425.2 MiB]                                                ==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

Copying file://faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.index [Content-Type=application/octet-stream]...
Cop

## Configuring the Object Detection Pipeline

In [35]:
![ ! -e {DEFAULT_HOME}/configs ] && git clone --depth=1 https://github.com/hailusong/colab-god-idclass.git {DEFAULT_HOME}/configs
!ls -al {DEFAULT_HOME}/configs/configs/faster_rcnn_resnet101_pets.config

-rw-r--r-- 1 root root 3735 Mar  5 20:43 /content/configs/configs/faster_rcnn_resnet101_pets.config


In [37]:
!gsutil cp {DEFAULT_HOME}/configs/configs/faster_rcnn_resnet101_pets.config gs://{YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config

Copying file:///content/configs/configs/faster_rcnn_resnet101_pets.config [Content-Type=application/octet-stream]...
/ [1 files][  3.7 KiB/  3.7 KiB]                                                
Operation completed over 1 objects/3.7 KiB.                                      


### Checking Your Google Cloud Storage Bucket

In [40]:
!gsutil ls gs://{YOUR_GCS_BUCKET}/data

gs://id-norm/data/faster_rcnn_resnet101_pets.config
gs://id-norm/data/model.ckpt.data-00000-of-00001
gs://id-norm/data/model.ckpt.index
gs://id-norm/data/model.ckpt.meta
