# Tensorflow Object Detection API Preparations

Here we will be preparing the folder for the next notebook. We will:
- Create the folder structure
- Download the model and dataset
- Fix Dataset XML Paths
- Download the config file
- Modify the config file
- Generate CSV files from XML files
- Generate the record file 

In [None]:
import os
from google.colab import drive

### Default Parameters configuration

Below is a list of default parameters to modify in the file.  You will have to leave it as is because i compiled the structure in these notebooks to tthe folders.  Please only modify the train configuration parameters.

Also you can check the [Tensorflow Model Zoo here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for investigating other models.

And the [model configurations are from here](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs), but you need to specify the raw content, here is an example of the [embedded mobilenet file](https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/samples/configs/embedded_ssd_mobilenet_v1_coco.config).

In [None]:
# Trainin configuration 
NUM_CLASSES    =    2
BATCH_SIZE     =    5
NUM_STEPS      = 1000
DECAY_STEPS    =  800
NUM_EXAMPLES   =  236
MAX_EVALS      =    5
NUM_READERS    =    2
TYPE_DESC_NAME = 'ssd_mobilenet_v1'

# Folers
DIR_NAME = 'TensorflowObjectDetectionAPI'                 # The root folder name
#DIR_ROOT_FLD       = 'TFOD_API'                           # The root folder name
WORKON_FLD         = 'object_detection'                   # The folder name that we will be working
TF_RECORD_FLD      = 'records'                            # The tensorflow records folder name
CSV_FLD            = 'annotations'                        # The annotations folder name
GITHUB_FLD         = 'ObjectDetection_SSD_TFOD_API'       # Name of the repo to download
MODEL_FLD          = 'model'                              # Name of the model folder
CONFIG_FLD         = 'config'                             # Name of the configuration folder
LABELS_FLD         = ''                                   # Name of the labels folder

# Files
DATASET_NAME         = 'Dataset'                          # The name of the dataset file to download
MODEL_NAME           = 'ssd_mobilenet_v1_coco_11_06_2017' # The name of the model (on the zoo)
CONFIG_NAME          = 'ssd_mobilenet_v1_coco'            # The pipeline configuration file
TF_RECORD_TRAIN_NAME = 'train'                            # Name only of the training record
TF_RECORD_TEST_NAME  = 'test'                             # Name only of the test record
CSV_TRAIN_INPUT_NAME = 'train_annotations'                # Name only of the csv train annotations file
CSV_TEST_INPUT_NAME  = 'test_annotations'                 # Name only of the csv test annotations file
CKPT_NAME            = 'model'                            # Name only of the checkpoint initial weights file
LABELS_NAME          = 'ssd_fox_badger'                   # Name only of the labels file

In [None]:
# The root folder, i.e. /content/drive/My Drive/TFOD_API
BASE_PATH           = '/content/drive/My\ Drive/' + DIR_ROOT_FLD

# The folder where all files will be, i.e. /content/drive/My Drive/TFOD_API/object_detection
WORKON_HOME_FLD     = os.path.join(BASE_PATH, WORKON_FLD) 

# Downlad the dataset from my github repo
DATASET_FILE        =  DATASET_NAME + '.zip'
DATASET_PATH        = 'https://github.com/issaiass/ObjectDetection_Retinanet/raw/master/' + DATASET_FILE

# Download the object detection model, i.e from 
# http://download.tensorflow.org/models/object_detection/ssd_mobielnet_v1_coco_11_06_2017.tar.gz
MODEL_FILE          = MODEL_NAME + '.tar.gz'
MODEL_DOWNLOAD_PATH = 'http://download.tensorflow.org/models/object_detection/' + MODEL_FILE

# Download the configuration pipeline file, i.e. from
# https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config 
CONFIG_FILE           = CONFIG_NAME + '.config'
CONFIG_PATH           = os.path.join(CONFIG_FLD, CONFIG_FILE)
#CONFIG_FULL_PATH      = os.path.join(WORKON_HOME_FLD, CONFIG_PATH)
CONFIG_DOWNLOAD_PATH  = 'https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/samples/configs/' + CONFIG_FILE

# The paths of the train and test files, i.e. /content/drive/My Drive/TFOD_API/object_detection/record/train_annotations.csv
CSV_TRAIN_INPUT_FILE = CSV_TRAIN_INPUT_NAME + '.csv'
CSV_TEST_INPUT_FILE  = CSV_TEST_INPUT_NAME + '.csv'
CSV_TRAIN_INPUT_PATH = os.path.join(CSV_FLD, CSV_TRAIN_INPUT_FILE)
CSV_TEST_INPUT_PATH  = os.path.join(CSV_FLD, CSV_TEST_INPUT_FILE)

# The tf record name, i.e.  train.record
TF_RECORD_TRAIN_FILE     = TF_RECORD_TRAIN_NAME + '.record'
TF_RECORD_TEST_FILE      = TF_RECORD_TEST_NAME + '.record'

# The relative path of the record name, i.e <WORKON_HOME_FLD>/record/train_annotations.csv
TF_RECORD_TRAIN_OUTPUT_PATH  = os.path.join(TF_RECORD_FLD, TF_RECORD_TRAIN_FILE)
TF_RECORD_TEST_OUTPUT_PATH   = os.path.join(TF_RECORD_FLD, TF_RECORD_TEST_FILE)

# The github repository of the dataset
GITHUB_BASE = 'https://github.com/issaiass/'
GITHUB_REPO = GITHUB_BASE + GITHUB_FLD 

# The name and paths of the ckpt file to modify
CKPT_FILE      = CKPT_NAME + '.ckpt'
CKPT_DFLT_FILE = CKPT_FILE + '.data-00000-of-00001'
CKPT_DFLT_PATH = os.path.join(MODEL_FLD, CKPT_DFLT_FILE)
CKPT_PATH      = os.path.join(MODEL_FLD, CKPT_FILE)
CKPT_FLD       = MODEL_FLD                          

# Labels
LABELS_FILE = LABELS_NAME + '.pbtxt'
LABELS_PATH = os.path.join(LABELS_FLD, LABELS_FILE)

### Mounting the drive

We will mount the google drive folder to use it as a persistent storage of generated data.

In [None]:
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


### Make the storage folder

Here we will make on our drive the main folder to not screw all other file structure in colab.

In [None]:
%mkdir $BASE_PATH
%cd $BASE_PATH

/content/drive/My Drive/TFOD_API


### Downlaod the repository

We here only download the repository from github that has some files we need and change to that folder.

Because Tensorflow has other frameworks than the Object Detection is good to do that way.

Finally we enter in a folder i named, as default **object_detection**

In [None]:
!git clone $GITHUB_REPO
!mv $GITHUB_FLD $WORKON_FLD # change name
%cd $WORKON_FLD

Cloning into 'ObjectDetection_SSD_TFOD_API'...
remote: Enumerating objects: 19, done.[K
remote: Counting objects: 100% (19/19), done.[K
remote: Compressing objects: 100% (14/14), done.[K
remote: Total 19 (delta 5), reused 19 (delta 5), pack-reused 0[K
Unpacking objects: 100% (19/19), done.
/content/drive/My Drive/TFOD_API/object_detection


### Download Dataset

We will be using the Retinanet Dataset of fox and badger.

In [None]:
# Pick up the dataset and unzip
!wget $DATASET_PATH
!unzip $DATASET_FILE
!rm -rf $DATASET_FILE

--2020-06-28 16:15:08--  https://github.com/issaiass/ObjectDetection_Retinanet/raw/master/Dataset.zip
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/issaiass/ObjectDetection_Retinanet/master/Dataset.zip [following]
--2020-06-28 16:15:09--  https://raw.githubusercontent.com/issaiass/ObjectDetection_Retinanet/master/Dataset.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20516736 (20M) [application/zip]
Saving to: ‘Dataset.zip’


2020-06-28 16:15:09 (54.2 MB/s) - ‘Dataset.zip’ saved [20516736/20516736]

Archive:  Dataset.zip
   creating: Dataset/
   creating: Dataset/Test/
  inflating: Dataset

### Fetch Tensorflow Base Model

Download the tenorflow model, by default i made it to be mobilenet and exluded som unnecesary files.

In [None]:
# Pick up the model and unzip
!wget $MODEL_DOWNLOAD_PATH

!tar xvzf $MODEL_FILE --exclude='saved_model' --exclude='pipeline.config' --exclude='graph.pbtxt'
!mv $MODEL_NAME $MODEL_FLD 
 --exclude='graph.pbtxt'
!mv $CKPT_DFLT_PATH $CKPT_PATH
!rm -rf $MODEL_FILE

--2020-06-28 16:15:22--  http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 74.125.20.128, 2607:f8b0:400e:c07::80
Connecting to download.tensorflow.org (download.tensorflow.org)|74.125.20.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 128048406 (122M) [application/x-tar]
Saving to: ‘ssd_mobilenet_v1_coco_11_06_2017.tar.gz’


2020-06-28 16:15:26 (35.8 MB/s) - ‘ssd_mobilenet_v1_coco_11_06_2017.tar.gz’ saved [128048406/128048406]

ssd_mobilenet_v1_coco_11_06_2017/
ssd_mobilenet_v1_coco_11_06_2017/model.ckpt.index
ssd_mobilenet_v1_coco_11_06_2017/model.ckpt.meta
ssd_mobilenet_v1_coco_11_06_2017/frozen_inference_graph.pb
ssd_mobilenet_v1_coco_11_06_2017/model.ckpt.data-00000-of-00001


### Download the configuration file

Here we download the configuration pipeline file in the folder config.  Check above to see other configurations.

In [None]:
# Pick up the config file
!curl $CONFIG_DOWNLOAD_PATH --create-dirs -o $CONFIG_PATH

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  4713  100  4713    0     0  21819      0 --:--:-- --:--:-- --:--:-- 21920


### Check the configuration file

We will get the output of the file to see what we will change.

In [None]:
!cat $CONFIG_PATH

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 90
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect

### Change specific lines

Here i put some examples of lines to change, comment the ones that you want to exclude

In [None]:
# After excecuting this line...
# Excecute the line below with grep <name> to see if change was applied...
# Or excecute the line above again to see the full file

!sed -i 's/num_classes: 90/num_classes: {NUM_CLASSES}/' $CONFIG_PATH      # classes to detect
!sed -i 's/ssd_mobilenet_v1/{TYPE_DESC_NAME}/' $CONFIG_PATH               # typedescriptor name
!sed -i 's/batch_size: 24/batch_size: {BATCH_SIZE}/' $CONFIG_PATH         # minibatch for the trainer
!sed -i 's/PATH_TO_BE_CONFIGURED\/model.ckpt/{CKPT_FLD}\/{CKPT_FILE}/' $CONFIG_PATH
!sed -i 's/num_steps: 200000/num_steps: {NUM_STEPS}/' $CONFIG_PATH        # Training = num_steps*num_examples ???
!sed -i 's/decay_steps: 800720/decay_steps: {DECAY_STEPS}/' $CONFIG_PATH  # (80% or 90%)*num_steps
!sed -i '175s/PATH_TO_BE_CONFIGURED\/mscoco_train.record-?????-of-00100/{TF_RECORD_FLD}\/{TF_RECORD_TRAIN_FILE}/' $CONFIG_PATH
!sed -i 's/PATH_TO_BE_CONFIGURED\/mscoco_label_map.pbtxt/{LABELS_FILE}/' $CONFIG_PATH
!sed -i '189s/PATH_TO_BE_CONFIGURED\/mscoco_val.record-?????-of-00010/{TF_RECORD_FLD}\/{TF_RECORD_TEST_FILE}/' $CONFIG_PATH
!sed -i 's/num_examples: 8000/num_examples: {NUM_EXAMPLES}/' $CONFIG_PATH # Number of training samples
!sed -i 's/max_evals: 10/max_evals: {MAX_EVALS}/' $CONFIG_PATH            # Number or round of the evaluator
!sed -i 's/num_readers: 1/num_readers: {NUM_READERS}/' $CONFIG_PATH       # Number of readers 

### Verify writing

Print out the specific lines so we can view the changes.

In [None]:
# Enable what you want to verify

!cat $CONFIG_PATH | grep num_classes
!cat $CONFIG_PATH | grep type
!cat $CONFIG_PATH | grep batch_size
!cat $CONFIG_PATH | grep fine_tune_checkpoint
!cat $CONFIG_PATH | grep num_steps
!cat $CONFIG_PATH | grep decay_steps
!cat $CONFIG_PATH | grep input_path     # first is the train record, second is the test record
!cat $CONFIG_PATH | grep label_map_path # first is the train labels, second is the test labels, but always are the same
!cat $CONFIG_PATH | grep num_examples
!cat $CONFIG_PATH | grep max_evals
!cat $CONFIG_PATH | grep num_readers

    num_classes: 2
      type: 'ssd_mobilenet_v1'
        loss_type: CLASSIFICATION
  batch_size: 5
# Users should configure the fine_tune_checkpoint field in the train config as
  fine_tune_checkpoint: "model/model.ckpt"
  num_steps: 1000
          decay_steps: 800
# well as the label_map_path and input_path fields in the train_input_reader and
    input_path: "records/train.record"
    input_path: "records/test.record"
# well as the label_map_path and input_path fields in the train_input_reader and
  label_map_path: "ssd_fox_badger.pbtxt"
  label_map_path: "ssd_fox_badger.pbtxt"
  num_examples: 236
  max_evals: 5
  num_readers: 2


### Create the labels file

This file is fixed to the badger and fox dataset.  It will create the *.pbtxt file.

In [None]:
!python helpers/create_pbtxt_labels_file_2.py

### Change XML Path

Probably you have your absolute path, here i only ensure we get the correct path fixing it.

### Create the annotations file

Some call them labels file, but for me are the complete class/box annotations file.  These files as default I stored in annotations folder.

In [None]:
!python helpers/create_csv_from_xml_3.py

[INFO] - Gathering XML Training Files
[INFO] - Gathering XML Testing Files
[INFO] - Make the array of XML Files
[INFO] - Fixing the paths of the Train XML Files
[INFO] - New path of Train set is = Dataset/Train/asian-badger.jpg
[INFO] - New path of Train set is = Dataset/Train/baby-badger-drinking-figurine.jpg
[INFO] - New path of Train set is = Dataset/Train/baby-fox.jpg
[INFO] - New path of Train set is = Dataset/Train/badger (1).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (10).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (11).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (12).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (2).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (3).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (4).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (5).jpg
[INFO] - New path of Train set is = Dataset/Train/badger (6).jpg
[INFO] - New path of Train set is = D

### Compile tools

The neccesary tools for compile and use in Tensorflow Object Detection API

In [None]:
%cd /content
!apt-get install protobuf-compiler python-pil python-lxml python-tk
!pip install cython jupyter matplotlib tf_slim
!git clone https://github.com/tensorflow/models.git
%cd /content/models/research
!protoc object_detection/protos/*.proto --python_out=.
!python setup.py build
!python setup.py install
%set_env PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/models:`pwd`/models/research:`pwd`/model/research/slim
!python object_detection/builders/model_builder_test.py
%cd $WORKON_HOME_FLD

/content
Reading package lists... Done
Building dependency tree       
Reading state information... Done
protobuf-compiler is already the newest version (3.0.0-9.1ubuntu1).
python-tk is already the newest version (2.7.17-1~18.04).
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  python-bs4 python-chardet python-html5lib python-olefile
  python-pkg-resources python-six python-webencodings
Suggested packages:
  python-genshi python-lxml-dbg python-lxml-doc python-pil-doc python-pil-dbg
  python-setuptools
The following NEW packages will be installed:
  python-bs4 python-chardet python-html5lib python-lxml python-olefile
  python-pil python-pkg-resources python-six python-webencodings
0 upgraded, 9 newly installed, 0 to remove and 59 not upgraded.
Need to get 1,818 kB of archives.
After this operation, 7,685 kB of additional disk space will be used.


#

### Creating the records file

Here we will create the Tensorflow records file for future training.  Next Notebook.

In [None]:
!python helpers/generate_tfrecord_4.py --csv_input=$CSV_TRAIN_INPUT_PATH --output_path=$TF_RECORD_TRAIN_FILE
!python helpers/generate_tfrecord_4.py --csv_input=$CSV_TEST_INPUT_PATH --output_path=$TF_RECORD_TEST_FILE

2020-06-28 16:17:01.559042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Successfully created the TFRecords: /content/drive/My Drive/TFOD_API/object_detection/train.record
2020-06-28 16:17:05.255247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Successfully created the TFRecords: /content/drive/My Drive/TFOD_API/object_detection/test.record


### Conclusion

Check the tree of the project, you now have the complete structure to go over the next notebook **Tensorflow Object Detection API Training**.