# Introduction

We will take the following steps to implement Efficientdet-D0 on our custom data:
* Install TensorFlow2 Object Detection Dependencies
* Write Custom TensorFlow2 Object Detection Training Configuation
* Train Custom TensorFlow2 Object Detection Model
* Export Custom TensorFlow2 Object Detection Weights

The inference made with our model can be found in the inference folder.

# Step 1: Install TensorFlow2 Object Detection Dependencies

In [1]:
!pip install gitpython

You should consider upgrading via the '/anaconda/envs/py38_default/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
import os
import pathlib
from git import Repo
import logging

logging.basicConfig(level=logging.INFO)

models_folder = os.path.join(os.getcwd(), "models")

# Clone the tensorflow models repository if it doesn't already exist in this folder
if "models" in pathlib.Path.cwd().parts:
    while "models" in pathlib.Path.cwd().parts:
        os.chdir('..')

    logging.info("The models have already been uploaded. Change working directory to the models folder.")

elif not pathlib.Path('models').exists():
    os.mkdir("./models")
    repo = Repo.clone_from(
        'http://RebSolcia:Clementinabookie18121998!@github.com/tensorflow/models.git',
        models_folder,
        depth=1,
        branch='master',
    )

    logging.info("The models have now been loaded from the tensorflow/models.git repo.")

## Step 1.1: PyCoco library

In [3]:
pycoco_folder = os.path.join(os.getcwd(), "pycoco")

# Clone the pycoco repository if it doesn't exist. It is needed to avoid clashes with the TF2API
if "pycoco" in pathlib.Path.cwd().parts:
    while "pycoco" in pathlib.Path.cwd().parts:
        os.chdir('..')

    logging.info("The models have already been uploaded. Change working directory to the models folder.")

elif not pathlib.Path('pycoco').exists():
    os.mkdir("./pycoco")
    repo = Repo.clone_from(
        'http://RebSolcia:Clementinabookie18121998!@github.com/cocodataset/cocoapi.git',
        pycoco_folder, 
        branch="master"
    )

    logging.info("The models have now been loaded from the coco repo.")

The following steps are needed in order to avoid having problems with Pycoco.

1. Clone the official repository
2. Navigate to the PythonAPI folder and open the setup.py file
3. Edit line 12 to be extra_compile_args=[]. The rationale here is to remove the Clang specific arguments, which don’t work on MVCC.

4. Run the following line


This final command will build and install the package within your current environment, ready to go. To test if the installation succeeded, fire up Python and import it as: import pycocotools.

In [4]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/pycoco/PythonAPI
!python setup.py build_ext --inplace

/home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/pycoco/PythonAPI
running build_ext
skipping 'pycocotools/_mask.c' Cython extension (up-to-date)
copying build/lib.linux-x86_64-3.8/pycocotools/_mask.cpython-38-x86_64-linux-gnu.so -> pycocotools


## Step 1.2: Changes to the Models folder

1. Navigate to “./research/object_detection/packages/tf2/” and edit the setup.py file. From the REQUIRED_PACKAGES list, delete the pycocotools reference (line 20). This change will prevent the installation process from trying to reinstall pycocotools from pip, which would fail and abort the whole process.
2. Copy this setup.py file to the “./research” folder, replacing the setup.py that was already there.
3. Once you're done, run the following line.

In [5]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research
!protoc object_detection/protos/*.proto --python_out=.
%cp /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research/object_detection/packages/tf2/setup.py .
!pip install .

/home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research
Processing /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m


Collecting h5py~=3.1.0
  Using cached h5py-3.1.0-cp38-cp38-manylinux1_x86_64.whl (4.4 MB)
Collecting numpy>=1.15.4
  Using cached numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)


Building wheels for collected packages: object-detection
  Building wheel for object-detection (setup.py) ... [?25ldone
[?25h  Created wheel for object-detection: filename=object_detection-0.1-py3-none-any.whl size=1679343 sha256=1b5e3f8b92dcd941c211464759d37ca68a41226b59cb5b0653353cb51b251ed9
  Stored in directory: /tmp/pip-ephem-wheel-cache-bhu658bj/wheels/a4/27/31/b41a2f9b118ebb35237b34adc3f408b0c60bd7f122d0a7eb79
Successfully built object-detection
Installing collected packages: numpy, h5py, object-detection
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.4
    Uninstalling numpy-1.21.4:
      Successfully uninstalled numpy-1.21.4
  Attempting uninstall: h5py
    Found existing installation: h5py 2.9.0
    Uninstalling h5py-2.9.0:
      Successfully uninstalled h5py-2.9.0
  Attempting uninstall: object-detection
    Found existing installation: object-detection 0.1
    Uninstalling object-detection-0.1:
      Successfully uninstalled object-detection-0.1

## Step 1.3: Uninstall and install h5py

Make sure to uninstall h5py and re-install it in the 2.9 version, because otherwise there might be problems with the training of the model.

In [6]:
!pip uninstall h5py -y

Found existing installation: h5py 3.1.0
Uninstalling h5py-3.1.0:
  Successfully uninstalled h5py-3.1.0


In [7]:
!pip install h5py==2.9

Collecting h5py==2.9
  Using cached h5py-2.9.0-cp38-cp38-manylinux1_x86_64.whl (2.8 MB)
Installing collected packages: h5py
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.6.1 requires h5py~=3.1.0, but you have h5py 2.9.0 which is incompatible.
tensorflow-gpu 2.5.0 requires grpcio~=1.34.0, but you have grpcio 1.41.1 which is incompatible.
tensorflow-gpu 2.5.0 requires h5py~=3.1.0, but you have h5py 2.9.0 which is incompatible.
tensorflow-gpu 2.5.0 requires tensorflow-estimator<2.6.0,>=2.5.0rc0, but you have tensorflow-estimator 2.2.0 which is incompatible.[0m
Successfully installed h5py-2.9.0
You should consider upgrading via the '/anaconda/envs/py38_default/bin/python -m pip install --upgrade pip' command.[0m


# Step 2: Prepare the model for training

Once everything is installed, import all the libraries that are needed and launch a sample training to check that everything works smoothly. 

In [8]:
import matplotlib
import matplotlib.pyplot as plt

import os
import random
import io
import imageio
import glob
import scipy.misc
import numpy as np
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
from IPython.display import display, Javascript
from IPython.display import Image as IPyImage

import tensorflow as tf

from models.research.object_detection.utils import label_map_util
from models.research.object_detection.utils import config_util
from models.research.object_detection.utils import visualization_utils as viz_utils
from models.research.object_detection.builders import model_builder

%matplotlib inline

Run a pip freeze to see whether tensorflow-gpu is installed, and run the test to see everything works smoothly.

In [None]:
#run model builder test
!python /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research/object_detection/builders/model_builder_tf2_test.py

## Step 2.1: Import the data

Change current directory to be sure everything works smoothly. This process of directory change will take place often to ensure code compatibility when constructing paths. 

Eventually, also remember to change the names of the files so that they are compatible with yours. 

In [None]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training

In [None]:
import zipfile

# Set the pictures directory to be the one containing your train and validation folders of interest
picture_files_directory = "/home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/pictures/"

# Set the names of the folders to respect the right path to the TFRecords
test_record_fname = os.path.join(picture_files_directory,"output_tfrecords_v2/valid/merged_logos.tfrecord")
train_record_fname = os.path.join(picture_files_directory,"output_tfrecords_v2/train/merged_logos.tfrecord")
label_map_pbtxt_fname = os.path.join(picture_files_directory, "output_tfrecords_v2/train/logos_label_map.pbtxt")

print(train_record_fname,label_map_pbtxt_fname, sep="\n")

## Step 2.2: Configure Custom TensorFlow2 Object Detection Training Configuration

To be able to use different models, we populated the file ModelSetting.py with the models we thought would be good to train. 

From the ModelZoo, it is possible to pick the pre_trained_checkpoints (extension tar.gz) and the model_name (that must be the same as the one given to the pre_trained_checkpoints):
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

By entering the *raw* version of the ModelZoo page on GitHub, it is possible to pick the the names of the tar.gz.

To find instead the configurations: 
https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2


In [None]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/

In [None]:
!pip install prettyprinter

In [None]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/

In [None]:
# Step 1: import model settings
# For each model, this file returns important info to actually use the model

from ModelSettings import Model_Setting
from prettyprinter import pprint

MODELS_CONFIG = Model_Setting()

pprint(MODELS_CONFIG)

In [None]:
# Step 2: chose the model and extract relevant info

chosen_model = 'efficientdet-d0'

model_name = MODELS_CONFIG[chosen_model]['model_name']
pretrained_checkpoint = MODELS_CONFIG[chosen_model]['pretrained_checkpoint']
base_pipeline_file = MODELS_CONFIG[chosen_model]['base_pipeline_file']
batch_size = MODELS_CONFIG[chosen_model]['batch_size'] #if you can fit a large batch in memory, it may speed up your training

In [None]:
# The more steps, the longer the training. 
# Increase if your loss function is still decreasing and validation metrics are increasing. 
num_steps = 500000

#Perform evaluation after so many steps
num_eval_steps = 3000 

#### EXTRA: Deploy folder structure

Inside the "research" folder, we will create a "deploy" folder in which we will dump all the data related to the model used and its specific configuration. <br>
For this reason, the structure of the deploy folder is as follows:

Deploy:<br>
&nbsp;&nbsp;&nbsp;&nbsp;Model A:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Config 1<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Config 2<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...<br>

This means that, once we choose a model:
1. if there is no folder within "deploy" with the model name, then create it and create the the Config 1 folder within the model folder.
2. if there is a folder with the name of the model, we need to check if the configurtion of the current model is the same as the one in the folder. If not, create a new Config folder.

In [None]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research

In [None]:
# If it does not exist already, create the 'deploy' folder inside training/models/research

main_deploy_folder = '/home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research/deploy'

if "deploy" not in os.listdir(os.getcwd()):
    os.mkdir(main_deploy_folder)

In [None]:
import re

def extract_configs_for_model(chosen_model):
    
    # This is a dict with config folder names as keys and values of the config as values
    folder_to_values = dict()
    
    model_path = os.path.join(main_deploy_folder, chosen_model)
    
    for config_folder in os.listdir(model_path):
        if not config_folder == ".ipynb_checkpoints":
            print(config_folder)
            config_path = os.path.join(model_path, config_folder)

            config_file = os.path.join(config_path,r'pipeline_file.config')
            config_values = list()

            with open(config_file) as f:
                file = f.read()

                # Extract all values except the path of the data
                # This mean that if we train the same config of a model on a different version of the data, this will overwrite the results
                # TODO: add path of the data as well?
                # TODO: add fine tune check points?
                config_values.append(re.search('batch_size: [0-9]+', file).group()[len('batch_size: '):])
                config_values.append(re.search('num_steps: [0-9]+', file).group()[len('num_steps: '):])
                config_values.append(re.search('num_classes: [0-9]+', file).group()[len('num_classes: '):])

            folder_to_values[config_folder] = config_values
        
    return folder_to_values

In [None]:
def get_num_classes(pbtxt_fname):
    from object_detection.utils import label_map_util
    label_map = label_map_util.load_labelmap(pbtxt_fname)
    categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=90, use_display_name=True)
    category_index = label_map_util.create_category_index(categories)
    return len(category_index.keys())

In [None]:
num_classes = get_num_classes(label_map_pbtxt_fname)
print(num_classes)

In [None]:
current_config = [
    str(batch_size),
    str(num_steps),
    str(num_classes)
]

current_config

In [None]:
chosen_model

In [None]:
# If the model has never been used, then create folder for the model and for the current config, the latter inside the former

def update_repo_structure(chosen_model):
    
    model_folder = main_deploy_folder + '/' + chosen_model
    
    # TODO: it has to be folder, not file
    if chosen_model not in os.listdir(main_deploy_folder):
        # Case 1: model never used
        os.mkdir(model_folder)

        config_folder = model_folder + '/config_1'
        os.mkdir(config_folder)

        print('case1')
        print(config_folder)

    else:
        # Case 2: model already used

        list_configs = extract_configs_for_model(chosen_model)
        print(list_configs)

        if current_config in list(list_configs.values()):
            
            # Case A: Specifics configs per model already used
            for key in list(list_configs.keys()):
                if list_configs[key] == current_config:
                    config_folder = key
                    print('case a')
                    print(config_folder)

        else:
            # Case B: new configs
            config_folder = model_folder + f'/config_{len(list_configs)+1}'
            os.mkdir(config_folder)
            print('case b')
            print(config_folder)
            
    return config_folder

In [None]:
# Obtain the proper config folder to use in the next cells 

config_subfolder = update_repo_structure(chosen_model)

In [None]:
config_folder = os.path.join(os.path.join(main_deploy_folder, chosen_model),config_subfolder)
config_folder

In [None]:
# Step 3.a: using info from step 2, download the weights of the model

import tarfile
import requests

download_tar = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/' + pretrained_checkpoint

file_to_be_opened = os.path.join(config_folder, pretrained_checkpoint)

# Unzip the tar.gz
response = requests.get(download_tar, stream=True)
if response.status_code == 200:
    with open(file_to_be_opened, 'wb') as f:
        f.write(response.raw.read())

tar = tarfile.open(file_to_be_opened)
tar.extractall(path=config_folder)
tar.close()

# TODO: once the tar has been extracted, delete the tar file

In [None]:
# Step 3.b: using info from step 2, download base training configuration file

download_config = 'https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/configs/tf2/' + base_pipeline_file

abrir = os.path.join(config_folder, base_pipeline_file)

response = requests.get(download_config, stream=True)
if response.status_code == 200:
    with open(abrir, 'wb') as f:
        f.write(response.content)

In [None]:
pipeline_fname = os.path.join(config_folder, base_pipeline_file)
print(pipeline_fname)

fine_tune_checkpoint = os.path.join(config_folder, model_name,"checkpoint", "ckpt-0")
print(fine_tune_checkpoint)

In [None]:
# Write custom configuration file by slotting our dataset, model checkpoint, and training parameters into 
# the base pipeline file

import re

print('writing custom configuration file')

with open(pipeline_fname) as f:
    s = f.read()

with open(os.path.join(config_folder, r'pipeline_file.config'), 'w') as f:
    
    # fine_tune_checkpoint
    s = re.sub('fine_tune_checkpoint: ".*?"',
               f'fine_tune_checkpoint: "{fine_tune_checkpoint}"', s)
    
    logging.info("Written fine tune checkpoint")
    
    # tfrecord files train and test.
    s = re.sub(
        '(input_path: ".*?)(PATH_TO_BE_CONFIGURED/train)(.*?")', f'input_path: "{train_record_fname}"', s)
    s = re.sub(
        '(input_path: ".*?)(PATH_TO_BE_CONFIGURED/val)(.*?")', f'input_path: "{test_record_fname}"', s)
    
    logging.info("Written input path")

    # label_map_path
    s = re.sub(
        'label_map_path: ".*?"', f'label_map_path: "{label_map_pbtxt_fname}"', s)
    
    logging.info("Written label map")

    # Set training batch_size.
    s = re.sub('batch_size: [0-9]+',
               f'batch_size: {batch_size}', s)

    # Set training steps, num_steps
    s = re.sub('num_steps: [0-9]+',
               f'num_steps: {num_steps}', s)
    
    # Set number of classes num_classes.
    s = re.sub('num_classes: [0-9]+',
               f'num_classes: {num_classes}', s)
    
    # Set number of classes num_classes.
    s = re.sub('learning_rate_base: [a-z.0-9-]+',
               f'learning_rate_base: 0.08', s)
    
    # Set number of classes num_classes.
    s = re.sub('warmup_learning_rate: [a-z.0-9-]+',
               f'warmup_learning_rate: 0.001', s)
    
    #fine-tune checkpoint type
    s = re.sub(
        'fine_tune_checkpoint_type: "classification"', 
        'fine_tune_checkpoint_type: "{}"'.format('detection'), s)
    
    f.write(s)

In [None]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/

In [None]:
pipeline_file = os.path.join(config_folder, 'pipeline_file.config')
pipeline_file

In [None]:
# Create the TENSOR_RESULTS directory, to store our models

if "TENSOR_RESULTS" not in os.listdir(os.getcwd()):
    os.mkdir(os.path.join(os.getcwd(),"TENSOR_RESULTS"))
    logging.info("Creating the directory TENSOR_RESULTS because it did not exist") 
else:
    logging.info("The directory TENSOR_RESULTS is already present, files will be stored there") 

In [None]:
tensor_results_directory = '/home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/TENSOR_RESULTS'

model_run_directory = os.path.join('/home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/TENSOR_RESULTS',
                                  chosen_model)

# Similarly to what we did for the configurations, we generate model results subdirectories inside TENSOR_RESULTS to then
# be able to restore the already-trained checkpoints

if chosen_model not in os.listdir(tensor_results_directory):
    try:
        os.mkdir(os.path.join(tensor_results_directory, chosen_model))
        logging.info(f"The folder model_run_directory is set to be: \n {model_run_directory}")
    except FileExistsError:
        logging.info(f"FILEEXISTSERROR: The folder model_run_directory is set to be: \n {model_run_directory}")
else:
    logging.info(f"The folder model_run_directory WAS ALREADY PRESENT and is set to be: \n {model_run_directory}")

model_dir = os.path.join(model_run_directory, config_subfolder.split("/")[-1])

if config_subfolder.split("/")[-1] not in os.listdir(model_run_directory):
    try:
        os.mkdir(os.path.join(model_run_directory, config_subfolder))
        logging.info(f"The folder model_dir is set to be: \n {model_dir}")
    except FileExistsError:
        logging.info(f"FILEEXISTSERROR: The folder model_dir WAS ALREADY PRESENT and is set to be: \n {model_dir}")
else:
    logging.info(f"The folder model_dir WAS ALREADY PRESENT and is set to be: \n {model_dir}")

# Step 3: Train Custom TF2 Object Detector

With this information, we can start training the model:

* pipeline_file: defined above in writing custom training configuration
* model_dir: the location tensorboard logs and saved model checkpoints will save to
* num_train_steps: how long to train for
* num_eval_steps: perform eval on validation set after this many steps

In [None]:
print("PIPELINE FILE: " + str(pipeline_file), 
      "MODEL DIRECTORY: " + str(model_dir), 
      "NUMBER OF STEPS: " + str(num_steps), 
      "NUMBER OF EVALUATION STEPS: " + str(num_eval_steps), 
      sep="\n\n")

In [None]:
!pip install --upgrade numpy

In [None]:
# Check for GPU presence and regularize their usage

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

## Step 3.1: Fire the training

In [None]:
!python -u /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research/object_detection/model_main_tf2.py \
    --pipeline_config_path={pipeline_file} \
    --model_dir={model_dir} \
    --alsologtostderr \
    --num_train_steps={num_steps} \
    --sample_1_of_n_eval_examples=1 \
    --num_eval_steps={num_eval_steps} 2>&1 | sed -e "/nan/q9";echo $? > exitcode

## Step 3.2: Fire the evaluation

In [None]:
#run model evaluation to obtain performance metrics

!python /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research/object_detection/model_main_tf2.py \
    --pipeline_config_path={pipeline_file} \
    --model_dir={model_dir} \
    --checkpoint_dir={model_dir} \

#Not yet implemented for EfficientDet

In [None]:
current_training_directory = os.path.join(model_dir, "train")

# Step 4: Exporting a Trained Inference Graph
We can now export the model.

In [None]:
#see where our model saved weights
%ls $model_dir

In [None]:
%cd /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/

In [None]:
model_dir

In [None]:
#run conversion script
import re
import numpy as np

fine_tuned_directory = '/home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/FINE_TUNED_MODEL'

if "FINE_TUNED_MODEL" not in os.listdir(os.getcwd()):
    os.mkdir(fine_tuned_directory)
    logging.info("Creating the directory TENSOR_RESULTS because it did not exist") 
else:
    logging.info("The directory FINE_TUNED_MODEL is already present, files will be stored there")
    
model_fine_tuned_directory = os.path.join(fine_tuned_directory, chosen_model)

if chosen_model not in os.listdir(fine_tuned_directory):
    try:
        os.mkdir(model_fine_tuned_directory)
        logging.info(f"The folder model_fine_tuned_directory is set to be: \n {model_fine_tuned_directory}")
    except FileExistsError:
        logging.info(f"FILEEXISTSERROR: The folder model_fine_tuned_directory is set to be: \n {model_fine_tuned_directory}")
else:
    logging.info(f"The folder model_run_directory WAS ALREADY PRESENT and is set to be: \n {model_fine_tuned_directory}")

output_directory = os.path.join(model_fine_tuned_directory, config_subfolder.split("/")[-1])

if config_subfolder.split("/")[-1] not in os.listdir(model_fine_tuned_directory):
    try:
        os.mkdir(output_directory)
        logging.info(f"The folder output_directory is set to be: \n {output_directory}")
    except FileExistsError:
        logging.info(f"FILEEXISTSERROR: The folder output_directory WAS ALREADY PRESENT and is set to be: \n {output_directory}")

# Place the model weights you would like to export here
last_model_path = model_dir
print(last_model_path)

In [None]:
!python /home/labuser/LogoDet/LogoDetection_DSBAProject/training_process/training/models/research/object_detection/exporter_main_v2.py \
    --trained_checkpoint_dir {last_model_path} \
    --output_directory {output_directory} \
    --pipeline_config_path {pipeline_file}

In [None]:
saved_model_directory = os.path.join(output_directory, "saved_model")

In [None]:
%ls $saved_model_directory