## Training

Let's prepare our dataset and do the training!

The code below runs the xmlconversion.py file and *--verbose* helps to show details about the results.

In [1]:
!python xmlconversion.py --verbose

images/Pica_pica_506.xml
images/periparus_ater_337.xml
images/ErithacusRubecula0060.xml
images/periparus_ater_3.xml
images/Pica_pica_197.xml
images/periparus_ater_469.xml
images/Pica_pica_674.xml
images/periparus_ater_157.xml
images/Pica_pica_398.xml
images/ErithacusRubecula0099.xml
images/Pica_pica_271.xml
images/periparus_ater_402.xml
images/Pica_pica_697.xml
images/Pica_pica_456.xml
images/periparus_ater_218.xml
images/ErithacusRubecula0219.xml
images/ErithacusRubecula0290.xml
images/ErithacusRubecula0621.xml
images/periparus_ater_41.xml
images/periparus_ater_99.xml
images/periparus_ater_72.xml
images/ErithacusRubecula0759.xml
images/Pica_pica_39.xml
images/periparus_ater_78.xml
images/Pica_pica_799.xml
images/ErithacusRubecula0719.xml
images/periparus_ater_220.xml
images/Pica_pica_513.xml
images/Pica_pica_341.xml
images/Pica_pica_518.xml
images/periparus_ater_577.xml
images/ErithacusRubecula0229.xml
images/Pica_pica_155.xml
images/Pica_pica_788.xml
images/Pica_pica_172.xml
images/P

### Doing train/test split
The code below executes the partition *dataset.py* file, which partitions the dataset in the *images* folder into train and test files with a 90/10 percentage split.

In [2]:
!python partition_dataset.py -x -i ./images -r 0.1

The code below opens the *label_map* file where we will add the labels with it's unique id. 

In [3]:
!code './data/label_map.pbtxt'

### Creating TFRecord
The code below generates the TFRecord file in the data folder for the train and test data partitioned above with the help of the *generate tfrecord.py* file.<br>
Converting to TFRecord format improves efficiency because it can take up less space than the original data and also allows for quick I/O, which is especially important when working with GPU or TPU devices.

In [4]:
!python generate_tfrecord.py -x images/train -l data/label_map.pbtxt -o data/train.record

Successfully created the TFRecord file: data/train.record


In [5]:
!python generate_tfrecord.py -x images/test -l data/label_map.pbtxt -o data/test.record

Successfully created the TFRecord file: data/test.record


### Setting model path
The code below is the name of the model that we are going to use. <br>

I decided to chose **Faster R-CNN** with ResNet101 as the base network for extracting features rather than VGG because they use less kernels and enhance the performance of neural network with more layers. A 1024x1024 image size is being used as we are identifying birds, which take up only a little portion of the image. As a result, if I reduce the size of the image, it may lose many of its valuable qualities.

In [3]:
PATH_TO_MODEL="faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8"

### Configuring the config file
The code below helps to open the *pipelin.config* file where we can set parameter values as per our requirement for training our model. Here I have changed number of classes to 3.<br> 
I tried altering the batch size to 64, 32, 16, 10 to improve my model performance but every time the training got stuck at one point for a very long time and the training process didn't start, so I set the value 4.

In [7]:
!code './training/TF2/training/' {PATH_TO_MODEL} '/pipeline.config'

### Training the model
Let's train our model with the help of *model_main_tf2.py* file where I have set the training model directory and a path to *pipeline.config* file for our parameter values to consider.<br>

I started with 5000 steps and gradually increased it to 20000 before settling on 10000 steps because it was performing well and with great accuracy. Also, above this steps the model performance was nearly identical, and below this, the model performed okay where the value of loss was not satisfying enough according to the tensorboard graphs. Finally at 5000 steps, the model performed well but not good enough because it was unable to detect one of the birds during inference process.

In [13]:
# Training the model, Initially started with 5k epochs to see how it performs
!python model_main_tf2.py --model_dir=training/TF2/training/{PATH_TO_MODEL} --pipeline_config_path=training/TF2/training/{PATH_TO_MODEL}/pipeline.config --num_train_steps=10000 --alsologtostder

2022-01-04 17:54:05.337935: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-04 17:54:06.962906: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-01-04 17:54:06.985180: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-04 17:54:06.985665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.755GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-01-04 17:54:06.985686: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-04 17:54:06.987664: I tensorflow/stream_executor/platfor

### Exporting trained inference graph
The code below helps to export the trained model to a directory with the help of *exporter_main_v2.py* file which will be used to perform object detection later.

In [14]:
!python exporter_main_v2.py --input_type image_tensor --pipeline_config_path ./training/TF2/training/{PATH_TO_MODEL}/pipeline.config --trained_checkpoint_dir ./training/TF2/training/{PATH_TO_MODEL}/ --output_directory ./training/TF2/training/{PATH_TO_MODEL}/saved_model/

2022-01-04 19:39:22.749210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-04 19:39:24.431568: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-01-04 19:39:24.452685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-04 19:39:24.453165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.755GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2022-01-04 19:39:24.453186: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-04 19:39:24.455115: I tensorflow/stream_executor/platfor