### Introduction

#### This tutorial shows how to run darknet end to end data pre-processing & training within a container

### Data Pre-processing
Darknet has its own format for image object detection annotations. The commonly known formats in the open source community are pascalVOC & COCO. We will be using a pascalVOC example for this demonstration, including converting its annotation into darknet format. Below shows the folder containing our training data

In [1]:
!ls /data/train

VOCdevkit


This folder shows the contents within pascalVOC 2008. We will be extracting pascalVOC annotations from the **Annotations** folder into a darknet annotation folder called **labels** (all folders except labels come from pascalVOC dataset)

In [2]:
!ls /data/train/VOCdevkit/VOC2008

Annotations  JPEGImages		SegmentationObject
ImageSets    SegmentationClass	labels


We will now run a script to aid in our conversion & creation of the darknet **labels** folder. We can run from command line python or through jupyter interface.

In [11]:
# Command Line
!python3 pascalvoc_to_yolo.py -h # for argument help
!python3 pascalvoc_to_yolo.py -n ../data/cfg/voc.names -d ../data/train/VOCdevkit/VOC2008/ -t ../data/cfg/train.txt # -rt /data/train/VOCdevkit/VOC2008/

usage: pascalvoc_to_yolo.py [-h] -n NAMES [-d DIR] [-s SUBDIR] [-t TEXTFILE]
                            [-rt TEXTFILEROOT]

Convert pascalVOC to darknet labels

optional arguments:
  -h, --help            show this help message and exit
  -n NAMES, --names NAMES
                        A darknet .names file containing class names
  -d DIR, --dir DIR     Other main directory
  -s SUBDIR, --subdir SUBDIR
                        Specify multiple sub-directories of pascalVOC folders.
                        NOT IN USE
  -t TEXTFILE, --textfile TEXTFILE
                        Path to save text file of images for training. Default
                        does not generate
  -rt TEXTFILEROOT, --textfileroot TEXTFILEROOT
                        Use only if running preprocesing in host machine but
                        running training in container, so that textfile image
                        directory mapping will have correct container path.
                        Defaults to -d direc

In [8]:
# OR python interface
%load_ext autoreload
%autoreload 2

from pascalvoc_to_yolo import main
main(argnames='../data/cfg/voc.names', argdir='../data/train/VOCdevkit/VOC2008/', argtextfile='../data/cfg/train.txt')

Conversion Completed.


### Model configuration / tuning
We will now run configuration of the model to finetune its architecture for our dataset. This will be an iterative process of experimentation. We will run through one example for our pascalVOC use-case.

### *Step 1 - Recalculate Anchor Box*
One of the pre-processing steps with darknet yolo model is to re-calculate anchor boxes for your dataset. This is to allow the easier regression of bounding-boxes to tightly fit the target from the pre-defined anchor boxes.

In [15]:
!/darknet/darknet detector calc_anchors /data/cfg/voc.data -num_of_clusters 9 -width 416 -height 416
# may have to interrupt the kernel.

 CUDA-version: 10020 (10020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 4.3.0

 num_of_clusters = 9, width = 416, height = 416 
 read labels from 5096 images 


 loaded 	 image: 1 	 box: 1 loaded 	 image: 1 	 box: 2 loaded 	 image: 2 	 box: 3 loaded 	 image: 3 	 box: 4 loaded 	 image: 4 	 box: 5 loaded 	 image: 5 	 box: 6 loaded 	 image: 5 	 box: 7 loaded 	 image: 6 	 box: 8 loaded 	 image: 6 	 box: 9 loaded 	 image: 6 	 box: 10 loaded 	 image: 7 	 box: 11 loaded 	 image: 8 	 box: 12 loaded 	 image: 8 	 box: 13 loaded 	 image: 8 	 box: 14 loaded 	 image: 9 	 box: 15 loaded 	 image: 10 	 box: 16 loaded 	 image: 10 	 box: 17 loaded 	 image: 11 	 box: 18 loaded 	 image: 11 	 box: 19 loaded 	 image: 11 	 box: 20 loaded 	 image: 12 	 box: 21 loaded 	 image: 13 	 box: 22 loaded 	 image: 14 	 box: 23 loaded 	 image: 15 	 box: 24 loaded 	 image: 15 	 box: 25 loaded 	 image: 16 	 box: 26 loaded 	 image: 16 	 box: 27 loaded 	 image: 17 	 box: 28 loaded 	 image: 17 	 box: 29 loaded 	 image: 18 	 box: 30 loaded 	 image: 19 	 box: 31 loaded 	 image: 19 	 box: 32 loaded 	 image: 19 	 box: 33 loaded 	 image: 19 	 box: 34 lo


 iterations = 70 


counters_per_class = 362, 311, 526, 388, 527, 174, 928, 427, 712, 184, 136, 533, 335, 322, 4783, 443, 210, 177, 210, 329

 avg IoU = 68.80 % 

Saving anchors to the file: anchors.txt 
anchors =  33, 48,  49,115, 120, 94,  78,204, 175,177, 125,298, 324,200, 217,328, 359,362
^C


### *Step 2 - Configure model architecture, parameters (& change new anchor boxes)*
You may want to edit your model architecture here with the new **anchor boxes**, or any other parameters e.g.
- **batch_size**
- **subdivision** (if GPU OOM)
- **input image size** (stick to your calc_anchors width & height. Set higher in multiples of 32 if you want image to be processed at higher resolution)
- **augmentation methods** (e.g. image rotation, saturation, hue, exposure)
- **Neural Network layer configurations** e.g.
    - stopbackward=1 at the appropriate layer for no gradient update in earlier layers (transfer learning)
    

Open a new browser to your yolov3.cfg file

```
http://http://localhost:8888/edit/data/cfg/yolov3-voc.cfg 
http://<your_browser_url>/edit/<path_to_file>/<config_file_name>
```

Go to the bottom to find the **[yolo]** layers and replace the old anchor boxes with yours.

### *Step 3 - Configure data file*
Include a data file (.data) for your model to reference training data & save model location
```
http://http://localhost:8888/edit/data/cfg/voc.data 
```

Your data file should have the following attributes 
- **classes** = 20 (this should be equal to the number of classes in voc.names)
- **train**  = /data/cfg/train.txt (this is the generated file from the pascal_to_yolo.py converter)
- **valid**  = /data/cfg/train.txt (optional to include your valid.txt if available)
- **names** = /data/cfg/voc.names 
- **backup** = /data/weights (save path location for your model. Ensure this folder is bind mounted to host, else data will be lost)

### Run training instance
Training darknet differs from other framework being it is executed from a binary. The model architecture, data (and if any pre-trained model) configurations are passed in as arguments to the darknet binary


```darknet detector train <config.data> <model.cfg> <pretrained.model> <-gpus 0,1,2,3> -dont_show```

In [6]:
!/darknet/darknet detector train /data/cfg/voc.data /data/cfg/yolov3-voc.cfg /data/pre-trained/darknet53.conv.74 -dont_show

 CUDA-version: 10020 (10020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 4.3.0
yolov3-voc
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070 SUPER 
net.optimized_memory = 0 
mini_batch = 2, batch = 64, time_steps = 1, train = 1 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   3 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF
   4 Shortcut Layer: 1,  wt = 0, wn = 0, outputs: 208 x 208 x  64 0.003 BF
   5 conv    128       3 x 3/ 2    208 x 208 x  64 ->  104 x 104 x 128 1.595 BF
   6 conv     64       1 x 1/ 1    104 x 104 x 128 ->  104 x 104 x  64 0.177 BF
   7 conv    128       3 x 3/ 1    104 x 104 x  64 ->  104 x 104 x 128 1.595 BF
   8 Shortcut Laye

Done! Loaded 75 layers from weights-file 
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing, random_coef = 1.40 

 608 x 608 
 Create 6 permanent cpu-threads 
 try to allocate additional workspace_size = 73.23 MB 
 CUDA allocate done! 
Loaded: 0.000020 seconds
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.591211, GIOU: 0.527301), Class: 0.736924, Obj: 0.669970, No Obj: 0.514060, .5R: 1.000000, .75R: 0.000000, count: 1, class_loss = 309.983368, iou_loss = 0.348602, total_loss = 310.331970 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 94 Avg (IOU: 0.052685, GIOU: 0.052685), Class: 0.392326, Obj: 0.185039, No Obj: 0.487571, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 1104.566040, iou_loss = 5.264648, total_loss = 1109.830688 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 106 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.523474, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 4985.49169

v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.514634, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 308.309906, iou_loss = 0.000000, total_loss = 308.309906 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 94 Avg (IOU: 0.204883, GIOU: 0.204883), Class: 0.449454, Obj: 0.499593, No Obj: 0.486873, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = 1105.508789, iou_loss = 9.733276, total_loss = 1115.242065 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 106 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.524105, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 5004.094238, iou_loss = 0.000000, total_loss = 5004.094238 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.138691, GIOU: -0.079765), Class: 0.442767, Obj: 0.572435, No Obj: 0.513962, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = 312.544312, iou_loss

v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.253805, GIOU: -0.030786), Class: 0.438750, Obj: 0.535848, No Obj: 0.513716, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = 313.899780, iou_loss = 4.700134, total_loss = 318.599915 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 94 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.487959, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 1104.346924, iou_loss = 0.000000, total_loss = 1104.346924 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 106 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.523885, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 4993.142090, iou_loss = 0.000000, total_loss = 4993.142090 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.317061, GIOU: 0.016371), Class: 0.615247, Obj: 0.464126, No Obj: 0.515006, .5R: 0.200000, .75R: 0.200000, count: 5, class_loss = 319.891510, iou_loss

v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.373437, GIOU: 0.303882), Class: 0.367839, Obj: 0.584393, No Obj: 0.513341, .5R: 0.333333, .75R: 0.000000, count: 3, class_loss = 312.025635, iou_loss = 4.167053, total_loss = 316.192688 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 94 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.486817, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 1097.424561, iou_loss = 0.000000, total_loss = 1097.424561 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 106 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.523997, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 5000.360352, iou_loss = 0.000000, total_loss = 5000.360352 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.477968, GIOU: 0.407233), Class: 0.581417, Obj: 0.548777, No Obj: 0.513485, .5R: 0.200000, .75R: 0.000000, count: 5, class_loss = 314.848877, iou_loss 

v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.375074, GIOU: 0.322097), Class: 0.427333, Obj: 0.415088, No Obj: 0.514890, .5R: 0.000000, .75R: 0.000000, count: 3, class_loss = 315.129364, iou_loss = 1.471436, total_loss = 316.600800 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 94 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.487060, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 1100.247559, iou_loss = 0.000000, total_loss = 1100.247559 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 106 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.523519, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 4992.655762, iou_loss = 0.000000, total_loss = 4992.655762 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.313324, GIOU: 0.065679), Class: 0.410755, Obj: 0.518051, No Obj: 0.514384, .5R: 0.000000, .75R: 0.000000, count: 3, class_loss = 312.625031, iou_loss 

v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.378091, GIOU: 0.277144), Class: 0.605008, Obj: 0.491408, No Obj: 0.514255, .5R: 0.333333, .75R: 0.000000, count: 3, class_loss = 313.056152, iou_loss = 2.787506, total_loss = 315.843658 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 94 Avg (IOU: 0.337594, GIOU: 0.102829), Class: 0.321337, Obj: 0.212274, No Obj: 0.485917, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 1098.666016, iou_loss = 1.186768, total_loss = 1099.852783 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 106 Avg (IOU: 0.356356, GIOU: 0.297010), Class: 0.331168, Obj: 0.469510, No Obj: 0.524370, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 5007.582520, iou_loss = 1.142578, total_loss = 5008.725098 
v3 (mse loss, Normalizer: (iou: 0.75, cls: 1.00) Region 82 Avg (IOU: 0.455291, GIOU: 0.420133), Class: 0.546727, Obj: 0.499771, No Obj: 0.514313, .5R: 0.500000, .75R: 0.000000, count: 4, class_loss = 316.531860, iou_loss 

### Run validation
Validate your results with a validation set by running the ```map``` command. Replace the model with your final trained model. You should **aim to optimize & make your model selection based on your validation score**. If it is unsatisfactory, you may want to change model parameters or refine your dataset.

```darknet detector map <config.data> <model.cfg> <trained.model>```

In [20]:
!/darknet/darknet detector map /data/cfg/voc.data /data/cfg/yolov3-voc.cfg /data/pre-trained/darknet53.conv.74

 CUDA-version: 10020 (10020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 4.3.0
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070 SUPER 
net.optimized_memory = 0 
mini_batch = 1, batch = 32, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   3 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF
   4 Shortcut Layer: 1,  wt = 0, wn = 0, outputs: 208 x 208 x  64 0.003 BF
   5 conv    128       3 x 3/ 2    208 x 208 x  64 ->  104 x 104 x 128 1.595 BF
   6 conv     64       1 x 1/ 1    104 x 104 x 128 ->  104 x 104 x  64 0.177 BF
   7 conv    128       3 x 3/ 1    104 x 104 x  64 ->  104 x 104 x 128 1.595 BF
   8 Shortcut Layer: 5,  wt =

Done! Loaded 75 layers from weights-file 

 calculation mAP (mean average precision)...
5096
 detections_count = 0, unique_truth_count = 12017  
class_id = 0, name = aeroplane, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 1, name = bicycle, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 2, name = bird, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 3, name = boat, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 4, name = bottle, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 5, name = bus, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 6, name = car, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 7, name = cat, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 8, name = chair, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 9, name = cow, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 10, name = diningtable, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 11, name = dog, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 12, name = horse, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 13, name = motorbike, ap = 0.00%   	 (TP = 0, F

### Run inference
```darknet detector test <config.data> <model.cfg> <pretrained.model> <img_path>```