# Training YOLOv6 on VOC dataset

### Step 1: Prepare VOC dataset

|  dataset | url | size  | images  |
|  :----:  |  :----:  |:----:  | :----:  |
| VOC2007 trainval  | [download zip](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) | 446MB | 5012  
| VOC2007 test  | [download zip](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar) | 438MB | 4953
| VOC2012 trainval  | [download zip](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) | 1.95GB | 17126

Download VOC dataset and unzip them, the directory shows like:
```
VOCdevkit
├── VOC2007
│   ├── Annotations
│   ├── ImageSets
│   ├── JPEGImages
│   ├── SegmentationClass
│   └── SegmentationObject
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    └── SegmentationObject
```

### Step 2: Convert VOC dataset to YOLO-format.

The VOC dataset use xml format annotations as below. (refer to [VOC2007 guidelines](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/guidelines.html))
```
<annotation>
	<folder>VOC2007</folder>
	<filename>000007.jpg</filename>
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>194179466</flickrid>
	</source>
	<owner>
		<flickrid>monsieurrompu</flickrid>
		<name>Thom Zemanek</name>
	</owner>
	<size>
		<width>500</width>
		<height>333</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>car</name>
		<pose>Unspecified</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>141</xmin>
			<ymin>50</ymin>
			<xmax>500</xmax>
			<ymax>330</ymax>
		</bndbox>
	</object>
</annotation>
```

Run the following command to convert voc dataset to yolo format:

&ensp;&ensp;`python yolov6/data/voc2yolo.py --voc_path your_path/to/VOCdevkit`

We follow the `07+12` training setting, which means using VOC2007 and VOC2012's train+val(16551) as training set, VOC2007's test(4952) as validation set and testing set.

Finally, the directory looks like:
```
VOCdevkit
├── images
├── labels
├── voc_07_12
│   ├── images
│   │   ├── train
│   │   └── val
│   └── labels
│       ├── train
│       └── val
├── VOC2007
└── VOC2012
```
Where `voc_07_12` is the converted yolo-format dataset.

#### Visualize yolo format dataset (Optional)

To check if your dataset is correct, run the following command:

&ensp;&ensp;`python yolov6/data/vis_dataset.py --img_dir your_path/to/VOCdevkit/images/train --label_dir your_path/to/VOCdevkit/labels/train`

### Step 3: Create dataset config file.

Create `data/voc.yaml` like:

```
# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR
train: your_path/to/VOCdevkit/voc_07_12/images/train # train images
val: your_path/to/VOCdevkit/voc_07_12/images/val # val images
test: your_path/to/VOCdevkit/voc_07_12/images/val # test images (optional)

# whether it is coco dataset, only coco dataset should be set to True.
is_coco: False
# Classes
nc: 20  # number of classes
names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
        'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']  # class names
```

### Step 4: Training.


Use the following command to start training:
- Multi GPUs (DDP mode recommended)

&ensp;&ensp;`python -m torch.distributed.launch --nproc_per_node 4 --master_port=23456 tools/train.py --batch 256 --conf configs/yolov6n_finetune.py --data data/voc.yaml --device 0,1,2,3`

- Single GPU

&ensp;&ensp;`python tools/train.py --batch 256 --conf configs/yolov6_finetune.py --data data/data.yaml --device 0`

#### Tensorboard
We can use tensorboard to visualize the train_batch/validation predictions and loss/mAP curve, run:

&ensp;&ensp;`tensorboard --logdir=your_path/to/log`

![Train batch](../assets/train_batch.jpg 'Train batch')

![Traing loss/mAP curve](../assets/voc_loss_curve.jpg 'Traing loss/mAP curve')

#### Evaluation
When training finished, it automatically do evaulation on the testset, the output metrics are:
```
DONE (t=4.21s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.632
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.854
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.702
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.272
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.473
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.518
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.737
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.751
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.554
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.656
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.791
Epoch: 399 | mAP@0.5: 0.8542516455615079 | mAP@0.50:0.95: 0.6315693468708705

Training completed in 9.206 hours.
```
Or you can manually evaulation model on your dataset by:

&ensp;&ensp;`python tools/eval.py --data data/voc.yaml  --weights your_path/to/weights/best_ckpt.pt --device 0`

### 5.Inference

&ensp;&ensp;`python tools/infer.py --weights your_path/to/weights/best_ckpt.pt --yaml data/voc.yaml --source data/images/image3.jpg --device 0`

![image3.jpg](../assets/image3.jpg)
### 6. Deployment

&ensp;&ensp;`python deploy/ONNX/export_onnx.py --weights your_path/to/weights/best_ckpt.pt --device 0`