# Using FSOCO

In this notebook, we'll start looking into using the [FSOCO dataset](https://www.fsoco-dataset.com/).

## Object Localisation

Let's start by discussing Object Localisation. In this task, the aim is, given an image, to localise and classify all objects within the image. For example, given the following image, the aim is to draw a bounding box around all objects.

<img alt="Object Localisation Example" src="./media/object-localisation-example.png" style="max-height: 300px;">

As you can see, each object in the image has had a bounding box drawn around it, as well as classified based on colour with human in <font color='cyan'>cyan</font>, sheep in <font color='blue'>blue</font>, and dog in <font color='red'>red</font>.

## FSOCO

FSOCO is a dataset collected by the community of those who take part in the Formula Student challenge and include a set of images to train and test models on. You can download the dataset [here](https://www.fsoco-dataset.com/download), note that this is 24GB due to the large scale size of the images!

Once you have done so, you will see the following directory structure:
```
.
├── ampera
│   ├── ann
│   └── img
├── amz
│   ├── ann
│   └── img
├── aristurtle
│   ├── ann
│   └── img
├── asurt
│   ├── ann
│   └── img
├── baltic
│   ├── ann
│   └── img
├── bauman
│   ├── ann
│   └── img
...
├── uop
│   ├── ann
│   └── img
└── wfm
    ├── ann
    └── img

124 directories

```
Each subdirectory represents data collected by different Universities/teams, with each containing an annotations folder (`ann`) and an images folder (`img`). Note that these images are in their original resolution and are thus very large!

## YOLO V7

YOLO is a famous object detector model family, with up to version 11 now. Standing for You Only Look Once, it was made famous for its real time speed and accuracy.

This stemmed from the entire image being fed into the convolutional neural network, instead of the network being applied to different parts of the image at different scales.

In this initial exploration, we're going to use YOLO V7 to see how it performans on the FSOCO images.

---

You can download the YOLO v7 model [here](https://github.com/WongKinYiu/yolov7?tab=readme-ov-file), either by cloning the repository or by downloading the repo as a zip.

The next step is to set up an environment to run the code. If you are familiar with using dockers, then the following steps (copied from the README) will be suitable:

```
# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolov7 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolov7 --shm-size=64g nvcr.io/nvidia/pytorch:21.08-py3

# apt install required packages
apt update
apt install -y zip htop screen libgl1-mesa-glx

# pip install required packages
pip install seaborn thop

# go to code folder
cd /yolov7
```

Otherwise, I would recommend using [conda](https://www.anaconda.com/download) to set up a new environment, information [here](https://docs.conda.io/docs/user-guide/tasks/manage-environments.html).

On my installation, I had to install the following (note there may be more packages you need to install!):
```
conda install -y pytorch::pytorch torchvision torchaudio 
conda install -y -c pytorch conda-forge::opencv 
conda install -y pandas tqdm matplotlib seaborn scipy
```

---

## Testing YOLO

You can now run and test YOLOV7 using the following command, replace `<path_to_FSOCO_image>` with the path to an image in the FSOCO dataset that you want to evaluate:

```python3 detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source <path_to_FSOCO_image>```

**What do you see?**

Well, as YOLO v7 was never trained on cones, it is never able to classify objects as cones. However, in most cases it is still able to find cones that it misclassifies as other objects, notably humans.

Now, from a self driving car point of view, you *probably* don't want to hit anything, but really we should have a go at improving the model from its base performance to something much better, which we can do via a process called fine-tuning. This is simply just training a model that has already been trained on new data to improve it! Note this process is also called transfer learning

## Fine-Tuning YOLO

The codebase allows you to fine-tune the model for a new dataset, which we can do with the following command:
```
python train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml

```

However, this gives three yaml files which we do not know about: `data/custom.yaml` referring to the dataset, `cfg/training/yolov7-custom.yaml` (the config of the model), and `data/hyp.scratch.custom.yaml` (the hyperparameters to use).

### Model Config
This can be the same as the base model config, we do not need to change the model by default, so we can use the config given by `./cfg/training/yolov7.yaml`.

### Hyperparameters
Again, this can be the same as the base model to start with: `./data/hyp.scratch.custom.yaml`. You may need to look into this later if the model isn't fine-tuning very well.

### Dataset YAML and .txt files
This is where we need to do some work. The code requires the dataset to be in a specific format, which you can read about [here](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/#option-2-create-a-manual-dataset). The `data/custom.yaml` file will simply be:
```yaml
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/FSOCO # dataset root dir
train: images/train # train images (relative to 'path')
val: images/val # val images (relative to 'path')
test: images/test # test images (relative to 'path')

# Classes (1 FSOCO classes)
names:
    0: cone
```
Note: we will start by just training a single cone class, but in the future this can be expanded to different colours of cone.

Creating the .txt files will be a bit more involved, each image requires one txt file with the following format:
* One row per object
* Each row is class x_centre, y_centre width height format.
* Box coordinates must be in normalised xywh format (from 0 to 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
* Class numbers are zero-indexed (start from 0).

You will need to write a script (i.e. in python) to convert the json annotations into this format. Note that the co-ordinates inside FSOCO refer to the top left and the bottom right of the bounding box instead of normalised x_centre, y_centre, width, height format. An example of the differences can be seen below:

<img alt="Object Localisation Example" src="./media/annotation_diffs.png" style="max-height: 300px;">


**Important:** The code requires images of size 640x640 resolution, which means that you will need to rescale the downloaded images into this format, an example of how to do this is using `[torchvision.transforms.Resize](https://pytorch.org/vision/main/generated/torchvision.transforms.Resize.html)`.