# Manual for Setting up YOLO with Strong Compute's ISC
This guide was created to help setup YOLO with Strong Compute's ISC for our Capstone unit in Semester 2, 2024. I will also provide a guide on how to access the dataset provided by the client and use it to train a model.  

**Disclaimer:** Some of the following code was appropriated from a template provided by Martin Ong on the #strong-isc channel on Discord.

## Step 1: Mounting the Dataset
Before starting the container on Strong Compute's website, ensure that you have mounted the data so that you can access it. Follow the guide [here](https://strong-compute.gitbook.io/developer-docs/basics/datasets) to see how to mount the data but also how to access the data in your training script.  

**NOTE**: don't worry if you can't see the `/data` folder in your workspace - it's actually located in a different parent directory from the `/root` directory that we are all working in. I'll explain in a later step how we can access it for training and validation.

## Step 2: Setup a new Virtual Env
Before we begin actually writing the code, it's important that we setup a new virtual environment to play around in. 

Once you've activated the virtual environment, you need to download the relevant python packages that we'll be using for our YOLO model. These are:
* Ultralytics; and
* PyTorch.

## Step 3: YOLO Script
Now that you've activated your virtual environment and downloaded the required package, it's time to write our python `train.py` script and the relevant `.isc` script which contains all the information that the ISC needs to run it. 

### Step 3.1 Training Script
Create a new file called `train.py`. This will contain all the code for our training script. Copy and paste the following code, remembering to **replace the commented parts with your own directories**

In [None]:
'''
Training script 'train.py' to be used in isc
'''
import os
import torch
from ultralytics import YOLO

def main():
    '''
    main function which sets up training paths and starts model training
    '''
    # if os.environ.get('OUTPUT_PATH'):
    #     output_path = f"{os.environ.get('OUTPUT_PATH')}/log.txt"
    #     output_file = open(output_path, "w")
    # else:
    #     output_path = None

    os.chdir('/root/seb-test-run')

    path = os.getcwd()
    model_path = os.path.join(path, "models", "yolov8n.pt")
    image_path = os.path.join(path, "misc", "test_image.jpg")

    model = YOLO(model_path)
    if torch.cuda.is_available():
        device = torch.device("cuda:0")
    else:
        device = torch.device("cpu")

    model.to(device)

    # Start training
    results = model.train(data="data.yaml", epochs=3, imgsz=1024, device=device)

    results = model(image_path)  # predict on an image
    for result in results:
        result.show()  # display to screen


if __name__ == "__main__":
    main()



### Step 3.2 Updating the TOML File
A TOML file basically communicates important information about your experiment to the ISC. Create a file called `test-run.isc` and place the following lines inside. It's important that you replace:
* **isc_project_id** with your own project ID. This is found on the Strong Compute dashboard.
* **experiment_name [Optional]** with whatever name you want to call your experiment.
* **dataset_id** with the dataset ID. This is also found on the Strong Compute dashboard.
* **command** with the directories to your training script. i.e., command = "source ~/\<virtual env>/venv/bin/activate && python3 ~/\<path to training script>".

## Step 4: Accessing the Data
Now that we have the training script set up and the relevant `.isc` file, we need to ensure that our YOLO model can actually access the data for training. As I mentioned earlier, the `/data` folder is located in a different parent directory which we actually can't edit, so we need to create a separate `fmg_data.yaml` file in our own directory which tells the YOLO compiler where to look for the data. This is because the original `fmg_data.yaml` file located inside the `/data/fmg_data` directory tells the YOLO compiler to look in `/home/calvin/fmg_data/` for the training and validation images, but this is incorrect.

To fix this issue, copy the `fmg_data.yaml` file into the same directory as your python training script and replace the `path` variable with `/data/fmg_data` as seen in the below. The reason for this is so that we can tell our python script in step 3.1 which `.yaml` file to access so that it correctly locates the data. **[Optional]** You can also specify in which directory you want YOLO to store the output files.

You may also need to update the directory that YOLO looks for the data in. When the ultralytics package is downloaded using pip, it creates a settings file located in `/root/.config/Ultralytics/settings.yaml` which tells the YOLO package where to look for the data. The problem is that by default it tells the YOLO package to look inside a `datasets` folder inside wherever directory it was installed in. This directory doesn't exist for us though, so if you try to run the training script as it is, you might receive an error that looks something like this:

To fix it, open the `settings.yaml` located in `/root/.config/Ultralytics/settings.yaml` (if you're using VSCode you can do this by clicking on the search bar up the top and choosing 'go to file' and then copying and pasting this path). The file will look something like:

Just replace `/root/seb-test-run/datasets` with the directory to the data: `/data/fmg_data/`.

## Step 5: Run an Experiment
Now that we have set everything up and YOLO knows where to look for the data, we can train the model using YOLO.

Launch your experiment by running the following commands.

You can track the status of your experiment in the terminal by typing:

## Step 6: Accessing Results
To view the results of each experiment, check the log from the ISC experiment as YOLO will tell you where it stored the results of the training. This should by default be inside a folder named `runs` which contains subdirectories for `train1`, `train2` and so on.