<img align="left" src="https://panoptes-uploads.zooniverse.org/project_avatar/86c23ca7-bbaa-4e84-8d8a-876819551431.png" type="image/png" height=100 width=100>
</img>
<h1 align="right">Train ML models</h1>
<h3 align="right"><a href="https://colab.research.google.com/github/ocean-data-factory-sweden/kso/blob/main/notebooks/analyse/Train_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a></h3>
<h3 align="right">Written by the KSO Team</h3>

This notebook takes you through the process of importing a baseline model, training it on a dataset and evaluating the quality of the model. If you do not have a project with us yet, you can run the template project to get a taste of how it all works. This notebook assumes that the user has prepared the dataset for model training, see Tutorial #8 for details on the required setup.

🔴 <span style="color:red">&nbsp;NOTE: In order to run this notebook, you need to have a Weights and Biases account. If you want to become a member of our Koster team on Weights and Biases, you may request this access by contacting jurie.germishuys@combine.se. But this is not necessary to run the template project. </span>

# Set up KSO requirements

### Install requirements and load KSO modules

Installing the requirements in Google Colab takes ~4 mins and might automatically crash/restart the session. Please run this cell until you get the "KSO successfully imported!" message.

In [None]:
%matplotlib inline
import os
import sys


def initiate_dev_version():
    kso_path = os.path.abspath(os.path.join(os.getcwd(), "../.."))
    if os.path.isdir(os.path.join(kso_path, "kso_utils")):
        sys.path.insert(0, kso_path)
        %load_ext autoreload
        %autoreload 2
        print("Development mode ON - kso-utils added to the system.")
    else:
        raise FileNotFoundError("kso_utils directory not found in the expected path.")


def install_kso_utils():
    !pip install -q kso-utils
    # Temporary workaround to install panoptes from the source (avoid requests incompatibility)
    !pip install git+https://github.com/zooniverse/panoptes-python-client.git
    print("Restarting runtime to apply package changes...")
    os.kill(os.getpid(), 9)


try:
    import kso_utils.widgets as kso_widgets
    import kso_utils.project_utils as p_utils
    import kso_utils.server_utils as s_utils
    import kso_utils.yolo_utils as y_utils
    from kso_utils.project import ProjectProcessor, MLProjectProcessor
    from ipyfilechooser import FileChooser
    from IPython.display import display
    
    print("KSO successfully imported!")
except Exception as e:
    print(f"Error importing kso modules: {e}")
    try:
        initiate_dev_version()
            import kso_utils.widgets as kso_widgets
            import kso_utils.project_utils as p_utils
            import kso_utils.server_utils as s_utils
            import kso_utils.yolo_utils as y_utils
            from kso_utils.project import ProjectProcessor, MLProjectProcessor
            from ipyfilechooser import FileChooser
            from IPython.display import display


        print("KSO successfully imported!")
    except Exception as e:
        install_kso_utils()

### Choose your project

In [None]:
project_name = kso_widgets.choose_project()

### Initiate project's database

In [None]:
# Find project
project = p_utils.find_project(project_name=project_name.value)
# Initialise pp
pp = ProjectProcessor(project)

In [None]:
# Initiate mlp
mlp = MLProjectProcessor(pp)

In [None]:
# Only for Template Project (downloading prepared data)
s_utils.get_ml_data(project)

# Train the model

### Configure data paths

If you are running the Template project, the output_folder that you want to select is the ml-template-data. The path to this folder is printed in the cell above. For any other project, it is the folder where you have saved your data.

In [None]:
# Specify path containing the images and labels folders.
mlp.output_path = kso_widgets.choose_folder(
    project.photo_folder if not project.photo_folder == "None" else ".", "output"
)

🔴 <span style="color:red">&nbsp;NOTE: Each model type requires a specific folder structure to be in place. To be able to train your own Object Detection models, your data_path must contain a yml file for data and hyperparameters. See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data#11-create-datasetyaml. For image classification models, there should be 3 folders (train, val, test) each containing images in class_name folders. For segmentation models, polygon coordinates are also required. </span>

In [None]:
# Fix important paths
mlp.setup_paths()

### Choose a suitable experiment name

In [None]:
exp_name = kso_widgets.choose_experiment_name()

### Choose model to use for training

In the next cell you will specify the folder (can be any folder of choice) where you want to download the baseline model to, which you will select in the cell after. This baseline model will be used as the starting point for the training.

In [None]:
# Specify path to download baseline model
download_folder = kso_widgets.choose_folder(
    project.photo_folder if not project.photo_folder == "None" else ".",
    "model download",
)

In [None]:
weights = mlp.choose_baseline_model(download_folder.selected)

### Train model with given configuration

The cell below will ask you which batch size and how many epochs you want to use during training. There are no strict rules for this and the best settings will depend on the choice of GPU and some randomness that we have encountered while training models. Therefore it will be some trial and error. As a starting point we advice to use a batch size of 8. For smaller datasets, we have experienced that 50-100 epochs has been sufficient to get good performance on the model (metrics that have reached a plateau), but to not overfit to the training set.

In [None]:
batch_size, epochs, img_h, img_w = mlp.choose_train_params()

In [None]:
# Give your WandB username, or team name where you want to sent the runs to.
# If you are part of the koster project, you can keep the default 'koster'.
entity = mlp.choose_entity(alt_name=False)

In [None]:
mlp.train_yolo(
    exp_name=exp_name.value,
    weights=weights.artifact_path,
    project=mlp.project_name,
    epochs=epochs.value,
    batch_size=batch_size.value,
    img_size=img_h.value,  # this requires an int
)

# Evaluate model performance

The model is now done with training. To see the loss, precision, recall and some other parameters per training epoch, click on the link in the previous cell. Here you can see your run in Weights and Biases. To evaluate the resulting model, please run the cells below. These execute the standard evaluation process from YOLO.

For a biological evaluation of the model, please see Notebook 6.

In [None]:
conf_thres = kso_widgets.choose_eval_params()

In [None]:
# Choose model: The folder you want to select for eval_model is the folder with your experiment_name.
eval_model = FileChooser(".")
display(eval_model)

When you run the cell below, you will get some numbers logged on the screen, and 3 files that are stored in the folder 'your_experiment_name'_val.

The numbers logged on the screen represent the following:
* The first 7 numbers are the: mean precision, mean recal, mean average precision calculated at IOU threshold 0.5 (map@0.5), the mean average precision calculated at different IOU thresholds of 0.5-0.95 with steps of 0.05 (map@0.5:0.95) and then 3 training losses based on predicting the box, object or class.
* The array gives the ap@0.5 per class.
* The last 3 numbers are the same as the numbers that are already printed in a line above, where it says 'Speed: … ms per....'

In [None]:
# Evaluate YOLO Model on Unseen Test data
mlp.eval_yolo(exp_name=exp_name.value, conf_thres=conf_thres.value)

# (Optional) : Enhance annotations using trained model

Enhancement uses the trained model to increase the amount of annotations in the training data. This should only be done in cases where it is absolutely necessary as bad predictions lead to worse predictions when used to train the next iteration of the model.


🔴 <span style="color:red">&nbsp;NOTE: We recommend using a relatively high confidence threshold when enhancing trained models as low confidence predictions could significantly impact the quality of your annotated data. This is currently only available for object detection models.  </span>

In [None]:
eh_conf_thres = kso_widgets.choose_eval_params()

In [None]:
# Choose an input path
input_path = FileChooser(mlp.project_name)
display(input_path)

In [None]:
# Find the project path
project_path = FileChooser(mlp.project_name)
display(project_path)

In [None]:
mlp.enhance_yolo(
    in_path=input_path.selected,
    project_path=project_path.selected,
    conf_thres=eh_conf_thres.value,
    img_size=[640, 640],
)

### Choose run to use as enhanced annotations

In [None]:
runs = FileChooser(".")
display(runs)

In [None]:
# Move enhanced annotations to original run folder (NB: This will replace the original annotations)
mlp.enhance_replace(runs.selected)

#### Once you have moved the new labels to the original label location, you can return to Step 2 and train your model again.

In [None]:
# END