<a href="https://colab.research.google.com/github/matthewleechen/digitize_woodcroft_patents/blob/main/fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook was designed for training in Google Colab Pro. It is **not** recommended to run this notebook on the Colab free plan. This notebook's training loop was originally run using Colab Pro on 1 Nvidia A100 (40GB) GPU. Training is both compute and time-intensive: the process consumed approximately 15-20 compute credits per hour, and the Faster-RCNN and Mask-RCNN (ResNet-50 backbone) models took approximately one full day (~12 hours) to train to 100,000 iterations. This is going to vary significantly depending on the quality of your input images that you need to load into GPU memory: I compressed them to 20% of their original size.

It uses the Detectron2 library for object detection and instance segmentation from Facebook AI Research (https://github.com/facebookresearch/detectron2).

**Prepare labelled data and directories**

In [None]:
%%capture
# Clone forked layout-model-training repo from Layout-Parser
! git clone https://github.com/matthewleechen/layout-model-training

# Clone forked cocosplit repo from akarazniewicz
! git clone https://github.com/matthewleechen/cocosplit

In [None]:
%%capture
# Install all dependencies 
! cd /content/layout-model-training/ && pip install -r requirements.txt
! cd /content/cocosplit/ && pip install -r requirements.txt
! pip install -e git+https://github.com/matthewleechen/layout-parser.git#egg=layoutparser
! pip install torchvision && pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2"
! pip install google-cloud-vision 

In [None]:
# Change working directory
%cd /content/layout-model-training/
!pwd

/content/layout-model-training
/content/layout-model-training


Restart runtime before proceeding. Then upload the COCO annotations file to the directory `/content/layout-model-training/`.

In [None]:
import os
import zipfile

In [None]:
zip_file = "annotations.zip"

# Create data folder
output_folder = "data"
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Extract the contents of the annotations file to the data folder
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
    for member in zip_ref.namelist():
        if not member.startswith('._'):
            zip_ref.extract(member, output_folder)


In [None]:
# Create outputs folder
! mkdir /content/layout-model-training/outputs/

The outputs folder will contain evaluation data, checkpoint information and model weights following training.

**Split the data into training and test sets**

The code below allocates 80% of the data to the training set, and 20% to the test set. You can change this via the parameter currently set to 0.8.

In [None]:
# Split the data 
! python /content/cocosplit/cocosplit.py --having-annotations --multi-class -s 0.8 \
/content/layout-model-training/data/new_results.json /content/layout-model-training/data/train.json \
/content/layout-model-training/data/test.json

Saved 9016 entries in /content/layout-model-training/data/train.json and 2254 in /content/layout-model-training/data/test.json


**Training Detectron2 vision models**

***Continue training from last checkpoint***

Upload the `last_checkpoint` file and the model weights file (`model_{number of iterations}.pth`) to the outputs folder.

***Start training from default pre-trained model weights***

Ensure the outputs folder is empty. 

***Evaluation only***

Pass the `--eval-only MODEL.WEIGHTS /content/layout-model-training/outputs/last_checkpoint` argument to the `train_annotations.sh` file.


Note: The default model in `train_annotations.sh` is Fast-RCNN with a ResNet-50 backbone and a feature pyramid network (config file: `fast_rcnn_R_50_FPN_3x.yaml`). There is also Mask-RCNN with the same backbone and feature pyramid network (config file: `mask_rcnn_R_50_FPN_3x.yaml `). Mask-RCNN is an instance segmentation model and so you will need a COCO dataset with segmentation masks, or else an attribute error will be returned. You can try other models from the Detectron2 library (https://github.com/facebookresearch/detectron2/tree/main/configs).

Hyperparameters can be adjusted from the configuration files directly. If training diverges, you will likely need to reduce the base learning rate (`BASE_LR` in the config file). Note that I used the hyperparameters from the config files in the cloned repository, which correspond to the default Detectron2 hyperparameters (except the base learning rate for Mask-RCNN which was halved to 0.01 because training diverged with a base learning rate of 0.02). I set the maximum iterations to 100,000 (from the default 60,000) and train all models to this iteration.

I train Fast-RCNN using a subset of the annotations (only from 1853). I train both Faster-RCNN and Mask-RCNN using the full set of annotations.

In [None]:
# Training loop
! bash /content/layout-model-training/scripts/train_annotations.sh

Once you stop training, you **must** save the contents of the outputs folder to your local directory (Colab deletes local files once the runtime is deleted).