# Training a segmentation model for layout recognition with YALTAi and YOLOv8

This example has been used to produce an object classification model for images of books printed around the 16th - 18th centuries. The aim is to facilitate the process of image occlusion, while preserving the semantic value of the layout.

Resources :
- [**YALTAi**](https://github.com/ponteineptique/yaltai) : CLI for kraken engine aptation with YOLOv8 API
- [**Yolov8**](https://docs.ultralytics.com/) : Deep learning vision engin for object detection and image segmention. Provides by ultralytics API
- [**SegmOnto**](https://segmonto.github.io/) : Controlled vocabulary for the description of printed and manuscript books to describe content and layout

Dataset :
- Gallic(orpor)a : [link](https://github.com/Gallicorpora)
- FoNDUE : [link](https://github.com/FoNDUE-HTR)
- SETAF-Pierre-de-Vingle : [link](https://github.com/SETAFDH/HTR-SETAF-Pierre-de-Vingle)

*The computations were performed at University of Geneva using Baobab HPC service.*

## Producing classification vision model

### Prepare dataset

You need to group your XML-ALTO data in a single folder so that you can transform it into the YOLO format used by the Ultralytics API. Check that your zone names are consistent, so as not to multiply the number of classes, thereby reducing model quality. Ideally, your dataset should be structured as follows: `dataset/[BookId]/[IdImg]/`.

In [1]:
dataset = 'PATH/TO/DATA/'

For virtual environnement dedicated to yaltai:
`pip install yaltai`

In [2]:
# convert alto data to yolo data

!yaltai alto-to-yolo PATH/TO/ALTOorPAGE/*.xml my-dataset --shuffle .1 --segmonto region

/bin/bash: line 1: yaltai: command not found


You'll find your new dataset, converted to YOLO format, in my-dataset or under the name you specified.

It may be necessary to modify certain parameters if you move your dataset to exercise the model, for example on a calculation server. You need to change the values of `train` and `val` to indicate the new absolute path within the `config.yml`.

###  Training a model

You can use the YALTAi library CLI to facilitate model training. In our case, we've chosen to use the ultralytics API directly, to go a step further in refining the model.

If you wish to use this method, we recommend that you install ultralytics in a new environment to ensure that you have the latest updates.

`pip install ultralytics`

#### Dashboard

You can use the *Comet* dashboard to track your training progress. The implementation is native to the ultralytics API. All you need to do is install the library.
Here, we indicate the various pieces of information when instantiating a class, although it is possible to import this information as an environment variable.
For documentation, please refer to https://www.comet.com/docs/v2/

`pip install comet-ml`

In [None]:
import comet_ml

comet_ml.init(api_key='API-KEY', project_name='NOM_PROJET', workspace='PSEUDO')

#### TRAINING

Here are the parameters we used to produce the model.
To do this, we used the following resources:
- GPU : 2 RTX Titan with 24 Gb RAM
- CPU : 12 Cores
- RAM : 25 Gb

time : 08:55:47

Pour les utilisateur du serveur de calcul de l'HPC de l'Université de Genève, vous pouvez retrouver l'exemple de script de lancement `SBATCH` ici : [FoNDUE](https://github.com/FoNDUE-HTR/Documentation/blob/master/CLUSTERS.md) 

In [None]:
import os
import torch
from ultralytics import YOLO

# empty CUDA cache
torch.cuda.empty_cache()

# load pretrained model
model = YOLO('yolov8x.pt') #it's huge model, you can use other little model if you want.
model.to('cuda')

# get config path
dataset_path = os.getcwd() + dataset + 'config.yml'


#train
model.train(data=dataset_path, 
                      epochs=300, 
                      patience=150, 
                      imgsz=896, #image size pixel
                      batch=32, # it's big batch depending to your GPU Ram capacity
                      cache=True, # RAM caching
                      device=[0,1]) # for multiprocessing GPU, you can use only one GPU. You need only to put number device (generally 0)

# VAL/PREDICT/Benchmark

Once you have completed your training, you can retrieve most of your models at the following address: `runs/detect/train/weights/`. 
Unless you've saved by iteration, you'll have two models. The `best.pt`, which corresponds to the best performing model according to ultralytics (based on the results of the metrics and your loss curve). The second, `last.pt`, corresponds to the model from the last epoch.

In [1]:
!ls models/

best.pt


In [None]:
from ultralytics import YOLO

#path_model = ".PATH/TO/WEIGHTS/best.pt"
path_model = "models/best.pt"

model = YOLO(path_model)
model.to('cuda') # to use GPU
model.info()
#model.fuse()    #Fuse PyTorch Conv2d and BatchNorm2d layers. This improves inference time and therefore execution time. 
#These two layers are generally the cause of high RAM usage in the case of large batches. Its use remains situational.

## Eval

You can directly evaluate the quality of your model using its `val()` class method, by entering your annotated dataset in the data parameter. You can reuse your initial dataset to give results on the evaluation data. It's best to use a third-party dataset for testing, so you can quickly visualize biases and, in particular, the risk of overfitting your model.
If the plots option is enabled, visualizations will be saved as `runs/detect/valn`.

In [None]:
dataset = 'PATH/TO/DATASET/config.yml'

metrics = model.val(data = '/home/rayondemiel/Grand_Siecle/yolov8-testing/data/dataset-GallicorporaXVIXVIIIxFonduexSETAF/config.yml', plots=True)  # no arguments needed, dataset and settings remembered
metrics.box.map    # map50-95
metrics.box.map50  # map50
metrics.box.map75  # map75
metrics.box.maps

To better understand the results:
- `Class`: objects to be detected, in this case Segmonto zones
- `Images`: number of images used for evaluation
- `Box(P)`: Accuracy of your class relative to a box. Simply put, has the model succeeded in determining the correct class for this zone?
- `R`: Recall your class. This is used to determine whether your model has missed too many zones in its identification, thereby distorting the accuracy rate.
- `mAP50`: Mean Average Precision with a confidence level of 50. It evaluates the ability of an object detection model to accurately locate objects in an image when the prediction confidence is equal to or greater than 50%.
- `mAP50-95`: Confidence threshold between 50 and 95. It indicates that the model is capable of accurately detecting objects over a wide range of confidence thresholds.

## Predict

Script for making predictions and visualizing model capacity from web images, especially iiif.

In [None]:
import requests
from io import BytesIO
from PIL import Image

def get_img(url: str):
    """Web image download function"""
    # download image
    response = requests.get(url)
    # open image with PIL library
    image = Image.open(BytesIO(response.content))
    return image

##### One image

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

img = get_img("https://gallica.bnf.fr/iiif/ark:/12148/bpt6k97850416/f11/full/,640/0/native.jpg")
results = model(img) # Prediction

# Print Result
for r in results:
    im_array = r.plot(conf=True)  # plot a BGR numpy array of predictions
    im = Image.fromarray(im_array[..., ::-1])  # RGB PIL image
    plt.imshow(im)
    plt.show()

##### Multiple images

In [None]:
img = get_img("https://gallica.bnf.fr/iiif/ark:/12148/bpt6k9767960q/f24/full/,640/0/native.jpg")
img1 = get_img("https://gallica.bnf.fr/iiif/ark:/12148/bpt6k9767960q/f11/full/,640/0/native.jpg")
img2 = get_img("https://gallica.bnf.fr/iiif/ark:/12148/bpt6k9767960q/f28/full/,640/0/native.jpg")
img3 = get_img("https://gallica.bnf.fr/iiif/ark:/12148/bpt6k9767960q/f24/full/,640/0/native.jpg")

# Run inference on 'bus.jpg'
results = model([img, img1, img2, img3])  # results list
#results[0].boxes.data  # to check tensor

# Show the results
for r in results:
    im_array = r.plot(conf=True)  # plot a BGR numpy array of predictions
    im = Image.fromarray(im_array[..., ::-1])  # RGB PIL image
    plt.imshow(im)
    plt.show()
    #im.save('results.jpg')  # save image

## EXPORT ONNX

Convert the model to ONNX (Open Neural Network Exchange) format, making it interoperable.
You can then easily reprocess your model to further improve its predictions, notably in terms of execution time, using [Nvidia TensorRT](https://developer.nvidia.com/tensorrt).

In [None]:
model.export(format="onnx")