  
  
This script extracts the image from the .zip file downloaded from Planet; prepares the images creating the .png image from the .tif file allowing for the manual annotation wit LabelMe; segments the labelled image; describe that segmented image for an evaluation of their relevance to train the model; culls them to suit the model training requirements; and finally apply the model training. 
  
  

#### 1. 
We define the path of the project, and refer the correct python scripts containing the functions that are automatically called when running dependencies. 

In [None]:
#| code-fold: show

import sys
import os
import yaml

# Add the project root to sys.path (adjust as needed)
sys.path.append(os.path.abspath("Waterholes_project/WaterholeDetection_UN-Handbook"))


#### 2. 
If we were to follow exactly the procedure and download the zip file from Planet, we would need to use the extract_zip function as demostrated in the following code section. It would extract the file called "composite.tif" obtained from the planet order and downloaded into the zip file in our "raw_images" folder. It will render a tif file and will rename it with the date_aoi.tif outside the zip file.   
However, for this tutorial, we provide in the link from the "Getting started" section to download the example image. This image is in the .tif format and will be manipulated further in the section n°3. 

In [None]:
import os
import yaml
import counting_wh.wh_utils.planet_utils

# Define the path to your zip file
zip_path = "images/raw_images" 
#AFUN: should this path be training/raw_images? 
#If so, need to be careful where we download the image zip file. Might need to adaot this. 

# Run extraction
counting_wh.wh_utils.planet_utils.extract_zip(zip_path)

#### 3. 
We will now transform the example raw tif image into a usable png for future steps. It creates a padded png image to exactly match the size dividable by the stride and tile size.   
Caution that you need to define the config file called "config_train_Drive_UN" as instructed to make sure it matches your paths and runs everything smoothly. 

In [None]:
#| code-fold: show
 
import os
import yaml
import counting_wh.train

#Run preparation of the tif files into png and renamed the tif. 
counting_wh.train.prepare("config_train_Drive_UN.yaml")
 

Doing gdal work...
Done with gdal work for 20240101_mimal_test.tif
New Width:  14976 New Height:  12064
Processed 1/1 images


#### 4. 
Once the .png image is created, as we are training the model, we need to label the training image. In order to do so, you need to use LabelMe.  
LabelMe is called in your terminal, manually typing "labelme".   
XXXXXXX insert image of label me opening, to make sure they can open it via the terminal? 

You have to then annotate the waterholes on your padded png image. This creates at the end a .json file with all my bounding boxes definitions. Those labels will be needed in the next step to segment the image and the corresponding labels for training purposes. 
XXXXXXXX insert gif of manual label me annotation?


#### 5. 

Once the whole manual annotation is done, save the outputs, and come back to this script to run the segmentation of the padded png image you just labelled.

In [None]:
#| code-fold: show

import os
import yaml
import counting_wh.train

#segment the png images
counting_wh.train.segment("config_train_Drive_UN.yaml", train_val_split=0.8)


#### 6. 
After the segmentation, we evaluate the result of the segmentation and production of material to train the model using the "train.describe" function. 
Run the bellow cell to describe the results of segmented images. 

In [None]:
import sys
import os
import yaml
import counting_wh.train

#describe the created segmented images: 
counting_wh.train.describe("config_train_Drive_UN.yaml")


Config path: D:\Waterholes_project\counting_waterholes\training_v4\output\images


KeyboardInterrupt: 

#### 7. 
Before proceeding to the training of the model, we need to apply the cull command which will remove images with no labels until 10% of the training set has no labels. This is a recommended procedure from Yolo to maintain efficient model training parameters. This has to be done post segmentation as we don't know prior the the amount.  

In [None]:
import sys
import os
import yaml
import random
import shutil
from pathlib import Path 
import counting_wh.train

#describe the created segmented images: 
counting_wh.train.cull_AF("config_train_Drive_UN.yaml")


Analyzing label files in: D:\Waterholes_project\counting_waterholes\training_v4\output\labels\train
Looking for corresponding images in: D:\Waterholes_project\counting_waterholes\training_v4\output\images\train
Total label files found: 13565
Empty label files found: 6063
Non-empty label files: 7502
Moving 5230 empty label files to maintain 10% ratio

--- SUMMARY ---
Total label files moved: 5230
Total image files moved: 5230
Remaining total label files: 8335
Remaining empty label files: 833
Empty labels now make up 9.99% of the dataset
Empty labels moved to: D:\Waterholes_project\counting_waterholes\training_v4\output\labels\moved_empty_labels
Corresponding images moved to: D:\Waterholes_project\counting_waterholes\training_v4\output\images\moved_empty_images

SUCCESS: Empty labels now make up 10% or less of the dataset.


#### 8. 
Now that we have less than 10% of the images unlabelled, we can train the model, but first let's reorganise the folders to be properly used in the model. 

In [None]:
import sys
import os
import yaml
import counting_wh.train

counting_wh.train.reorganize_folders("config_train_Drive_UN.yaml")

Copying from D:\Waterholes_project\counting_waterholes\training_v4\output\images\val to D:\Waterholes_project\counting_waterholes\training_v4\train\val\images
Successfully copied to D:\Waterholes_project\counting_waterholes\training_v4\train\val\images
Copying from D:\Waterholes_project\counting_waterholes\training_v4\output\images\train to D:\Waterholes_project\counting_waterholes\training_v4\train\train\images
Successfully copied to D:\Waterholes_project\counting_waterholes\training_v4\train\train\images
Copying from D:\Waterholes_project\counting_waterholes\training_v4\output\labels\val to D:\Waterholes_project\counting_waterholes\training_v4\train\val\labels
Successfully copied to D:\Waterholes_project\counting_waterholes\training_v4\train\val\labels
Copying from D:\Waterholes_project\counting_waterholes\training_v4\output\labels\train to D:\Waterholes_project\counting_waterholes\training_v4\train\train\labels
Successfully copied to D:\Waterholes_project\counting_waterholes\trainin

#### 9. 
The following code provides you with the line of code to run in yolo in order to train a neural network model. Before doing so, modifiy the directory of the yolo model training path in the config file called "config_train_Drive_UN.yaml". 

Then, run this code which provides you with the command to excecute in the cmd terminal of the Yolov5.  
XXXXXXXXXXXXXXX insert screenshot of the yoloy terminal and what it looks like when it's training? Maybe a screen shot and a gif when it trains?   

In [None]:
#| code-fold: show

import sys
import os
import yaml
import counting_wh.train

#describe the created segmented images: 
counting_wh.train.train("config_train_Drive_UN.yaml")

[93mpython C:/Users/fossatia/Documents/Waterholes_project/yolov5/train.py --device cuda:0 --img 416 --batch 8 --workers 6 --epochs 100 --data config_train_Drive.yaml --weights C:/Users/fossatia/Documents/Waterholes_project/yolov5/runs/train/exp3/weights/best.pt --save-period 50[0m


Comment from AF (08.09): Need to run it and see what it actually prints. Because I ran with the line bellow to train. So need to make sure what we advise to the user as well in term of batch size, img, workers epochs etc... 

In [None]:
python train.py --workers 2 --img 416 --batch 8 --epochs 150 --data config_train_Drive_UN.yaml --weights yolov5s.pt --cache disk

So it actually seems that it runs automatically with the function train but we do not see any progression... 
Prefer for now to run it manually in the cmd of the yolov5 folder.  

Changed the cache to the SSD drive we are using as we are limited in the storage available locally. 
Needed to set the project to the SSD which saves the outputs and doesn't increase the C: storage usage. 

Need to actually create the D:/temp and D:/yolo_run on your Drive or external directory 

In [None]:
set TMPDIR=D:/temp
set TEMP=D:/temp
set TMP=D:/temp
set KMP_DUPLICATE_LIB_OK=TRUE 
python C:/Users/fossatia/Documents/Waterholes_project/yolov5/train.py --device cuda:0 --img 416 --batch 4 --workers 2 --epochs 50 --data C:\Users\fossatia\Documents\Waterholes_project\counting_waterholes\config_train_Drive.yaml --weights C:/Users/fossatia/Documents/Waterholes_project/yolov5/runs/train/exp3/weights/best.pt --cache False --project D:/yolo_runs

The set KMP_DUPLICATE_LIB_OK=TRUE is not recommended on the error command... I tried to google it and it seems we should force an install of the Nomkl using 'conda install nomkl --channel conda-forge'. 
However, by doing so, dependencies might be altered. To be checked. 

End of this script. 