<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preparing-Dataset" data-toc-modified-id="Preparing-Dataset-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Preparing Dataset</a></span></li><li><span><a href="#Data-Annotation" data-toc-modified-id="Data-Annotation-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data Annotation</a></span></li><li><span><a href="#Traning-YOLO-for-Droplet/Intruder-Detection" data-toc-modified-id="Traning-YOLO-for-Droplet/Intruder-Detection-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Traning YOLO for Droplet/Intruder Detection</a></span></li><li><span><a href="#Droplet/Intruder-Tracking" data-toc-modified-id="Droplet/Intruder-Tracking-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Droplet/Intruder Tracking</a></span></li><li><span><a href="#Inspecting-Results-and-Some-Post-Processing" data-toc-modified-id="Inspecting-Results-and-Some-Post-Processing-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Inspecting Results and Some Post-Processing</a></span></li><li><span><a href="#How-Many-Training-Images-Do-I-Need?" data-toc-modified-id="How-Many-Training-Images-Do-I-Need?-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>How Many Training Images Do I Need?</a></span></li><li><span><a href="#Tracking-with-StrongSORT" data-toc-modified-id="Tracking-with-StrongSORT-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Tracking with StrongSORT</a></span></li></ul></div>

In [None]:
import os
import numpy as np
import torch
import pandas as pd
import matplotlib.pyplot as plt
from myutils import GetVideoInfo
from myutils import CaptureFrames
from myutils import TrackDroplet
from myutils import TrainValidTestSplit
from myutils import OptimumTrainImages
from myutils import TrackMultipleExperiments
%matplotlib inline

## Preparing Dataset

- Some global variables we will keep using in this tutorial are below. We will demonstrate everything on the walking droplet experiment with 3 walkers. You should be able to train your own model by following the same steps for your own experiment


- **To create your own model using this tutorial, make sure to create your own folder instead of working on "tutorial_data" folder. Otherwise, your results will overwrite on the actual repository data**.

In [None]:
#project to be conducted
project_name = "three_droplet"

#root_dir for all tutorial data
project_dir = "tutorial_data"

#images to be used for YOLO model
raw_image_dir = "tutorial_data/raw_images/"

#image/label pairs for train/valid/test
data_dir = "tutorial_data/annotations/"

#all experiment videos are here
video_root_dir = "datasets/videos/"

#three-droplet experiment video
video_path = "datasets/videos/three_droplet.mp4"

- Assuming you have the experiment video, first step is to save some sample frames from the video source. It is always a good idea to capture only the relevant components of the experiment such as experiment corral and walkers etc. Thus you may want to crop your video before creating training data. You can use [Avidemux](https://avidemux.sourceforge.net/download.html0) which is an amazing online tool. 


- First call **"GetVideoInfo"** helper function to get an idea about the video source.

In [None]:
video_path = "datasets/videos/three_droplet.mp4"
GetVideoInfo(video_path)

- Based on the info above, decide how many frames to be captured. For example, we aim for 180 frames in total, we need to capture one frame in every (total_frames/180)/frame_rate = 1.7 second. 


- **"CaptureFrames"** function captures one frame in every *save_interval* second and save to *image_dir*.

In [None]:
CaptureFrames(video_path=video_path,image_dir = raw_image_dir,save_interval=1.7)

- We will now create a traning/validation/testing dataset from these frame. The most common ratio is 70/20/10. 


- **TrainValidTestSplit** helper function does this job. This function first creates a folder structure for train/valid/test data as 

        root_dir/X/images and root_dir/X/labels. 

- We then get images from *image_dir* and save to these folders accordingly based on the provided ratios.

In [None]:
TrainValidTestSplit(image_dir=raw_image_dir, 
                    root_dir=data_dir,train_ratio=0.7, valid_ratio=0.2, test_ratio=0.1)

## Data Annotation

- Now, we are ready to annotate the all the images to create training, validation and testing data we will use for training our YOLO model. To do so we will be using free online annotation tool **LabelImg**. 


- It is super easy to use, here is a quick tutorial [here](https://www.youtube.com/watch?v=VsZvT69Ssbs). Make sure to switch YOLO format at the beginning. Annotate each images in train,valid and test images in *root_dir/X/images* folders and save them to respective *root_dir/X/labels* folder. You can simply use "droplet" as a default label name in the app.


- Just run the following cell to access its user interface. In the app, zoom-in to droplets to create high quality bounding boxes. We have already done this before. Inspect the folders inside each directory. Notice that each text file in *"labels"* folder has exactly the same name with its corresponding image file in *"images"* folder.


- If you have multiple experiments to carry out, check out Sec-6 in this notebook to get an idea about approximately how many training images you should use. You can save up quite a bit annotation time with that approach 

In [None]:
%run -i labelImg/labelImg.py

## Traning YOLO for Droplet/Intruder Detection

- Now we are ready for model traning. First of all, find the *"tutorial_data.yml"* file in *"yolov5/custom_data"* folder and change the path variable to the directory including our train/valid/test folders. Without properly setting up this file, YOLO cannot access our data.


- Once we are done, run the following cell to train your model. It would be a better practice to run it from terminal but it does the job anyway.


- Briefly, it will train "yolov5s" architecture for 100 epochs using Adam optimizer on our data and save the results in *"project_dir/project_name"* folder. 


- By inspecting *mAP50* values, we can easily see the model is learning very quickly as values near 1 points to perfect performance. 


- **The best model will be located at *"project_dir/project_name/weights/best.pt"* . Training results can be found at *"project_dir/project_name/results.csv"***. 

In [None]:
data = "yolov5/custom_data/tutorial_data.yml"
epoch = 150
project_dir = project_dir
optimizer = "Adam"
batch_size = 32 #default YOLO
os.system(f"python  yolov5/train.py --data {data} --weights yolov5/yolov5s.pt \
          --epoch {epoch} --optimizer {optimizer} --batch-size {batch_size} \
          --project {project_dir} --name {project_name} \
          --cache --exist-ok --seed 0")

- Now let's test the best model on our test data. "mAP50" value looks pretty good. it indicates that we will most likely obtain very high detection rate when we process the actual experiment video. This comes in the next section. 

- **If *mAP50* value on testing data is way below 0.90, that means the model is did not learn enough thus fails to generalize to unseen data. Based on our experience, first start by increasing the the number of epochs to a high number say 400. If the behaviour is the same, increasing the number of training images may help. Add 20-30 images/labels to the training data. You can also increase the batch_size**. 

In [None]:
model_path = f"{project_dir}/{project_name}/weights/best.pt"
os.system(f"python yolov5/val.py --data {data}  --weights {model_path} --task test")

## Droplet/Intruder Tracking

- Once the model is trained, you will find the best model at *"tutorial_data/myproject/weights/best.pt"* as noted above.


- To visualize and save the tracking results, simply modify the following cell. **TrackDroplet** returns number of detected frames, total frames and a dataframe including positions, velocity, confidence scores for each individual droplet/intruder in the video source. This dataframe is saved to *"save_dir"* as a *"save_name.csv"* in **real time**. Ignore if you get "QObject::moveToThread" error. It also saves the tracking video in the same directory.


- Make sure to properly enter "number of objects(droplets/intruders) and video_path, model_path etc. If you spot any false positives, try increasing the threshold slightly. You can always interrupt the simulation by pressing "q" on your keyboard. All information is saved in real-time.

In [None]:
#number of droplet(s)/intruder(s)
nd = 3

#best YOLO Pytorch model path
model_path = f"{project_dir}/{project_name}/weights/best.pt"
model = torch.hub.load('ultralytics/yolov5', 'custom', path=model_path)

#accept if only all detections are above this thresold
conf_thresold = 0.45


#False: show only bounding box, True: show trajectory
show_trace = True

detected,total_frame,df = TrackDroplet(model= model,conf_thresold=conf_thresold,nd=nd,
                                    video_path=video_path, save_dir=project_dir,
                                    save_name=project_name, show_trace=show_trace)

- If you have a model trained for multiple experiments, you can also those experiments all at once using **"TrackMultipleExperiments"** functions. As outlined in the paper **"best_droplet.pt"** is the model we trained for all droplet experiments. The code below shows how to use it to track multiple experiments. At the end, we save frame detection rates for each experiment to save_dir/name.csv. You can always interrupt the simulation by pressing "q" on your keyboard.



- Make sure the keys in exp_dict is exactly the same with the video names in *video_root_dir* and provide the number of droplet/intruder associated with the corresponding video source.

In [None]:
#experiment names and object number(s)
exp_dict = {'control':1, 'lights_off':1}

#root directory having all the videos
video_root_dir = video_root_dir

#best YOLO Pytorch model path
model = torch.hub.load('ultralytics/yolov5', 'custom', path="best_droplet.pt")


#detect above this value
conf_thresold = 0.45

#save dir
save_dir = project_dir


#save name for dataframe
name = "frame_detection_rates"

#True shows the trajectory
show_trace = False

#track exps in exp_dict
TrackMultipleExperiments(exp_dict=exp_dict,video_root_dir=video_root_dir,model=model,
                         conf_thresold=conf_thresold,
                         save_dir=save_dir, name=name, show_trace=False
                         )

## Inspecting Results and Some Post-Processing

- Let's inspect the results regarding our original experiment.We will start by loading the dataframe we saved. Or you can directly use it as it is return by  **TrackDroplet** function. This was the major goal of this tutorial. You can analyze this data in a way you wish. 


- **frame_id, time, x, y, c, dx,dy speed** columns refers to the frame number, time stamp(sec), x-position,y-position,x-velocity,y-velocity and speed of each individual droplet/intruder tracked. 



- For example, using the plot function below, we can overlay the position and the flow of the object.

In [None]:
df = pd.read_csv(f"{project_dir}/{project_name}.csv")
df.head()

In [None]:
def GetDynamics(df,nd):    
    xlist = [f"x{i}" for i in range(1,nd+1)]
    dxlist = [f"dx{i}" for i in range(1,nd+1)]

    ylist = [f"y{i}" for i in range(1,nd+1)]
    dylist = [f"dy{i}" for i in range(1,nd+1)]

    slist = [f"speed{i}" for i in range(1,nd+1)]

    #get stuff column-wise
    t = df["time"].to_numpy()
    xc = df[xlist].to_numpy()
    dx = df[dxlist].to_numpy()

    yc = df[ylist].to_numpy()
    dy = df[dylist].to_numpy()

    speed = df[slist].to_numpy()
    
    return t, xc, yc, dx, dy, speed

def PlotFlow(nd,df,sample_interval=5):
    _, xc, yc, _, _, _ = GetDynamics(df,nd)
    fig,ax = plt.subplots(3,1,figsize=(8,18),sharex=True)
    for i in range(nd):
        X = xc[:,i]
        Y = yc[:,i]
        x = X[0:-1:sample_interval]
        y = Y[0:-1:sample_interval]
        dx = np.diff(x)
        dy = np.diff(y)
        dx1 = np.append(dx, 0)
        dy1 = -np.append(dy, 0)
        ax[i].plot(X, Y, linestyle='-', color='tomato')
        ax[i].quiver(x, y, dx1, dy1, color='blue', units='width')
        ax[i].plot(x[0], y[0], 'ks', label='initial point', markersize=15)
        ax[i].plot(x[-1], y[-1], 'ko', label='terminal point', markersize=15)
        ax[i].invert_yaxis()
        ax[i].legend(loc='upper left',labelspacing = 1.5)
    plt.show()
    
#restart and run again if you dont see images. YOLO has a tiny conflict with  plt.show()
%matplotlib inline
PlotFlow(nd=3,df=df)

## How Many Training Images Do I Need?

- We cannot give a definitive answer this question but we can make a quick experiment. 


- We have a folder "data_dir = tutorial_data/annotations", including all of our training data. We are uncertain as to whether the amount of training images is adequate to create an effective model, or if it is excessive. The later item is important if we are to repeat similar experiments multiple times. So the idea is the following;


- Train the model with increasing number of training images by keeping 70%/20% train/valid ratio but keep the same number of testing images. Then monitor mAP scores vs number of training images. Let's say we have 70/20/10 train/valid/test images in our original dataset. We can train the model with 20/6/10, 40/12/10, 60/18/10 partitions and check how mAPs are changing for each case.


- We are looking for some sort of asymptotic behaviour of  mAP0.5. We can then pick the number of images slightly after asymptotic behaviour started. This should give us a good estimate on the optimal number of images for our dataset so that we do not need to spend huge amount of time to annotate lots of images.  


- To  much talking, **OptimumTrainImages** should do the job. Pick *max_image_number* images as an upper bound. This should not exceed the number of training images you initially prepared. We will test each case between *start_image_num* and final_imag_num with *num_interval* intervals. Pick a reasonable *epoch* number for each cycle. Recall that this is not the actual training, we would like to get a quick estimate. Thus 50 should do the job.. Outcome is  a dataframe in the *"project_dir"* with columns *num_train_image,mAP@[0.5],mAP@[0.5..0.95]*. The function will give you a plot anyway but you can use that file as well. This simulation should take around 25 minutes.


- The final plot indicates that we should get pretty much the same performance just by using slightly more than 60 images.

In [None]:
data_dir = data_dir
project_dir = project_dir
max_image_number = 120
start_image_num = 5
final_imag_num = max_image_number
num_interval = 5
epoch  = 50
save_name = "test_scores"
OptimumTrainImages(data_dir=data_dir,project_dir=project_dir,
                   max_image_number=max_image_number,start_image_num=start_image_num,
                   final_image_num=final_imag_num, 
                   num_interval=num_interval,epoch=epoch,save_name=save_name)


In [None]:
df = pd.read_csv(f"{project_dir}/{save_name}.csv")
num_train = df['num_train_image']
mAP_05 = df['mAP@0.5']
mAP_0595 = df['mAP@0.5..0.95']
fig,ax = plt.subplots(figsize=(8,6))
ax.plot(num_train, mAP_05, label='mAP@0.5', linestyle='--',marker='o')
ax.plot(num_train, mAP_0595, label='mAP@0.5:0.95',marker='o')
ax.set_xlabel('#training images')
ax.set_ylabel('score')
ax.grid(True)
ax.legend()
plt.show()

## Tracking with StrongSORT

- We also discussed tracking with StrongSORT in our paper. Once the YOLO model is ready, we can simply run the following cell with the usual arguments. Notice that StrongSORT suffers from multiple ID switches. 



- This notebook is just a demonstration, we highly recommend running the following command from terminal using as it will be way remarkably faster.

        python Yolov5_StrongSORT_OSNet/track_erdi.py --yolo-weights "best_droplet.pt" --source "datasets/videos/three_droplet.mp4" --conf-thres 0.45 --show-vid --config-strongsort "Yolov5_StrongSORT_OSNet/strong_sort/configs/strong_sort.yaml"


        
- Tracking video and tracks in MOT format will be saved to *project_dir/project_name* folder.

In [None]:
model_path = f"{project_dir}/{project_name}/weights/best.pt"
video_dir = video_path
conf_thresold = 0.45
project_dir = project_dir
sort_dir = "SORT_tracks"

os.system(f"python Yolov5_StrongSORT_OSNet/track.py --yolo-weights {model_path} \
      --source {video_dir} --conf-thres {conf_thresold} \
      --project {project_dir} --name {sort_dir} \
      --show-vid --save-txt --save-vid \
      --config-strongsort Yolov5_StrongSORT_OSNet/strong_sort/configs/strong_sort.yaml\
      ")

                                               THANK YOU FOR CHECKING OUT!