# Introdução

Este trabalho tem como objetivo comparar a performance de três rastreadores sobre a categoria de _detection and tracking_ (detecção e rastreamento), [SORT](https://github.com/abewley/sort) (simple online and realtime tracking), Deep SORT e ByteTrack. Os rastreadores serão testados em dois bancos de dados, o [pNEUMA](https://open-traffic.epfl.ch/), dataset contendo videos de 30 minutos de estradas na Grécia capturadas por um conjunto de drones, e um customizado nomeado _city above_, que contém imagens de um drone sobrevoando uma via de carros.

Também serão testados duas redes neurais convolucionais para detecção de objetos, treinadas no dataset [VSAI](https://www.kaggle.com/datasets/dronevision/vsaiv1), sendo elas a _YOLOv7_, que é estado da arte em detecção de objetos e a _YOLOv7-tiny_, versão com menos parâmetros da _[YOLOv7](https://github.com/WongKinYiu/yolov7)_ que sacrifica acurácia por velocidade. As redes foram treinadas 5 vezes, partindo dos pesos do dataset [COCO](https://cocodataset.org/#home), com inputs de tamanho 640x640 pixels.

Uma rodada de testes também foi feita com a _Fast RCNN_, para comparar a rede neural original utilizada nos artigos do _SORT_ e _Deep SORT_.

Todo o código utilizado para gerar os resultados estão disponíveis no [github](https://github.com/samsvp/deep-learning-tracking) e os pesos no [Google Drive](https://drive.google.com/uc?id=1xRwxB8QUFRx6wt5HSEJySCUhbUr3X_j0).

# Metodologia

Para teste dos rastreadores foram utilizados os repositório oficiais do [SORT](https://github.com/abewley/sort), [ByteTrack](https://github.com/ifzhang/ByteTrack) e a seguinte implementação do [Deep SORT](https://github.com/levan92/deep_sort_realtime).

Para os detectores, foram utilizados os repositórios oficias da _[YOLOv7](https://github.com/WongKinYiu/yolov7)_ e a seguinte implementação da _[fastrcnn](https://github.com/sovit-123/fasterrcnn-pytorch-training-pipeline)_. Para a _YOLOv7_ utilizamos os [pesos pré-treinados](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt) do COCO disponíveis no repositório. Os pesos para versão _tiny_ podem ser encontrados [aqui](https://github.com/WongKinYiu/yolov7). Foram treinadas durante 50 epochs 5 instâncias da versão _tiny_ e 5 instâncias da versão full por 100 epochs. Cada instância teve um split diferente de treino e teste, com 5240 imagens de treino e 2315 de teste.

A _fastrcnn_ utilizou os pesos da _fasterrcnn_resnet50_fpn_ como ponto de partida e foi treinada no dataset do VSAI por 10 iterações em uma P100, com od metaparâmetros padrão do repositório, utilizando as mesmas configurações da _YOLO_.

Para o DeepSORT, foi utilizado como feature extractor a MobilenetV2 treinada no dataset da Imagenet.

O dataset utilizado foi o [VSAI](https://www.kaggle.com/datasets/dronevision/vsaiv1), contendo 5240 imagens de treino, 2315 de teste e 1520 de validação. O dataset contém imagens aérias, capturas por um drone em diversas partes da China, de 49,712 veículos, sendo 47,519 veículos de pequeno porte (e.g. carros) e 2193 veículos de grande porte (e.g. ônibus e caminhões). Foi necessário converter as labels do Dataset para o formato da YOLO e da fastrcnn. O dataset já convertido para a YOLO pode ser encontrado no [drive](https://drive.google.com/uc?id=1xRwxB8QUFRx6wt5HSEJySCUhbUr3X_j0). O código para conversão para formato da fastrcnn é o seguinte (rodar dentro do repo da Fast RCNN):

In [65]:
%%script false --no-raise-error

import os
from PIL import Image
import numpy as np
import pandas as pd


def create_xml(data: str, img_w: int, img_h: int, image_path: str) -> str:
    text = f"""
    <annotation>
	<folder></folder>
	<filename>{image_path}</filename>
	<path>{image_path}</path>
	<source>
		<database>VSAI</database>
	</source>
	<size>
		<width>{img_w}</width>
		<height>{img_h}</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
    """
    for line in data.split("\n"):
        if not line: continue
        
        values = [int(float(v)) for v in line.split(" ")[:8]]
        name = line.split(" ")[8]
        xmax = max([values[i] for i in [0, 2, 4, 6]])
        xmin = min([values[i] for i in [0, 2, 4, 6]])
        ymax = max([values[i+1] for i in [0, 2, 4, 6]])
        ymin = min([values[i+1] for i in [0, 2, 4, 6]])
        ymin = max(0, ymin)
        xmin = max(0, xmin)
        text += f"""
        <object>
		<name>{name}</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<occluded>0</occluded>
		<bndbox>
			<xmin>{xmin}</xmin>
			<xmax>{xmax}</xmax>
			<ymin>{ymin}</ymin>
			<ymax>{ymax}</ymax>
		</bndbox>
	</object>
        """
    text += "</annotation>"
    return text

if __name__ == "__main__":
    basedir = "vsaiv1/VSAIv1/split_ss_444_lsv/"
    folders = ["test", "train", "val"]
    for folder in folders:
        if not os.path.exists(f"{folder}"):
            os.mkdir(f"{folder}")
        files = os.listdir(os.path.join(basedir, folder, "images"))
        for i, img_path in enumerate(files):
            full_path = os.path.join(basedir, folder, "images", img_path)
            with open(full_path.replace(".png", ".txt").replace("images", "annfiles")) as fp:
                data = fp.read()
            width, height = Image.open(full_path).size
            
            t = create_xml(data, width, height, full_path)
            with open(f"{folder}/{img_path.replace('.png', '.xml')}", 'w') as fp:
                fp.write(t)
            if i % 200 == 0:
                print(f"[{folder}] {i} out of {len(files)} completed")
    
    print("Done preprocessing")

O arquivo data_configs/custom_data.yaml para treinar é o seguinte:
```yaml
# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: 'vsaiv1/VSAIv1/split_ss_444_lsv/train/images'
TRAIN_DIR_LABELS: 'train'
VALID_DIR_IMAGES: 'vsaiv1/VSAIv1/split_ss_444_lsv/test/images'
VALID_DIR_LABELS: 'test'

# Class names.
CLASSES: [
    '__background__',
    'small-vehicle', 'large-vehicle'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 3

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True
```

Para teste do tracking foram utilizados dois datasets, o primeiro sendo a extensão do [pNEUMA](https://open-traffic.epfl.ch/) que contém mais de 1900 imagens em 4K de veículos vistos de um drone em uma região da Grécia anotadas com id e bounding box de cada veículo. As imagens foram capturadas em aproximadamente 2 quadros por segundo. Devido ao tamanho das imagens, focamos em rodar o rastreamento em uma seção da imagem de 1028x512 pixels. Como esta região não contém objetos que possam gerar oclusão dos veículos na pista (e.g. uma àrvore) um vídeo customizado, a 24FPS, chamado de _city\_above_, com algumas regiões de oclusão foi devidamente anotado.

Para anotar o vídeo customizado, foi criada uma ferramenta para auxiliar a gerar dados no formato de MOT. Dado um arquivo com rastreamentos parciais é possível deletar rastreamentos, anotar boxes manualmente, interpolar posição e tamanho de boxes e juntar rastros segmentados. É possível encontrar o código fonte [neste repo](https://github.com/samsvp/tracker-annotation-helper). No total foram anotados, aproximadamente, 400 quadros.

Os pesos de treino encontram-se disponíveis no [drive](https://drive.google.com/drive/folders/1cxW4QY-Zf4UjMVH_-As_XZzAOU9y03pL?usp=sharing) (Os pesos do YOLO se encontram no arquivo weights.7z e os pesos da Fast RCNN em fastrcnn_best_model.pth)

# Métricas de Avaliação

Para avaliar o rastreamento foi utilizado o pacote [pymotmetrics](https://github.com/cheind/py-motmetrics). As seguintes métricas de avaliação serão utilizadas para comparar os rastreadores e detectores:

- IDP: A precisão da tarefa de detecção e rastreamento, medindo a porcentagem de detecções corretas.
- IDR: O _recall_ da tarefa de detecção e rastreamento, medindo a porcentagem de objetos reais que foram detectados.
- IDF1: A pontuação F1 da tarefa de detecção e rastreamento, combinando IDP e IDR;
    - IDF1 enfatiza a precisão da associação em vez da detecção
- Recall: A proporção de casos positivos reais que foram corretamente identificados como positivos por um modelo;
- Precisão: A proporção de casos positivos previstos que foram corretamente identificados como positivos por um modelo;

- Multiple Object Tracking Accuracy (MOTA): Precisão de Rastreamento de Múltiplos Objetos, medindo o desempenho geral da tarefa de detecção e rastreamento.
    - O MOTA mede três tipos de erros de rastreamento: Falso Positivo (FP), Falso Negativo (FN) e ID Switch (IDSW);
    - O MOTA não inclui uma medida de erro de localização e o desempenho da detecção supera significativamente o desempenho da associação;
- Multiple Object Tracking Precision (MOTP): A Precisão de Rastreamento de Múltiplos Objetos, medindo a sobreposição média entre a verdade básica e as caixas delimitadoras previstas.
    - O MOTP mede a precisão da localização e o MOTP calcula a média da sobreposição entre todas as previsões correspondidas corretamente e sua verdade básica;
    - O MOTP quantifica principalmente a precisão de localização do detector e, portanto, fornece poucas informações sobre o desempenho real do rastreador;

# Rodando os Rastreadores

Como foi utilizado diversas bibliotecas diferentes em python, por vezes com requisitos conflitantes, cada tracker tem um ambiente virtual próprio. Um dockerfile para o programa base também é fornecido para ajudar a gerar as detecções no formato correto.

1. SORT: Ative o ambiente virtual `. sort/bin/activate` e coloque um arquivo com as detecções dentro de `data/val/<name>/det/det.txt`. Em seguida, rode `python3 sort.py --phase val` para rodar o SORT em suas detecções. Os resultados serão salvos em `output/<name>`
2. ByteTrack: Ative o ambiente virtual `. byte-env/bin/activate` e rode `python3 main.py -v <video-base> -f <detecções> -n output/<dataset-name>-<tracker-name>.mot`
3. DeepSort: Ative o ambiente virtual `. deep-sort/bin/activate` e rode `python3 main.py -v <video-base> -f <detecções> -n output/<dataset-name>-<tracker-name>.mot`

Para obter as métricas de avaliações, rode o script `evaluate-all.py`. Ele automaticamente irá pegar os arquivos nas pastas `<tracker>/output` e irá calcular as métricas de avaliação. O output será salvo em `mots/val/`

In [66]:
import os
import pandas as pd

In [67]:
OUTPUTS_DIR = "mots/eval"

files = [os.path.join(OUTPUTS_DIR, f) for f in os.listdir(OUTPUTS_DIR)]
files.sort()
df = pd.DataFrame(columns=[
                         "mota","motp", "num_frames","idf1","idp",
                         "idr","recall","precision",
                         "num_objects","mostly_tracked","partially_tracked",
                         "mostly_lost","num_false_positives","num_misses",
                         "num_switches","num_fragmentations",])
for f in files:
    row = pd.read_csv(f)
    df = pd.concat([df, row])

  df = pd.concat([df, row])


In [68]:
from IPython.display import display
df.index = [f.split('/')[-1] for f in files]
df = df.drop(df.columns[-1], axis=1)
df \
    .sort_values(by=['mota'], ascending=False) \
    .style.format("{:.2f}") \
    .set_sticky(axis="index")

Unnamed: 0,mota,motp,num_frames,idf1,idp,idr,recall,precision,num_objects,mostly_tracked,partially_tracked,mostly_lost,num_false_positives,num_misses,num_switches,num_fragmentations
idm-tracker-pNEUMA10-9-tiny.eval,0.56,0.29,1907.0,0.68,0.73,0.63,0.72,0.83,52790.0,187.0,210.0,25.0,7959.0,14831.0,347.0,2721.0
idm-tracker-pNEUMA15-8-tiny.mot,0.56,0.27,1907.0,0.73,0.78,0.69,0.73,0.81,31654.0,277.0,203.0,70.0,5238.0,8691.0,133.0,912.0
idm-tracker-pNEUMA10-7-tiny.eval,0.56,0.28,1907.0,0.65,0.73,0.59,0.68,0.85,52790.0,152.0,243.0,27.0,6226.0,16810.0,446.0,2656.0
idm-tracker-pNEUMA10-8-tiny.eval,0.55,0.28,1907.0,0.68,0.75,0.62,0.69,0.84,52790.0,174.0,223.0,25.0,6876.0,16345.0,379.0,2568.0
idm-tracker-pNEUMA15-7-tiny.mot,0.55,0.27,1907.0,0.74,0.79,0.69,0.71,0.82,31654.0,275.0,199.0,76.0,4973.0,9166.0,109.0,864.0
idm-tracker-pNEUMA15-10-tiny.mot,0.55,0.27,1907.0,0.74,0.77,0.71,0.73,0.8,31654.0,278.0,204.0,68.0,5745.0,8422.0,122.0,858.0
sort-acc-vel-pNEUMA15_10-tiny-vel.eval,0.52,0.26,1907.0,0.65,0.72,0.59,0.67,0.83,31654.0,233.0,184.0,133.0,4506.0,10339.0,269.0,483.0
idm-tracker-pNEUMA15-11-tiny.mot,0.52,0.28,1907.0,0.73,0.76,0.71,0.73,0.78,31654.0,285.0,196.0,69.0,6447.0,8573.0,111.0,807.0
sort-acc-vel-pNEUMA15_8-tiny-vel.eval,0.52,0.26,1907.0,0.64,0.72,0.58,0.66,0.83,31654.0,226.0,182.0,142.0,4201.0,10654.0,298.0,537.0
sort-pNEUMA10_9-tiny.eval,0.52,0.3,1907.0,0.59,0.66,0.54,0.67,0.83,52790.0,129.0,250.0,43.0,7423.0,17394.0,551.0,1256.0


# Resultados rastreamento

Iremos pegar os dados referentes aos dois datasets (pNEUMA e city_above) e calcular a média e variância das métricas de rastreamento dos modelos treinados

In [69]:
namemap = {
    'mota': 'MOTA', 'motp': 'MOTP','idf1': 'IDF1', 
    'idp': 'IDP', 'idr': 'IDR', 'recall': 'Recall',
    'precision': 'Precision', 'num_objects': 'Num Objects',
    'mostly_tracked': 'Mostly Tracked', 'partially_tracked': 'Partially Tracked',
    'mostly_lost': 'Mostly Lost', 'num_false_positives': 'FP',
    'num_misses': 'FN', 'num_switches': '# Switches', 'num_frames': 'Frames',
    'num_fragmentations': 'Fragmentations', 
}


data_pNEUMA = {}
indexes = []
for tracker in ["idm-tracker", "sort-acc-acc", "sort-acc-vel", "sort-acc-adp", "deepsort", "ByteTrack"]:
    tracker_df = df[df.index.str.startswith(tracker)].sort_values(by='mota', ascending=False)
    for model in ["full", "tiny", "fastrcnn"]:
        tracker_df_model = tracker_df[tracker_df.index.str.contains(model)]

        for n in [2, 6, 9, 10, 13, 15]:
            dataset = f"pNEUMA{n}"
            res = tracker_df_model[tracker_df_model.index.str.contains(dataset)]
            data = {}
            for idx, mean, std in zip(res.columns, res.mean(), res.std()):
                name = namemap[idx]
                data[name] = mean
                data[f"{name}-std"] = std
            
            vals = data_pNEUMA.get(n, []) 
            vals.append(data)
            data_pNEUMA[n] = vals
        indexes.append(f"{tracker}-{model}")



# Tabela de Resultados pNEUMA

In [70]:
for n in [2, 6, 9, 10, 13, 15]:
    df_pNEUMA = pd.DataFrame(data_pNEUMA[n]) \
        .drop(columns=["Frames", "Mostly Lost", "Mostly Lost-std", 
                       "Num Objects", "Frames-std", "Num Objects-std",
                       "Mostly Tracked", "Mostly Tracked-std", 
                       "Partially Tracked", "Partially Tracked-std",
                      ]) \
        .round(2)
    
    df_pNEUMA.index = indexes
    
    __df = df_pNEUMA \
        .dropna() \
        .sort_values(by=['MOTA'], ascending=False) \
        .style.format("{:.2f}") \
        .set_sticky(axis="index") \
        .highlight_max(axis=0, 
                       subset=["IDF1", "IDP", "IDR", "Recall", "Precision", "MOTA"], 
                       props='font-weight: bold') \
        .highlight_min(axis=0,
                       subset=["# Switches", "Fragmentations", "FP", "FN", "MOTP"],
                       props='font-weight: bold')
    print(f"DF {n}")
    display(__df)


DF 2


Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
idm-tracker-tiny,0.26,0.05,0.31,0.01,0.61,0.02,0.64,0.03,0.59,0.02,0.59,0.02,0.64,0.03,2647.0,432.52,3310.75,128.18,5.0,1.83,135.5,9.26
sort-acc-vel-tiny,0.25,0.03,0.32,0.01,0.5,0.01,0.63,0.02,0.41,0.02,0.46,0.01,0.69,0.03,1652.0,288.34,4395.75,102.57,40.75,5.5,91.5,13.0
ByteTrack-tiny,0.2,0.04,0.32,0.01,0.51,0.01,0.63,0.04,0.43,0.01,0.45,0.01,0.66,0.04,1920.25,368.9,4460.0,87.46,58.5,11.12,137.5,17.14
sort-acc-acc-tiny,0.17,0.02,0.31,0.01,0.44,0.01,0.65,0.02,0.33,0.02,0.35,0.01,0.67,0.03,1386.0,262.71,5290.0,113.97,13.5,4.65,64.5,11.39
sort-acc-adp-tiny,0.17,0.02,0.31,0.01,0.44,0.01,0.64,0.03,0.33,0.02,0.35,0.01,0.67,0.03,1404.0,267.07,5283.25,113.89,14.0,4.08,65.5,11.7
deepsort-tiny,0.06,0.08,0.33,0.0,0.46,0.02,0.51,0.04,0.42,0.01,0.44,0.01,0.54,0.05,3068.25,661.91,4486.0,101.83,29.25,4.5,84.75,8.69


DF 6


Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
idm-tracker-tiny,0.44,0.02,0.27,0.0,0.57,0.02,0.62,0.03,0.53,0.01,0.65,0.02,0.76,0.02,11758.0,1605.84,20098.4,1173.72,366.8,85.32,2737.2,323.78
sort-acc-vel-tiny,0.43,0.01,0.25,0.0,0.54,0.02,0.6,0.02,0.49,0.03,0.63,0.02,0.77,0.02,10711.4,1584.11,21263.6,1415.03,592.2,41.48,1112.6,63.72
ByteTrack-tiny,0.41,0.02,0.26,0.0,0.61,0.01,0.66,0.02,0.56,0.02,0.64,0.02,0.75,0.02,12221.0,1962.33,20490.8,1287.09,1131.8,140.08,1463.0,132.18
sort-acc-acc-tiny,0.37,0.01,0.25,0.0,0.53,0.03,0.64,0.03,0.45,0.04,0.54,0.02,0.77,0.02,9529.8,1567.08,26432.2,1432.73,266.2,31.82,723.0,68.19
sort-acc-adp-tiny,0.37,0.01,0.25,0.0,0.53,0.03,0.64,0.03,0.46,0.04,0.54,0.03,0.76,0.02,9685.4,1551.07,26131.8,1437.67,269.4,31.74,762.4,64.29
deepsort-tiny,0.22,0.03,0.3,0.0,0.5,0.02,0.5,0.03,0.49,0.03,0.61,0.02,0.62,0.02,21852.8,2732.41,22209.8,1104.36,532.2,34.81,1167.0,36.82


DF 9


Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
idm-tracker-tiny,0.31,0.05,0.32,0.01,0.62,0.03,0.64,0.04,0.6,0.02,0.62,0.02,0.67,0.03,8741.6,1136.86,10685.4,499.97,84.0,20.87,869.0,119.13
sort-acc-vel-tiny,0.26,0.04,0.33,0.01,0.53,0.02,0.61,0.03,0.47,0.02,0.52,0.02,0.67,0.03,7257.6,939.31,13487.4,453.92,149.8,19.11,490.8,38.08
sort-acc-acc-tiny,0.21,0.04,0.31,0.01,0.49,0.02,0.63,0.04,0.4,0.02,0.42,0.01,0.67,0.04,5901.4,975.23,16531.4,401.95,69.8,12.19,291.4,21.51
sort-acc-adp-tiny,0.2,0.04,0.32,0.01,0.49,0.02,0.63,0.04,0.4,0.02,0.42,0.02,0.66,0.04,6048.0,986.81,16529.2,425.93,66.0,13.62,329.6,18.12
ByteTrack-tiny,0.2,0.05,0.33,0.01,0.51,0.03,0.58,0.04,0.45,0.02,0.49,0.02,0.63,0.04,8059.2,1178.52,14479.2,439.42,228.2,21.99,690.4,30.44
deepsort-tiny,0.03,0.07,0.37,0.0,0.46,0.03,0.48,0.04,0.44,0.02,0.48,0.02,0.52,0.03,12662.0,1704.61,14723.6,478.95,116.6,14.64,581.6,13.41


DF 10


Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
idm-tracker-tiny,0.53,0.03,0.28,0.0,0.65,0.03,0.71,0.04,0.6,0.03,0.69,0.02,0.82,0.03,7918.6,1620.05,16291.0,860.11,414.8,48.67,2867.2,337.16
sort-acc-vel-tiny,0.48,0.04,0.3,0.0,0.55,0.04,0.63,0.05,0.49,0.03,0.64,0.02,0.82,0.04,7553.4,2270.78,19110.0,1269.45,615.6,69.87,1352.2,77.86
ByteTrack-tiny,0.47,0.04,0.3,0.0,0.59,0.03,0.66,0.05,0.54,0.02,0.65,0.02,0.8,0.04,8789.0,2562.22,18331.6,1202.77,989.4,49.08,1941.6,106.89
sort-acc-acc-tiny,0.43,0.04,0.28,0.0,0.54,0.03,0.66,0.06,0.46,0.03,0.56,0.02,0.82,0.04,6585.6,2245.38,23042.8,1177.35,405.4,44.33,1032.6,69.15
sort-acc-adp-tiny,0.43,0.04,0.28,0.0,0.54,0.03,0.67,0.05,0.46,0.03,0.57,0.02,0.82,0.04,6725.2,2249.37,22863.2,1203.3,404.4,41.67,1080.2,72.91
deepsort-tiny,0.28,0.06,0.34,0.0,0.5,0.03,0.51,0.05,0.5,0.02,0.64,0.02,0.65,0.04,18235.2,3500.72,19040.6,1117.59,610.2,52.03,1639.8,64.56


DF 13


Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
idm-tracker-tiny,0.15,0.05,0.31,0.01,0.46,0.03,0.55,0.04,0.4,0.06,0.45,0.05,0.62,0.05,5693.2,1747.03,11006.0,1038.38,156.2,35.76,660.2,33.99
sort-acc-vel-tiny,0.04,0.06,0.3,0.01,0.34,0.03,0.48,0.04,0.27,0.04,0.31,0.05,0.55,0.05,5205.4,1955.05,13784.2,908.75,60.2,4.66,184.4,15.52
ByteTrack-tiny,-0.02,0.07,0.29,0.01,0.3,0.03,0.42,0.04,0.23,0.04,0.27,0.04,0.49,0.05,5769.6,2050.63,14422.8,834.23,142.2,17.46,248.8,18.9
sort-acc-acc-tiny,-0.05,0.07,0.29,0.01,0.23,0.02,0.39,0.05,0.17,0.03,0.2,0.03,0.46,0.05,4846.8,1837.34,15958.4,536.24,26.0,4.3,87.4,11.46
sort-acc-adp-tiny,-0.05,0.07,0.29,0.01,0.23,0.02,0.39,0.05,0.17,0.03,0.2,0.03,0.46,0.05,4857.0,1841.56,15933.4,531.71,25.4,3.97,92.0,13.77
deepsort-tiny,-0.08,0.09,0.32,0.01,0.32,0.02,0.38,0.04,0.27,0.04,0.32,0.04,0.45,0.05,7833.4,2333.76,13607.4,706.44,66.8,9.6,150.6,25.39


DF 15


Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
idm-tracker-tiny,0.53,0.03,0.27,0.01,0.73,0.01,0.77,0.02,0.69,0.01,0.72,0.01,0.79,0.03,5916.0,901.42,8835.4,390.39,119.4,9.71,888.6,73.5
sort-acc-vel-tiny,0.5,0.02,0.27,0.0,0.63,0.01,0.71,0.02,0.57,0.01,0.66,0.01,0.82,0.02,4638.6,749.52,10799.6,359.21,288.0,22.7,521.2,23.82
sort-acc-acc-tiny,0.46,0.02,0.26,0.0,0.64,0.01,0.76,0.03,0.55,0.01,0.6,0.01,0.82,0.02,4127.8,707.03,12809.6,313.0,150.2,20.49,325.6,20.28
sort-acc-adp-tiny,0.46,0.02,0.26,0.0,0.65,0.01,0.76,0.02,0.56,0.01,0.6,0.01,0.82,0.02,4210.4,719.33,12608.2,322.42,140.2,22.66,352.2,20.69
ByteTrack-tiny,0.43,0.03,0.28,0.0,0.62,0.02,0.68,0.03,0.56,0.02,0.65,0.01,0.78,0.03,5868.0,960.15,11193.8,344.09,905.8,114.53,836.2,37.57
deepsort-tiny,0.23,0.04,0.33,0.0,0.56,0.02,0.57,0.03,0.55,0.01,0.6,0.01,0.62,0.03,11554.8,1352.03,12648.4,281.28,216.6,14.72,531.2,26.19


In [81]:
for n in [2, 6, 9, 10, 13, 15]:
    df_pNEUMA = pd.DataFrame(data_pNEUMA[n]) \
        .drop(columns=["Frames", "Mostly Lost", "Mostly Lost-std", 
                       "Num Objects", "Frames-std", "Num Objects-std",
                       "Mostly Tracked", "Mostly Tracked-std", 
                       "Partially Tracked", "Partially Tracked-std",
                       "MOTA-std", "MOTP-std", "IDF1-std", "IDP-std",
                       "IDR-std", "Recall-std", "Precision-std", "FP-std",
                       "FN-std", "# Switches-std", "Fragmentations-std",
                      ]) \
        .round(2)
    df_pNEUMA.index = indexes
    print("N =", n)
    _df = df_pNEUMA \
          .dropna() \
          .sort_values(by=['MOTA'], ascending=False) \
          .style.format("{:.2f}") \
          .set_sticky(axis="index") \
          .highlight_max(axis=0, 
                       subset=["IDF1", "IDP", "IDR", "Recall", "Precision", "MOTA", "MOTP"], 
                       props='font-weight: bold') \
          .highlight_min(axis=0,
                       subset=["# Switches", "Fragmentations", "FP", "FN"],
                       props='font-weight: bold')
    print(_df.to_latex())
    

N = 2
\begin{table}
\thead tr th:nth-child(1)sticky
\tbody tr th:nth-child(1)sticky
\begin{tabular}{lrrrrrrrrrrr}
 & MOTA & MOTP & IDF1 & IDP & IDR & Recall & Precision & FP & FN & # Switches & Fragmentations \\
idm-tracker-tiny & \font-weightbold 0.26 & 0.31 & \font-weightbold 0.61 & 0.64 & \font-weightbold 0.59 & \font-weightbold 0.59 & 0.64 & 2647.00 & \font-weightbold 3310.75 & \font-weightbold 5.00 & 135.50 \\
sort-acc-vel-tiny & 0.25 & 0.32 & 0.50 & 0.63 & 0.41 & 0.46 & \font-weightbold 0.69 & 1652.00 & 4395.75 & 40.75 & 91.50 \\
ByteTrack-tiny & 0.20 & 0.32 & 0.51 & 0.63 & 0.43 & 0.45 & 0.66 & 1920.25 & 4460.00 & 58.50 & 137.50 \\
sort-acc-acc-tiny & 0.17 & 0.31 & 0.44 & \font-weightbold 0.65 & 0.33 & 0.35 & 0.67 & \font-weightbold 1386.00 & 5290.00 & 13.50 & \font-weightbold 64.50 \\
sort-acc-adp-tiny & 0.17 & 0.31 & 0.44 & 0.64 & 0.33 & 0.35 & 0.67 & 1404.00 & 5283.25 & 14.00 & 65.50 \\
deepsort-tiny & 0.06 & \font-weightbold 0.33 & 0.46 & 0.51 & 0.42 & 0.44 & 0.54 & 3068.25 &

In [27]:
HTML("""
<video width="800" controls>
  <source src="videos/pneuma10-tiny-bt.mp4" type="video/mp4">
</video>
""")

In [30]:
HTML("""
<video width="800" controls>
  <source src="videos/pneuma10-tiny-deepsort.mp4" type="video/mp4">
</video>
""")

In [28]:
HTML("""
<video width="800" controls>
  <source src="videos/pneuma10-tiny-sort.mp4" type="video/mp4">
</video>
""")

Notamos que não houve muita diferença entro os modelos tiny e full da YOLOv7. É provavel que o modelo cheio da YOLO ainda não tenha convergido completamente, devido ao número limitado de iterações, enquanto que a versão tiny, por ser menor, esteja em um estado mais avançado de convergência.

Nota-se também que o Deep SORT teve uma performance pior que o SORT e ByteTrack. É provável que os resultados melhorem treinando um feature extractor específico para veículos. Apesar disso, o DeepSORT junto com a YOLO consegue melhores métricas que o SORT e ByteTrack utilizando a Fast RCNN como detector, provavelmente por conta da discrepancia na qualidade das detecções.

O ByteTrack mostrou-se superior ou igual ao SORT em todas métricas, especialmente em trocas de ID e fragmentações de trajetórias. Ele também mostra um desvio padrão mais baixo que os demais algoritmos na maioria das métricas.

O dataset do pNEUMA, apesar de não ter nenhum tipo de oclusão, se mostrou mais desafiador que o dataset da city_above. Tal fator provavelmente vem de seu baixo FPS e da distância da câmera aos veículos que desejamos detectar.

A _fastrcnn_ se saiu bem quando comparado a _YOLO_ na métrica de _recall_ e _IDR_, porém não se saiu tão bem no _MOTA_. É possível notar que sua quantidade de falsos positivos é significantemente maior que ambas as versões do _YOLO_, porém seu número de falsos negativos é significantemente menor. Uma análise estatística mais robusta pode ser feita treinando o modelo mais vezes e durante mais iterações.

É importante notar também que a _fastrcnn_ é significantemente mais lenta que a _YOLO_, levando mais que 1 segundo para processar um frame em uma P100.

Conseguimos ver que, na categoria de detecção e rastreamento, modelos mais leves podem ser utilizados para aplicações em tempo real de forma satisfatória.

# Trabalhos Futuros

É necessário treinar a versão cheia da YOLOv7 e Fast RCNN durante mais iterações para poder julgar de forma mais acurada sua performânce, bem como retreinar a Fast RCNN mais vezes. Outro importante ponto que deve ser levado em conta é o tempo de processamento de um quadro para cada modelo, que é essencial para aplicações em tempo real. Melhores datasets para validação de rastreamento de veículos também podem ser criados, contendo mais oclusões e situações diversas, como, por exemplo, vídeos realizados de noite ou durante tempo nublado e vias que passam em cima da outra.