# Introdução

Este trabalho tem como objetivo comparar a performance de três rastreadores sobre a categoria de _detection and tracking_ (detecção e rastreamento), [SORT](https://github.com/abewley/sort) (simple online and realtime tracking), Deep SORT e ByteTrack. Os rastreadores serão testados em dois bancos de dados, o [pNEUMA](https://open-traffic.epfl.ch/), dataset contendo videos de 30 minutos de estradas na Grécia capturadas por um conjunto de drones, e um customizado nomeado _city above_, que contém imagens de um drone sobrevoando uma via de carros.

Também serão testados duas redes neurais convolucionais para detecção de objetos, treinadas no dataset [VSAI](https://www.kaggle.com/datasets/dronevision/vsaiv1), sendo elas a _YOLOv7_, que é estado da arte em detecção de objetos e a _YOLOv7-tiny_, versão com menos parâmetros da _[YOLOv7](https://github.com/WongKinYiu/yolov7)_ que sacrifica acurácia por velocidade. As redes foram treinadas 5 vezes, partindo dos pesos do dataset [COCO](https://cocodataset.org/#home).

Uma rodada de testes também foi feita com a _fastrcnn_, para comparar a rede neural original utilizada nos artigos do _SORT_ e _Deep SORT_

In [137]:
import os
import pandas as pd

In [152]:
OUTPUTS_DIR = "mots/eval"

files = [os.path.join(OUTPUTS_DIR, f) for f in os.listdir(OUTPUTS_DIR)]
files.sort()
df = pd.DataFrame(columns=[
                         "mota","motp", "num_frames","idf1","idp",
                         "idr","recall","precision",
                         "num_objects","mostly_tracked","partially_tracked",
                         "mostly_lost","num_false_positives","num_misses",
                         "num_switches","num_fragmentations",])
for f in files:
    row = pd.read_csv(f)
    df = pd.concat([df, row])

In [153]:
df.index = [f.split('/')[-1] for f in files]
df = df.drop(df.columns[-1], axis=1)
df

Unnamed: 0,mota,motp,num_frames,idf1,idp,idr,recall,precision,num_objects,mostly_tracked,partially_tracked,mostly_lost,num_false_positives,num_misses,num_switches,num_fragmentations
ByteTrack-city_10-tiny.eval,0.741336,0.265613,317,0.846943,0.828234,0.866286,0.895100,0.856000,7531,49,8,8,1134,790,24,117
ByteTrack-city_11-tiny.eval,0.780507,0.286773,317,0.881699,0.889445,0.873855,0.882220,0.898202,7531,43,13,9,753,887,13,132
ByteTrack-city_5-full.eval,0.760324,0.233030,317,0.863771,0.973198,0.776258,0.779445,0.977519,7531,37,20,8,135,1661,9,113
ByteTrack-city_6-full.eval,0.725534,0.254789,317,0.848454,0.943577,0.770548,0.771212,0.944697,7531,34,19,12,340,1723,4,114
ByteTrack-city_7-full.eval,0.754481,0.240069,317,0.866025,0.942491,0.800823,0.802284,0.944505,7531,39,16,10,355,1489,5,107
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
sort-pNEUMA10_7-tiny.eval,0.501440,0.294939,1907,0.549876,0.653034,0.474863,0.619947,0.852554,52790,93,283,46,5660,20063,596,1314
sort-pNEUMA10_8-full.eval,0.382876,0.290757,1907,0.494872,0.641932,0.402633,0.509699,0.812630,52790,71,273,78,6204,25883,491,1063
sort-pNEUMA10_8-tiny.eval,0.497822,0.297413,1907,0.566864,0.667300,0.492707,0.623338,0.844220,52790,108,270,44,6072,19884,554,1346
sort-pNEUMA10_9-full.eval,0.408543,0.283686,1907,0.543189,0.628071,0.478519,0.590453,0.774988,52790,100,263,59,9050,21620,553,1172


# Resultados rastreamento

Iremos pegar os dados referentes aos dois datasets (pNEUMA e city_above) e calcular a média e variância das métricas de rastreamento dos modelos treinados

In [154]:
namemap = {
    'mota': 'MOTA', 'motp': 'MOTP','idf1': 'IDF1', 
    'idp': 'IDP', 'idr': 'IDR', 'recall': 'Recall',
    'precision': 'Precision', 'num_objects': 'Num Objects',
    'mostly_tracked': 'Mostly Tracked', 'partially_tracked': 'Partially Tracked',
    'mostly_lost': 'Mostly Lost', 'num_false_positives': 'FP',
    'num_misses': 'FN', 'num_switches': '# Switches', 'num_frames': 'Frames',
    'num_fragmentations': 'Fragmentations', 
}

data_city = []
data_pNEUMA = []
indexes = []
for tracker in ["sort", "deep-sort", "ByteTrack"]:
    tracker_df = df[df.index.str.contains(tracker)].sort_values(by='mota', ascending=False)
    for model in ["full", "tiny", "fastrcnn"]:
        tracker_df_model = tracker_df[tracker_df.index.str.contains(model)]
        
        for dataset in ["pNEUMA", "city"]:
            res = tracker_df_model[tracker_df_model.index.str.contains(dataset)]
            data = {}
            for idx, mean, std in zip(res.columns, res.mean(), res.std()):
                name = namemap[idx]
                data[name] = mean
                data[f"{name}-std"] = std
            
            if dataset == "city":
                data_city.append(data)
            else:
                data_pNEUMA.append(data)
        indexes.append(f"{tracker}-{model}")

df_pNEUMA = pd.DataFrame(data_pNEUMA) \
    .drop(columns=["Frames", "Mostly Lost", "Mostly Lost-std", 
                   "Num Objects", "Frames-std", "Num Objects-std",
                   "Mostly Tracked", "Mostly Tracked-std", 
                   "Partially Tracked", "Partially Tracked-std",
                  ]) \
    .round(2)

df_city = pd.DataFrame(data_city) \
    .drop(columns=["Frames", "Mostly Lost", "Mostly Lost-std", 
                   "Num Objects", "Frames-std", "Num Objects-std",
                   "Mostly Tracked", "Mostly Tracked-std", 
                   "Partially Tracked", "Partially Tracked-std",]) \
    .round(2)

df_pNEUMA.index = indexes
df_city.index = indexes

# Tabela de Resultados pNEUMA

In [155]:
df_pNEUMA \
    .style.format("{:.2f}") \
    .set_sticky(axis="index") \
    .highlight_max(axis=0, 
                   subset=["IDF1", "IDP", "IDR", "Recall", "Precision", "MOTA", "MOTP"], 
                   props='font-weight: bold') \
    .highlight_min(axis=0,
                   subset=["# Switches", "Fragmentations", "FP", "FN"],
                   props='font-weight: bold')


Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
sort-full,0.19,0.22,0.28,0.01,0.27,0.25,0.35,0.29,0.22,0.21,0.4,0.15,0.8,0.03,5379.2,2237.58,31831.9,8104.67,5553.9,5347.52,3787.8,2815.16
sort-tiny,0.23,0.26,0.29,0.01,0.29,0.27,0.35,0.29,0.26,0.24,0.48,0.17,0.82,0.03,5751.4,2539.49,27370.1,8794.58,7298.1,7103.95,4660.5,3496.19
sort-fastrcnn,0.08,0.44,0.28,0.01,0.32,0.4,0.31,0.38,0.33,0.42,0.54,0.28,0.62,0.09,16201.5,2363.86,24193.0,14938.34,8210.0,10900.76,5253.0,6054.25
deep-sort-full,-0.01,0.02,0.27,0.0,0.04,0.0,0.08,0.01,0.03,0.0,0.25,0.03,0.79,0.03,3521.6,877.93,39340.2,1458.84,10560.0,1298.3,6436.0,542.03
deep-sort-tiny,-0.01,0.02,0.28,0.0,0.04,0.0,0.07,0.01,0.03,0.0,0.33,0.03,0.82,0.03,3949.4,1111.66,35630.2,1356.72,13980.6,1379.93,7968.8,366.43
deep-sort-fastrcnn,-0.24,,0.28,,0.04,,0.05,,0.03,,0.34,,0.55,,14530.0,,34756.0,,15918.0,,9534.0,
ByteTrack-full,0.36,0.05,0.29,0.0,0.48,0.04,0.64,0.04,0.38,0.04,0.48,0.05,0.82,0.03,5803.2,1253.59,27250.4,2483.67,746.4,55.38,1258.0,78.39
ByteTrack-tiny,0.47,0.04,0.3,0.0,0.59,0.03,0.66,0.05,0.54,0.02,0.65,0.02,0.8,0.04,8789.0,2562.22,18331.6,1202.77,989.4,49.08,1941.6,106.89
ByteTrack-fastrcnn,0.28,,0.29,,0.61,,0.55,,0.68,,0.77,,0.62,,24335.0,,12369.0,,1192.0,,1360.0,


# Tabela de Resultados City Above

In [156]:
df_city \
    .style.format("{:.2f}") \
    .set_sticky(axis="index") \
    .highlight_max(axis=0, 
                   subset=["IDF1", "IDP", "IDR", "Recall", "Precision", "MOTA", "MOTP"], 
                   props='font-weight: bold') \
    .highlight_min(axis=0,
                   subset=["# Switches", "Fragmentations"],
                   props='font-weight: bold')

Unnamed: 0,MOTA,MOTA-std,MOTP,MOTP-std,IDF1,IDF1-std,IDP,IDP-std,IDR,IDR-std,Recall,Recall-std,Precision,Precision-std,FP,FP-std,FN,FN-std,# Switches,# Switches-std,Fragmentations,Fragmentations-std
sort-full,0.39,0.38,0.25,0.01,0.46,0.39,0.51,0.41,0.42,0.36,0.67,0.14,0.9,0.05,530.0,187.0,2503.6,1078.15,1555.5,1610.17,527.2,477.54
sort-tiny,0.36,0.44,0.27,0.01,0.46,0.4,0.46,0.39,0.46,0.41,0.73,0.18,0.83,0.05,1050.8,208.57,2031.7,1359.23,1757.7,1820.17,638.1,579.89
sort-fastrcnn,0.21,0.66,0.24,0.0,0.43,0.54,0.41,0.49,0.46,0.58,0.67,0.34,0.68,0.16,2108.5,490.02,2473.0,2544.17,1401.5,1936.77,750.5,965.2
deep-sort-full,0.03,0.01,0.25,0.01,0.1,0.01,0.12,0.0,0.08,0.01,0.53,0.02,0.85,0.01,688.6,71.81,3515.2,148.94,3082.2,79.19,980.0,19.66
deep-sort-tiny,-0.06,0.02,0.27,0.01,0.08,0.0,0.09,0.0,0.07,0.0,0.56,0.02,0.78,0.02,1165.2,135.35,3315.4,182.57,3480.6,182.4,1187.6,40.46
deep-sort-fastrcnn,-0.26,,0.24,,0.05,,0.06,,0.04,,0.43,,0.57,,2455.0,,4272.0,,2771.0,,1433.0,
ByteTrack-full,0.75,0.03,0.25,0.01,0.86,0.02,0.95,0.01,0.79,0.03,0.79,0.03,0.95,0.01,302.6,94.8,1587.8,192.34,7.2,2.59,106.2,7.73
ByteTrack-tiny,0.75,0.03,0.27,0.01,0.86,0.02,0.85,0.03,0.87,0.01,0.89,0.01,0.87,0.03,1026.4,266.81,816.0,47.88,18.4,4.04,113.2,13.95
ByteTrack-fastrcnn,0.48,,0.25,,0.75,,0.65,,0.88,,0.91,,0.68,,3227.0,,645.0,,48.0,,89.0,


Notamos que não houve muita diferença entro os modelos tiny e full da YOLOv7. É provavel que o modelo cheio da YOLO ainda não tenha convergido completamente, devido ao número limitado de iterações, enquanto que a versão tiny, por ser menor, esteja em um estado mais avançado de convergência.

Nota-se também que o Deep SORT teve uma performance relativamente ruim, quando comparado ao SORT e ByteTrack. É provável que essa performânce tenha sido causada por uma má convergência da rede Siamesa utilizada como feature extractor e é necessário treinar uma nova rede ou reimplementar o algoritmo de _Deep SORT_.

O ByteTrack mostrou-se superior ou igual ao SORT em todas métricas, especialmente em trocas de ID e fragmentações de trajetórias. Ele também mostra um desvio padrão mais baixo que os demais algoritmos na maioria das métricas.

O dataset do pNEUMA, apesar de não ter nenhum tipo de oclusão, se mostrou mais desafiador que o dataset da city_above. Tal fator provavelmente vem de seu baixo FPS e da distância da câmera aos veículos que desejamos detectar.

A _fastrcnn_ se saiu bem quando comparado a _YOLO_ na métrica de _recall_ e _IDR_, porém não se saiu tão bem no _MOTA_. É possível notar que sua quantidade de falsos positivos é significantemente maior que ambas as versões do _YOLO_, porém seu número de falsos negativos é significantemente menor. Uma análise estatística mais robusta pode ser feita treinando o modelo mais vezes e durante mais iterações.

É importante notar também que a _fastrcnn_ é significantemente mais lenta que a _YOLO_, levando mais que 1 segundo para processar um frame em uma P100