#### Imports

In [4]:
from src.data_utils import *
import pandas as pd


# Data Exploration

In this section, we will explore the data that we will be working with throughout the project. This will form the foundation of our approach to addressing the task at hand.

## Ground truth file paths

In [1]:
gt_train_1_path = 'data/1_train-val_1min_aalesund_from_start/gt/gt.txt'
gt_train_2_path = 'data/2_train-val_1min_after_goal/gt/gt.txt'
gt_test_3_path = 'data/3_test_1min_hamkam_from_start/gt/gt.txt'

#### To analyse the ground truth data, we will transform them to csv format. This will be done using the **convert_gt_to_csv** function in the **src/utils.py** file.

In [5]:
gt1_csv_path = 'data/1_train-val_1min_aalesund_from_start/gt/gt.csv' 
gt2_csv_path = 'data/2_train-val_1min_after_goal/gt/gt.csv'
gt3_csv_path = 'data/3_test_1min_hamkam_from_start/gt/gt.csv'

convert_gt_to_csv(gt_train_1_path, gt1_csv_path)

convert_gt_to_csv(gt_train_2_path, gt2_csv_path)

convert_gt_to_csv(gt_test_3_path, gt3_csv_path)

Converted data has been written to data/1_train-val_1min_aalesund_from_start/gt/gt.csv
Converted data has been written to data/2_train-val_1min_after_goal/gt/gt.csv
Converted data has been written to data/3_test_1min_hamkam_from_start/gt/gt.csv


To analyse the data, we will load them into pandas dataframes. We will then display the first few rows of the dataframes to get a sense of the data.

In [None]:
pd_train_csv_1 = pd.read_csv(gt1_csv_path)
pd_train_csv_2 = pd.read_csv(gt2_csv_path)
pd_test_csv_3 = pd.read_csv(gt3_csv_path)

display('Head of ground truth data for test set 1', pd_train_csv_1.head())
display('Head of ground truth data for test set 2', pd_train_csv_2.head())
display('Head of ground truth data for test set 3', pd_test_csv_3.head())

'Head of ground truth data for test set 1'

Unnamed: 0,frame_id,object_id,x,y,width,height,unknown1,player/ball,unknown2
0,1,1,1018.0,517.0,39.0,79.0,1,2,1.0
1,1,2,175.0,568.0,42.0,90.0,1,2,1.0
2,1,3,921.0,601.0,42.0,84.0,1,2,1.0
3,1,4,562.0,551.0,38.0,82.0,1,2,1.0
4,1,5,1659.0,795.0,49.0,111.0,1,2,1.0


'Head of ground truth data for test set 2'

Unnamed: 0,frame_id,object_id,x,y,width,height,unknown1,player/ball,unknown2
0,1,1,324.0,271.0,37.0,70.0,1,2,1.0
1,1,2,647.0,523.0,33.0,94.0,1,2,1.0
2,1,3,1628.0,253.0,26.0,70.0,1,2,1.0
3,1,4,440.0,299.0,33.0,69.0,1,2,1.0
4,1,5,1542.0,615.0,29.0,92.0,1,2,1.0


'Head of ground truth data for test set 3'

Unnamed: 0,frame_id,object_id,x,y,width,height,unknown1,player/ball,unknown2
0,1,1,1350.0,325.0,20.0,78.0,1,2,1.0
1,1,2,764.0,855.0,66.0,112.0,1,2,1.0
2,1,3,1649.0,371.0,35.0,86.0,1,2,1.0
3,1,4,50.0,232.0,28.0,68.0,1,2,1.0
4,1,5,797.0,519.0,32.0,106.0,1,2,1.0


The data all looks like it is in the same format. We will continue to analyse the different columns in the dataframes to understand the data better.

In [None]:
summary_train_csv_1 = dataframe_summary(pd_train_csv_1)
summary_train_csv_2 = dataframe_summary(pd_train_csv_2)
summary_test_csv_3 = dataframe_summary(pd_test_csv_3)

display('Summary of ground truth data for test set 1', summary_train_csv_1)
display('Summary of ground truth data for test set 2', summary_train_csv_2)
display('Summary of ground truth data for test set 3', summary_test_csv_3)

'Summary of ground truth data for test set 1'

Unnamed: 0,Min,Mean,Mode,Unique Values
frame_id,1.0,900.639238,433.0,1802
object_id,1.0,12.146735,1.0,25
x,0.0,888.886917,813.0,2469
y,40.0,360.236755,305.0,1525
width,4.7,28.465113,23.0,583
height,4.49,62.264168,50.0,633
unknown1,1.0,1.0,1.0,1
player/ball,1.0,1.969367,2.0,2
unknown2,1.0,1.0,1.0,1


'Summary of ground truth data for test set 2'

Unnamed: 0,Min,Mean,Mode,Unique Values
frame_id,1.0,897.897951,892.0,1802
object_id,1.0,11.870911,1.0,25
x,-1.0,908.151442,1044.0,2279
y,0.0,393.445757,322.0,1378
width,5.0,29.095123,24.0,318
height,6.0,67.709166,53.0,441
unknown1,1.0,1.0,1.0,1
player/ball,1.0,1.972544,2.0,2
unknown2,1.0,1.0,1.0,1


'Summary of ground truth data for test set 3'

Unnamed: 0,Min,Mean,Mode,Unique Values
frame_id,1.0,914.376209,402.0,1802
object_id,1.0,12.429797,1.0,26
x,-3.0,913.568779,0.0,2985
y,25.3,425.077927,362.0,1933
width,5.0,30.125108,23.0,865
height,6.83,60.229825,61.0,886
unknown1,1.0,1.0,1.0,1
player/ball,1.0,1.966534,2.0,2
unknown2,1.0,1.0,1.0,1


We can also visualise the ground truth data to get a better understanding.

In [7]:
image_train_path_1 = 'data/1_train-val_1min_aalesund_from_start/img1'
image_train_path_2 = 'data/2_train-val_1min_after_goal/img1'
image_test_path_3 = 'data/3_test_1min_hamkam_from_start/img1'

annotated_image_path_1 = 'data/1_train-val_1min_aalesund_from_start/annotated_img1'
annotated_image_path_2 = 'data/2_train-val_1min_after_goal/annotated_img1'
annotated_image_path_3 = 'data/3_test_1min_hamkam_from_start/annotated_img1'


In [None]:

annotate_images(image_train_path_1, gt1_csv_path, annotated_image_path_1)

annotate_images(image_train_path_2, gt2_csv_path, annotated_image_path_2)

annotate_images(image_test_path_3, gt3_csv_path, annotated_image_path_3)

Image train data Aalesund:

![](data/1_train-val_1min_aalesund_from_start/annotated_img1/annotated_000001.jpg)

Image train data after goal: 

![](data/2_train-val_1min_after_goal/annotated_img1/annotated_000001.jpg)

Image test data hamkam:

![](data/3_test_1min_hamkam_from_start/annotated_img1/annotated_000001.jpg)

## Image dimensions
This function outputs the unuque image dimensions in the dataset.

In [8]:
image_train_dimensions_1 = get_image_dimensions(image_train_path_1)
image_train_dimensions_2 = get_image_dimensions(image_train_path_2)
image_test_dimensions_3 = get_image_dimensions(image_test_path_3)

display('Image dimensions for test set 1', image_train_dimensions_1)
display('Image dimensions for test set 2', image_train_dimensions_2)
display('Image dimensions for test set 3', image_test_dimensions_3)

'Image dimensions for test set 1'

{(1920, 1080)}

'Image dimensions for test set 2'

{(1920, 1080)}

'Image dimensions for test set 3'

{(1920, 1080)}