# Visualisation of collected data

## 1. Introduction

The purpose if this notebook is to conduct statistics about our gathered data before proceeding to trainig the detector models. This step of visualising the statistics of our dataset is crucial for any machine learning model process, as a model is only as good as its data. 

This process will help us clarify annotation correctness, balance of dataset images, and structural properties of the dataset. This will also uncover any class imbalances, annotation errors, or wrongly bounded boxes and scales.

During the collection of the images used in the dataset, all necessary factors where taken into consideration. These factors include:
- `Lighting`-
- `Angle`-
- `Quality`-


#### Import Libraries and load Annotations

In [None]:
import json
import random
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

from pycocotools.coco import COCO
from pathlib import Path
import skimage.io as io

sns.set_theme(style="whitegrid")

data_dir = Path("Outputs")
coco = COCO(data_dir / "train.json")



## 2. Dataset Overview

In [None]:
print(f"Number of images: {len(coco.imgs)}")
print(f"Number of annotations: {len(coco.anns)}")
print(f"Number of classes: {len(coco.cats)}")

## 3. Class Distribution

Plotting of distribution of instances across all 6 traffic sign classes.

In [None]:
cat_ids = coco.getCatIds()
cats = coco.loadCats(cat_ids)
cat_names = [cat['name'] for cat in cats]

cat_counts = {}
for cat_id in cat_ids:
    ann_ids = coco.getAnnIds(catIds=cat_id)
    cat_counts[coco.loadCats(cat_id)[0]['name']] = len(ann_ids)

plt.figure(figsize=(10,5))
sns.barplot(x=list(cat_counts.keys()), y=list(cat_counts.values()))
plt.title("Number of Instances per Class")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.show()


total = sum(cat_counts.values())
cat_percentages = {k: v/total*100 for k,v in cat_counts.items()}

plt.figure(figsize=(10,5))
sns.barplot(x=list(cat_percentages.keys()), y=list(cat_percentages.values()))
plt.title("Percentage of Instances per Class")
plt.ylabel("Percentage (%)")
plt.xticks(rotation=45)
plt.show()



## 4. Attribute Distribution


Plotting of distrivution of instances of the 4 attributes.

In [None]:
from collections import Counter

viewing_angles = []
mounting_types = []

for ann in coco.anns.values():
    attrs = ann.get('attributes', {})
    if 'Viewing Angle' in attrs:
        viewing_angles.append(attrs['Viewing Angle'])
    if 'Mounting Type' in attrs:
        mounting_types.append(attrs['Mounting Type'])

va_counts = Counter(viewing_angles)
mt_counts = Counter(mounting_types)

plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
sns.barplot(x=list(va_counts.keys()), y=list(va_counts.values()))
plt.title("Viewing Angle Distribution")

plt.subplot(1,2,2)
sns.barplot(x=list(mt_counts.keys()), y=list(mt_counts.values()))
plt.title("Mounting Type Distribution")

plt.show()


## 5. Analysing Bounding Boxes

In [None]:
widths, heights, areas = [], [], []

for ann in coco.anns.values():
    w, h = ann['bbox'][2], ann['bbox'][3]
    widths.append(w)
    heights.append(h)
    areas.append(w * h)

plt.figure(figsize=(6,6))
plt.scatter(widths, heights, alpha=0.4)
plt.xlabel("Bounding Box Width (pixels)")
plt.ylabel("Bounding Box Height (pixels)")
plt.title("Bounding Box Size Distribution")
plt.show()

plt.figure(figsize=(6,4))
sns.histplot(areas, bins=50)
plt.title("Bounding Box Area Distribution")
plt.xlabel("Area (pixelsÂ²)")
plt.show()


## 6. Visual Manual Check

In [None]:
import random
fig, axs = plt.subplots(2,2, figsize=(10,10))

for ax in axs.flatten():
    img_id = random.choice(img_ids)
    img_data = coco.loadImgs(img_id)[0]
    ann_ids = coco.getAnnIds(imgIds=img_data['id'])
    anns = coco.loadAnns(ann_ids)
    I = io.imread(f"Outputs/images/{img_data['file_name']}")
    ax.imshow(I)
    coco.showAnns(anns, draw_bbox=True)
    ax.set_title(img_data['file_name'])
    ax.axis('off')

plt.show()


## 7. Analysing and Discussing Visualisation Results


Thius notebook   DONT FORGET THISSS!!!!