# Exploratory Data Analysis for Incident Detection System

In this notebook, we will perform exploratory data analysis (EDA) on the dataset used for training the YOLOv8 model for incident detection. The goal is to visualize and understand the characteristics of the dataset, including the distribution of incidents, types of incidents, and any other relevant features.

In [None]:
# Import necessary libraries
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

# Load the annotations data
annotations_path = '../data/annotations/'
annotations_files = [f for f in os.listdir(annotations_path) if f.endswith('.csv')]

# Combine all annotation files into a single DataFrame
annotations = pd.concat([pd.read_csv(os.path.join(annotations_path, f)) for f in annotations_files], ignore_index=True)

# Display the first few rows of the annotations DataFrame
annotations.head()

In [None]:
# Visualize the distribution of incidents
plt.figure(figsize=(12, 6))
sns.countplot(data=annotations, x='incident_type')
plt.title('Distribution of Incident Types')
plt.xlabel('Incident Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

In [None]:
# Analyze the duration of incidents
plt.figure(figsize=(12, 6))
sns.histplot(annotations['duration'], bins=30, kde=True)
plt.title('Distribution of Incident Durations')
plt.xlabel('Duration (seconds)')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Check for missing values in the dataset
missing_values = annotations.isnull().sum()
missing_values[missing_values > 0]

## Conclusion

In this exploratory analysis, we visualized the distribution of incident types and durations, and checked for missing values in the dataset. This analysis will help inform the training process and model evaluation for the YOLOv8 incident detection system.