# Problem Statement

We are a Surveillance and CCTV company. We have a standard CCTV system, and we want to upgrade it into smart CCTVs. We want a solution that enables our cameras to detect human patterns, count objects, provide analysis within a 24-hour cooldown, and correctly detect the proper alert action for the system.

# Data Science Methodology

Let's apply the Data Science Methodology. 

Important note: it is really hard to mimic real-case projects. Below points describe assumptions and differences between any personal project/research and real-case scenario project.

- **Raw Data**: In my personal project, I am going to collect video and image data from more than one resource. However, if I want to be more accurate, I cannot call it raw data. This is because most resources are providing data already labeled (annotated). **Annotation** consume heavy resources from the team. In addition to annotation, ETL process will be missing in my personal project where in real cases data are being stored in certain Data Base where collection requires queries (assuming that there is no platform specifically to the firm purposes). For example, I have worked in a company where the collection of data is through platform, where me as a data scientist did not need any ETL techniques, queries, or communicating with a Data Engineer to help in ETL process. Meaning that, the job of Data Engineering and ETL might be done one time, enabling analysts and scientist to perform ETL process easily and lonely in the future. 

## Surveillance and Alarm Management Business Understanding and Problem Definition

I can conclude that they have CCTVs that are designed for **standard monitoring activities**, where cameras are capturing normal videos. Meaning that video and image data that we should use to train the model contains standard CCTV captures, not advanced videos and images like thermal captures nor medical and microbial captures.

However, it was mentioned in the problem statement that they don't need to detect anomalies only! Stakeholders want also to have smart cameras that provide analysis on daily basis like counting and identifying categories, meaning that we are looking to achieve the following goals:

- To enable cameras accurately detect positive alarms and minimize false and missing alarms. I will discuss more about this point in the model evaluation stage.

- To enable the model to integrate with the CCTV system by upgrading it to intelligent level. Below list shows new features to be added:
    - Provide daily basis counting to objects. This requires to train the model with various categories, so it can correctly identify objects in real time. I will add *New* as feature so whenever a new class that the model does not know, the data scientist then label captures manually and train and new model based on that. This is an iterative process, and it is a part of optimizing the models almost in daily basis.
    - We need to include in the daily analysis dashboards a section for counting patterns. i.e., in addition labeling anomalies, I will introduce normal patterns, for example but not limited to: paying in supermarket, walking, driving/parking, gender prediction, and many many ideas that could lead to insights. In real case, this should be discussed with stakeholders twice per week or minimum once. 




## Data Collection Criteria and Strategy

During my graduate degree I used to implement Computer Vision projects for the purpose of learning Deep Learning algorithms as theory. However, in this project I will use raw data that is not being annotated.

### VIRAT Video Dataset Overview

The VIRAT dataset is a popular benchmark for human and vehicle activity detection in surveillance videos. It includes realistic scenes recorded from various camera angles across different locations in the U.S., with over 40 labeled activities.

Video quality varies—some clips are high-resolution, others low—with people appearing at different sizes and frame rates ranging from 2 to 30 FPS. It's designed to reflect real-world conditions, which makes it great for testing models on practical surveillance data.

Access requires approval via the official VIRAT website. Once granted, the full dataset (videos + annotations) can be downloaded through provided links.

#### Reason for Exclusion

Although the dataset is high quality, I won’t be using it because of its size—over **550 GB**. It’s too large to handle efficiently within my current resources.

#### Alternative: TinyVIRAT Dataset

As a smaller option, I explored **TinyVIRAT**, a scaled-down dataset (~3.6 GB) made for quick experiments. It includes short, low-resolution video clips with multiple action labels, ideal for lightweight testing.

The GitHub repo ([Tiny-VIRAT](https://github.com/UgurDemir/Tiny-VIRAT)) provides helper scripts and documentation. To get the actual videos, you can directly download them from:

- [https://www.crcv.ucf.edu/tiny-actions-challenge-cvpr2021/data/TinyVIRAT-v2.zip](https://www.crcv.ucf.edu/tiny-actions-challenge-cvpr2021/data/TinyVIRAT-v2.zip)

TinyVIRAT is helpful for prototyping but doesn't fully replace the coverage or complexity of the full VIRAT dataset.

#### References

- VIRAT Dataset Official Website: [https://www.viratdata.org/](https://www.viratdata.org/)
- Tiny-VIRAT GitHub: [https://github.com/UgurDemir/Tiny-VIRAT](https://github.com/UgurDemir/Tiny-VIRAT)
- Tiny-VIRAT-v2 Download: [https://www.crcv.ucf.edu/tiny-actions-challenge-cvpr2021/data/TinyVIRAT-v2.zip](https://www.crcv.ucf.edu/tiny-actions-challenge-cvpr2021/data/TinyVIRAT-v2.zip)
- Roboflow Hosted Subset: [https://universe.roboflow.com/radoslaw-kawczak/virat-v2](https://universe.roboflow.com/radoslaw-kawczak/virat-v2)

### Data Management

To manage datasets more effectively, I will store and organize my data using Hugging Face. I plan to create dataset repositories and use Python libraries provided by Hugging Face for upload, and effective version control.















## Model Evaluation - Definitions in the Context of Surveillance and Alarm Management



Confusion Matrix

TP: correct positives, reflects the number of positive cases that been predicted correctly.

In our project, that represents: Number of correctly predicted cases of a real threaten. 

TN: correct negatives, reflects the number of negative cases that been predicted correctly.

In our project, that represents: Number of correctly predicted cases of normal activity (no threaten detected).

FP: Wrong positives, reflects the number of negative cases that been predicted wrongly as positive

In our project, that represents: Number of wrongly predicted cases of normal activities as a real threaten. Which is in alarm management called `False Alarm`

FN : Missed positives, reflects the number of positive cases that been predicted wrongly as negative

In our project, that represents: Number of wrongly predicted cases of real threaten as a normal activity. In the context of alarm management this represents a real threaten that being not predicted. This metric considered the most dangerous one and we want our model to have the most minimum value as possible, where zero cases is the optimum result we must aim to.
