* **[Phase 1 - Data Collection and Visualization](#phase1)**
   - [Imports & Installations](#import)
   - [Data Loading](#step01)   
   - [Data Parsing & Cleaning](#step02)  
   - [Data Flattening](#step03)  
   - [Exporting Data to Excel](#step04)  

* **[Phase 2 - Data Storage and Processing Pipeline](#phase2)**
* **[Phase 3 - Final Conclusions and Model Development](#phase3)**


<a id='phase1'></a>
## **Phase 1 - Data Collection and Visualization**

<a id='import'></a>
### **Imports & Installations**

In [21]:
# !pip install kagglehub
# !pip install openpyxl

In [22]:
import xml.etree.ElementTree as ET
import kagglehub
import os
import pandas as pd

<a id='step01'></a>
### **Data Loading**

In [23]:
path_to_dataset = "content"
"""
If the dataset is not downloaded to your local machine you can use the following
commented command:
"""
# path_to_dataset = kagglehub.dataset_download("andrewmvd/car-plate-detection")

annotation_folder = os.path.join(path_to_dataset, "annotations")
image_folder = os.path.join(path_to_dataset, "images")

<a id='step02'></a>
### **Data Parsing & Cleaning**

In [28]:
def parse_xml(xml_path):
    tree = ET.parse(xml_path)
    root = tree.getroot()
    
    size = {
        'width': root.find('size/width').text,
        'height': root.find('size/height').text
    }
        
    objects = [{
            'bndbox': {
                'xmin': object.find('bndbox/xmin').text,
                'ymin': object.find('bndbox/ymin').text,
                'xmax': object.find('bndbox/xmax').text,
                'ymax': object.find('bndbox/ymax').text,
            }
        } for object in root.findall('object')]
           
    extracted_info = {
        'folder': root.find('folder').text,
        'filename': root.find('filename').text,
        'size': size,
        'objects': objects
    }
    
    return extracted_info 


<a id='step03'></a>
### **Data Flattening**

In [29]:
records = []
for file in os.listdir(annotation_folder):
    if file.endswith(".xml"):
        data = parse_xml(os.path.join(annotation_folder, file))
        for obj in data["objects"]:
            row = {
                "filename": data["filename"],
                "folder": data["folder"],
                "width": int(data["size"]["width"]),
                "height": int(data["size"]["height"]),
                "xmin": int(obj["bndbox"]["xmin"]),
                "ymin": int(obj["bndbox"]["ymin"]),
                "xmax": int(obj["bndbox"]["xmax"]),
                "ymax": int(obj["bndbox"]["ymax"])
            }
            records.append(row)


<a id='step04'></a>
### **Exporting Data to Excel**

In [30]:
df = pd.DataFrame(records)
excel_path = "car_plate_annotations.xlsx"
df.to_excel(excel_path, index=False)

print(f"Data exported successfully to {excel_path}")

Data exported successfully to car_plate_annotations.xlsx


<a id='phase2'></a>
## **Phase 2 - Data Storage and Processing Pipeline**

<a id='phase3'></a>
## **Phase 3 - Final Conclusions and Model Development**