# Road-Pole/Stick Detection on Edge Devices

a research paper by Sindre L. Øyen

---
## 1 Introduction
In this paper, I will investigate the feasibility of using edge devices for pole detection, applying modern solutions for Visual Intelligence (VI). First, the data preparation phase will be explained, where the data set is visualized and discussed. Second, the model selection phase will be documented, discussing available options weighing them against one another based on performance and accuracy. Third, a model will be trained on the dataset and monitored based on performance metrics. Hence, three final sections will document (1) the testing and evaluation phase based on relevant metrics, (2) the deployment feasibility of the model on e.g., an edge device, and (3) the sustainability of the training and inference of the developed model will be discussed with examples and comparisons of energy consumption.

Road poles, commonly referred to as "brøytestikker" in Norwegian, are central in navigation guidance during the winter - especially for giving guidance to plowers as to where the road edge is positioned on overly snowy roads. Recently, in the realm of Cooperative, Connected and Automated Mobility (CCAM) (i.e., self-driving vehicles) research, studies investigate the feasibility of using VI technology to detect road poles and more efficiently navigate Norwegian roads under diverse conditions. Thus, road poles may have an ever-growing importance in the Norwegian infrastructure and the dataset collection which on the larger scale is the motivation of this research paper.

The end goal is to produce an accurate model that has the potential to be ran with real-time inference on a hardware-restricted edge-device, such as a smartphone.

---
## 2 Data preparation


### 2.1 NAPLab LiDAR Image Dataset

NAPLab has collected and annotated a dataset of LiDAR images of poles which has been made available for this project. As mentioned in the repository README, this dataset is unavailable for redistribution. However, with access to the NTNU Idun cluster the dataset is made found under `/cluster/projects/vc/data/ad/open/Poles`. 

#### 2.1.1 Discussion of the NAPLab Dataset

The dataset contains annotated data with one class as the basis for detection, namely `pole`, which refers to a road pole. While writing this paper, the dataset comprises a total of 2 259 images, divided into a train and a validation split of 1 809 and 450 images, respectively. In the figure below, an example image from the dataset is linked. The dataset has a total of 4 230 annotations, with an average of 1.9 annotations per image; indicating the expected result that road poles come in pairs, with some exceptions. The images are of median image ratio `1024 x 128` and are thus of an ultra-wide format. The training dataset contains 3 378 annotations and the validation dataset contains 852. Currently, no test set is present.

<img src="assets/roadpoles_example.png" alt="An example of an entry in the dataset" width="1000"/>

###### © NAPLab 

Notably, as aforementioned and as visible in the example, the dataset is based on LiDAR images of poles. On one side, this reduces the feasibility of a potential result being used on e.g., a smartphone and increments the hardware cost for real-time inference. Modern smartphones such as the newer iPhone Pro and Pro Max models do have LiDAR sensors but with a limited reach of around 5 meters On the other side, LiDAR images, while being costly, are more available in the varying lighting conditions found in Norway. Nonetheless, the exploration in this project will be based on the LiDAR images in this dataset, evaluated on the potential performance on a hypothetical hardware-restrained edge device.

#### 2.1.2 The Train/Val Split in this Paper

Seeing that the dataset does not comprise a test dataset, the training dataset will be utilized to create a new train/validation split. The original validation dataset will be used for comparing the end result to other solutions, effectively acting as the test dataset.

#### 2.1.3 Bibtex Reference to the Dataset

```biblatex
    @misc{
        polpolpol-pol-det_dataset,
        title = { polpolpol pol det Dataset },
        type = { Open Source Dataset },
        author = { DURGA PRASAD },
        howpublished = { \url{ https://universe.roboflow.com/durga-prasad-hmjq7/polpolpol-pol-det } },
        url = { https://universe.roboflow.com/durga-prasad-hmjq7/polpolpol-pol-det },
        journal = { Roboflow Universe },
        publisher = { Roboflow },
        year = { 2024 },
        month = { oct },
        note = { visited on 2024-11-23 },
    }
```

---
## 3 Model Selection

As the end goal is to produce an accurate model that can be performant for real-time inference on a hardware-restricted edge device, the goal is to use a lightweight model. In my previous exploratory study in the TDT17 Visual Intelligence course, I stipulated the following comparative overview of performance metrics of lightweight Object Detection (OD) models:

<img src="assets/yolo-comparison.png" alt="A comparison between YOLO models and EfficientDet" width="1000"/>

###### Overview by Sindre Øyen, based on data from Ultralytics and EfficientDet (Google)

The table above highlights the performance of different real-time Object Detection (OD) models on the Microsoft Common Objects in Context (COCO) dataset, which often serves as a benchmark for evaluating OD models. EfficientDet-D1 has low computational requirements while maintaining reasonable accuracy, with a reasonable precision-to-computation for edge-device deployment. However, its accuracy (mAP) is only slightly better than the smallest YOLO model, YOLO11n. For tasks requiring higher precision, EfficientDet models have rapidly increasing computational needs for the larger models (e.g., EfficientDet-D2, D3), effectively making EfficientDet less ideal for edge-device deployment.

On the other side, the YOLO11 series has a slower increase in computational demands for increased model size. Among the lightweight models, YOLO11s represents the best precision-to-computation trade-off - with superior precision to e.g., the EfficientDet-D3 model with 14% less floating point operations per second (FLOPs). Moreover, the YOLO11 series is specifically designed for real-time usage, with optimized inference speeds that are essential for edge deployment. YOLO11s, in particular, has both a compact model architecture and a solid detection performance and is thus a good starting point for this project, focusing on achieving reliable, real-time pole detection on edge-devices.

In summary, the hypothesis is that YOLO11s will deliver the best precision-to-efficiency ratio for this use case, emphasizing effective detection of road poles while also considering compatibility with the computational limits of edge-devices. Further exploration may refine this initial choice of model, but YOLO11s proves a strong foundation based on benchmarked results and architectural requirements.

---
## 4 Training

In [None]:
import os, sys

# Path to dataset
dataset_path = os.path.join("..", "data", "Poles")
data_yaml = os.path.join(dataset_path, "data.yaml")

# Path to model with pre-trained weights
pretrained_model = os.path.join("..", "models", "pre-trained", "yolo11s.pt")

In [8]:
from ultralytics import YOLO

ImportError: generic_type: type "_InterpolationType" is already registered!

In [None]:
pretrained = YOLO(pretrained_model)

result = pretrained.train(data=data_yaml, epochs=100, imgsz=[1024,128])

ImportError: generic_type: type "_InterpolationType" is already registered!

---

## 5 Testing and Evaluation

---
## 6 Methods / Models

--- 
## 7 Results

---
## 8 Discussion


### 8.2 Key Learning Points

---
## References

