# COMP9444 Group Project


## 1. Introduction, Motivation, and/or Problem Statement

### Introduction
Our project aims to leverage computer vision neural network to improve object detection of images during both daytime and nighttime environments. The ability to accurately detect and recognize objects in varying lighting condition has become crucial for the functionalities of many modern day applications; some examples would be autonomous vehicles, surveillance and security systems.

Consider the two images below. It is imperative that everything in left image is very easy to identify, and when contrasted to the image on the right it really highlights just how much harder it is to identify objects with low luminosity.
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="https://www.exposit.com/wp-content/webp-express/webp-images/doc-root/wp-content/uploads/2021/04/Illumination_conditions_as_a_challenge_of_comp.width-800.jpg.webp" width="500"/>
</div>

### Motivation
Modern day computer vision neural networks often fail to perform well in nighttime object detection (inaccurate detection of objects in low luminosity environments). Nighttime environment factors like shadow, limited luminosity, and visibility makes it challenging for the network to classify objects. With this problem, it can hinder the effectiveness and safety of pre-existing computer vision applications like surveillance, which requires all day monitoring.

Researchers have made advancements in enhancing accuracy for low-light detection. An example is the REDI low-light enhancement algorithm, which effectively filters noise in low-light conditions and performs detection on the resulting image.
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/lowlight.png" />
</div>

Here (a) through to (d) are stages of REDI algorithm filtering. However, there are many downsides to this algorithm like loss of details, over-correction, and high computational cost. This would pose a challenge as it would add extra complexity and computational stress on existing models.

Solving day/night object detection will definitely bring significant enhancements in the real world, and some key areas of improvements are autonomous driving, surveillance and security systems. This is not only an exciting technical challenge for researchers, but also has the potential to open up new possibilities for neural network computer vision advancements.

### Problem Statements
Key challenges that requires to be address by our models are:
1. The model requires to handle varying levels of brightness within the image.
2. Removing noise from nighttime image, as image taken at night might have more noise.

## 2. Exploration Analysis or Data or RL Tasks


## 3. Models and/or Methods

### 2DPASS 
Link to paper: https://arxiv.org/pdf/2210.04208.pdf

#### Model Introduction
This model is an Assisted Semantic Segmentation method that boosts the representation learning on point clouds. A notable advantage of this model is that 
Advantages of this model is that it does not require strict pair data alignments between the camera and LiDAR data. 

The 2DPASS method leverages an auxiliary model fusion and multi-scale fusion to single knowledge distillation (MSFSKD) to acquire richer semantic and structural information from the multi-modal data. This is a significant improvement over baseline models where models only use point cloud.



## 4. Results

### 2DPASS Results

#### 2DPASS Trained on Mini-Dataset
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/mini.png" />
</div>

#### 2DPASS Pretrained Model
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/pretrained.png" />
</div>

#### Model Results
| Model                | mIoU | Accuracy |
|----------------------|------|----------|
| 2DPASS (Mini-dataset)| 36%  | 56%      |
| 2DPASS (Pretrained)  | 81%  | 63%      |

Major improvements in accuracy and mIoU are both significant for the pretrained model which was initially trained on the full dataset. Note, that this result is worse than the one displayed in the paper as their model was trained with additional validation set and using instance-level augmentation.

#### Epoch Training Steps
NOTE: X-axis is number of epoch.
##### mIoU vs Epoch 
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/miou_r.png" width="700px" />
</div>
##### Best mIoU vs Epoch 
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/miou.png" width="700px" />
</div>

From the mIoU curves and best mIoU curve(smoothened out), we see that around 8000 epoch there are no significant improves in the mIoU value, emphasizing that further training after 8000 epoch does not improve the model, and could lead to overfitting existing data.

##### Accuracy vs Epoch 
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/accuracy.png" width="700px" />
</div>
The accuracy during the training of the model behaves similarly to the mIoU curve as optimum accuracy is reached around 8000 epoche

## 5. Discussion
### 2DPASS Discussion
#### System Performance:
System Specifications:
We have trained the 2DPASS model on a Nvidia 4060 laptop graphics card with 16 gigabytes of RAM. 

#### Dataset:
For the interest of time we have used the mini-training dataset of nuscenes which is around 6 gigabytes compared to the 80 gigabytes full dataset.

#### Training Specifications
Training batch size had to be limited to a size of 1 as any batch sizes larger than this would cause insufficient memory errors.
Training parameters have been pre-tuned by the developers as:
- Learning Rate: 0.24
- Optimizer: SGD
- Momentum: 0.9
- Weight Decay: 1.0e-4
	
#### Model Architecture
This model significantly improves upon simple image computer vision neural networks, as 2DPASS introduces lidar detection combined with the use of image. This more accurately detects the existence and classification of the object even in low luminosity environments.

#### Training Time
The training process of our model on the mini-dataset took approximately 5 hours, which is due to our computer’s limited memory as it was only able to manage a batch training size of one. Also, due to the limited variety in the mini-dataset, we observed that the val/mIoU failed to show improvements over the last 50 records, which shows that a lot of the computation towards the end of training did not achieve any notable performance improvements.
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/train_time.png" />
</div>

#### Challenges and Solutions
Originally running the model on the whole 80 gigabytes data requires too much computational power and time, so we resorted to using the mini-training set instead, which was much faster to train.

Training on a much smaller dataset could potentially introduce overfitting of data and lead to inaccurate results, in this case we have used their pre-trained model to compare results before drawing conclusions.
<div style="margin-top: 20px; margin-bottom: 20px;">
<img src="./images/overfit.png" />
</div>
The above is the result from testing the model trained with the mini-dataset, and here we can clearly see a case of overfitting where all vehicle like objects are recognised as cars explaining the high accuracy in car predictions and basically 0% accuracy in all other vehicles detections.

Our main challenges occurred within our limited ability to modify the model, as the training time even on a much smaller dataset took up to five hours. To tackle this problem, we have introduced early-stopping of the training, where if we do not see noticeable improvements on the mIoU(mean intersection over Union) value over five epochs of training we will manually exit the training. However, finding a sweet spot for the improvement was difficult and is hard to optimise. Moreover, as training is also dependent on the distribution of the dataset, it is uncertain how much the model will learn from processing different data.


### References (To be cleaned up later)
https://www.exposit.com/blog/computer-vision-object-detection-challenges-faced/
