Resource Constrained Object Detection

A research project supervised by Dr David Thomas (Imperial College) on lightweight convolutional neural network solution to detect obstacle balls using computer vision that is trained using PyTorch / TFLite to be deployed on the following resource constrained environments: Intel DE10-Lite FPGA and Raspberry Pi 3.

Hard-coded ver.	DL ver.

Limitations / Constraints

Limited dataset of about 500 images (approx. 100 for each class)
Dataset includes images of small objects
Limited compute power / resources
Constrained by low target inference time (~= 30 ms)
Constrained by high target frame rate (30 ~ 60 fps)

Datasets

Manually taken images are split into three categories in terms of lighting and thus brightness levels: dark, normal and bright. This is to mainly ensure robust inference of NNs under varying light settings.
Each of these images contain a single ball of five different colours: red, green, blue, yellow and pink.

The dataset contains the raw images (1920x1080) and the respective label csv file that contains the dimensions of bounding boxes and the colour of the ball on the image. The dataset is published as a DOI via zenodo.

Implementation and Training Model

Input and output formats

The input tensor to the CNN is RGB image of size 320x240. The output tensors are a tensor of size [1x4] for the object bounding box regression and a tensor of size [1x5] for the classification / scores of each class of the balls.

Image data augmentation

To make the most out of a limited dataset and to prevent the model from overfitting onto the training data, the input images can be augmented i.e. flipped, cropped, etc. just to make sure that the training images are slightly different and the model is not fed in and hence learning the exact same tensors.

Loss functions

Also known as the cost function, for the dual-inferencing CNN (bbox regression & classification), it is necessary to use the appropriate loss function to be minimised in order to achieve the desirable performance once trained or to even train the NNs. For classification tasks cross entropy loss was used and for bbox regression tasks, L1 loss (Mean Absolute Error) / L2 loss (Mean Squared Error) / IOU loss (Intersection over Union).

Simple CNN

The initial attempt was to design a CNN architecture with few conv2d layers followed by activations and maxpooling with fc layers in the end.

inference time of ~= 30 ms very low robustness & accuracy fitted to training set only x work on images with different backgrounds (high variance)

Transfer learning pre-trained state-of-the-art CNN

Some state-of-the-art CNNs such as the resnets and mobilenets with pre-trained early layers frozen (great feature extractors) and trainable fully connected layers at the end were trained on the dataset. The training and the progress in validation loss over the number of epochs trained was great compared to a simple CNN (just a few conv2d layers with fc layers). However, when the torch model was converted to TFLite model for deployment on raspberry pi, the inference time was about > 3000 ms due to the limited compute power. Although transfer learning is beneficial given a limited, small dataset, the computational cost is too high for a CPU to work in real-time with low inference time.

EfficientDet

Google's EfficientDet with the backbone of EfficientNet and BiFPN as its feature network that uses a compound scaling was able to achieve both the high accuracy and low inference time deep learning object detector.

EfficientDet-Lite0: ~= 1000 ms
EfficientDet-Lite2: ~= 2700 ms

Optimizations

Evaluation Metrics

Accuracy and robustness metrics
Speed metric
Resource utilisation metric

Quantization Techniques

Post-training quantization
Quantization-aware training
Torch to TFLite conversion For pytorch model to TFLite conversion, run the following command on terminal.

python3 torch_to_tflite.py --torch ./trained_model/CNN2.pt --tflite ./model/CNN2.tflite

Room for improvements

Biased dataset

I started this project by collecting the ball dataset, purely out of my experience from hard-coded computer vision that directly works (only in the optimal light setting) from the individual pixel values that are gaussian filtered to minimise noise. I thought by collecting images of certain categories of light settings (dark, normal, bright), it would help neural networks to generalize better but it turns out that was not the case despite image augmentation and possibly due to this nature of CNNs.

Importance of neural network accelerators

When deploying several trained models on Raspberry Pi 3, only a minimal CNN e.g. 2 conv2d layers followed by 2 fc layers was just about to satisfy the resource constraints and the target performance in terms of the frame rate and inference time. This leads to how the current processors on edge devices are not solely for DL inferencing and how the software stack or the compilers on top of hardware system adapt the CNNs to optimize for the CPU. The performance on inference can be improved by using AI accelerators such as Edge TPU from Google along with the CPU on edge devices.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dataset		dataset
trained_model		trained_model
.gitignore		.gitignore
DL.gif		DL.gif
Jayjun_Lee_Resource_Constrained_Object_Detection.pdf		Jayjun_Lee_Resource_Constrained_Object_Detection.pdf
README.md		README.md
balls.png		balls.png
cnn_ball_detection.ipynb		cnn_ball_detection.ipynb
data_augmentation.py		data_augmentation.py
hardcoded.gif		hardcoded.gif
labels.txt		labels.txt
raspberry_pi_deploy.py		raspberry_pi_deploy.py
torch_to_tflite.py		torch_to_tflite.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resource Constrained Object Detection

Limitations / Constraints

Datasets

Implementation and Training Model

Input and output formats

Image data augmentation

Loss functions

Simple CNN

Transfer learning pre-trained state-of-the-art CNN

EfficientDet

Optimizations

Evaluation Metrics

Quantization Techniques

Room for improvements

Biased dataset

Importance of neural network accelerators

Related papers

About

Releases 2

Packages

Languages

jayjunlee/Resource-Constrained-Object-Detection

Folders and files

Latest commit

History

Repository files navigation

Resource Constrained Object Detection

Limitations / Constraints

Datasets

Implementation and Training Model

Input and output formats

Image data augmentation

Loss functions

Simple CNN

Transfer learning pre-trained state-of-the-art CNN

EfficientDet

Optimizations

Evaluation Metrics

Quantization Techniques

Room for improvements

Biased dataset

Importance of neural network accelerators

Related papers

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages