This project aims to put in practice concepts and techniques to develop deep neural networks for being able to compute three main computer vision tasks: object detection, segmentation and recognition. These concepts will be applied in an specific environent which is autonomous vehicles.
Authors | Emails | Github |
---|---|---|
Guillem Delgado | guillem.delgado@gmail.com | guillemdelgado |
Francisco Roldan | fran.roldans@gmail.com | franroldans |
Jordi Gené | jordigenemola.1@gmail.com | Jordi-Gene-Mola |
robertbenavente and lluisgomez as supervisors.
The final report details and summarizes the work done weekly in the project can be found here.
The final slides for the presentations detailing and summaraizing the work done weekly can be found here.
We have manually inspected the data in which we have work to facilitate the interpretation of the results obtained. Find the data set analysis here.
The weights of the different models can be found here. As the size of weights files is huge there are just the most successful experiments for each dataset and network. However, if you feel there are missing the weights of an experiment you are interested in, just open an issue and we will update this Google Drive with your request ASAP.
In order to choose a good-performing object recognition network for our system, we have tested several CNNs with different architectures. Changing different parameters from code/config/tt100k_classif.py we were able to test different datasets and different NN. In addition, we implemented and tested a Deep Network with Stochastic Depth based on Residual Blocks which can be found in code/models/stochastic_depth.py
The framework's code is divided as follows:
- callbacks/ : Folder that handles all the different callbacks involved during training.
- config/ : Folder that contains all configuration files for the different experiments done and handles them.
- initializations/ : Useful tools for weights initialization.
- layers/ : Folder that contains layers not present in Keras such as Deconvolution.
- metrics/ : Tools for model evaluation
- models/ : Folder that handles all the different models involved in the project.
- tools/ : Useful tools to manage deep learning projects.
Results of the different experiments.
See this README to know how to run the code and run the experiments.
- Testing the framework:
- Analyze the dataset, which the summary can be found the Datasets Analysis section.
- Calculate the accuracy on train and test sets.
- Evaluate different techniques in the configuration file.
- Transfer learning to another dataset.
- Understand configuration file.
- Train networks on different datasets:
- VGG model from scratch.
- VGG model fine-tuning with ImageNet weights.
- Implementing a new Neural Network:
- Integrate the new model into the framework.
- Evaluate the new model on TT100K dataset.
- Boost performance
- Data Augmentation.
- Data Preprossesing.
- Comparative of optimizers.
For object detection we have considered two single-shot models: You Only Look Once (YOLO) with the smaller model, Tiny-YOLO, and Single-Shot Multibox Detector (SSD). All these models have been trained to detect a variety of traffic signs in the TT100K detection dataset and to detect pedestrians, cars and trucks in the Udacity dataset. Faster-RCNN was tried but it is not included due to difficulties to upgrade it to newest Keras version.
The contributions done for these weeks are:
- layers/ssd_layers.py : Layers needed for the SSD model
- models/ssd.py : SSD Model.
- tools/ssd_utils : Utils needed for the SSD Model.
Results of the different experiments.
See this README to know how to run the code and run the experiments.
- YOLOv2 model in TT100k Dataset:
- Analyze the dataset, which the summary can be found the Datasets Analysis section.
- Calculate the F-score.
- Summary of references:
- Summary of Yolo and F-RCNN.
- Implementing a new Neural Network:
- Integrate the new model (SSD) into the framework.
- Evaluate the new model on BOTH datasets.
- Train the networks on a different dataset:
- Evaluate the YOLO on BOTH datasets.
- Evaluate the SSD on BOTH datasets.
- Boost performance:
- Data Augmentation.
- Data Preprossesing.
- Comparative of optimizers.
Three different models have been tried during these weeks. Starting by FCN-8, we have explored different regularization methods not tried in previous weeks, such as batch normalization, and trained the model in different datasets (CamVid, Cityscapes and KITTI). Segnet and Unet have been also adapted to Keras 2.0 and tested on CamVid dataset.
The contributions done for these weeks are:
- models/unet.py: U-Net Model
- models/segnet.py: SegNet Model
- predictions.py: Script that generates the mask predictions.
Results of the different experiments.
See this README to know how to run the code and run the experiments.
- FCN8 model:
- Analyze the dataset, which the summary can be found the Datasets Analysis section.
- Summary of references:
- Summary of FCN and SegNet.
- Implementing a new Neural Network:
- Integrate the new model (SegNet) into the framework.
- Integrate the new model (unet) into the framework.
- Train the networks on a different dataset:
- Evaluate FCN8 for Cityscapes and KITTI.
- Boost performance:
- Comparative of batch sizes.
- Comparative or learning rates.
- Comparative of optimizers.
- Batch normalitzation.
- Data augmentation for unet.
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
A. Krizhevsky, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012.
G. Huang, “Deep networks with stochastic depth,” in European Conference on Computer Vision, 2016.
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." arXiv preprint 1612 (2016).
Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv preprint arXiv:1511.00561, 2015.