In this project, implements Fully Convolutional Network (FCN) for semantic segmentation of a road in an image. Architecture on the FCN described in [1].
The project labels at least 80% of the road and label no more than 20% of non-road pixels as road. The model doesn't have to predict correctly all the images, just most of them.
Sample image
The full collection of testing dataset could be found here
python main.py
main.py
- the project entry pointhelper.py
- helper functions (download model, inference, etc)project_tests.py
- basic tests of correctness
Python 3
- Python is a programming language that lets you work quickly and integrate systems more effectively
Conda
- Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN
TensorFlow
- An open source machine learning framework for everyoneNumPy
- The fundamental package for scientific computing with Pythonscikit-image
- Image processing in Pythontqdm
- A fast, extensible progress bar for Python and CLI
Kitti Road dataset
from here
- Clone the repository.
- Download dataset.
- Extract the dataset in the
data
folder. This will create the folderdata_road
with all the training a test images. - Setup environment
conda env create -f environment.yml carnd-t3p2
- Run
python main.py
The project uses FCN-8
Neural Network Architecture as described in [1].
- Encoder -
VGG-16
[2] - Decoder
- restores original image resolution
- adds skip connection for 3rd layer
- adds skip connection for 4th layer
- L2 regularization used with weight 1e-3 for each transpose convolution
- All weights initialized using truncated normal distribution with standard deviation 1e-2
- Dataset was augmented by flipping all images from left to right
- Batch size - 34
- Number of epochs - 75
- Initial learining rate - 0.0005
- Optimizer - ADAM
- Encoder weights frozen
- L2 regularization helps to prevent overfitting since the train dataset is very small
- Frozen encoder weight - prevent overfitting since the train dataset is very small, increases speed of training
- Data augmentation - helps improve accuracy and prevent overfitting
- Loss - 0.114276
- IoU - 0.904379
- Weights - available here
- J. Long, E. Shelhamer, T. Darrell, "Fully convolutional networks for semantic segmentation", 2014. arXiv:1605.06211
- Very Deep Convolutional Networks for Large-Scale Image Recognition K. Simonyan, A. Zisserman. arXiv:1409.1556