Semantic Segmentation

Overview

In this project, implements Fully Convolutional Network (FCN) for semantic segmentation of a road in an image. Architecture on the FCN described in [1].

Goal

The project labels at least 80% of the road and label no more than 20% of non-road pixels as road. The model doesn't have to predict correctly all the images, just most of them.

Demo

Sample image

The full collection of testing dataset could be found here

Usage

python main.py

Project structure

main.py - the project entry point
helper.py - helper functions (download model, inference, etc)
project_tests.py - basic tests of correctness

Dependencies

Language

Python 3 - Python is a programming language that lets you work quickly and integrate systems more effectively

Tools

Conda - Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN

Libraries and Frameworks

TensorFlow - An open source machine learning framework for everyone
NumPy - The fundamental package for scientific computing with Python
scikit-image - Image processing in Python
tqdm - A fast, extensible progress bar for Python and CLI

Dataset

Kitti Road dataset from here

Run

Clone the repository.
Download dataset.
Extract the dataset in the data folder. This will create the folder data_road with all the training a test images.
Setup environment conda env create -f environment.yml carnd-t3p2
Run python main.py

Implementation

Architecture

The project uses FCN-8 Neural Network Architecture as described in [1].

Encoder - VGG-16 [2]
Decoder
- restores original image resolution
- adds skip connection for 3rd layer
- adds skip connection for 4th layer
- L2 regularization used with weight 1e-3 for each transpose convolution
- All weights initialized using truncated normal distribution with standard deviation 1e-2

Data Preparaion

Dataset was augmented by flipping all images from left to right

Training

Batch size - 34
Number of epochs - 75
Initial learining rate - 0.0005
Optimizer - ADAM
Encoder weights frozen

Rational

L2 regularization helps to prevent overfitting since the train dataset is very small
Frozen encoder weight - prevent overfitting since the train dataset is very small, increases speed of training
Data augmentation - helps improve accuracy and prevent overfitting

Results

Loss - 0.114276
IoU - 0.904379
Weights - available here

Reference

J. Long, E. Shelhamer, T. Darrell, "Fully convolutional networks for semantic segmentation", 2014. arXiv:1605.06211
Very Deep Convolutional Networks for Large-Scale Image Recognition K. Simonyan, A. Zisserman. arXiv:1409.1556

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
helper.py		helper.py
main.py		main.py
project_tests.py		project_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Segmentation

Overview

Goal

Demo

Usage

Project structure

Dependencies

Language

Tools

Libraries and Frameworks

Dataset

Run

Implementation

Architecture

Data Preparaion

Training

Rational

Results

Reference

About

Releases

Packages

Languages

License

sgalkin/CarND-T3P2

Folders and files

Latest commit

History

Repository files navigation

Semantic Segmentation

Overview

Goal

Demo

Usage

Project structure

Dependencies

Language

Tools

Libraries and Frameworks

Dataset

Run

Implementation

Architecture

Data Preparaion

Training

Rational

Results

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages