Skip to content

sgalkin/CarND-T3P2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Segmentation

Udacity - Self-Driving Car NanoDegree Codacy Badge

Overview

In this project, implements Fully Convolutional Network (FCN) for semantic segmentation of a road in an image. Architecture on the FCN described in [1].

Goal

The project labels at least 80% of the road and label no more than 20% of non-road pixels as road. The model doesn't have to predict correctly all the images, just most of them.

Demo

Sample image

The full collection of testing dataset could be found here

Usage

python main.py

Project structure

  • main.py - the project entry point
  • helper.py - helper functions (download model, inference, etc)
  • project_tests.py - basic tests of correctness

Dependencies

Language

  • Python 3 - Python is a programming language that lets you work quickly and integrate systems more effectively

Tools

  • Conda - Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN

Libraries and Frameworks

  • TensorFlow - An open source machine learning framework for everyone
  • NumPy - The fundamental package for scientific computing with Python
  • scikit-image - Image processing in Python
  • tqdm - A fast, extensible progress bar for Python and CLI

Dataset

Run

  1. Clone the repository.
  2. Download dataset.
  3. Extract the dataset in the data folder. This will create the folder data_road with all the training a test images.
  4. Setup environment conda env create -f environment.yml carnd-t3p2
  5. Run python main.py

Implementation

Architecture

The project uses FCN-8 Neural Network Architecture as described in [1].

  • Encoder - VGG-16 [2]
  • Decoder
    • restores original image resolution
    • adds skip connection for 3rd layer
    • adds skip connection for 4th layer
    • L2 regularization used with weight 1e-3 for each transpose convolution
    • All weights initialized using truncated normal distribution with standard deviation 1e-2

Data Preparaion

  • Dataset was augmented by flipping all images from left to right

Training

  • Batch size - 34
  • Number of epochs - 75
  • Initial learining rate - 0.0005
  • Optimizer - ADAM
  • Encoder weights frozen

Rational

  • L2 regularization helps to prevent overfitting since the train dataset is very small
  • Frozen encoder weight - prevent overfitting since the train dataset is very small, increases speed of training
  • Data augmentation - helps improve accuracy and prevent overfitting

Results

  • Loss - 0.114276
  • IoU - 0.904379
  • Weights - available here

Reference

  1. J. Long, E. Shelhamer, T. Darrell, "Fully convolutional networks for semantic segmentation", 2014. arXiv:1605.06211
  2. Very Deep Convolutional Networks for Large-Scale Image Recognition K. Simonyan, A. Zisserman. arXiv:1409.1556

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%