In this project, you'll label the pixels of a road in images using a Fully Convolutional Network (FCN).
The goal of this project is to understand the concepts of Fully Convolutional Network (FCN) and write a program to label the pixels of a road in images.
The results of the run are present in the logsData file. The test images are in the runs
folder.
Examples of the training images:
Original Camera Image | Ground truth generated by manual annotation |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
A pre-trained VGG-16 network was converted to a fully convolutional network by converting the final fully connected layer to a 1x1 convolution and setting the depth equal to the number of desired classes (in this case, two: road and not-road). Performance is improved through the use of skip connections, performing 1x1 convolutions on previous VGG layers (in this case, layers 3 and 4) and adding them element-wise to upsampled (through transposed convolution) lower-level layers (i.e. the 1x1-convolved layer 7 is upsampled before being added to the 1x1-convolved layer 4). Each convolution and transpose convolution layer includes a kernel initializer and regularizer
The loss function for the network is cross-entropy, and an Adam optimizer is used.
The hyperparameters used for training are:
- keep_prob: 0.5
- learning_rate: 1e-4
- epochs: 25
- batch_size: 16
Epoch | Execution Time | Loss |
---|---|---|
0/25 | 19.111719 sec | 0.5272083282470703 |
1/25 | 35.192439 sec | 0.2041412889957428 |
2/25 | 51.362913 sec | 0.16918902099132538 |
3/25 | 67.358598 sec | 0.19845721125602722 |
4/25 | 83.408402 sec | 0.14211736619472504 |
5/25 | 99.636254 sec | 0.16290423274040222 |
6/25 | 115.70175099999999 sec | 0.1123085767030716 |
7/25 | 131.741223 sec | 0.5022396445274353 |
8/25 | 147.769128 sec | 0.13068705797195435 |
9/25 | 163.813714 sec | 0.11155102401971817 |
10/25 | 179.83248799999998 sec | 0.08798660337924957 |
11/25 | 195.88330399999998 sec | 0.15328627824783325 |
12/25 | 211.923601 sec | 0.05937640368938446 |
13/25 | 227.927481 sec | 0.0824950635433197 |
14/25 | 243.924142 sec | 0.07202279567718506 |
15/25 | 259.94838 sec | 0.05812866985797882 |
16/25 | 275.939361 sec | 0.06622869521379471 |
17/25 | 291.978812 sec | 0.13202236592769623 |
18/25 | 308.036312 sec | 0.09400054067373276 |
19/25 | 324.012979 sec | 0.03189713507890701 |
20/25 | 340.00436 sec | 0.06519308686256409 |
21/25 | 355.966684 sec | 0.021495511755347252 |
22/25 | 371.971759 sec | 0.03929092362523079 |
23/25 | 387.93568899999997 sec | 0.032834939658641815 |
24/25 | 403.961944 sec | 0.06721004098653793 |
Below are a few sample images from the output of the fully convolutional network, with the segmentation class overlaid upon the original image in green.
Make sure you have the following is installed:
conda env create -f environment.yaml
- Python 3.5
- TensorFlow-gpu 1.0.0
- NumPy 1.13.1
- SciPy 0.17.0
- Pillow 4.2.1
- tqdm 4.15.0
Download the Kitti Road dataset from here. Extract the dataset in the data
folder. This will create the folder data_road
with all the training a test images.
Implement the code in the main.py
module indicated by the "TODO" comments.
The comments indicated with "OPTIONAL" tag are not required to complete.
Run the following command to run the project:
python main.py
Note If running this in Jupyter Notebook system messages, such as those regarding test status, may appear in the terminal rather than the notebook.
- Ensure you've passed all the unit tests.
- Ensure you pass all points on the rubric.
- Submit the following in a zip file.
helper.py
main.py
project_tests.py
- Newest inference images from
runs
folder
A well written README file can enhance your project and portfolio. Develop your abilities to create professional README files by completing this free course.