In this project, we label the pixels of a road in images using a Fully Convolutional Network (FCN) that replicates the fcn8s network shown in this reference. The reference is anotated to highlight the main points of the paper that lead to the implementation shown in main.py. Comments around the structure of the network link to this anotated pdf in the main.py.
The results are shown in the runs directory. Few results are shown here:
Make sure you have the following is installed:
The Kitti Road dataset from here was used for training and testing.
The following command can run the project:
python main.py
- The link for the frozen
VGG16
model is hardcoded intohelper.py
. The model can be found here - The model is not vanilla
VGG16
, but a fully convolutional version, which already contains the 1x1 convolutions to replace the fully connected layers. Please see this forum post for more information. A summary of additional points, follow. - The original FCN-8s was trained in stages. The authors later uploaded a version that was trained all at once to their GitHub repo. The version in the GitHub repo has one important difference: The outputs of pooling layers 3 and 4 are scaled before they are fed into the 1x1 convolutions. As a result, some students have found that the model learns much better with the scaling layers included. The model may not converge substantially faster, but may reach a higher IoU and accuracy.
- When adding l2-regularization, setting a regularizer in the arguments of the
tf.layers
is not enough. Regularization loss terms must be manually added to your loss function. otherwise regularization is not implemented.