implementation of a CNN model for real-time semantic segmentation
Switch branches/tags
Nothing to show
Clone or download
Latest commit 51e2a9c Dec 11, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea message Nov 26, 2018
docs Set theme jekyll-theme-cayman Nov 29, 2018
encoding Create Dec 8, 2018
experiments/segmentation Update Dec 8, 2018
scripts Create Dec 9, 2018
tests message Nov 26, 2018
torch_encoding.egg-info message Nov 26, 2018
video_demo Add files via upload Nov 26, 2018
.editorconfig message Nov 26, 2018
.gitignore message Nov 26, 2018
LICENSE message Nov 26, 2018
Makefile message Nov 26, 2018 Update Dec 11, 2018
_config.yml Update _config.yml Nov 29, 2018 message Nov 26, 2018
setup.cfg message Nov 26, 2018 message Nov 26, 2018


  • This is repository for a real-time segmentation CNN, and achieves both faster inference speed and higher segmentation accuracy, compared with other real-time models such as Lightweight-RefineNet.
  • This branch performs experiments on Cityscapes dataset, please see branch pascal for experiments on PASCAL VOC dataset.
  • This implementation is based on torch-encoding. Main difference is the structure of the model.


  • We tested ShelfNet with ResNet50 and ResNet101 as the backbone respectively: they achieved 59 FPS and 42 FPS respectively on a GTX 1080Ti GPU with a 512x512 input image.
  • On PASCAL VOC 2012 test set, it achieved 84.2% mIoU with ResNet101 backbone and 82.8% mIoU with ResNet50 backbone.
  • It achieved 75.8% mIoU with ResNet50 backbone on Cityscapes dataset.

Differences from results reported in the paper on Cityscapes

  • The results on PASCAL VOC is the same as in paper, but implementation on Cityscapes is slightly different.
  • The result of ShelfNet50 is slightly different on this implementation and reported in the paper (75.4% in this implementation, 75.8% in the paper).
  • The paper trains 500 epochs, while here the training epoch is 240.
  • The paper does not use synchronized batch normalization, while this implementation uses synchronized batch normalization across multiple GPUs.
  • For training on coarse labelled data, in this implementation the learning rate is set as 0.01 and remains constant; in results for the paper, the training on coarse labelled data uses a poly decay schedule, but the total epochs is set as 500, while I stopped the training mannualy at epoch 35 (In this way, there is a very slight decay on learning rate instead of constant).


  • Please refer to torch-encoding for implementation on synchronized batch-normalization layer.
  • PyTorch 0.4.1
  • Python 3.6
  • requests
  • nose
  • scipy
  • tqdm
  • Other requirements by torch-encoding.

How to run

Environment setup

  • run python install to install torch-encoding
  • make sure you have the same path for a datset in /scripts/ and /encoding/datasets/, default path is ~/.encoding/data, which is a hidden folder. You will need to type Ctrl + h to show is in Files

PASCAL dataset preparation

  • run cd scripts
  • run python to prepare datasets, including MS COCO, PASCAL VOC, PASCAL Aug, PASCAL Context
  • Download test dataset from official evaluation server for PASCAL, extract and merge with training data folder, e.g. ~/.encoding/data/VOCdevkit

Cityscapes dataset preparation

  • The data preparation code is modified from fyu implementation
  • The scripts are in folder scripts/prepare_citys
  • Step 1, download Cityscapes and Cityscapes Coarse dataset from Cityscapes official website, you need to download, ,, and unzip them into one folder
  • Step 2, prepare fine labelled dataset:
    • convert original segmentation id into 19 training ids python3 scripts/prepare_citys/ <cityscape folder>/gtFine/
    • Run sh in cityscape data folder, and move info.json into the data folder
  • Step 3, prepare coarse labelled dataset:
    • convert original segmentation id into 19 training ids python3 scripts/prepare_citys/ <cityscape folder>/gtCoarse/
    • Run sh in cityscape data folder, and move info.json into the data folder

Configurations (refer to /experiments/

  • --diflr: default value is True. If set as True, the head uses 10x larger learning rate than the backbone; otherwise head and backbone uses the same learning rate.
  • --model: which model to use, default is shelfnet, other options include pspnet, encnet,fcn
  • --backbone: backbone of the model, resnet50 or resnet101
  • --dataset: which dataset to train on, coco for MS COCO, pascal_aug for augmented PASCAL,pascal_voc for PASCAL VOC,pcontext for pascal context.
  • --aux: if type --aux, the model will use auxilliray layer, which is a FCN head based on the final block of backbone.
  • --se_loss: a context module based on final block of backbone, the shape is 1xm where m is number of categories. It penalizes whether a category is present or not.
  • --resume: default is None. It specifies the checkpoint to load
  • --ft: fine tune flag. If set as True, the code will resume from checkpoint but forget optimizer information.
  • --checkname: folder name to store trained weights
  • Other parameters are trevial, please refer to /experiments/segmentation/ for more details

Training scripts on PASCAL VOC

  • run cd /experiments/segmentation
  • pre-train ShelfNet50 on COCO,
    python --backbone resnet50 --dataset coco --aux --se-loss --checkname ShelfNet50_aux
  • fine-tune ShelfNet50 on PASCAL_aug, you may need to double check the path for resume.
    python --backbone resnet50 --dataset pascal_aug --aux --se-loss --checkname ShelfNet50_aux --resume ./runs/coco/shelfnet/ShelfNet50_aux_se/model_best.pth.tar -ft
  • fine-tune ShelfNet50 on PASCAL VOC, you may need to double check the path for resume.
    python --backbone resnet50 --dataset pascal_voc --aux --se-loss --checkname ShelfNet50_aux --resume ./runs/pascal_aug/shelfnet/ShelfNet50_aux_se/model_best.pth.tar -ft

Training scripts on Cityscapes

  • run cd /experiments/segmentation
  • pre-train ShelfNet50 on coarse labelled dataset,
    python --diflr False --backbone resnet50 --dataset citys_coarse --checkname ShelfNet50_citys_coarse --lr-schedule step
  • fine-tune ShelfNet50 on fine labelled dataset, you may need to double check the path for resume.
    python --diflr False --backbone resnet50 --dataset citys --checkname citys_coarse --resume ./runs/citys_coarse/shelfnet/ShelfNet50_citys_coarse/model_best.pth.tar -ft

Test scripts on PASCAL VOC

  • To test on PASCAL_VOC with multiple-scales input [0.5, 0.75, 1.0, 1.25, 1.5, 1.75].
    python --backbone resnet50 --dataset pascal_voc --resume ./runs/pascal_voc/shelfnet/ShelfNet50_aux_se/model_best.pth.tar
  • To test on PASCAL_VOC with single-scale input
    python --backbone resnet50 --dataset pascal_voc --resume ./runs/pascal_voc/shelfnet/ShelfNet50_aux_se/model_best.pth.tar
  • Similar experiments can be performed on ShelfNet with ResNet101 backbone, and experiments on Cityscapes can be performed by changing dataset as --dataset citys

Evaluation scripts

  • You can use the following script to generate ground truth - prediction pairs on PASCAL VOC validation set.
    python --backbone resnet50 --dataset pascal_voc --resume ./runs/pascal_voc/shelfnet/ShelfNet50_aux_se/model_best.pth.tar --eval

Measure running speed

  • Measure running speed of ShelfNet on 512x512 image.
    python --model shelfnet --backbone resnet101
    python --model pspnet --backbone resnet101

Pre-trained weights

Structure of ShelfNet


Examples on Pascal VOC datasets

Pascal results

Video Demo on Cityscapes datasets

  • Video demo of ShelfNet50 on Cityscapes Video demo of ShelfNet50
  • Video demo of ShelfNet101 on Cityscapes Video demo of ShelfNet101

Numerical results on Pascal VOC test set

Numerical Results