This repo contains ICNet implemented by PyTorch, based on paper by Hengshuang Zhao, and et. al(ECCV'18). Training and evaluation are done on the Cityscapes dataset by default.
Python 3.6 or later with the following pip3 install -r requirements.txt
:
- torch==1.1.0
- torchsummary==1.5.1
- torchvision==0.3.0
- numpy==1.17.0
- Pillow==6.0.0
- PyYAML==5.1.2
- 2019.11.15: change
crop_size=960
, the best mIoU increased to 71.0%. It took about 2 days. Get icnet_resnet50_197_0.710_best_model.pth
Method | mIoU(%) | Time(ms) | FPS | Memory(GB) | GPU |
---|---|---|---|---|---|
ICNet(paper) | 67.7% | 33ms | 30.3 | 1.6 | TitanX |
ICNet(ours) | 71.0% | 19ms | 52.6 | 1.86 | GTX 1080Ti |
- Base on Cityscapes dataset, only train on trainning set, and test on validation set, using only one GTX 1080Ti card, and input size of the test phase is 2048x1024x3.
- For the performance of the original paper, you can query the "Table2" in the paper.
image | predict |
---|---|
- All the input images comes from the validation dataset of the Cityscaps, you can switch to the
demo/
directory to check more demo results.
First, modify the configuration in the configs/icnet.yaml
file:
### 3.Trainning
train:
specific_gpu_num: "1" # for example: "0", "1" or "0, 1"
train_batch_size: 7 # adjust according to gpu resources
cityscapes_root: "/home/datalab/ex_disk1/open_dataset/Cityscapes/"
ckpt_dir: "./ckpt/" # ckpt and trainning log will be saved here
Then, run: python3 train.py
First, modify the configuration in the configs/icnet.yaml
file:
### 4.Test
test:
ckpt_path: "./ckpt/icnet_resnet50_197_0.710_best_model.pth" # set the pretrained model path correctly
Then, run: python3 evaluate.py
The structure of ICNet is mainly composed of sub4
, sub2
, sub1
and head
:
sub4
: basically apspnet
, the biggest difference is a modifiedpyramid pooling module
.sub2
: the first three phases convolutional layers ofsub4
,sub2
andsub4
share these three phases convolutional layers.sub1
: three consecutive stried convolutional layers, to fastly downsample the original large-size input imageshead
: through theCFF
module, the outputs of the three cascaded branches(sub4
,sub2
andsub1
) are connected. Finaly, using 1x1 convolution and interpolation to get the output.
During the training, I found that pyramid pooling module
in sub4
is very important. It can significantly improve the performance of the network and lightweight models.
The most import thing in data preprocessing phase is to set the crop_size
reasonably, you should set the crop_size
as close as possible to the input size of prediction phase, here is my experiment:
- I set the
base_size
to 520, it means resize the shorter side of image between 520x0.5 and 520x2, and set thecrop size
to 480, it means randomly crop 480x480 patch to train. The final best mIoU is 66.7%. - I set the
base_size
to 1024, it means resize the shorter side of image between 1024x0.5 and 1024x2, and set thecrop_size
to 720, it means randomly crop 720x720 patch to train. The final best mIoU is 69.9%. - Beacuse our target dataset is Cityscapes, the image size is 2048x1024, so the larger
crop_size
(720x720) is better. I have not tried a largercrop_size
(such as 960x960 or 1024x1024) yet, beacuse it will result in a very small batch size and is very time-consuming, in addition, the current mIoU is already high. But I believe that largercrop_size
will bring higher mIoU.
In addition, I found that a small training technique can improve the performance of the model:
- set the learning rate of
sub4
to orginal initial learning rate(0.01), because it has backbone pretrained weights. - set the learning rate of
sub1
andhead
to 10 times initial learning rate(0.1), because there are no pretrained weights for them.
This small training technique is really effective, it can improve the mIoU performance by 1~2 percentage points.
Any other questions or my mistakes can be fedback in the comments section. I will replay as soon as possible.