Implementation for: Graph-Based Global Reasoning Networks (CVPR19)
- Image recognition experiments are in MXNet @92053bd
- Video and segmentation experiments are in PyTorch (0.5.0a0+783f2c6)
Train kinetics (single node):
./run_local.sh
Train kinetics (multiple nodes):
# please setup ./Host before running
./run_dist.sh
Evaluate the trained model on kinetics:
cd test
# check $ROOT/test/*.txt for the testing log
python test-single-clip.py
Note:
- The code is adapted from MFNet (ECCV18).
- ImageNet pretrained models (R50, R101) might be required. Please put it under
$ROOT/network/pretrained/
. - For image classification and segmentation tasks, please refer the code below.
Model | Method | Res3 | Res4 | Code & Model | Top-1 |
---|---|---|---|---|---|
ResNet50 | Baseline | link | 76.2 % | ||
ResNet50 | w/ GloRe | +3 | link | 78.4 % | |
ResNet50 | w/ GloRe | +2 | +3 | link | 78.2 % |
SE-ResNet50 | Baseline | link | 77.2 % | ||
SE-ResNet50 | w/ GloRe | +3 | link | 78.7 % |
Model | Method | Res3 | Res4 | Code & Model | Top-1 |
---|---|---|---|---|---|
ResNet200 | w/ GloRe | +3 | link | 79.4 % | |
ResNet200 | w/ GloRe | +2 | +3 | link | 79.7 % |
ResNeXt101 (32x4d) | w/ GloRe | +2 | +3 | link | 79.8 % |
DPN-98 | w/ GloRe | +2 | +3 | link | 80.2 % |
DPN-131 | w/ GloRe | +2 | +3 | link | 80.3 % |
* We use pre-activation[1] and strided convolution[2] for all networks for simplicity and consistency.
Model | input frames | stride | Res3 | Res4 | Model | Clip Top-1 |
---|---|---|---|---|---|---|
Res50 (3D) + Ours | 8 | 8 | +2 | +3 | link | 68.0 % |
Res101 (3D) + Ours | 8 | 8 | +2 | +3 | link | 69.2 % |
* ImageNet-1k pretrained models: R50(link), R101(link).
Method | Backbone | Code & Model | IoU cla. | iIoU cla. | IoU cat. | iIoU cat. |
---|---|---|---|---|---|---|
FCN + 1 GloRe unit | ResNet50 | link | 79.5% | 60.3% | 91.3% | 81.5% |
FCN + 1 GloRe unit | ResNet101 | link | 80.9% | 62.2% | 91.5% | 82.1% |
* All networks are evaluated on Cityscapes test set by the testing server without using extra “coarse” training set.
ImageNet-1k Training/Validation List:
- Download link: GoogleDrive
ImageNet-1k category name mapping table:
- Download link: GoogleDrive
Kinetics Dataset:
- Downloader: GitHub
Cityscapes Dataset:
- Download link: GoogleDrive
- The code is packed with the model within the same
*.tar
file.
- The `dataiter' supports reading from raw videos.
- Remove HLS augmentation (won't make much difference); Try to convert the raw videos to lower resolution to reduce the decoding cost (We use <=288p for all experiment).
For example:
# convet to sort_edge_length <= 288
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(288*iw)/min(iw\,ih)):-1" -b:v 640k -an ${DST_VID}
# or, convet to sort_edge_length <= 256
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(256*iw)/min(iw\,ih)):-1" -b:v 512k -an ${DST_VID}
# or, convet to sort_edge_length <= 160
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(160*iw)/min(iw\,ih)):-1" -b:v 240k -an ${DST_VID}
[1] He, Kaiming, et al. "Identity mappings in deep residual networks."
[2] https://github.com/facebook/fb.resnet.torch
@inproceedings{chen2019graph,
title={Graph-based global reasoning networks},
author={Chen, Yunpeng and Rohrbach, Marcus and Yan, Zhicheng and Shuicheng, Yan and Feng, Jiashi and Kalantidis, Yannis},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={433--442},
year={2019}
}
The code and the models are MIT licensed, as found in the LICENSE file.