dff

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
dff_faster-rcnn_r101-dc5_8xb1-7e_imagenetvid.py		dff_faster-rcnn_r101-dc5_8xb1-7e_imagenetvid.py
dff_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py		dff_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py
dff_faster-rcnn_x101-dc5_8xb1-7e_imagenetvid.py		dff_faster-rcnn_x101-dc5_8xb1-7e_imagenetvid.py
metafile.yml		metafile.yml

README.md

Deep Feature Flow for Video Recognition

Abstract

Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is nontrivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two video datasets on object detection and semantic segmentation. It significantly advances the practice of video recognition tasks.

Citation

@inproceedings{zhu2017deep,
  title={Deep feature flow for video recognition},
  author={Zhu, Xizhou and Xiong, Yuwen and Dai, Jifeng and Yuan, Lu and Wei, Yichen},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={2349--2358},
  year={2017}
}

Results and models on ImageNet VID dataset

We observe around 1 mAP fluctuations in performance, and provide the best model.

Method	Backbone	Style	Lr schd	Mem (GB)	Inf time (fps)	box AP@50	Config	Download
DFF	R-50-DC5	pytorch	7e	2.50	44.0	70.3	config	model \| log
DFF	R-101-DC5	pytorch	7e	3.25	39.8	73.5	config	model \| log
DFF	X-101-DC5	pytorch	7e	4.95	-	75.5	config	model \| log

Get started

1. Training

Due to the influence of parameters such as learning rate in default configuration file, we recommend using 8 GPUs for training in order to reproduce accuracy. You can use the following command to start the training.

# The number after config file represents the number of GPUs used. Here we use 8 GPUs
./tools/dist_train.sh \
    configs/vid/dff/dff_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py 8

If you want to know about more detailed usage of train.py/dist_train.sh/slurm_train.sh, please refer to this document.

2. Testing and evaluation

# The number after config file represents the number of GPUs used. Here we use 8 GPUs.
./tools/dist_test.sh \
    configs/vid/dff/dff_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py 8 \
    --checkpoint ./checkpoints/dff_faster_rcnn_r50_dc5_1x_imagenetvid_20201227_213250-548911a4.pth

3.Inference

Use a single GPU to predict a video and save it as a video.

python demo/demo_vid.py \
    configs/vid/dff/dff_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py \
    --checkpoint ./checkpoints/dff_faster_rcnn_r50_dc5_1x_imagenetvid_20201227_213250-548911a4.pth \
    --input demo/demo.mp4 \
    --output vid.mp4

If you want to know about more detailed usage of demo_vid.py, please refer to this document.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dff

dff

README.md

README.md

dff_faster-rcnn_r101-dc5_8xb1-7e_imagenetvid.py

dff_faster-rcnn_r101-dc5_8xb1-7e_imagenetvid.py

dff_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py

dff_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py

dff_faster-rcnn_x101-dc5_8xb1-7e_imagenetvid.py

dff_faster-rcnn_x101-dc5_8xb1-7e_imagenetvid.py

metafile.yml

metafile.yml

README.md

Deep Feature Flow for Video Recognition

Abstract

Citation

Results and models on ImageNet VID dataset

Get started

1. Training

2. Testing and evaluation

3.Inference

Files

dff

Directory actions

More options

Directory actions

More options

Latest commit

History

dff

Folders and files

parent directory

Deep Feature Flow for Video Recognition

Abstract

Citation

Results and models on ImageNet VID dataset

Get started

1. Training

2. Testing and evaluation

3.Inference