Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos). It is worth noting that:
- We propose a unified framework LSFA for video object detection addressing both detection accuracy and speed.
- In LSFA, the short-term feature aggregation method is the first work that uses feature extracted from original image to enhance the propagated features for non-key frames.
- On the challenging ImageNet VID dataset, LSFA runs in real-time (30 FPS) with detection accuracy on-par with the state-of-the-art results.
For fair comparisons, all the values of testing speed are obtained using a Titan X GPU
-
MXNet from the offical repository. We tested our code on MXNet@(commit 75a9e187d).
-
Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not suppoort Python 3 yet, if you want to use Python 3 you need to modify the code to make it work.
-
Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If
pip
is set up on your system, those packages should be able to be fetched and installed by runningpip install Cython pip install opencv-python==3.2.0.6 pip install easydict==1.6
-
We use ffmpeg 3.1.3 to generate mpeg4 raw videos.
-
We build coviar.so to load compressed representation (I-frame, motion vectors, or residual).
Any NVIDIA GPUs with at least 8GB memory should be OK
-
Clone the LSFA repository, and we'll call the directory that you cloned LSFA as ${LSFA_ROOT}.
git clone https://github.com/hustvl/LSFA.git
-
For Linux user, run
sh ./init.sh
. The scripts will build cython module automatically and create some folders. -
Install MXNet:
3.1 Clone MXNet and checkout to MXNet@(commit 75a9e187d) by
git clone --recursive https://github.com/apache/incubator-mxnet git checkout 75a9e187d git submodule update
3.2 Copy operators in
$(DFF_ROOT)/dff_rfcn/operator_cxx
or$(DFF_ROOT)/rfcn/operator_cxx
to$(YOUR_MXNET_FOLDER)/src/operator/contrib
bycp -r $(LSFA_ROOT)/dff_rfcn/operator_cxx/* $(MXNET_ROOT)/src/operator/contrib/
3.3 Compile MXNet
cd ${MXNET_ROOT} make -j4
3.4 Install the MXNet Python binding by
Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4
cd python sudo python setup.py install
3.5 For advanced users, you may put your Python packge into
./external/mxnet/$(YOUR_MXNET_PACKAGE)
, and modifyMXNET_VERSION
in./experiments/dff_rfcn/cfgs/*.yaml
to$(YOUR_MXNET_PACKAGE)
. Thus you can switch among different versions of MXNet quickly. -
Install ffmpeg:
4.1 Clone ffmpeg and checkout to ffmpeg@(commit 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2) by
git clone https://github.com/FFmpeg/FFmpeg.git git checkout 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2
4.2 Compile ffmpeg
make clean ./configure --prefix=${FFMPEG_INSTALL_PATH} --enable-pic --disable-yasm --enable-shared make make install
4.3 If needed, add ${FFMPEG_INSTALL_PATH}/lib/ to $LD_LIBRARY_PATH.
-
Build coviar_py2.so
cd $(LSFA_ROOT)/external/data_loader_py2 sh install.sh cp ./build/lib.linux-x86_64-2.7/coviar_py2.so $(LSFA_ROOT)/lib
-
Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:
./data/ILSVRC2015/ ./data/ILSVRC2015/Annotations/DET ./data/ILSVRC2015/Annotations/VID ./data/ILSVRC2015/Data/DET ./data/ILSVRC2015/Data/VID ./data/ILSVRC2015/ImageSets
-
Use ffmpeg generate mpeg4 raw videos.
sh ./data/reencode_vid ./data/ILSVRC2015/Data/VID/snippets ./data/ILSVRC2015/Data/VID/mpeg4_snippets
-
For your convenience, we provide the trained models and pretrained_model from Baidu Yun (pwd:493a), and put pretrained_model under folder
./model
. put the trained model under folder./output
:
-
All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder
./experiments/{rfcn/dff_rfcn}/cfgs
. -
Two config files have been provided so far, namely, Frame baseline with R-FCN and LSFA with R-FCN for ImageNet VID. We use 4 GPUs to train models on ImageNet VID.
-
To perform experiments, run the python script with the corresponding config file as input. For example, to train and test LSFA with R-FCN, use the following command
python experiments/dff_rfcn/dff_rfcn_end2end_train_test.py --cfg experiments/dff_rfcn/cfgs/resnet_v1_101_flownet_imagenet_vid_rfcn_end2end_ohem.yaml
A cache folder would be created automatically to save the model and the log under
output/dff_rfcn/imagenet_vid/
. -
Please find more details in config files and in our code.
The code of LSFA on is based on
Thanks for the contribution of the above repositories.