Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Introduction

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos). It is worth noting that:

We propose a unified framework LSFA for video object detection addressing both detection accuracy and speed.
In LSFA, the short-term feature aggregation method is the first work that uses feature extracted from original image to enhance the propagated features for non-key frames.
On the challenging ImageNet VID dataset, LSFA runs in real-time (30 FPS) with detection accuracy on-par with the state-of-the-art results.

Main Results

For fair comparisons, all the values of testing speed are obtained using a Titan X GPU

Requirements: Software

MXNet from the offical repository. We tested our code on MXNet@(commit 75a9e187d).
Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not suppoort Python 3 yet, if you want to use Python 3 you need to modify the code to make it work.
Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running
```
pip install Cython
pip install opencv-python==3.2.0.6
pip install easydict==1.6
```
We use ffmpeg 3.1.3 to generate mpeg4 raw videos.
We build coviar.so to load compressed representation (I-frame, motion vectors, or residual).

Requirements: Hardware

Any NVIDIA GPUs with at least 8GB memory should be OK

Installation

Clone the LSFA repository, and we'll call the directory that you cloned LSFA as ${LSFA_ROOT}.
```
git clone https://github.com/hustvl/LSFA.git
```
For Linux user, run sh ./init.sh. The scripts will build cython module automatically and create some folders.
Install MXNet:

3.1 Clone MXNet and checkout to MXNet@(commit 75a9e187d) by
```
git clone --recursive https://github.com/apache/incubator-mxnet
git checkout 75a9e187d
git submodule update
```
3.2 Copy operators in $(DFF_ROOT)/dff_rfcn/operator_cxx or $(DFF_ROOT)/rfcn/operator_cxx to $(YOUR_MXNET_FOLDER)/src/operator/contrib by
```
cp -r $(LSFA_ROOT)/dff_rfcn/operator_cxx/* $(MXNET_ROOT)/src/operator/contrib/
```
3.3 Compile MXNet
```
cd ${MXNET_ROOT}
make -j4
```
3.4 Install the MXNet Python binding by

Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4
```
cd python
sudo python setup.py install
```
3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE), and modify MXNET_VERSION in ./experiments/dff_rfcn/cfgs/*.yaml to $(YOUR_MXNET_PACKAGE). Thus you can switch among different versions of MXNet quickly.

Install ffmpeg：

4.1 Clone ffmpeg and checkout to ffmpeg@(commit 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2) by

git clone https://github.com/FFmpeg/FFmpeg.git
git checkout 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2

4.2 Compile ffmpeg

make clean
./configure --prefix=${FFMPEG_INSTALL_PATH} --enable-pic --disable-yasm --enable-shared
make
make install

4.3 If needed, add ${FFMPEG_INSTALL_PATH}/lib/ to $LD_LIBRARY_PATH.

Build coviar_py2.so

cd $(LSFA_ROOT)/external/data_loader_py2
sh install.sh
cp ./build/lib.linux-x86_64-2.7/coviar_py2.so $(LSFA_ROOT)/lib

Preparation for Training & Testing

Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:

./data/ILSVRC2015/
./data/ILSVRC2015/Annotations/DET
./data/ILSVRC2015/Annotations/VID
./data/ILSVRC2015/Data/DET
./data/ILSVRC2015/Data/VID
./data/ILSVRC2015/ImageSets

Use ffmpeg generate mpeg4 raw videos.

sh ./data/reencode_vid ./data/ILSVRC2015/Data/VID/snippets ./data/ILSVRC2015/Data/VID/mpeg4_snippets

For your convenience, we provide the trained models and pretrained_model from Baidu Yun (pwd:493a), and put pretrained_model under folder ./model. put the trained model under folder ./output:

Usage

All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder ./experiments/{rfcn/dff_rfcn}/cfgs.
Two config files have been provided so far, namely, Frame baseline with R-FCN and LSFA with R-FCN for ImageNet VID. We use 4 GPUs to train models on ImageNet VID.
To perform experiments, run the python script with the corresponding config file as input. For example, to train and test LSFA with R-FCN, use the following command
```
python experiments/dff_rfcn/dff_rfcn_end2end_train_test.py --cfg experiments/dff_rfcn/cfgs/resnet_v1_101_flownet_imagenet_vid_rfcn_end2end_ohem.yaml
```
A cache folder would be created automatically to save the model and the log under output/dff_rfcn/imagenet_vid/.
Please find more details in config files and in our code.

Acknowledgement

The code of LSFA on is based on

Thanks for the contribution of the above repositories.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
demo/ILSVRC2015_val_00007010		demo/ILSVRC2015_val_00007010
dff_rfcn		dff_rfcn
experiments		experiments
external/data_loader_py2		external/data_loader_py2
figs		figs
lib		lib
rfcn		rfcn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cvread.JPEG		cvread.JPEG
init.bat		init.bat
init.sh		init.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Introduction

Main Results

Requirements: Software

Requirements: Hardware

Installation

Preparation for Training & Testing

Usage

Acknowledgement

About

Releases

Packages

Languages

License

hustvl/LSFA

Folders and files

Latest commit

History

Repository files navigation

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Introduction

Main Results

Requirements: Software

Requirements: Hardware

Installation

Preparation for Training & Testing

Usage

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages