This repo contains training and testing code for:
Zehuan Yuan, Jonathan Stroud, Tong Lu, and Jia Deng. Temporal Action Localization by Structured Maximal Sums. CVPR 2017.
Our code requires the following dependencies:
- leveldb-matlab
- denseflow
- cudnn5.0
This repo installs a modified version of Caffe, so it is not a required dependency. However, all caffe dependencies are required.
We recommend a local installation of all dependencies.
GPU(s) are required for optical flow extraction, model training, and testing.
- Clone this repository:
git clone git@github.com:shallowyuan/struct-max-sum.git
-
Rename
Makefile.config.example
toMakefile.config
and edit it following the caffe installation instructions. Note: Our code is only tested onopencv-2.4.13
. -
Make training code:
cmake .
make all -j8
make matcaffe
- Make testing code:
cd action_matlab/tools
make clean
make all
- Additionally install leveldb-matlab in
action_matlab
to make testing work.
Note: our code trains and evaluates on Thumos14. To train or test using other datasets, follow the instructions in data
to prepare the database.
All videos are extracted into individual frames, and stored using leveldb. Dense optical flow is extracted using denseflow.
- Download our pretrained models:
cd data/models
bash ../../get_reference_models.sh
- Extract confidence scores.
Our model first extracts confidence scores for each video, which are saved into
action_matlab/tools/testmat
. The inputtype
tomodel_def
can be used to differentiate between flow and rgb.
cd action_matlab/tools
mkdir testmat
matlab
model_inference(model,model_def,type)
- Get final predictions:
matlab < sliding_infer.m
- Evaluate performance.
Detections from
sliding_infer
are stored inresult.txt
. We evaluate performance of these detections using the official Thumos14 metric.
matlab
TH14evalDet(<detfilename>,<gtpath>,<subset>,<threshold>)
-
Follow the instructions in
data
to prepare the dataset. -
Download our intialization models:
cd data/models
bash ../scripts/get_init_models.sh
-
[Optional] reconfigure models. We have provided the necessary configurations for models in the paper, as well as pre-trained models used for initialization. To try differnet configurations, you can change the prototxt parameters
structsvm_loss_parameter
inmodels/action_detection
. -
Train the RGB model:
bash scripts/train.sh
Models will be saved in models
.
Note: Training has only been tested on one GPU.
Please cite our paper if you use this repository in your research. BibTeX:
@inproceedings{yuan2016temporal,
title={Temporal Action Localization by Structured Maximal Sums},
author={Yuan, Zehuan and Stroud, Jonathan and Lu, Tong and Deng, Jia},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}
Please direct any questions to Zehuan Yuan zhyuan001@gmail.com
.
A large portion of the code is based on Caffe, made possible by the Berkeley Vision and Learning Center and many other contributors.