Temporal Relation Networks
Switch branches/tags
Nothing to show
Clone or download
Latest commit ae3b888 Aug 8, 2018
Permalink
Failed to load latest commit information.
model_zoo @ e31e0b7 added model zoo Dec 2, 2017
ops initial tsn-pytorch release Aug 10, 2017
pretrain Add 3 segment TRN pretrained model Jan 11, 2018
sample_data demo script for using pretrained model Jan 9, 2018
.gitignore initial tsn-pytorch release Aug 10, 2017
.gitmodules updated Dec 30, 2017
LICENSE test model scripts added, data processing code added Dec 2, 2017
README.md Update README.md Aug 8, 2018
TRNmodule.py fix a little bug May 31, 2018
average_scores.py average scores for moments dataset Dec 6, 2017
dataset.py training flow with moments added Dec 4, 2017
datasets_video.py Add utilities and files for Something-Something-V2 Aug 8, 2018
download.sh Add links to download something-something-v2 pretrained models Jul 27, 2018
extract_frames.py Add utilities and files for Something-Something-V2 Aug 8, 2018
main.py test model scripts added, data processing code added Dec 2, 2017
models.py demo script for using pretrained model Jan 9, 2018
opts.py test model scripts added, data processing code added Dec 2, 2017
process_dataset.py test model scripts added, data processing code added Dec 2, 2017
test_models.py update readme to the public Dec 30, 2017
test_rgb_something.sh test model scripts added, data processing code added Dec 2, 2017
test_video.py Merge branch 'master' of https://github.com/metalbubble/TRN-pytorch Jan 10, 2018
test_video.sh demo script for using pretrained model Jan 9, 2018
train_rgb_something.sh typo Dec 3, 2017
transforms.py py3 compatible Dec 4, 2017

README.md

Temporal Relation Networks

We release the code of the Temporal Relation Networks, built on top of the TSN-pytorch codebase.

NEW (July 29, 2018): This work is accepted to ECCV'18, check the paper for the latest result. We also release the state of the art model trained on the Something-Something V2, see following instruction.

Note: always use git clone --recursive https://github.com/metalbubble/TRN-pytorch to clone this project Otherwise you will not be able to use the inception series CNN architecture.

framework

Data preparation

Download the something-something dataset or jester dataset or charades dataset. Decompress them into some folder. Use process_dataset.py to generate the index files for train, val, and test split. Finally properly set up the train, validation, and category meta files in datasets_video.py.

For Something-Something-V2, we provide a utilty script extract_frames.py for converting the downloaded .webm videos into directories containing extracted frames. Additionally, the corresponding optic flow images can be downloaded from here.

Code

Core code to implement the Temporal Relation Network module is TRNmodule. It is plug-and-play on top of the TSN.

Training and Testing

  • The command to train single scale TRN
CUDA_VISIBLE_DEVICES=0,1 python main.py something RGB \
                     --arch BNInception --num_segments 3 \
                     --consensus_type TRN --batch-size 64
  • The command to train multi-scale TRN
CUDA_VISIBLE_DEVICES=0,1 python main.py something RGB \
                     --arch BNInception --num_segments 8 \
                     --consensus_type TRNmultiscale --batch-size 64
  • The command to test the single scale TRN
python test_models.py something RGB model/TRN_something_RGB_BNInception_TRN_segment3_best.pth.tar \
   --arch BNInception --crop_fusion_type TRN --test_segments 3
  • The command to test the multi-scale TRN
python test_models.py something RGB model/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar \
   --arch BNInception --crop_fusion_type TRNmultiscale --test_segments 8

Pretrained models and demo code

cd pretrain
./download_models.sh
  • Download sample video and extracted frames. There will be mp4 video file and a folder containing the RGB frames for that video.
cd sample_data
./download_sample_data.sh

The sample video is the following

result

  • Test pretrained model trained on Something-Something-V2
python test_video.py --arch BNInception --dataset somethingv2 \
    --weight pretrain/TRN_somethingv2_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar \
    --frame_folder sample_data/bolei_juggling

RESULT ON sample_data/bolei_juggling
0.500 -> Throwing something in the air and catching it
0.141 -> Throwing something in the air and letting it fall
0.072 -> Pretending to throw something
0.024 -> Throwing something
0.024 -> Hitting something with something
python test_video.py --arch InceptionV3 --dataset moments \
    --weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar \
    --frame_folder sample_data/bolei_juggling

RESULT ON sample_data/bolei_juggling

0.982 -> juggling
0.003 -> flipping
0.003 -> spinning
0.003 -> smoking
0.002 -> whistling
  • Test pretrained model on mp4 video file
python test_video.py --arch InceptionV3 --dataset moments \
    --weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar \
    --video_file sample_data/bolei_juggling.mp4 --rendered_output sample_data/predicted_video.mp4

The command above uses ffmpeg to extract frames from the supplied video --video_file and optionally generates a new video --rendered_output from the frames used to make the prediction with the predicted category in the top-left corner.

TODO

  • TODO: Web-cam demo script
  • TODO: Visualization script
  • TODO: class-aware data augmentation

Reference:

B. Zhou, A. Andonian, and A. Torralba. Temporal Relational Reasoning in Videos. European Conference on Computer Vision (ECCV), 2018. PDF

@article{zhou2017temporalrelation,
    title = {Temporal Relational Reasoning in Videos},
    author = {Zhou, Bolei and Andonian, Alex and Oliva, Aude and Torralba, Antonio},
    journal={European Conference on Computer Vision},
    year={2018}
}

Acknowledgement

Our temporal relation network is plug-and-play on top of the TSN-Pytorch, but it could be extended to other network architectures easily. We thank Yuanjun Xiong for releasing TSN-Pytorch codebase. Something-something dataset and Jester dataset are from TwentyBN, we really appreciate their effort to build such nice video datasets. Please refer to their dataset website for the proper usage of the data.