Skip to content
Code for the paper "Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos"
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
jester
modules
ops
something
.gitignore
LICENSE
README.md
dataloader.py
dataset.py
datasets_video.py
main.py
models.py
opts.py
process_dataset.py
requirements.txt
transforms.py

README.md

Spatio-Temporal Modeling

Pytorch implementation for the paper "Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos". In this work, different 'Spatiotemporal Modeling Blocks' are analyzed for the architecture illustrated at the above below.

Maintainers: Okan Köpüklü and Fabian Herzog

The structure was inspired by the project TRN-pytorch

Results and Pretrained Models

The pretrained models can be found in our Google Drive.

Setup

Clone the repo with the following command:

git clone git@github.com:fubel/stmodeling.git

Setup in virtual environment

The project requirements can be found in the file requirements.txt. To run the code, create a Python >= 3.6 virtual environment and install the requirements with

pip install -r requirements.txt

NOTE: This project assumes that you have a GPU with CUDA support.

Dataset Preparation

Download the jester dataset or something-something-v2 dataset. Decompress them into the same folder and use process_dataset.py to generate the index files for train, val, and test split. Poperly set up the train, validatin, and category meta files in datasets_video.py. To convert the something-something-v2 dataset, you can use the extract_frames.py from TRN-pytorch.

Assume the structure of data directories is the following:

~/stmodeling/
   datasets/
      jester/
         rgb/
            .../ (directories of video samples for Jester)
                .../ (jpg color frames)
      something/
         rgb/    
            .../ (directories of video samples for Something-Something)
    model/
       .../(saved models for the last checkpoint and best model)

Running the Code

Currently the following ST Modeling blocks are implemented:

  • MLP
  • TRNmiltiscale
  • RNN_TANH
  • RNN_RELU
  • LSTM
  • GRU
  • BLSTM
  • FCN

Furthermore, the following backbone feature extractors are implemented:

  • squeezenet1_1
  • BNInception

Followings are some examples for training under different scenarios:

  • Train 8-segment network for Jester with MLP and squeeznet backbone
python main.py jester RGB --arch squeezenet1_1 --num_segments 8 \
--consensus_type MLP --batch-size 16
  • Train 16-segment network for Something-Something with TRN-multiscale and BNInception backbone
python main.py something RGB --arch BNInception --num_segments 16 \ 
--consensus_type TRNmultiscale --batch-size 16

Reference

Acknowledgement

This project was build on top of TRN-pytorch, which itself was build on top of TSN-Pytorch. We thank the authors for sharing their code publicly.

You can’t perform that action at this time.