Pytorch implementation for the paper "Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos". In this work, different 'Spatiotemporal Modeling Blocks' are analyzed for the architecture illustrated at the above below.
The structure was inspired by the project TRN-pytorch
Results and Pretrained Models
The pretrained models can be found in our Google Drive.
Clone the repo with the following command:
git clone firstname.lastname@example.org:fubel/stmodeling.git
Setup in virtual environment
The project requirements can be found in the file
requirements.txt. To run the code, create a Python >= 3.6 virtual environment and install
the requirements with
pip install -r requirements.txt
NOTE: This project assumes that you have a GPU with CUDA support.
Download the jester dataset or something-something-v2 dataset. Decompress them into the same folder and use process_dataset.py to generate the index files for train, val, and test split. Poperly set up the train, validatin, and category meta files in datasets_video.py.
To convert the something-something-v2 dataset, you can use the
extract_frames.py from TRN-pytorch.
Assume the structure of data directories is the following:
~/stmodeling/ datasets/ jester/ rgb/ .../ (directories of video samples for Jester) .../ (jpg color frames) something/ rgb/ .../ (directories of video samples for Something-Something) model/ .../(saved models for the last checkpoint and best model)
Running the Code
Currently the following ST Modeling blocks are implemented:
Furthermore, the following backbone feature extractors are implemented:
Followings are some examples for training under different scenarios:
- Train 8-segment network for Jester with MLP and squeeznet backbone
python main.py jester RGB --arch squeezenet1_1 --num_segments 8 \ --consensus_type MLP --batch-size 16
- Train 16-segment network for Something-Something with TRN-multiscale and BNInception backbone
python main.py something RGB --arch BNInception --num_segments 16 \ --consensus_type TRNmultiscale --batch-size 16