smth-smth-v2-baseline-with-models

Contains code and pretrained models to get you started with a baseline on version 2 of "something-something" dataset

Paper: https://arxiv.org/abs/1706.04261
Data and Leaderboard: https://20bn.com/datasets/something-something/v2

Performance of pre-trained model on validation set:

Model	top-1	top-5
model3D_1	49.88%	78.82%
model3D_1_224	47.67%	77.35%
model3D_1 with left-right augmentation and fps jitter	51.33%	80.46%

Prerequisites

Conda - manages Python environment and dependencies
Run conda update conda to ensure the package is up-to-date

Setting up

Installation

Clone this repo and https://github.com/jacobgil/pytorch-grad-cam (for obtaining saliency maps see section for more details
Move into this repo's root directory
Setup python environment - conda env update
- This will setup an environment named - smth
Activate python environment - source activate smth

Download the dataset

The dataset is provided in the form of videos in webm format using VP9 encoding, occupying a total size of 19.4 GB. The videos are in landscape format with height (the shorter side) of 240px at 12 frames/sec.

Follow instructions on the data page
Download the json files to fetch annotations of the data

Modify config file to include the above paths

In any configuration file (e.g. configs/config_example.json), modify the

path to data: data_folder
path to JSONs: json_data_train, json_data_val, json_data_test

How to train from scratch?

Run: CUDA_VISIBLE_DEVICES=0,1 python train.py -c configs/config_example.json -g 0,1 --use_cuda

where,

CUDA_VISIBLE_DEVICES: environment variable to specify GPU ids to use. (Note: uses all gpus if not specified)

Hyperparameters

Please refer to config file at: configs/config_example.json

batch_size: 30 - change this to fill your GPU memory (Note: should be a multiple of number of gpus used)
num_workers: 5: number of parallel processes to fetch and pre-process data (increase to max possible CPU cores you have to get better GPU utilisation)
lr: 0.008 - increase it if you happen to increase the batch size
clip_size: 72 - number of frames in a video sample as input to the model (which at default 12 fps covers 6 secs)
step_size_train: 1 - factor by which FPS is reduced (so a step size of 2 would mean an fps of 6)
input_spatial_size: 84 - dimension of each frame in input is scaled and cropped to 84x84, but you can use the ubiquitous frame size of 224x224, since the data is provided with height of 240px in landscape format
column_units: 512: desired number of units in feature space for each sample

How to use a pre-trained model?

Pre-trained models are available in directory: trained_models/pretrained/ With their respective config files here: configs/pretrained/
We provide a vanilla implementation of consisting of 11 layers of 3D convolutions. Please refer here: model3D_1.py
Use the notebook to get predictions from these models

Test model and get submission file on test data

Modify path to model file in checkpoint variable of config file

CUDA_VISIBLE_DEVICES=0,1 python train.py -c configs/pretrained/config_model1.json -g 0,1 -r -e --use_cuda

The options used here are:

-r: to resume an already trained model
-e: to evaluate the model on test data

Grad-CAM

Use the notebook to visualize saliency maps of any example from validation set

e.g.

Id of the video sample = 56620
True label --> 12 (Dropping something onto something)

Top-5 Predictions:
Top 1 :== Dropping something next to something. Prob := 41.51%
Top 2 :== Throwing something. Prob := 13.26%
Top 3 :== Throwing something onto a surface. Prob := 8.92%
Top 4 :== Something falling like a rock. Prob := 8.68%
Top 5 :== Dropping something onto something. Prob := 4.55%

Predicted index chosen = 11 (Dropping something next to something)

Commonsense score

Use the notebook to fetch commonsense score using contrastive groups list in directory assets/

For more details, please refer: https://openreview.net/pdf?id=rkX9Z_kwf

LICENSE

See the file LICENSE for details. Some code snippets have been taken from Keras (see LICENSE_keras) and the PyTorch (see LICENSE_pytorch). See comments in the source code for details.

Reference

If you use our code, dataset or pre-trained models, please cite our paper:

@inproceedings{goyal2017something,
  title={The” something something” video database for learning and evaluating visual common sense},
  author={Goyal, Raghav and Kahou, Samira Ebrahimi and Michalski, Vincent and Materzynska, Joanna and Westphal, Susanne and Kim, Heuna and Haenel, Valentin and Fruend, Ingo and Yianilos, Peter and Mueller-Freitag, Moritz and others}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
configs		configs
models		models
notebooks		notebooks
.gitignore		.gitignore
2		2
LICENSE		LICENSE
LICENSE_keras		LICENSE_keras
LICENSE_pytorch		LICENSE_pytorch
README.md		README.md
all_mysd.py		all_mysd.py
all_plot.py		all_plot.py
all_plot2.py		all_plot2.py
all_sd_acc.toml		all_sd_acc.toml
both_selected_dropout_plot.py		both_selected_dropout_plot.py
boxcount.py		boxcount.py
callbacks.py		callbacks.py
data_augmentor.py		data_augmentor.py
data_loader_av.py		data_loader_av.py
data_loader_skvideo.py		data_loader_skvideo.py
data_loader_uiuc.py		data_loader_uiuc.py
data_parser.py		data_parser.py
distance_plot.py		distance_plot.py
e_train.py		e_train.py
en_train.py		en_train.py
entropy.py		entropy.py
entropy_fix_clstm1024.toml		entropy_fix_clstm1024.toml
entropy_fix_clstm128.toml		entropy_fix_clstm128.toml
entropy_fix_clstm16.toml		entropy_fix_clstm16.toml
entropy_fix_clstm256.toml		entropy_fix_clstm256.toml
entropy_fix_clstm32.toml		entropy_fix_clstm32.toml
entropy_fix_clstm512.toml		entropy_fix_clstm512.toml
entropy_fix_clstm64.toml		entropy_fix_clstm64.toml
entropy_fix_crnn1024.toml		entropy_fix_crnn1024.toml
entropy_fix_crnn2048.toml		entropy_fix_crnn2048.toml
entropy_fix_crnn4096.toml		entropy_fix_crnn4096.toml
entropy_fix_crnn8192.toml		entropy_fix_crnn8192.toml
entropy_fix_sclstm128.toml		entropy_fix_sclstm128.toml
entropy_fix_sclstm16.toml		entropy_fix_sclstm16.toml
entropy_fix_sclstm32.toml		entropy_fix_sclstm32.toml
entropy_fix_sclstm64.toml		entropy_fix_sclstm64.toml
entropy_selected_dropout_acc.toml		entropy_selected_dropout_acc.toml
entropy_selected_dropout_plot.py		entropy_selected_dropout_plot.py
environment.yml		environment.yml
fix_boxcount.py		fix_boxcount.py
fix_select_train.py		fix_select_train.py
fix_train.py		fix_train.py
grad_cam_videos.py		grad_cam_videos.py
heat_scatter.py		heat_scatter.py
mi_en_plot.py		mi_en_plot.py
mi_train.py		mi_train.py
mutual_info.py		mutual_info.py
mutual_info_fix_clstm1024.toml		mutual_info_fix_clstm1024.toml
mutual_info_fix_clstm128.toml		mutual_info_fix_clstm128.toml
mutual_info_fix_clstm16.toml		mutual_info_fix_clstm16.toml
mutual_info_fix_clstm256.toml		mutual_info_fix_clstm256.toml
mutual_info_fix_clstm32.toml		mutual_info_fix_clstm32.toml
mutual_info_fix_clstm512.toml		mutual_info_fix_clstm512.toml
mutual_info_fix_clstm64.toml		mutual_info_fix_clstm64.toml
mutual_info_fix_crnn1024.toml		mutual_info_fix_crnn1024.toml
mutual_info_fix_crnn128.toml		mutual_info_fix_crnn128.toml
mutual_info_fix_crnn2048.toml		mutual_info_fix_crnn2048.toml
mutual_info_fix_crnn4096.toml		mutual_info_fix_crnn4096.toml
mutual_info_fix_crnn8192.toml		mutual_info_fix_crnn8192.toml
mutual_info_fix_sclstm128.toml		mutual_info_fix_sclstm128.toml
mutual_info_fix_sclstm16.toml		mutual_info_fix_sclstm16.toml
mutual_info_fix_sclstm32.toml		mutual_info_fix_sclstm32.toml
mutual_info_fix_sclstm64.toml		mutual_info_fix_sclstm64.toml
mutual_info_test.toml		mutual_info_test.toml
mutual_train.py		mutual_train.py
mysd.py		mysd.py
nohup.out		nohup.out
plan.sh		plan.sh
plan10.sh		plan10.sh
plan11.sh		plan11.sh
plan12.sh		plan12.sh
plan13.sh		plan13.sh
plan14.sh		plan14.sh
plan15.sh		plan15.sh
plan16.sh		plan16.sh
plan17.sh		plan17.sh
plan2.sh		plan2.sh
plan3.sh		plan3.sh
plan4.sh		plan4.sh
plan5.sh		plan5.sh
plan6.sh		plan6.sh
plan7.sh		plan7.sh
plan8.sh		plan8.sh
plan9.sh		plan9.sh
plan_all_mysd.sh		plan_all_mysd.sh
plan_retrain.sh		plan_retrain.sh
plan_select.sh		plan_select.sh
plan_select2.sh		plan_select2.sh
plot_copy.sh		plot_copy.sh
retrain_en_acc.toml		retrain_en_acc.toml
retrain_mi_acc.toml		retrain_mi_acc.toml
retrain_te_acc.toml		retrain_te_acc.toml
run_after_ps.py		run_after_ps.py
scatter.py		scatter.py
scatter_half.py		scatter_half.py

License

latte488/smth-smth-v2

Folders and files

Latest commit

History