Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition
Kiyoon Kim, Shreyank N Gowda, Oisin Mac Aodha, Laura Sevilla-Lara
In BMVC 2022. arXiv
Presentation video
conda create -n videoai python=3.9
conda activate videoai
conda install pytorch==1.12.1 torchvision cudatoolkit=10.2 -c pytorch
### For RTX 30xx GPUs,
#conda install pytorch==1.12.1 torchvision cudatoolkit=11.3 -c pytorch
git clone --recurse-submodules https://github.com/kiyoon/channel_sampling
cd channel_sampling
git submodule update --recursive
cd submodules/video_datasets_api
pip install -e .
cd ../experiment_utils
pip install -e .
cd ../..
pip install -e .
Optional: Pillow-SIMD and libjepg-turbo to improve dataloading performance.
Run this at the end of the installation:
conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
pip uninstall -y pillow pil jpeg libtiff libjpeg-turbo
conda install -yc conda-forge libjpeg-turbo
CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
conda install -y jpeg libtiff
- Download the dataset and annotations. Rename the directories into
frames
andannotations
, and put them indata/something-something-v1
. - Generate splits.
conda activate videoai
python tools/datasets/generate_somethingv1_splits.py data/something-something-v1/splits_frames data/something-something-v1/annotations --root data/something-something-v1/frames --mode frames
- Download the dataset and annotations. Rename the directories into
videos
andannotations
, and put them indata/something-something-v2
. - Extract videos into frames of images, to folder
data/something-something-v2/frames_q5
.
submodules/video_datasets_api/tools/something-something-v2/extract_frames.sh data/something-something-v2/videos data/something-something-v2/frames_q5
- Generate splits.
conda activate videoai
python tools/datasets/generate_somethingv2_splits.py data/something-something-v2/splits_frames data/something-something-v2/annotations data/something-something-v2/frames_q5 --mode frames
- Core implementation of reordering methods is in
pyvideoai/utils/tc_reordering.py
. - See
exp_configs/ch_tcgrey
for experiment settings.
For example, in order to run TSM model, GreyST method on the Something-Something-V1 dataset, you should runexp_configs/ch_tcgrey/something_v1/tsm_resnet50_nopartialbn-GreyST_8frame.py
, the command of which would be:
# Run training
tools/run_singlenode.sh train 4 -R ~/experiment_root -D something_v1 -M tsm_resnet_nopartialbn -E GreyST_8frame -c:e tcgrey
# Run evaluation
tools/run_singlenode.sh eval 4 -R ~/experiment_root -D something_v1 -M tsm_resnet_nopartialbn -E GreyST_8frame -c:e tcgrey
#!/bin/bash
exp_root="$HOME/experiments" # Experiment results will be saved here.
export CUDA_VISIBLE_DEVICES=0
num_gpus=1
subfolder="test_run" # Name subfolder as you like.
## Choose the dataset
dataset=something_v1
#dataset=something_v2
#dataset=cater_task2
#dataset=cater_task2_cameramotion
## Choose the model
model=tsn_resnet50
#model=trn_resnet50
#model=mtrn_resnet50
#model=tsm_resnet50_nopartialbn # NOTE: use tsm_resnet50 for CATER experiments
#model=mvf_resnet50_nopartialbn # NOTE: use mvf_resnet50 for CATER experiments
## Choose the sampling method.
## NOTE: Use 32 frame for CATER experiments.
exp_name="RGB_8frame"
#exp_name="TC_8frame"
#exp_name="TCPlus2_8frame"
#exp_name="GreyST_8frame"
# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e tcgrey -S "$subfolder" #--wandb_project kiyoon_kim_tcgrey
# Evaluating script
# -l -2 loads the best model
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e tcgrey -S "$subfolder" -l -2 -p #--wandb
If you find our work or code useful, please cite:
@inproceedings{kim2022capturing,
author = {Kiyoon Kim and Shreyank N Gowda and Oisin Mac Aodha and Laura Sevilla-Lara},
title = {Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year = {2022},
url = {https://bmvc2022.mpi-inf.mpg.de/0355.pdf}
}
This repository is a fork of PyVideoAI framework.
Learn how to use it with PyVideoAI-examples notebooks.