GitHub - leftthomas/R2Plus1D-C3D: A PyTorch implementation of R2Plus1D and C3D based on CVPR 2017 paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition" and CVPR 2014 paper "Learning Spatiotemporal Features with 3D Convolutional Networks"

R2Plus1D-C3D

A PyTorch implementation of R2Plus1D and C3D based on CVPR 2017 paper A Closer Look at Spatiotemporal Convolutions for Action Recognition and CVPR 2014 paper Learning Spatiotemporal Features with 3D Convolutional Networks.

Requirements

conda install pytorch torchvision -c pytorch

opencv

conda install opencv

rarfile

pip install rarfile

rar

sudo apt install rar

unrar

sudo apt install unrar

ffmpeg

sudo apt install build-essential openssl libssl-dev autoconf automake cmake git-core libass-dev libfreetype6-dev libsdl2-dev libtool libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texinfo wget zlib1g-dev nasm yasm libx264-dev libx265-dev libnuma-dev libvpx-dev libfdk-aac-dev libmp3lame-dev libopus-dev
wget https://ffmpeg.org/releases/ffmpeg-4.1.3.tar.bz2
tar -jxvf ffmpeg-4.1.3.tar.bz2
cd ffmpeg-4.1.3/
./configure --prefix="../build" --enable-static --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree --enable-openssl
make -j4
make install
sudo cp ../build/bin/ffmpeg /usr/local/bin/ 
rm -rf ../ffmpeg-4.1.3/ ../ffmpeg-4.1.3.tar.bz2 ../build/

youtube-dl

pip install youtube-dl

joblib

pip install joblib

PyTorchNet

pip install git+https://github.com/pytorch/tnt.git@master

Datasets

The datasets are coming from UCF101、 HMDB51 and KINETICS600. Download UCF101 and HMDB51 datasets with train/val/test split files into data directory. We use the split1 to split files. Run misc.py to preprocess these datasets.

For KINETICS600 dataset, first download train/val/test split files into data directory, then run download.py to download and preprocess this dataset.

Usage

Train Model

visdom -logging_level WARNING & python train.py --num_epochs 20 --pre_train kinetics600_r2plus1d.pth
optional arguments:
--data_type                   dataset type [default value is 'ucf101'](choices=['ucf101', 'hmdb51', 'kinetics600'])
--gpu_ids                     selected gpu [default value is '0,1']
--model_type                  model type [default value is 'r2plus1d'](choices=['r2plus1d', 'c3d'])
--batch_size                  training batch size [default value is 8]
--num_epochs                  training epochs number [default value is 100]
--pre_train                   used pre-trained model epoch name [default value is None]

Visdom now can be accessed by going to 127.0.0.1:8097 in your browser.

Inference Video

python inference.py --video_name data/ucf101/ApplyLipstick/v_ApplyLipstick_g04_c02.avi
optional arguments:
--data_type                   dataset type [default value is 'ucf101'](choices=['ucf101', 'hmdb51', 'kinetics600'])
--model_type                  model type [default value is 'r2plus1d'](choices=['r2plus1d', 'c3d'])
--video_name                  test video name
--model_name                  model epoch name [default value is 'ucf101_r2plus1d.pth']

The inferences will show in a pop up window.

Benchmarks

Adam optimizer (lr=0.0001) is used with learning rate scheduling.

For ucf101 and hmdb51 dataset, the models are trained with 100 epochs and batch size of 8 on one NVIDIA Tesla V100 (32G) GPU.

For kinetics600 dataset, the models are trained with 100 epochs and batch size of 32 on two NVIDIA Tesla V100 (32G) GPU. Because the training time is too long, so this experiment have not been finished.

The videos are preprocessed as 32 frames of 128x128, and cropped to 112x112.

Dataset	UCF101	HMDB51	Kinetics600
Num. of Train Videos	9,537	3,570	375,008
Num. of Val Videos	756	1,666	28,638
Num. of Test Videos	3,783	1,530	56,982
Num. of Classes	101	51	600
Accuracy (R2Plus1D)	63.60%	24.97%	\
Accuracy (C3D)	51.63%	25.10%	\
Num. of Parameters (R2Plus1D)	33,220,990	33,195,340	33,476,977
Num. of Parameters (C3D)	78,409,573	78,204,723	80,453,976
Training Time (R2Plus1D)	19.3h	7.3h	350h
Training Time (C3D)	10.9h	4.1h	190h

Results

The train/val/test loss、accuracy and confusion matrix are showed on visdom. The pretrained models can be downloaded from BaiduYun (access code: ducr).

UCF101

R2Plus1D C3D

HMDB51

R2Plus1D C3D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

epochs

epochs

models

models

results

results

statistics

statistics

README.md

README.md

download.py

download.py

inference.py

inference.py

misc.py

misc.py

train.py

train.py

utils.py

utils.py

Repository files navigation

R2Plus1D-C3D

Requirements

Datasets

Usage

Train Model

Inference Video

Benchmarks

Results

UCF101

HMDB51

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1,049 Commits
data		data
epochs		epochs
models		models
results		results
statistics		statistics
README.md		README.md
download.py		download.py
inference.py		inference.py
misc.py		misc.py
train.py		train.py
utils.py		utils.py

leftthomas/R2Plus1D-C3D

Folders and files

Latest commit

History

Repository files navigation

R2Plus1D-C3D

Requirements

Datasets

Usage

Train Model

Inference Video

Benchmarks

Results

UCF101

HMDB51

About

Topics

Resources

Stars

Watchers

Forks

Languages