Skip to content
Video classification tools using 3D ResNet
Python
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
generate_result_video add option for video root. Sep 25, 2017
models fix bugs Nov 29, 2017
.gitignore add .DS_Store Nov 29, 2017
LICENSE
README.md fix the link of pretrained models Oct 11, 2018
class_names_list initial commit Sep 25, 2017
classify.py add feature mode Sep 25, 2017
dataset.py change type of segments from float to int Sep 25, 2017
input update usage Sep 25, 2017
main.py add option for video root. Sep 25, 2017
mean.py fix bugs Sep 25, 2017
model.py add last_fc option for resnet34 Feb 2, 2018
opts.py add model options Nov 27, 2017
spatial_transforms.py fix bugs Sep 25, 2017
temporal_transforms.py initial commit Sep 25, 2017
test.py initial commit Sep 25, 2017
train.py
validation.py initial commit Sep 25, 2017

README.md

Video Classification Using 3D ResNet

This is a pytorch code for video (action) classification using 3D ResNet trained by this code.
The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes.
This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode.
In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames.

Torch (Lua) version of this code is available here.

Requirements

conda install pytorch torchvision cuda80 -c soumith
  • FFmpeg, FFprobe
wget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz
tar xvf ffmpeg-release-64bit-static.tar.xz
cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;
  • Python 3

Preparation

  • Download this code.
  • Download the pretrained model.
    • ResNeXt-101 achieved the best performance in our experiments. (See paper in details.)

Usage

Assume input video files are located in ./videos.

To calculate class scores for each 16 frames, use --mode score.

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode score

To visualize the classification results, use generate_result_video/generate_result_video.py.

To calculate video features for each 16 frames, use --mode feature.

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode feature

Citation

If you use this code, please cite the following:

@article{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  journal={arXiv preprint},
  volume={arXiv:1711.09577},
  year={2017},
}
You can’t perform that action at this time.