Skip to content
Video classification tools using 3D ResNet
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
generate_result_video add option for video root. Sep 25, 2017
models fix bugs Nov 29, 2017
.gitignore add .DS_Store Nov 29, 2017
LICENSE fix the link of pretrained models Oct 11, 2018
class_names_list initial commit Sep 25, 2017 add feature mode Sep 25, 2017 change type of segments from float to int Sep 25, 2017
input update usage Sep 25, 2017 add option for video root. Sep 25, 2017 fix bugs Sep 25, 2017 add last_fc option for resnet34 Feb 2, 2018 add model options Nov 27, 2017 fix bugs Sep 25, 2017 initial commit Sep 25, 2017 initial commit Sep 25, 2017 initial commit Sep 25, 2017

Video Classification Using 3D ResNet

This is a pytorch code for video (action) classification using 3D ResNet trained by this code.
The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes.
This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode.
In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames.

Torch (Lua) version of this code is available here.


conda install pytorch torchvision cuda80 -c soumith
  • FFmpeg, FFprobe
tar xvf ffmpeg-release-64bit-static.tar.xz
cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;
  • Python 3


  • Download this code.
  • Download the pretrained model.
    • ResNeXt-101 achieved the best performance in our experiments. (See paper in details.)


Assume input video files are located in ./videos.

To calculate class scores for each 16 frames, use --mode score.

python --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode score

To visualize the classification results, use generate_result_video/

To calculate video features for each 16 frames, use --mode feature.

python --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode feature


If you use this code, please cite the following:

  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  journal={arXiv preprint},
You can’t perform that action at this time.