requirements

cuda
pytorch 0.3.1
python3(未测试) or python2(已测试,最好统一用py2吧)
ffmpeg (can install using anaconda)

usage

2d特征提取, 如resnet101, nasnet等

sh ./2d_extract_feat.sh
# model 模型选择
# n_frame_steps 一段视频提取多少帧，默认选80吧

3d特征提取

cd c3d_feat_extract
sh ./c3d_feat_extract.sh
# --mode feature 提取特征模式，无需改动
# 以下根据所选模型不同进行更改
# --model_name resnext \
# --model_depth 101 \
# --resnext_cardinality 32 \
# --resnet_shortcut B \
# --model pretrained_models/resnext-101-64f-kinetics.pth

训练

./train_s2vt.sh
# 根据相关配置进行设置，具体选项含义参考opts.py

测试和评分

./eval_s2vt.sh
# 根据相关配置进行设置，具体选项含义参考eval.py

file tree

相关文件下载链接: https://pan.baidu.com/s/1RDNygrWtz_PtVH8nh4vG3w 密码: nxyk

data
│   all_caption.json
│   all_info.json    
│   all_videodatainfo_2017.json
└───feats
│   └───nasnet
│   │   │   videoxxx.npy
│   │   │   ...
│   └───resnet
│   │   │   videoxxx.npy
│   │   │   ... 
│   └───xxnet
│       │   videoxxx.npy
│       │   ... 
└───videos
│   │   videoxxx.mp4
│   │   ...
│
│
新建这些目录
log
checkpoint
result

pytorch implementation of video captioning

recommend installing pytorch and python packages using Anaconda

python packages

tqdm
pillow
pretrainedmodels
nltk

Data

MSR-VTT. Test video doesn't have captions, so I spilit train-viedo to train/val/test. Extract and put them in ./data/ directory

train-video: download link
test-video: download link
json info of train-video: download link
json info of test-video: download link

Options

all default options are defined in opt.py or corresponding code file, change them for your like.

Usage

(Optional) c3d features

you can use video-classification-3d-cnn-pytorch to extract features from video. Then mean pool to get a 2048 dim feature for each video.

Steps

preprocess videos and labels

this steps take about 3 hours for msr-vtt datasets use one titan XP gpu

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152 --n_frame_steps 40  --gpu 4,5

python prepro_vocab.py

Training a model

python train.py --gpu 5,6,7 --epochs 9001 --batch_size 450 --checkpoint_path data/save --feats_dir data/feats/resnet152 --dim_vid 2048 --model S2VTAttModel

test

opt_info.json will be in same directory as saved model.

python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_1000.pth --batch_size 100 --gpu 1,0

Metrics

I fork the coco-caption XgDuan. Thanks to port it to python3.

TODO

lstm
beam search
reinforcement learning

Note

This repository is not maintained, please see my another repository video-caption-openNMT.py. It has higher performence and test score.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
c3d_feat_extract		c3d_feat_extract
coco-caption		coco-caption
misc		misc
models		models
.gitignore		.gitignore
2d_feat_extract.sh		2d_feat_extract.sh
LICENSE		LICENSE
README.md		README.md
caffe_feat_extract.py		caffe_feat_extract.py
caffe_feat_extract.sh		caffe_feat_extract.sh
dataloader.py		dataloader.py
eval.py		eval.py
eval_s2vt.sh		eval_s2vt.sh
finetune_cnn.py		finetune_cnn.py
opts.py		opts.py
prepro_coco.py		prepro_coco.py
prepro_feats.py		prepro_feats.py
prepro_ngrams.py		prepro_ngrams.py
prepro_vocab.py		prepro_vocab.py
train.py		train.py
train_s2vt.sh		train_s2vt.sh
train_s2vt_att.sh		train_s2vt_att.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

requirements

usage

file tree

pytorch implementation of video captioning

python packages

Data

Options

Usage

(Optional) c3d features

Steps

Metrics

TODO

Note

About

Releases

Packages

Languages

License

piperino11/video-caption.pytorch

Folders and files

Latest commit

History

Repository files navigation

requirements

usage

file tree

pytorch implementation of video captioning

python packages

Data

Options

Usage

(Optional) c3d features

Steps

Metrics

TODO

Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages