Introduction

This repository implements the method which is presented in the following paper:

[Memory-augmented Attention Modelling for Videos] (https://arxiv.org/abs/1611.02261/)

If you find this code useful in your research, please cite:

@article{Fakoor16,
  author    = {Rasool Fakoor and
               Abdel{-}rahman Mohamed and
               Margaret Mitchell and
               Sing Bing Kang and
               Pushmeet Kohli},
  title     = {Memory-augmented Attention Modelling for Videos},
  journal   = {CoRR},
  volume    = {abs/1611.02261},
  year      = {2016},
  url       = {http://arxiv.org/abs/1611.02261},
}

Code setup

Step 0) Install required packages

sudo add-apt-repository ppa:mc3man/trusty-media sudo apt-get update sudo apt-get dist-upgrade sudo apt-get install ffmpeg python-opencv sudo pip install scipy numpy

install opencv

It is better to install opencv from source not from repro sudo apt-get install python-opencv

install torch

luarocks install torch && luarocks install image && luarocks install sys && luarocks install nn && luarocks install optim && luarocks install lua-cjson && luarocks install cutorch && luarocks install cunn && luarocks install loadcaffe

Add coco-caption eval codes

Go to https://github.com/tylin/coco-caption/tree/master/pycocoevalcap Download the following folders and add them to eval_caption/

bleu/
cider/
meteor/
rouge/
tokenizer/

Step 1)

Download Data from http://upplysingaoflun.ecn.purdue.edu/~yu239/datasets/youtubeclips.zip

Download VGG16 pretrained model and copy in ~/Data/vgg_pre: http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel

Step 2) unzip data:

unzip youtubeclips.zip let's assume data are in ~/Data/youtubeclips-dataset

Step 3) Prepare data [it takes a couple of hours]

-create the following folders

mdkir ~/Data/YouTubeClip_mp4
mkdir ~/Data/Youtube_frames_8
mkdir ~/Data/Y_8_data

python -u scripts/convert_aviTompg.py --video_dir ~/Data/youtubeclips-dataset --output ~/Data/YouTubeClip_mp4 python -u scripts/build_frames.py --clip_dir ~/Data/YouTubeClip_mp4 --output ~/Data/Youtube_frames_8 --num_frames 8 --frame_type continuous

Step 4) Preprocess Data

python -u scripts/data_prep.py --frame_dir ~/Data/Youtube_frames_8 --input_json Youtube/YT_40_raw_all.json --max_length 30 --output_json ~/Data/Y_8_data/YT_8_len30.json --output_h5 ~/Data/Y_8_data/YT_8_len30.h5 --dataset_name YT_all --only_test 0 --word_count_threshold 0

Step 5) Train the model and Report results

CUDA_VISIBLE_DEVICES=0 th train_SeqToSeq_MeMLocSoft_R2.lua -cnn_proto ~/Data/vgg_pre/VGG_ILSVRC_19_layers_deploy.prototxt -input_h5 ~/Data/Y_8_data/YT_8_len30.h5 -json_file ~/Data/Y_8_data/YT_8_len30.json -f_gt Youtube/YT_40_captions_val.json -checkpoint_name ~/Data/cv/yt_n -log_id mylog_mlsnnet_y_w11111 -cnn_model ~/Data/vgg_pre/VGG_ILSVRC_19_layers.caffemodel

CUDA_VISIBLE_DEVICES=0 th eval_SeqToSeq_MemLocSoft_R2.lua -gpu_id 0 -split test -input_h5 ~/Data/Y_8_data/YT_8_len30.h5 -json_file ~/Data/Y_8_data/YT_8_len30.json -f_gt Youtube/YT_40_captions_test.json -gpu_backend cuda -checkpoint_name ~/Data/cv/yt_test -init_from /Data/cv/yt_n/mylog_mlsnnet_y_w11111.t7

Acknowledgements

The structure of this codebase is inspired by https://github.com/karpathy/neuraltalk2. In addation, some functions from https://github.com/karpathy/neuraltalk2 have been re-written/changed in this codebase which are [mostly] excpliclty mentioned in my code.

Please contact me (@rasoolfa) if you find a bug or problem with this code.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Youtube		Youtube
cnn_arch		cnn_arch
component		component
eval_caption		eval_caption
misc		misc
models		models
scripts		scripts
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
eval_SeqToSeq_MemLocSoft_R2.lua		eval_SeqToSeq_MemLocSoft_R2.lua
train_SeqToSeq_MeMLocSoft_R2.lua		train_SeqToSeq_MeMLocSoft_R2.lua

License

rasoolfa/videocap

Folders and files

Latest commit

History

Repository files navigation

Introduction

Code setup

Step 0) Install required packages

install opencv

install torch

Add coco-caption eval codes

Step 1)

Step 2) unzip data:

Step 3) Prepare data [it takes a couple of hours]

Step 4) Preprocess Data

Step 5) Train the model and Report results

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages