NAFAE

Introduction

This project is the Pytorch implementation of Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Losses in CVPR 2019.
Video Grounding Definition: Given a video segment with its language description, the aim is to localize objects query from the description to the video.

Note: this repository only provides the implementation for Finite Class Training mode for YouCookII Dataset.

Prerequisites

Python >= 3.6
Pytorch >= 0.4.0 (<1.0.0)

Installation

Clone the NAFAE repository

git clone https://github.com/jshi31/NAFAE.git

Dependencies

torchtext: torchtext is for obtaining glove feature. Install from torchtext
opencv

Data Preparation

Please download the dataset from YouCookII to prepare YouCookII datasets. We only need the folder raw_videos and the path to it is denoted as $RAW_VIDEO_DIR.
Note: Please ensure that you downloaded all of the 2000 videos. If some videos are missing, please contact the authors to get them.
Parse video into frames

cd $ROOT/data/YouCookII 
python genframes.py --video_dir $RAW_VIDEO_DIR

The generated frames are stored in sampled_frames_splnum-1, under the same parent folder of $RAW_VIDEO_DIR, then build a soft link to project directory as

ln -s $PATH_TO_sampled_frames_splnum-1 $ROOT/data/YouCookII/

Test dataloader:

python $ROOT/lib/datasets/youcook2.py

It is safe if no error reported.

Model Preparation

Create directory $ROOT/models/vgg16/pretrain/

We used faster RCNN with VGG16 backbone pretrained on Visual Gnome for region proposals. Download and put the VGG16 model as $ROOT/models/vgg16/pretrain/faster_rcnn_gnome.pth

Compilation FasterRCNN Layers

As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch in make.sh file, to compile the cuda code:

GPU model	Architecture
TitanX (Maxwell/Pascal)	sm_52
GTX 960M	sm_50
GTX 1080 (Ti)	sm_61
Grid K520 (AWS g2.2xlarge)	sm_30
Tesla K80 (AWS p2.xlarge)	sm_37

Install all the python dependencies using pip:

pip install -r requirements.txt

Compile the cuda dependencies using following simple commands:

cd lib
sh make.sh

It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop.

Training

./train.sh

Evaluation

Evaluate on test set

./test_model.sh

Evaluate on validation set

./eval_model.sh

Please change checksession, checkepoch, checkbatch to the same with the training setting .

Visualization

Visualize groundings Specify the train_vis_freq and val_vis_freq as $n so that the the detected result is visualized in $ROOT/output every $n batches
Visualize training curve

tensorboard --logdir runs

Pretrained Final Model

In order to get the result in our paper, download and put the Final Model ([gcloud] or [baidu cloud, psword:nhq8]) into $ROOT/output/models/vgg16/YouCookII/. Run

./test_model.sh
./eval_model.sh

	macro box accuracy %	macro query accuracy %
val	39.48	41.23
test	40.62	42.36

Citation

If you think this paper or repository is helpful, please cite

@inproceedings{shi2019not,
  title={Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses},
  author={Shi, Jing and Xu, Jia and Gong, Boqing and Xu, Chenliang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={10444--10452},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
cfgs		cfgs
data/YouCookII		data/YouCookII
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_init_paths.py		_init_paths.py
eval_model.sh		eval_model.sh
model.py		model.py
requirements.txt		requirements.txt
test_model.sh		test_model.sh
train_model.sh		train_model.sh

License

jshi31/NAFAE

Folders and files

Latest commit

History

Repository files navigation

NAFAE

Introduction

Prerequisites

Installation

Clone the NAFAE repository

Dependencies

Data Preparation

Model Preparation

Compilation FasterRCNN Layers

Training

Evaluation

Visualization

Pretrained Final Model

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages