Skip to content

A Pytorch implementation of "Reconstruction Network for Video Captioning", CVPR 2018

License

Notifications You must be signed in to change notification settings

hobincar/RecNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RecNet

This project tries to implement RecNet proposed in Reconstruction Network for Video Captioning [1], CVPR 2018.

Environment

  • Ubuntu 16.04
  • CUDA 9.0
  • cuDNN 7.3.1
  • Nvidia Geforce GTX Titan Xp 12GB

Requirements

  • Java 8
  • Python 2.7.12
    • PyTorch 1.0
    • Other python libraries specified in requirements.txt

How to use

Step 1. Setup python virtual environment

$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt

Step 2. Prepare Data

  1. Extract Inception-v4 [2] features from datasets, and locate them at <PROJECT ROOT>/<DATASET>/features/<DATASET>_InceptionV4.hdf5. I extracted the Inception-v4 features from here.

    Dataset Inception-v4
    MSVD link
    MSR-VTT link
  2. Split the dataset along with the official splits by running following:

    (.env) $ python -m splits.MSVD
    (.env) $ python -m splits.MSR-VTT
    

Step 3. Prepare Evaluation Codes

Clone evaluation codes from the official coco-evaluation repo.

(.env) $ git clone https://github.com/tylin/coco-caption.git
(.env) $ mv coco-caption/pycocoevalcap .
(.env) $ rm -rf coco-caption

Step 4. Train

  • Stage 1 (Encoder-Decoder)

    (.env) $ python train.py -c configs.train_stage1
    
  • Stage 2 (Encoder-Decoder-Reconstructor

    Set the pretrained_decoder_fpath of TrainConfig in configs/train_stage2.py as the checkpoint path saved at stage 1, then run

    (.env) $ python train.py -c configs.stage2
    

You can change some hyperparameters by modifying configs/train_stage1.py and configs/train_stage2.py.

Step 5. Inference

  1. Set the checkpoint path by changing ckpt_fpath of RunConfig in configs/run.py.
  2. Run
    (.env) $ python run.py
    

Performances

* NOTE: As you can see, the performance of RecNet does not outperform SA-LSTM. Better hyperparameters should be found out.

  • MSVD

    Model BLEU4 CIDEr METEOR ROUGE_L pretrained
    SA-LSTM 45.3 76.2 31.9 64.2 -
    RecNet (global) 51.1 79.7 34.0 69.4 -
    RecNet (local) 52.3 80.3 34.1 69.8 -
    (Ours) SA-LSTM 50.9 79.6 33.4 69.6 link
    (Ours) RecNet (global) 49.9 78.7 33.2 69.7 link
    (Ours) RecNet (local) 49.8 79.4 33.2 69.6 link
  • MSR-VTT

    Model BLEU4 CIDEr METEOR ROUGE_L pretrained
    SA-LSTM 36.3 39.9 25.5 58.3 -
    RecNet (global) 38.3 41.7 26.2 59.1 -
    RecNet (local) 39.1 42.7 26.6 59.3 -
    (Ours) SA-LSTM 38.0 40.2 25.6 58.1 link
    (Ours) RecNet (global) 37.4 40.0 25.5 58.0 link
    (Ours) RecNet (local) 37.9 40.9 25.7 58.3 link

References

[1] Wang, Bairui, et al. "Reconstruction Network for Video Captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

[2] Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI. Vol. 4. 2017.

About

A Pytorch implementation of "Reconstruction Network for Video Captioning", CVPR 2018

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages