Review Network for Caption Generation
Image Captioning on MSCOCO
You can use the code in this repo to genearte a MSCOCO evaluation server submission with CIDEr=0.96+ with just a few hours.
No fine-tuning required. No fancy tricks. Just train three end-to-end review networks and do an ensemble.
- Feature extraction: 2 hours in parallel
- Single model training: 6 hours
- Ensemble model training: 30 mins
- Beam search for caption generation: 3 hours in parallel
Below is a comparison with other state-of-the-art systems (with according published papers) on the MSCOCO evaluation server:
|Model||BLEU-4||METEOR||ROUGE-L||CIDEr||Fine-tuned||Task specific features|
In the diretcory
image_caption_online, you can use the code therein to reproduce our evaluation server results.
In the directory
image_caption_offline, you can rerun experiments in our paper using offline evaluation.
Predicting comments for a piece of source code is another interesting task. In the repo we also release a dataset with train/dev/test splits, along with the code of a review network.
Check out the directory
Below is a comparison with baselines on the code captioning dataset:
|LSTM Language Model||-5.34||0.2340||0.2763||0.3000||0.3153||0.3290|
|Attentive Encoder-Decoder (Bidir)||-5.14||0.2716||0.3152||0.3364||0.3523||0.3651|
This repo contains the code and data used in the following paper:
Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen