Skip to content

Efficient modeling of future context for image captioning

Notifications You must be signed in to change notification settings

feizc/Future-Caption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FutureCap

"a future in the past" -- Assassin's Creed

This repository contains the reference code for the paper Efficient Modeling of Future Context for Image Captioning. In this paper, we aims to utilize mask-based non-autoregressive image caption (NAIC) model to improve the performance of conventional image captioning model with dynamic distribution calibration. As NAIC model is applied to calibrate the generated sentence, the length predictor is dropped.

1. Requirements

torch==1.10.1

transformers==4.11.3

clip

2. Dataset

To run the code, annotations and detection features for the COCO dataset are needed. Please download the annotations file annotations.zip and extract it. Image representation are firstly computed with the pre-trained model provided by CLIP.

3. Training

First, run python train_NAIC.py to obtain the non-autoregressive image captioning model, which serves as a teacher model. Then, run python train_combine.py to conduct a distribution calibration of conventional transformer image captioning model.

Training arguments are as followings:

Argument Possible values
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--warmup Warmup value for learning rate scheduling (default: 10000)
--resume_last If used, the training will be resumed from the last checkpoint.
--data_path Path to COCO dataset file
--annotation_folder Path to folder with COCO annotations

4. Evaluation

To reproduce the results reported in our paper, download the pretrained model file from google drive and place it in the ckpt folder.

Run python inference.py using the following arguments:

Argument Possible values
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--data_path Path to COCO dataset file
--annotation_folder Path to folder with COCO annotations

5. Acknowledgements

This repository is based on M2T and Huggingface, and you may refer to it for more details about the code.

About

Efficient modeling of future context for image captioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages