NIC model

A pytorch implementation of "Show and Tell: A Neural Image Caption Generator".
Add SCST training from "Self-critical Sequence Training for Image Captioning".
Clear and easy to learn.

Environment

Python 3.7
Pytorch 1.3.1

Method

1. Architecture

2. Main Process

Usage

1. Preprocessing

Extract image features by ResNet-101 (denoted as grid-based features) and process coco captions data (from Karpathy splits) through preprocess.py. Need to adjust the parameters, where resnet101_file comes from here. Image features can also be obtained from here or extracted using ezeli/bottom_up_features_extract repository (from Bottom-Up Attention paper, using fixed 36 features per image, denoted as region-based features).

This project is not limited to the MSCOCO dataset, but you need to process your data according to the data format in the preprocess.py file.

2. Training

First adjust the parameters in opt.py:
- train_mode: 'xe' for pre-training, 'rl' for fine-tuning (+SCST).
- learning_rate: '4e-4' for xe, '4e-5' for rl.
- resume: resume training from this checkpoint. required for rl.
- other parameters can be modified as needed.
Run:
- python train.py
- checkpoint save in checkpoint dir, test result save in result dir.

3. Test

python test.py -t model.pth -i image.jpg
only applicable to the model trained by grid-based features.
for region-based features, you can first extract the image feature through ezeli/bottom_up_features_extract repository, and then simply modify the test.py file to use.

Result

Evaluation metrics

Evaluation tool: ezeli/caption_eval

XE represents Cross-Entropy loss, and +SCST means using reinforcement learning to fine-tune the model (using CIDEr reward).

features	training	Bleu-1	Bleu-2	Bleu-3	Bleu-4	METEOR	ROUGE_L	CIDEr	SPICE
grid-based	XE	71.9	55.0	41.2	30.9	24.7	53.0	94.7	17.8
grid-based	+SCST	75.0	57.4	41.9	30.2	24.9	53.8	103.1	17.9
region-based	XE	72.5	55.8	42.0	31.7	25.2	53.7	97.0	17.9
region-based	+SCST	75.5	58.1	42.7	30.9	25.2	54.3	105.4	18.4

Examples


a man is sitting on a motorcycle.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data/demos		data/demos
method_figs		method_figs
models		models
self_critical		self_critical
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
eval.py		eval.py
opts.py		opts.py
preprocess.py		preprocess.py
test.py		test.py
train.py		train.py

License

ezeli/NIC_model

Folders and files

Latest commit

History

Repository files navigation

NIC model

Environment

Method

1. Architecture

2. Main Process

Usage

1. Preprocessing

2. Training

3. Test

Result

Evaluation metrics

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Languages