Image caption generator using Chainer

Python 3 and ResNet feature version by @milhidaka

Including caption generation demo on web browser using WebDNN.

Requirement

Python 3.6
Chainer 2.0.0
Cupy 1.0.0
Pillow

Usage (only caption generation)

Simply doing caption generation using pre-trained model (ResNet-50 + MSCOCO)

Download caption_gen_resnet.model (45MB) and dataset_coco.pkl (28MB).

$ python src/generate_caption.py -s dataset_coco.pkl -m caption_gen_resnet.model -l image/list.txt -g 0

Options:

-s, sentence: (required) sentence dataset file path.
-m, --model: (required) trained model file path.
-l, --list: (required) image path list file.
-g, --gpu: (optional) GPU index. -1 means CPU.

Convert model to WebDNN (browser demo)

$ python src/convert_webdnn.py --sentence dataset_coco.pkl --model caption_gen_resnet.model --example_image image/asakusa.jpg

Then start a HTTP server (python -m http.server) and go to http://localhost:8000/webdnn.

Usage (training model using MSCOCO dataset)

Download dataset

Download images (2014) from http://mscoco.org/dataset/#download and extract to some directory.
Download caption_datasets.zip from: http://cs.stanford.edu/people/karpathy/deepimagesent/
Extract downloaded zip file, and you'll get dataset_coco.json.

Convert dataset

$ python src/convert_dataset.py dataset_coco.json dataset_coco.pkl

Parameters:

sentence JSON file of dataset.
output pkl file.

Extract ResNet feature

$ python src/extract_resnet_feat.py dataset_coco.json /path/to/coco/images resnet_feat.mat -g 0 -b 16

Options:

sentence JSON file of dataset.
Top-level directory containing images. Searches files recursively.
output feature matrix file. (becomes about 1GB)
-g, --gpu: (optional) GPU index. -1 means CPU.
-b, --batchsize: (optional) batch size for extracting feature.

It will take several hours.

Train dataset

$ python src/train.py -g 0 -s dataset_coco.pkl -i resnet_feats.mat -o model/caption_gen

Options:

-g, --gpu: (optional) GPU device index (default: -1).
-s, --sentence: (required) sentence dataset file path.
-i, --image: (required) image feature file path.
-m, --model: (optional) input model file path without extension.
-o, --output: (required) output model file path without extension.
--iter: (optional) the number of iterations (default: 100).

Image path list file sample

image/asakusa.jpg
image/tree.jpg

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Image caption generator using Chainer

Requirement

Usage (only caption generation)

Simply doing caption generation using pre-trained model (ResNet-50 + MSCOCO)

Convert model to WebDNN (browser demo)

Usage (training model using MSCOCO dataset)

Download dataset

Convert dataset

Extract ResNet feature

Train dataset

Image path list file sample

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Image caption generator using Chainer

Requirement

Usage (only caption generation)

Simply doing caption generation using pre-trained model (ResNet-50 + MSCOCO)

Convert model to WebDNN (browser demo)

Usage (training model using MSCOCO dataset)

Download dataset

Convert dataset

Extract ResNet feature

Train dataset

Image path list file sample

License