CS231N_Project

Image Captioning on Microsoft Coco Dataset

Getting Started

Clone this repository
Download the Microsoft Coco dataset from http://cocodataset.org/#download. We need 2014 Train images, 2014 Val Images, and 2014 Train/Val annotations. Move the zip files "train2014.zip", "val2014.zip", and "annotations_trainval2014.zip" to the coco/ folder, and unzip them.
Download the glove.6B.300d word vectors from http://nlp.stanford.edu/data/glove.6B.zip. Place the "glove.6B.300d.txt" file in the top-level folder.
Run the "get_started.sh" script. This will re-organize the data files, and build the Coco Python API.
If interested, you can check out "explore_data.ipynb" to see what the official json files look like.

Perform train/val/test split and preprocess all the captions

Run "python prepro.py". It should take less than two minutes. The script is tested on Python 3.6. It depends on numpy and nltk. After running the script, there will be several data files generated in the data/ folder, and a trimmed version of glove.6B.300d will be saved to the top-level folder. For details about the generated files, see the comments in "prepro.py"
Start Jupyter Notebook and open "prepro_unit_test.ipynb", re-run all the cells in the notebook to make sure the preprocessing was successful.

Train

python main.py --mode=train --experiment_name=baseline --data_source=ram/ssd

Options: --primary_metric, --num_epochs, --learning_rate, --dropout, --batch_size, --hidden_size, --beam_width, --special_token, --print_every, --save_every, --eval_every

Evaluate

python main.py --mode=eval --experiment_name=baseline --ckpt_load_dir=./experiments/baseline/best_checkpoint --data_source=ram/ssd

Note: The optimal beam width can be tuned by only running in eval mode for multiple times (although using the best beam width for early stopping during training might give slightly better performance)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
coco		coco
data		data
.gitignore		.gitignore
README.md		README.md
SentinelLSTM.py		SentinelLSTM.py
data_batcher.py		data_batcher.py
explore_cocoEval.ipynb		explore_cocoEval.ipynb
explore_data.ipynb		explore_data.ipynb
get_started.sh		get_started.sh
image_features.ipynb		image_features.ipynb
main.py		main.py
model.py		model.py
modules.py		modules.py
prepro.py		prepro.py
prepro_images.py		prepro_images.py
prepro_unit_test.ipynb		prepro_unit_test.ipynb
utils.py		utils.py
vocab.py		vocab.py

junjiedong/Image_Captioning_MSCOCO

Folders and files

Latest commit

History

Repository files navigation

CS231N_Project

Getting Started

Perform train/val/test split and preprocess all the captions

Train

Evaluate

About

Topics

Resources

Stars

Watchers

Forks

Languages