This repository contains an implementation of image captioning based on neural network (i.e. CNN + RNN). The model first extracts the image feature by CNN and then generates captions by RNN. CNN is VGG16 and RNN is a standard LSTM .
Normal Sampling and Beam Search were used to predict the caption of images.
Dataset used was Flickr8k dataset.
- Keras 2.0.7
- Theano 0.9.0
- Numpy
- Pandas 0.20.3
- Matplotlib
- Pickle
[1] Deep Visual-Semantic Alignments for Generating Image Descriptions ( Karpathy et-al, CVPR 2015)
[2] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan Show and Tell: A Neural Image Caption Generator
[3] CS231n: Convolutional Neural Networks for Visual Recognition. ( Instructors : Li Fei Fei, Andrej Karpathy, Justin Johnson)