Skip to content

Image Captioning using Encoder Decoder network , Pretrained models given

License

Notifications You must be signed in to change notification settings

roysti10/Image_Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning

Dataset Preparation

  • Clone this repsoitory using
    git clone https://github.com/lucasace/Image_Captioning.git 
  • Download the Flickr8k Image and Text dataset from here and here respectively
  • Unzip both the dataset and text files and place it inside the repository folder

I want to train the model

To train the model simply run

python3 main.py --type train --checkpoint_dir <checkpointdir> --cnnmodel <cnnmodel> --image_folder <imagefolder location> --caption_file <location to token.txt> --feature_extraction <True or False>
  • The checkpoint dir is the place where your model checkpoints are going to be saved.
  • cnnmodel is either inception or vgg16,default is inception
  • imagefolder is location of the folder with all the images
  • caption_file is Location to 'Flickr8k.token.txt'
  • feature_extraction - True or False,default is True
    • True if you havent extracted the image features
    • False if you have already extracted the image features This saves time and memory when training again
  • batch_size batch_size of training and validation default is 128

Testing the model

python3 main.py --type test --checkpoint_dir <checkpointdir> --cnnmodel <cnnmodel> --image_folder <imagefolder location> --caption_file <location to token,txt> --feature_extraction <True or False>
  • Download the checkpoints from here if your cnn_model is inception ,if your cnn_model is vgg 16 download from here or you can use your own trained checkpoints
  • All arguments are same as in training model

I just want to caption

python3 main.py --type caption --checkpoint_dir <checkpointdir> --cnnmodel <cnnmodel> --caption_file <location to token,txt> --to_caption <image file path to caption>
  • Download the checkpoints from here
    • Note these are inception checkpoints and for vgg16 download from here
  • captionfile is required to make the vocabulary

Custom dataset

if you want to train it on a custom dataset kindly make changes in the dataset.py folder to make it suitable for your dataset

Results

Model Type CNN_Model Bleu_1 Bleu_2 Bleu_3 Bleu_4 Meteor
Encoder-Decoder Inception_V3 60.12 51.1 48.13 39.5 25.8
VGG16 58.46 49.87 47.50 39.37 26.32

Here are some of the results:

  • 1
  • 2
  • 3

Things to Do

  • beam search
  • Image Captioning using Soft and Hard Attention
  • Image Captioning using Adversarial Training

Contributions

Any contributions are welcome

If there is any issue with the model or errors in the program, feel free to raise a issue or set up a PR.

References

  • O. Vinyals, A. Toshev, S. Bengio and D. Erhan, "Show and tell: A neural image caption generator," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3156-3164, doi: 10.1109/CVPR.2015.7298935.
  • Tensorflow documentation on Image Captioning
  • Machine Learning Mastery for dataset
  • nltk documentation for meteor score
  • RNN lecture by Standford University

About

Image Captioning using Encoder Decoder network , Pretrained models given

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages