Skip to content

moaaztaha/Image-Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

176 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning using Pytorch

Current Implementation -> Show, Attend, and Tell

Papers' results

  • The Goggle NIC was the first implementation from the paper "Show and Tell"
  • Soft-Attention is the result I'm comparing to. I'm not comparing to Hard-Attention as it's trainable by maximizing an approximate vairational lower bound (REINFORCE) while Soft-Attention is trainable by standard back-propagation.

My Best Results (top bleu-4)

  • using a beam size of 3:

    • Bleu-1 is lower than the results of the paper, this maybe due to one or all of the below reasons:
      • We are using a smaller vocabulary size so the model has less number of words in its knowledge.
      • While training, validation and also testing we are trying to get the highest bleu-4 so we ignored higher bleu-1 before because it has lower bleu-4.
        • The beam search also finds the sequences with the highest score which depends on the loss function.
        • And we chose the best model based on bleu-4 only.
    • The other bleu scores are higher as they depend more on the pairs of words which is our main optimization goal

Implementaiton Differences Table

About

Generating Image Captions using CNNS, RNNS and Attention layers in Pytorch.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages