Skip to content

The main goal of this project model is to assign each pixel of an image in a category label. This network provides a complete understanding of the scene. It predicts the label, location as well as shape of each element in the image. Difficulty of this network's computational work depends on the scene and label variety.

Notifications You must be signed in to change notification settings

shahilazmayish/Bangla-Image-Captioning

Repository files navigation

Image Captioning

Build Status

The main goal of this project model is to assign each pixel of an image in a category label. This network provides a complete understanding of the scene. It predicts the label, location as well as shape of each element in the image.

Image captioning is the process of generating textual description from an image.

The first part is handled by CNNs and the second is handled by RNNs. Use both Natural Language Processing and Computer Vision to generate the captions.

If we are told to describe this image,

intro

“মাঠের মধ্যে একটি ছেলে বল ধরে আছে ।” or “খালি গায়ে শিশুটি খুশিতে বল নিয়ে দাড়িয়ে আছে।"

While forming the description, we are seeing the image but at the same time, we are looking to create a meaningful sequence of words.

Model

  • Developing Deep Learning Model -> Google Colab
  • Generate New Captions
  • Preparing Photo Data
  • Preparing Text Data
    • Each photo has two described captions
  • Evaluate Model

Design Approach

  • RNN + CNN
  • Encoder-decoder model

EncoderCNN

  • Extract feature vector from input image
  • Based on pretrained ResNet50
  • Only require very small modifies

DecoderRNN

  • LSTM: Long Short Term Memory networks
  • Multiple Copies of the same network
  • Contained three gates to control the cell state
  • Capable of learning long-term dependencies.

encdr_dcdr_model

Tools

tools

Notepad++, Avro (Bangla Writings), Flatten, Convolution2D, Dropout, LSTM, TimeDistributed, Embedding, Bidirectional, Activation, RepeatVector, Concatenate.

Result Analysis

result_analysis result_analysis2_with_BLEU-score

Poster

499B_Poster_Bangla_Image_Captioning

About

The main goal of this project model is to assign each pixel of an image in a category label. This network provides a complete understanding of the scene. It predicts the label, location as well as shape of each element in the image. Difficulty of this network's computational work depends on the scene and label variety.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published