Skip to content

CNN Encoder connected with RNN Encoder for image Captioning

Notifications You must be signed in to change notification settings

paluchnuggets/ImageCaptioningProject

Repository files navigation

Image Captioning Project - Tell me what do you see

In this repository you can find Image Captioning model which I've trained as a part of the Udacity Computer Vision Nanodegree. Image captioning is the process when "seeing" an image the model is able to generate the sequence of words describing situation of that spesific image. See example below.

CNN Encoder providing input to RNN Decoder.

Credits: Udacity Computer Vision Nanodegree

Above, the high-level description of that process can be seen. First the image is processed through CNN network that produces feature vector for that particular image. Next, that vector goes through the embedding layer which adjust its size to that required by the RNN. When trained, the RNN takes that embedded iamge vector and based on that produces the post probable sequence of words that describes it (based on weights matrix obtain during the training process).

Whole training process is shown in this notebook 🏃 and the model capabilities are shown here 💪. When you are interested in theory behind all of this, you are more then welcome to check my another repository that contains explanation of RNN's, LSTM's and also Attention Mechanisms.

About

CNN Encoder connected with RNN Encoder for image Captioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published