Image Captioning Project - Tell me what do you see

In this repository you can find Image Captioning model which I've trained as a part of the Udacity Computer Vision Nanodegree. Image captioning is the process when "seeing" an image the model is able to generate the sequence of words describing situation of that spesific image. See example below.

CNN Encoder providing input to RNN Decoder.

Credits: Udacity Computer Vision Nanodegree

Above, the high-level description of that process can be seen. First the image is processed through CNN network that produces feature vector for that particular image. Next, that vector goes through the embedding layer which adjust its size to that required by the RNN. When trained, the RNN takes that embedded iamge vector and based on that produces the post probable sequence of words that describes it (based on weights matrix obtain during the training process).

Whole training process is shown in this notebook 🏃 and the model capabilities are shown here 💪. When you are interested in theory behind all of this, you are more then welcome to check my another repository that contains explanation of RNN's, LSTM's and also Attention Mechanisms.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
images		images
.gitignore		.gitignore
Inference.ipynb		Inference.ipynb
README.md		README.md
Training.ipynb		Training.ipynb
data_loader.py		data_loader.py
filelist.txt		filelist.txt
model.py		model.py
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning Project - Tell me what do you see

About

Releases

Packages

Languages

paluchnuggets/ImageCaptioningProject

Folders and files

Latest commit

History

Repository files navigation

Image Captioning Project - Tell me what do you see

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages