ImageCaptioning

This repo contains an implementation of a ImageCaptioning model. It was implemented as a part of 4 ECTS course Deep Learning of the Data Science Bachelor at the FHNW.

Architecture

The architecture is basically as follows:

A pretrained CNN-model (e.g. ResNet50) is used to generate features from the images.
With the help of an embedding, the dimension is adapted to the vocab size and the embedding dimension is selected based on available computing resources. Technically, a higher dimension should be better but it takes longer to train and requires more resources.
This vector is then passed as the first hidden state in a LSTM.

Futher details

Please have a look at main.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vizwiz_api		vizwiz_api
.gitignore		.gitignore
README.md		README.md
dataloader.py		dataloader.py
image_loader.py		image_loader.py
main.ipynb		main.ipynb
mini-challenges_SGDS_DEL_MC2.pdf		mini-challenges_SGDS_DEL_MC2.pdf
models.py		models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageCaptioning

Architecture

Futher details

About

Releases

Packages

Languages

kenfus/ImageCaptioning

Folders and files

Latest commit

History

Repository files navigation

ImageCaptioning

Architecture

Futher details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages