Skip to content

kenkai21/Image_Captioning

Repository files navigation

Image_Captioning

This project conatins file from Udacity Computer Vision Nanodegree

In this project we combine CNN and RNN as decoder and encoder respectively, to produce caption for images from the COCO Dataset - Common Objects in Context.

drawing

Dataset

drawing

To set up the COCOAPI to use the dataset, follow the instruction in this readme file

Project Structure

The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order:

Notebook 0 : Microsoft Common Objects in COntext (MS COCO) dataset;

Notebook 1 : Load and pre-process data from the COCO dataset;

Notebook 2 : Training the CNN-RNN Model;

Notebook 3 : Load trained model and generate predictions.

Installation

$ git clone https://github.com/kenkai/Image_Captioning.git
$ pip3 install -r requirements.txt

References

Microsoft COCO, arXiv:1411.4555v2 [cs.CV] 20 Apr 2015 and arXiv:1502.03044v3 [cs.LG] 19 Apr 2016

Licence

This project is licensed under the terms of the License: MIT