# Computer Vision Nanodegree

## Project: Image Captioning

---

The Microsoft **C**ommon **O**bjects in **CO**ntext (MS COCO) dataset is a large-scale dataset for scene understanding.  The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.  

![Sample Dog Output](images/coco-examples.jpg)

You can read more about the dataset on the [website](http://cocodataset.org/#home) or in the [research paper](https://arxiv.org/pdf/1405.0312.pdf).

In this notebook, you will explore this dataset, in preparation for the project.

## Step 1: Initialize the COCO API

We begin by initializing the [COCO API](https://github.com/cocodataset/cocoapi) that you will use to obtain the data.

In [12]:
import os
import sys
sys.path.append('cocoapi/PythonAPI')
from pycocotools.coco import COCO

# initialize COCO API for instance annotations
dataDir = '/home/henry/data'
dataType = 'train2014'
instances_annFile = os.path.join(dataDir, 'annotations/captions_{}.json'.format(dataType))
coco = COCO(instances_annFile)

# initialize COCO API for caption annotations
captions_annFile = os.path.join(dataDir, 'annotations/captions_{}.json'.format(dataType))
coco_caps = COCO(captions_annFile)

# get image ids 
ids = list(coco.anns.keys())

loading annotations into memory...
Done (t=0.97s)
creating index...
index created!
loading annotations into memory...
Done (t=1.35s)
creating index...
index created!


## Step 2: Plot a Sample Image

Next, we plot a random image from the dataset, along with its five corresponding captions.  Each time you run the code cell below, a different image is selected.  

In the project, you will use this dataset to train your own model to generate captions from images!

In [15]:
!conda install scikit-image


Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/torch

  added / updated specs:
    - scikit-image


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    networkx-2.3               |             py_0         1.1 MB
    pywavelets-1.0.3           |   py37hdd07704_1         4.4 MB
    scikit-image-0.15.0        |   py37he6710b0_0        28.4 MB
    ------------------------------------------------------------
                                           Total:        33.9 MB

The following NEW packages will be INSTALLED:

  decorator          pkgs/main/linux-64::decorator-4.4.0-py37_1
  imageio            pkgs/main/linux-64::imageio-2.5.0-py37_0
  networkx           pkgs/main/noarch::networkx-2.3-py_0
  pywavelets         pkgs/main/linux-64::pywavelets-1.0.3-py37hdd07704_1
  scikit-image       pkgs/main/linux-64::scikit-i

In [16]:
!y

/bin/sh: 1: y: not found


In [18]:
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
%matplotlib inline

# pick a random image and obtain the corresponding URL
ann_id = np.random.choice(ids)
img_id = coco.anns[ann_id]['image_id']
img = coco.loadImgs(img_id)[0]
url = img['coco_url']

# print URL and visualize corresponding image
print(url)

# load and display captions
annIds = coco_caps.getAnnIds(imgIds=img['id']);
anns = coco_caps.loadAnns(annIds)
coco_caps.showAnns(anns)

http://mscoco.org/images/475277
a white and black horse in its pen with green grass
A white horse looking up for a photo at a fence side.
A close-up of a horse looking at the camera. 
A white horse is standing by a fence
A white horse standing in a grassy field.


## Step 3: What's to Come!

In this project, you will use the dataset of image-caption pairs to train a CNN-RNN model to automatically generate images from captions.  You'll learn more about how to design the architecture in the next notebook in the sequence (**1_Preliminaries.ipynb**).

![Image Captioning CNN-RNN model](images/encoder-decoder.png)