# Image Caption Generator

This notebook is a to-do guide about generating captions for images and it's comparision to the latest models for generating these captions. 

# Problem Description

Image caption generation models are models that analyze images and automatically generate relevant captions. 

They combine techniques from computer vision and natural language processing to “understand” an image's visual content and express it in natural language. This task is complex because it requires not only recognizing objects in an image but also understanding their context, relationships, and the ability to translate this understanding into a coherent sentence.

## Intuition

- Images can be compressed to vectors of a multitude of features. These can be generated using a CNN (Convolutional Neural Network).

- Our goal is to generate a suitable `caption` for the image given, which is a sequence of texts. We can generate a sequence using an RNN (Recurrent Neural Network) like LSTM(Long-Short Term Memory) or GRU (Gated Recurrent Unit)

- We push the Image vector(feature vector) as our initial state for RNN and try to generate text at each time-step of the RNN using the feature vector.

- While training, we will already have our images and captions at the ready. Get our feature vector of the image and push the feature vector against a untrained/ pre-trained RNN and compare it with our actual caption output. Train it with back-prop to get better at accuracy. 

# Strategy

- Use the pretrained `Inception_V3` model to generate the feature vector of the image.

- Pass it through an RNN to generate an output embedding and compare it to the actual output in the embedding form, use an error function with these two and backprop to get a fix of this hybrid model, to generate accurate captions. 

- We are going to implement both `LSTM` and `GRU` architectures as our caption generation models.

- We are using the `MSCOCO` Dataset for our task of image caption generation, with an 80-20 train-test split.

# Code

## Installs and Environment Setup

In [None]:
%pip install numpy tensorflow
%pip install keras # For latest versions of tensorflow, it is advised to use keras externally 
%pip install keras_nlp

## Imports

In [None]:
import importlib, importlib_metadata

In [None]:
def print_module_version(module_name):
    try:
        version = importlib.metadata.version(module_name)
        print(f"{module_name} Version: ",version)
    except importlib.metadata.PackageNotFoundError:
         print(f"{module_name} is not installed or version information is not available")

In [None]:
import numpy as np
print_module_version("numpy")
import pandas as pd
print_module_version("pandas")
import tensorflow as tf
print_module_version("tensorflow")
from keras.applications import InceptionV3
from keras.models import Model
from keras.layers import Input, Embedding, LSTM, GRU, Dense, Dropout, Add
from keras_nlp.tokenizers import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
print_module_version("keras")
# Importing Pycoctools for potential dataset handling from the coco["train2017"] API -- Python Version
import pycocotools
print_module_version("pycocotools")
from sklearn.model_selection import train_test_split
print_module_version("sklearn")
from nltk.translate.bleu_score import sentence_bleu
print_module_version("nltk")
from scipy.spatial.distance import cosine
print_module_version("scipy")

import pickle
print_module_version("pickle")
import os
print_module_version("os")
import glob
print_module_version("glob")
from PIL import Image
print_module_version("PIL")
from tqdm import tqdm
print_module_version("tqdm")

In [None]:
%matplotlib inline
from pycocotools.coco import COCO
import skimage.io as io
print_module_version("skimage.io")
import matplotlib.pyplot as plt
print_module_version("matplotlib")
import pylab
print_module_version("pylab")
pylab.rcParams['figure.figsize'] = (8.0, 10.0)

## Implementation

### Installing MSCOCO Dataset and Understanding the COCO API

- We have installed it with Github Import from [CocoAPI](https://github.com/cocodataset/cocoapi)
- Used make tool to install from MakeFile of the `cocoapi/PythonAPI` folder in the repository, with the command below. 

$$ make -f MakeFile $$  

But this has only provided us with the validation datasets. What we actually want are all the datasets -- train, val, test. Foe which we used the `pycocotools` module/ API for installing the COCO dataset.

> In the Common Objects in Context (COCO) dataset, an annotation is a list of objects in an image, along with detailed information about each object. This information includes the object's class label, bounding box coordinates, and segmentation mask. 

> Annotations are stored in a JSON file, along with other information about the images and dataset.

#### Instance Viewing

In [None]:
dataDir='./dataset'
dataTypes=['train2017','val2017']

In [None]:
def generate_coco_ds_files(datadir,datatypes):
    coco = dict()
    for dataType in dataTypes:
        annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)
        coco[dataType]=COCO(annFile)
    return coco

In [None]:
coco = generate_coco_ds_files(datadir=dataDir,datatypes=dataTypes)

In [None]:
print(coco['train2017'].info())
print("")
print("")
print(coco['val2017'].info())


In [None]:
coco['train2017'].getAnnIds()

In [None]:
# display COCO categories and supercategories
cats = coco["train2017"].loadCats(coco["train2017"].getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))

nms = set([cat['supercategory'] for cat in cats])
print('COCO supercategories: \n{}'.format(' '.join(nms)))

In [None]:
# get all images containing given categories, select one at random
catIds = coco["train2017"].getCatIds(catNms=['person','dog','skateboard']);
print(len(catIds))
if(len(catIds)<=5):
    print(catIds)
imgIds = coco["train2017"].getImgIds(catIds=catIds)
print(len(imgIds))
if(len(imgIds)<=5):
    print(imgIds)
# Get a Random Image from the above categories
img = coco["train2017"].loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]

In [None]:
I = io.imread(img['coco_url'])
plt.axis('off')
plt.imshow(I)
plt.show()

In [None]:
# load and display instance annotations
plt.imshow(I); plt.axis('off')
annIds = coco["train2017"].getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
anns = coco["train2017"].loadAnns(annIds)
coco["train2017"].showAnns(anns)

#### Caption Viewing

The code below demostrates loading the captions of the dataset based on image id of the COCO annotations 

In [None]:
dataDir='./dataset'
dataTypes=['train2017','val2017']

In [None]:
def generate_coco_ds_caption_files(datadir,datatypes):
    coco = dict()
    for dataType in dataTypes:
        annFile='{}/annotations/captions_{}.json'.format(dataDir,dataType)
        coco[dataType]=COCO(annFile)
    return coco

In [None]:
coco_caps = generate_coco_ds_caption_files(datadir=dataDir,datatypes=dataTypes)

In [None]:
# load and display caption annotations
annIds = coco_caps["train2017"].getAnnIds(imgIds=img['id'])
print(annIds)
anns = coco_caps["train2017"].loadAnns(annIds)
coco_caps["train2017"].showAnns(anns)
plt.imshow(I); plt.axis('off'); plt.show()

#### Review

Now we have a solid understanding of what to do in order to load the COCO dataset. 

### Model Installation

In [None]:
inception_model_pretrained = InceptionV3(weights='imagenet',classifier_activation=None)

# Observations

# Conclusion

# Scope