# Image Caption Generator

This notebook is a to-do guide about generating captions for images and it's comparision to the latest models for generating these captions. 

# Problem Description

Image caption generation models are models that analyze images and automatically generate relevant captions. 

They combine techniques from computer vision and natural language processing to “understand” an image's visual content and express it in natural language. This task is complex because it requires not only recognizing objects in an image but also understanding their context, relationships, and the ability to translate this understanding into a coherent sentence.

## Intuition

- Images can be compressed to vectors of a multitude of features. These can be generated using a CNN (Convolutional Neural Network).

- Our goal is to generate a suitable `caption` for the image given, which is a sequence of texts. We can generate a sequence using an RNN (Recurrent Neural Network) like LSTM(Long-Short Term Memory) or GRU (Gated Recurrent Unit)

- We push the Image vector(feature vector) as our initial state for RNN and try to generate text at each time-step of the RNN using the feature vector.

- While training, we will already have our images and captions at the ready. Get our feature vector of the image and push the feature vector against a untrained/ pre-trained RNN and compare it with our actual caption output. Train it with back-prop to get better at accuracy. 

# Strategy

- Use the pretrained `Inception_V3` model to generate the feature vector of the image.

- Pass it through an RNN to generate an output embedding and compare it to the actual output in the embedding form, use an error function with these two and backprop to get a fix of this hybrid model, to generate accurate captions. 

- We are going to implement both `LSTM` and `GRU` architectures as our caption generation models.

- We are using the `MSCOCO` Dataset for our task of image caption generation, with an 80-20 train-test split.

# Code

## Installs and Environment Setup

In [3]:
%pip install numpy tensorflow
%pip install keras # For latest versions of tensorflow, it is advised to use keras externally 
%pip install keras_nlp

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting keras_nlp
  Downloading keras_nlp-0.17.0-py3-none-any.whl.metadata (1.2 kB)
Collecting keras-hub==0.17.0 (from keras_nlp)
  Downloading keras_hub-0.17.0-py3-none-any.whl.metadata (7.4 kB)
Downloading keras_nlp-0.17.0-py3-none-any.whl (2.0 kB)
Downloading keras_hub-0.17.0-py3-none-any.whl (644 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m644.1/644.1 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: keras-hub, keras_nlp
Successfully installed keras-hub-0.17.0 keras_nlp-0.17.0
Note: you may need to restart the kernel to use updated packages.


## Imports

In [4]:
import importlib, importlib_metadata

In [5]:
def print_module_version(module_name):
    try:
        version = importlib.metadata.version(module_name)
        print(f"{module_name} Version: ",version)
    except importlib.metadata.PackageNotFoundError:
         print(f"{module_name} is not installed or version information is not available")

In [10]:
import numpy as np
print_module_version("numpy")
import pandas as pd
print_module_version("pandas")
import tensorflow as tf
print_module_version("tensorflow")
from keras.applications import InceptionV3
from keras.models import Model
from keras.layers import Input, Embedding, LSTM, GRU, Dense, Dropout, Add
from keras_nlp.tokenizers import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
print_module_version("keras")
# Importing Pycoctools for potential dataset handling from the COCO API -- Python Version
import pycocotools
print_module_version("pycocotools")
from sklearn.model_selection import train_test_split
print_module_version("sklearn")
from nltk.translate.bleu_score import sentence_bleu
print_module_version("nltk")
from scipy.spatial.distance import cosine
print_module_version("scipy")

import pickle
print_module_version("pickle")
import os
print_module_version("os")
import glob
print_module_version("glob")
from PIL import Image
print_module_version("PIL")
from tqdm import tqdm
print_module_version("tqdm")

numpy Version:  1.26.4
pandas Version:  2.0.3
tensorflow Version:  2.17.0
keras Version:  3.6.0
pycocotools Version:  2.0
sklearn is not installed or version information is not available
nltk Version:  3.8.1
scipy Version:  1.11.1
pickle is not installed or version information is not available
os is not installed or version information is not available
glob is not installed or version information is not available
PIL is not installed or version information is not available
tqdm Version:  4.65.0


### Installing MSCOCO Dataset

- We have installed it with Github Import from [CocoAPI](https://github.com/cocodataset/cocoapi)
- Used make tool to install from MakeFile of the `cocoapi/PythonAPI` folder in the repository, with the command below. 

$$ make -f MakeFile $$  

## Implementation

In [9]:
inception_model_pretrained = InceptionV3(weights='imagenet',classifier_activation=None)

# Observations

# Conclusion

# Scope