# Computer Vision Nanodegree

## Project: Image Captioning

---

In this notebook, I will use the trained model to generate captions for images in the test dataset.

INDEX:
- [Step 1](#step1): Get Data Loader for Test Dataset 
- [Step 2](#step2): Load Trained Models
- [Step 3](#step3): Clean up Captions
- [Step 4](#step4): Generate Predictions!

<a id='step1'></a>
## Step 1: Get Data Loader for Test Dataset

Before running the code cell below, define the transform in `transform_test` for the test images.  

The pre-processing of the test images is as same as the training images

In [2]:
import sys
from data_loader import get_loader
from torchvision import transforms

# TODO #1: Defining a transform to pre-process the testing images.
transform_test = transforms.Compose([ 
    transforms.Resize(256),                          
    transforms.CenterCrop(224),                             
    transforms.ToTensor(),                           
    transforms.Normalize((0.485, 0.456, 0.406),      
                         (0.229, 0.224, 0.225))])

<a id='step2'></a>
## Step 2: Load Trained Models

In the next code cell I have defined a `device` that you will use move PyTorch tensors to GPU (if CUDA is available).  Run this code cell before continuing.

In [3]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [5]:
# Watches for any changes in model.py, and re-load it automatically.
%load_ext autoreload
%autoreload 2
import pickle
import os
import torch
from model import EncoderCNN, DecoderRNN

# TODO #2: Specify the saved models to load.
encoder_file = 'encoder-1.pkl'
decoder_file = 'decoder-1.pkl'

# TODO #3: Select appropriate values for the Python variables below.
embed_size = 512
hidden_size = 512

# The size of the vocabulary.
vocab_file = 'vocab.pkl'

with open(os.path.join(os.getcwd(),vocab_file), 'rb') as f:
    vocab = pickle.load(f)
    word2idx = vocab.word2idx
    idx2word = vocab.idx2word
vocab_size = len(vocab)
# Initialize the encoder and decoder, and set each to inference mode.
encoder = EncoderCNN(embed_size)
encoder.eval()
decoder = DecoderRNN(embed_size, hidden_size, vocab_size)
decoder.eval()
print(os.getcwd())
address = os.path.join(os.getcwd(),'models', encoder_file)
print(address)
# Load the trained weights.
encoder.load_state_dict(torch.load(address))
decoder.load_state_dict(torch.load(os.path.join(os.getcwd(),'models', decoder_file)))
print(vocab_size)
# Move models to GPU if CUDA is available.
encoder.to(device)
decoder.to(device)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
E:\Shiva Image Captioning
E:\Shiva Image Captioning\models\encoder-1.pkl
10321


DecoderRNN(
  (word_embeddings): Embedding(10321, 512)
  (embed): Embedding(10321, 512)
  (lstm): LSTM(512, 512, batch_first=True)
  (linear): Linear(in_features=512, out_features=10321, bias=True)
)

<a id='step3'></a>
## Step 3: Clean up the Captions

In the code cell below, complete the `clean_sentence` function.  It should take a list of integers as input and return the corresponding predicted sentence (as a single Python string). 

In [26]:
# TODO #4: Complete the function.
def clean_sentence(output,idx2word):
    sentence = ""
    for idx in output:
        if idx == 0:
            continue
        if idx == 1:
            break
        word = idx2word[idx]
        sentence = sentence + word + ' '
        
    return sentence

<a id='step4'></a>
## Step 4: Generate Predictions!

In the code cell below, we have written a function (`get_prediction`) that we will use to generate proper sentence

In [29]:
import os
import random 
from PIL import Image
def load_image(transform_t):
    path=os.getcwd()+"\Images\\"
    files=os.listdir(path)
    d=random.choice(files)
    file_path = path+d
    raw_image = Image.open(file_path)
    raw_image.show()
    raw_image = raw_image.convert('RGB')
    return transform_t(raw_image).unsqueeze(0)

In [32]:
image = load_image(transform_test)
image_tensor = image.to(device)
features = encoder(image_tensor).unsqueeze(1)
output = decoder.sample(features)
caption = clean_sentence(output,idx2word)
print(caption)

a man brushing his teeth with a white toothbrush . 
