# Deep Learning Photo Caption Generator

---

A deep learning model is developed in this notebook to automatically describe the contents of photos. Methods from both computer vision and natural language processing are used to turn the understanding of the image into words in the right order.

## 1. Photo Data

In [1]:
from os import listdir

directory = 'Flickr8k/Flicker8k_Dataset'
files = [directory + '/' + name for name in listdir(directory)]

print("Number of images: %d" % len(files))

Number of images: 8091


We use the VGG16 pre-trained deep learning model to extract for each photo a set of features. The image features are a 1-dimensional 4,096 element vector.

In [2]:
import numpy
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import VGG16

model = VGG16(include_top=False)
model.summary()

def path_to_tensor(img_path):
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    return numpy.expand_dims(x, axis=0)

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
__________

The VGG16 bottleneck features are now extracted and saved on disk. 

In [3]:
from tqdm import tqdm_notebook
from pickle import dump

features = dict()

for file in tqdm_notebook(files):
    tensor = preprocess_input(path_to_tensor(file))
    feature = model.predict(tensor, verbose=0)
    img_id = file.split('.')[0]
    features[img_id] = feature
    
print("Number of features extracted: %d" % len(features))
dump(features, open('features.pkl', 'wb'))


Number of features extracted: 8091


## 2. Text Data