# Data Preparation for Flickr8k dataset

In order to use Flickr8k instead of MS COCO, I need to process the dataset and save a dataframe of image filenames and captions. Also I need to load all the images to process them into the VGG16 and save results.

In [1]:
import pandas as pd
import numpy as np
from keras.preprocessing import image

Using TensorFlow backend.


All train image filenames are listed on a file, so we need to load it to get all images on train dataset and get their captions.

In [8]:
f_images = open('Flickr8k/Flickr8k_text/Flickr_8k.trainImages.txt', 'rb')
imgs = f_images.read().strip().split('\n')
f_images.close()

f_captions = open('Flickr8k/Flickr8k_text/Flickr8k.token.txt', 'rb')
captions = f_captions.read().strip().split('\n')
f_captions.close()

Caption file is tab separated so we need to split them and remove the hash.

In [9]:
data = {}
for row in captions:
    row = row.split("\t")
    row[0] = row[0][:len(row[0])-2]
    try:
        data[row[0]].append(row[1])
    except:
        data[row[0]] = [row[1]]

Insert init sentence and end sentence tag

In [10]:
f_dataset = open('data/flickr8k_train.csv', 'wb')
f_dataset.write('image\tcaption\n')
for img in imgs:
    for caption in data[img]:
        f_dataset.write(img+'\t<start> '+caption+' <end>\n')
f_dataset.close()

Now I will process images through a VGG16 model

In [23]:
len(images)

6000

In [11]:
from keras.applications.vgg16 import VGG16

from imagenet_utils import preprocess_input

In [13]:
model = VGG16(weights='imagenet', include_top=True, input_shape = (224, 224, 3))
img_dir = "Flickr8k/Flicker8k_Dataset/"
c = 0
img_features = {}
for img in imgs:
    c += 1
    img_s = image.load_img(img_dir + img, target_size=(224, 224))
    img_s = image.img_to_array(img_s)
    img_s = np.expand_dims(img_s, axis=0)
    img_s = preprocess_input(img_s)
    img_s = np.asarray(img_s)
    img_feature = model.predict(img_s)
    #img_feature = np.asarray(img_feature)
    #img_feature = img_feature.argmax(axis=-1)
    if c % 100 == 0:
        print "Processed {0} images".format(c)
    img_features[img] = img_feature[0]


Processed 100 images
Processed 200 images
Processed 300 images
Processed 400 images
Processed 500 images
Processed 600 images
Processed 700 images
Processed 800 images
Processed 900 images
Processed 1000 images
Processed 1100 images
Processed 1200 images
Processed 1300 images
Processed 1400 images
Processed 1500 images
Processed 1600 images
Processed 1700 images
Processed 1800 images
Processed 1900 images
Processed 2000 images
Processed 2100 images
Processed 2200 images
Processed 2300 images
Processed 2400 images
Processed 2500 images
Processed 2600 images
Processed 2700 images
Processed 2800 images
Processed 2900 images
Processed 3000 images
Processed 3100 images
Processed 3200 images
Processed 3300 images
Processed 3400 images
Processed 3500 images
Processed 3600 images
Processed 3700 images
Processed 3800 images
Processed 3900 images
Processed 4000 images
Processed 4100 images
Processed 4200 images
Processed 4300 images
Processed 4400 images
Processed 4500 images
Processed 4600 imag

Load some sentences

In [3]:
import pickle

In [17]:
pickle.dump(img_features, open('image_features.p', 'wb'))

In [37]:
dataset = pd.read_csv('data/flickr8k_train.csv', delimiter='\t')

In [47]:
k = 1
for i, v in dataset.iterrows():
    print v
    if k == 10:
        break
    k += 1

image                              2513260012_03d33305cf.jpg
caption    <start> A black dog is running after a white d...
Name: 0, dtype: object
image                              2513260012_03d33305cf.jpg
caption    <start> Black dog chasing brown dog through sn...
Name: 1, dtype: object
image                              2513260012_03d33305cf.jpg
caption    <start> Two dogs chase each other across the s...
Name: 2, dtype: object
image                              2513260012_03d33305cf.jpg
caption    <start> Two dogs play together in the snow . <...
Name: 3, dtype: object
image                              2513260012_03d33305cf.jpg
caption    <start> Two dogs running through a low lying b...
Name: 4, dtype: object
image                        2903617548_d3e38d7f88.jpg
caption    <start> A little baby plays croquet . <end>
Name: 5, dtype: object
image                              2903617548_d3e38d7f88.jpg
caption    <start> A little girl plays croquet next to a ...
Name: 6, dtype: obje

In [52]:
dataset['caption']

0        <start> A black dog is running after a white d...
1        <start> Black dog chasing brown dog through sn...
2        <start> Two dogs chase each other across the s...
3        <start> Two dogs play together in the snow . <...
4        <start> Two dogs running through a low lying b...
5              <start> A little baby plays croquet . <end>
6        <start> A little girl plays croquet next to a ...
7        <start> The child is playing croquette by the ...
8        <start> The kid is in front of a car with a pu...
9        <start> The little boy is playing with a croqu...
10       <start> A brown dog in the snow has something ...
11       <start> A brown dog in the snow holding a pink...
12       <start> A brown dog is holding a pink shirt in...
13       <start> A dog is carrying something pink in it...
14       <start> A dog with something pink in its mouth...
15       <start> A brown dog is running along a beach ....
16       <start> A brown dog wearing a black collar run.

Load some image VGG-16 processed

In [18]:
features = pickle.load(open('image_features.p', 'rb'))

In [14]:
img_features['2513260012_03d33305cf.jpg']

array([  1.80983573e-09,   1.36077913e-08,   2.04017816e-08,
         3.42138016e-08,   1.32259345e-07,   1.26060744e-08,
         8.15404189e-09,   2.44983016e-06,   2.63610605e-06,
         4.86745557e-05,   8.41506065e-09,   7.59701102e-09,
         3.09292609e-08,   2.28279582e-08,   1.39403786e-08,
         2.30172404e-07,   1.16488621e-07,   1.27154909e-08,
         5.65652158e-07,   2.01504058e-08,   1.86344806e-08,
         2.02403598e-07,   5.37092637e-06,   2.91369383e-06,
         3.58928958e-07,   1.58956830e-08,   4.98221091e-08,
         1.57077736e-07,   5.83819970e-09,   3.14535278e-08,
         3.17594884e-09,   1.10164580e-08,   2.14969145e-08,
         1.53569758e-07,   1.85778859e-07,   8.33272562e-09,
         2.30957475e-08,   1.02774811e-08,   1.21012556e-06,
         2.97251120e-07,   1.44163463e-08,   6.39516244e-08,
         4.53091147e-08,   6.52685159e-08,   5.64788891e-08,
         2.30312281e-07,   1.61486842e-08,   1.11779110e-08,
         7.64410117e-08,

In [19]:
features['2513260012_03d33305cf.jpg']

array([  1.80983573e-09,   1.36077913e-08,   2.04017816e-08,
         3.42138016e-08,   1.32259345e-07,   1.26060744e-08,
         8.15404189e-09,   2.44983016e-06,   2.63610605e-06,
         4.86745557e-05,   8.41506065e-09,   7.59701102e-09,
         3.09292609e-08,   2.28279582e-08,   1.39403786e-08,
         2.30172404e-07,   1.16488621e-07,   1.27154909e-08,
         5.65652158e-07,   2.01504058e-08,   1.86344806e-08,
         2.02403598e-07,   5.37092637e-06,   2.91369383e-06,
         3.58928958e-07,   1.58956830e-08,   4.98221091e-08,
         1.57077736e-07,   5.83819970e-09,   3.14535278e-08,
         3.17594884e-09,   1.10164580e-08,   2.14969145e-08,
         1.53569758e-07,   1.85778859e-07,   8.33272562e-09,
         2.30957475e-08,   1.02774811e-08,   1.21012556e-06,
         2.97251120e-07,   1.44163463e-08,   6.39516244e-08,
         4.53091147e-08,   6.52685159e-08,   5.64788891e-08,
         2.30312281e-07,   1.61486842e-08,   1.11779110e-08,
         7.64410117e-08,