<html>
<head>

<h1 style="color:red; font-size:30px; font-style:italic; font-alignment:center;">Title: Image Captioning using Deep Learning</h1>
<br><br>
<h1 style="color:black; font-style:italic; font-size:24px">Table of Contents</h1>
<br>
<ol>
    <li style='color:green; font-style:italic; font-size:20px'>Objective/Goal and Business Questions</li><br><br>
    <li style='color:green; font-style:italic; font-size:20px'>Introduction</li><br><br>
    <li style='color:green; font-style:italic; font-size:20px'>About Data</li><br><br>
    <li style='color:green; font-style:italic; font-size:20px'>Evaluation Metrics</li><br><br>
    <li style='color:green; font-style:italic; font-size:20px'>Load Captions and Pre-process</li><br><br>
    <li style='color:green; font-style:italic; font-size:20px'>Feature Extraction From Images</li><br><br>
    <li style='color:green; font-style:italic; font-size:20px'>Model Training</li><br><br>
    <li style='color:green; font-style:italic; font-size:20px'>Conclusion</li><br><br>
</ol>


<br>
<h1 style='color:black; font-style:italic; font-size:25px'>6.&nbsp;&nbsp;Feature Extraction From Images</h1><br>
<p style='color:blue; font-style:italic; font-size:17px'>For feature extraction, we will use efficientnet B7 pre-trained model.</p>
<br>
<p style='color:blue; font-style:italic; font-size:17px'>The image dimensions will be (600,600,3).</p>
<br>
<ol>
<li style='color:green; font-style:italic; font-size:17px'>Loading Feature extractor Model</li><br>
<li style='color:green; font-style:italic; font-size:17px'>Dataloader</li><br>
<li style='color:green; font-style:italic; font-size:17px'>Functions for extracting features from images and saving to disk</li><br>
<li style='color:green; font-style:italic; font-size:17px'>Extracting features for train, test and val images data</li>
</ol>


<h1 style='color:black; font-style:italic; font-size:20px'>6.1&nbsp;Loading Feature extractor Model</h1>

* Importing dependencies

In [1]:
import tensorflow as tf
import numpy as np
import os
import glob
import pickle as pkl
from tqdm import tqdm
import gc

* Defining paths and constants

In [2]:
size = [600,600]
shape = (600,600,3)
root_dir = "D:\\projects\\image-captioning"
train_dataset_dir = "D:\\MS-COCO-2017-dataset\\coco2017\\train2017\\*.jpg"
test_dataset_dir = "D:\\MS-COCO-2017-dataset\\coco2017\\test2017\\*.jpg"
val_dataset_dir = "D:\\MS-COCO-2017-dataset\\coco2017\\val2017\\*.jpg"

* Reading train, test and val images names

In [3]:
train_images = glob.glob(train_dataset_dir)
val_images = glob.glob(val_dataset_dir)
test_images = glob.glob(test_dataset_dir)
print(f"There are {len(train_images)} images in the train set")
print(f"There are {len(val_images)} images in the val set")
print(f"There are {len(test_images)} images in the test set")

There are 118287 images in the train set
There are 5000 images in the val set
There are 40670 images in the test set


In [4]:
train_images[:5]

['D:\\MS-COCO-2017-dataset\\coco2017\\train2017\\000000000009.jpg',
 'D:\\MS-COCO-2017-dataset\\coco2017\\train2017\\000000000025.jpg',
 'D:\\MS-COCO-2017-dataset\\coco2017\\train2017\\000000000030.jpg',
 'D:\\MS-COCO-2017-dataset\\coco2017\\train2017\\000000000034.jpg',
 'D:\\MS-COCO-2017-dataset\\coco2017\\train2017\\000000000036.jpg']

In [5]:
# from tensorflow.keras.layers import GlobalMaxPool2D, GlobalAveragePooling2D, Dense, Flatten
from tensorflow.keras.applications.efficientnet import EfficientNetB7, preprocess_input

eff_b7 = EfficientNetB7(input_shape=shape)

In [6]:
def load_feature_extractor(pre_trained_model):
    tf.keras.backend.clear_session()
    input_tensors=pre_trained_model.inputs
    x = pre_trained_model.layers[-2].output
    model = tf.keras.models.Model(input_tensors, x)
    return model


<h1 style='color:black; font-style:italic; font-size:20px'>6.2&nbsp;Dataloader</h1>

In [7]:
def dataloader(images_list, batch_size):
    start=0
    L=len(images_list)
    img_names, batches = [], []
    for img_path in tqdm(images_list):
        img=tf.io.read_file(img_path)
        img=tf.io.decode_jpeg(img)
        img=tf.cast(img, tf.float32)
        img=tf.image.resize(img, size=size)
        if img.shape == shape:
            img=preprocess_input(img)
            batches.append(img)
            img_names.append(img_path.split('\\')[-1])
            start+=1
            if start!=0 and start%batch_size==0 or L==start:
                batches = np.array(batches)
                img_names = np.array(img_names)
                yield [img_names, batches]
                del img_names, batches
                img_names, batches = [], []
        

<h1 style='color:black; font-style:italic; font-size:20px'>6.3&nbsp; Functions for extracting features from images and saving to disk</h1>

In [8]:
def extract_features(feature_extractor, images_list, batch_size):
    features = dict()
    for data in dataloader(images_list, batch_size):
        images_names, images_batch = data[0], data[1]
        features_batch = feature_extractor.predict(images_batch)
        for img_name, img_feature in zip(images_names, features_batch):
            features[img_name] = img_feature
        # break
    gc.collect()
    return features

In [9]:
def extract_and_save_features(images_name, feature_extractor, dataset_name, feature_extractor_name, batch_size):
    print(f"Extracting {dataset_name}...")
    features = extract_features(feature_extractor, images_name, batch_size)
    print(f"Done!\nSaving extracted features to disk...", end='')
    with open(os.path.join(root_dir, dataset_name +'_'+ feature_extractor_name +'.pkl'), 'wb') as f:
        pkl.dump(features, f)
    del features
    print(f"done!")

In [10]:
eff_b7_feature_extractor = load_feature_extractor(eff_b7)

In [11]:
eff_b7_feature_extractor.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 600, 600, 3) 0                                            
__________________________________________________________________________________________________
rescaling (Rescaling)           (None, 600, 600, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
normalization (Normalization)   (None, 600, 600, 3)  7           rescaling[0][0]                  
__________________________________________________________________________________________________
stem_conv_pad (ZeroPadding2D)   (None, 601, 601, 3)  0           normalization[0][0]              
______________________________________________________________________________________________

<h1 style='color:black; font-style:italic; font-size:20px'>6.4&nbsp;Extracting features for train, test and val images data</h1>

In [12]:
batch_size = 8

* Extracting train features

In [13]:
extract_and_save_features(train_images, eff_b7_feature_extractor, 'train_features', 'eff_b7', batch_size=batch_size)

Extracting train_features...


100%|████████████████████████████████████████████████████████████████████████| 118287/118287 [4:40:55<00:00,  7.02it/s]


Done!
Saving extracted features to disk...done!


* Extracting val features

In [16]:
extract_and_save_features(val_images, eff_b7_feature_extractor, 'val_features', 'eff_b7', batch_size=batch_size)

Extracting val_features...


100%|██████████████████████████████████████████████████████████████████████████████| 5000/5000 [11:48<00:00,  7.05it/s]


Done!
Saving extracted features to disk...done!


* Extracting test features

In [17]:
extract_and_save_features(test_images, eff_b7_feature_extractor, 'test_features', 'eff_b7', batch_size=batch_size)

Extracting test_features...


100%|██████████████████████████████████████████████████████████████████████████| 40670/40670 [1:36:16<00:00,  7.04it/s]


Done!
Saving extracted features to disk...done!
