# EEE 443 - Final Project - Image Captioning:

## Group 10:

Ayhan Okuyan, Baris Akcin, Emre Donmez, Hasan Emre Erdemoglu, Ruzgar Eserol, Suleyman Taylan Topaloglu

### Inception Encoder: (Part 2/3)

1. Import Transfer Learning Models and do the encoding using these models, (Intricasies will be explained in the report).
2. Export necessary output to be used in the continuing notebooks.

**Note:** Images must be on working directory under images subdirectory. The rest of the structure will be built by this notebook.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Directory Making Section:

1. Build necessary directories, some files needed are already put within root folder.
2. Unpack given dataset nd download images.

In [None]:
import os
#os.environ["CUDA_VISIBLE_DEVICES"]="-1" # Disable GPU
# See the directories construct directories if needed:
root_dir = os.getcwd()
imgs_dir = root_dir + '\\images'
exports_dir = root_dir + '\\exports'

print(root_dir)

if not os.path.exists(exports_dir):
    os.mkdir(exports_dir)

### Transfer Learning Section:

Try on different encoding schemes on CNN-Encoder part. With a given RNN-Decoder piece, these networks may give different performance. 
If time permits, there will be 4 different CNNs to be used in transfer learning. For RNN implementations, refer to that notebook.

1. Inception v3 model definition
3. Inception ResNet v2 (to be implemented)

In [None]:
import tensorflow as tf
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
from tensorflow.keras.preprocessing import image 
from tensorflow.python.keras import backend as K

def inception_load_image(path):
    img = image.load_img(path, target_size=(299, 299, 3))
    imgar = image.img_to_array(img)
    imgar = np.expand_dims(imgar, axis=0)
    imgar = preprocess_input(imgar)
    return imgar

def inception_transfer_model():
    tf.keras.backend.clear_session() # clears previous session if this code is run multiple time
    # This is necessary, as re-running these segments may stack up models 

    inception_model = InceptionV3(include_top=True, weights='imagenet', input_shape=(299,299,3))
    inception_model.trainable = False

    # Check layers via inception_model.summary()
    #inception_model.summary()

    inception_tx_layer = inception_model.get_layer('avg_pool') # mixed10 is the final layer with notop layout. 

    new_input = inception_model.input
    x = inception_tx_layer.output

    inception_tx_model = tf.keras.Model(outputs=x, inputs=new_input) # directly make a model from it.
    #inception_tx_model.summary()

    inception_img_size = K.int_shape(inception_tx_model.input)[1:3]
    print('Image size: ', inception_img_size)

    inception_tx_values_size = K.int_shape(inception_tx_layer.output)
    print('Vector size of transfer values: ', inception_tx_values_size)
    return inception_tx_model

def inception_encode_image(image_dir, img_id,model):
    image = inception_load_image(image_dir+img_id)
    image = image.reshape((1, image.shape[1], image.shape[2], image.shape[3]))
    image = preprocess_input(image)
    encoding = model.predict(image)
    #encoding = np.reshape(encoding, encoding.shape[1])
    return encoding

### Encoding & Pickling:

For each of the model, realize everything, encode the images and save them using pickle serialization.

In [None]:
inception_tx_model = inception_transfer_model()

In [None]:
from tqdm import tqdm
import numpy as np
os.chdir(imgs_dir) # change to training directory
inception_v3_exp_dir = exports_dir + '\\inception_v3_encodings'

if not os.path.exists(inception_v3_exp_dir):
    os.mkdir(inception_v3_exp_dir)

for img in tqdm(os.listdir()):
    if os.path.exists(inception_v3_exp_dir + '\\' + img +'.npy'):
        continue
    else:
        tmp = inception_encode_image(os.getcwd() + '\\', img, inception_tx_model)
        np.save(inception_v3_exp_dir + '\\' + img +'.npy', tmp)
    

### Next Steps: 
In the next step, we will import these encodings to another notebook and construct an RNN model to do actual image captioning.