# Multi-input Neural Network

The previous experiments we've conducted lead us to believe that a multimodal input approach would produce better results. Photo metadata and images consist in our two sources of data. They're of different nature. For example a densely connected network could tackle the metadata information, whilst a 2D convolutional neural network would deal with image data.

Here we'll try to jointly learn information from both the data sources by using a model that can see all available input modalities simulataneously. Our model will have two input branches.

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
from sklearn.model_selection import StratifiedShuffleSplit
import csv

config = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=2, 
                                  inter_op_parallelism_threads=2, 
                                  allow_soft_placement=True) # TensorFlow config
pd.options.mode.chained_assignment = None # Pandas config

## Data handlers

## Images attributes definition

Input data are manipulated as tensors, the basic data structures of TensorFlow. Images generally have 3 dimensions: height, width, and number of colour channels. An image dataset is most of the time represented as a rank-4 tensor (or 4D tensor) of shape (samples, height, width, channels). For example, a batch of 32 colour images of size 150 x 150 pixels can be stored in the rank-4 tensor (32, 150, 150, 3).

Our network will have to receive images of a fixed size. The images in the dataset being of different size, we'll have to resize them, here to the size of the smallest image. Let's explore some image attributes.

In [None]:
import os
import PIL

path = '../input/petfinder-pawpularity-score/train/'
training_img = os.listdir(path) # list all training images names
print('There are {} images in the training directory'.format(len(training_img)))

img_sz = {'width': list(),
          'height': list()} # store image attributes for further analysis
width, height = 9999, 9999 # start with fixed very high image size, and keep reducing it when iterating over images in the dataset

for im in training_img:
    img = PIL.Image.open(path+im)
    w, h = img.size
    if w < width:
        width = w
    if h < height:
        height = h

IMG_WIDTH = width
IMG_HEIGHT = height
IMG_CHANNELS = 3

print('Min training image width: {} px'.format(IMG_WIDTH))
print('Min training image height: {} px'.format(IMG_HEIGHT))

In [None]:
def read_and_decode(filename, reshape_dims=(IMG_HEIGHT, IMG_WIDTH)):
    # Read an image file to a tensor as a sequence of bytes
    image = tf.io.read_file(filename)
    # Convert the tensor to a 3D uint8 tensor
    image = tf.image.decode_jpeg(image, channels=IMG_CHANNELS)
    # Convert 3D uint8 tensor with values in [0, 1]
    image = tf.image.convert_image_dtype(image, tf.float32)
    # Resize the image to the desired size
    return tf.image.resize(image, reshape_dims).numpy()

def show_image(filename):
    image = read_and_decode(filename, [IMG_HEIGHT, IMG_WIDTH])
    plt.imshow(image);
    plt.axis('off');

def training_plot(metrics, history):
    f, ax = plt.subplots(1, len(metrics), figsize=(5*len(metrics), 5))
    for idx, metric in enumerate(metrics):
        ax[idx].plot(history.history[metric], ls='dashed')
        ax[idx].set_xlabel('Epochs')
        ax[idx].set_ylabel(metric)
        ax[idx].plot(history.history['val_'+metric]);
        ax[idx].legend(['train_'+metric, 'val_'+metric])

In [None]:
# Display a random image from the dataset

rand_idx = np.random.randint(0, len(training_img)-1)
rand_img = training_img[rand_idx]

show_image(path+rand_img)

## Get the data

Let's create training, validation and test sets of images and their corresponding metadata. `Pawpularity` is the target to predict, we'll then create sets ensuring its distribution remains the same in each set. This is carried out using stratified sampling instead of random sampling.

In [None]:
path = "../input/petfinder-pawpularity-score/"

data = pd.read_csv(path+"/train.csv") # Dataset for images
data['Id'] = data['Id'].apply(lambda x: path+'train/'+x+'.jpg')
x, y = data.drop(["Id", "Pawpularity"], axis=1), data["Pawpularity"] # Subsets of the dataset for tabular data

# Create training, validation and test sets for tabular and image data
# First: test set is created by keeping apart 20% of the dataset
# Second: validation set is created by keeping apart 20% of the remaining dataset
# Third: Training set consists of the remaining samples after test and validation set creation

sssplit = StratifiedShuffleSplit(n_splits=1, test_size=0.2) # Use stratified sampling
for train_index, test_index in sssplit.split(x, y):
    # Tabular tmp training and test sets
    x_train_tmp, y_train_tmp = x.iloc[train_index], y.iloc[train_index]
    x_test, y_test = x.iloc[test_index], y.iloc[test_index]
    # Image tmp training and test sets
    train_img_tmp = data.iloc[train_index]
    test_img = data.iloc[test_index][['Id', 'Pawpularity']]
    
sssplit = StratifiedShuffleSplit(n_splits=1, test_size=0.2)
for train_index, val_index in sssplit.split(x_train_tmp, y_train_tmp):
    # Tabular training and validation set
    x_train, y_train = x_train_tmp.iloc[train_index], y_train_tmp.iloc[train_index]
    x_val, y_val = x_train_tmp.iloc[val_index], y_train_tmp.iloc[val_index]
    # Image training and validation set
    train_img = train_img_tmp.iloc[train_index][['Id', 'Pawpularity']]
    val_img = train_img_tmp.iloc[val_index][['Id', 'Pawpularity']]
    
# Export image sets for futher loading and processing
train_img.to_csv('/kaggle/working/training_img.csv', header=False, index=False)
val_img.to_csv('/kaggle/working/val_img.csv', header=False, index=False)
test_img.to_csv('/kaggle/working/test_img.csv', header=False, index=False)

In [None]:
# Pre-process tabular data for further input to the neural network

x_train, y_train = np.asarray(x_train), np.asarray(y_train).astype('float32')
x_val, y_val = np.asarray(x_val), np.asarray(y_val).astype('float32')
x_test, y_test = np.asarray(x_test), np.asarray(y_test).astype('float32')

In [None]:
# Store images into arrays for further processing by the network

train_dataset = list()
print('Loading training images...')
with open('/kaggle/working/training_img.csv', 'r') as file:
    reader = csv.reader(file) 
    for i, row in enumerate(reader):
        train_dataset.append(read_and_decode(row[0]))
print('...done!\n')
train_dataset = np.array(train_dataset)
print('Training dataset shape:', train_dataset.shape)
print()
        
val_dataset = list()
print('Loading validation images...')
with open('/kaggle/working/val_img.csv', 'r') as file:
    reader = csv.reader(file)
    for i, row in enumerate(reader):
        val_dataset.append(read_and_decode(row[0]))
print('...done!')
val_dataset = np.array(val_dataset)
print('Validation dataset shape:', val_dataset.shape)
print()
        
test_dataset = list()
with open('/kaggle/working/test_img.csv', 'r') as file:
    reader = csv.reader(file)
    for i, row in enumerate(reader):
        test_dataset.append(read_and_decode(row[0]))
test_dataset = np.array(test_dataset)
print('Test dataset shape:', test_dataset.shape)

## Building the model

The general architecture we'd like to use should accept two inputs: tabular and image data, and from there produce one input: a prediction for `Pawpularity`. 

Tabular data is passed to a dense NN and image data is passed to a CNN. Outputs from both the newtorks are concatenated, and the resulting vector passes through a series of consecutive output units.

In [None]:
# Build models for each data type using Keras Functional API

BATCH_SIZE = 128
IMG_WIDTH = width
IMG_HEIGHT = height
IMG_CHANNELS = 3

# Image data
input_img = tf.keras.layers.Input(shape=(IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
x = tf.keras.layers.Conv2D(32, (3, 3), activation='relu')(input_img)
x = tf.keras.layers.MaxPooling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(64, (3, 3), activation='relu')(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)

# Tabular data
input_tab = tf.keras.layers.Input(shape=(12,))
y = tf.keras.layers.Dense(16, activation='relu')(input_tab)
y = tf.keras.layers.Dense(32, activation='relu')(y)

# Concatenate models outputs
concatenated = tf.keras.layers.concatenate([x, y], axis=-1)

# Pass concatenated vector through a Dense layer with no activation to output a `Pawpularity` score prediction
output_score = tf.keras.layers.Dense(1, activation=None)(concatenated)

# Build general model with Keras Functional API
model = tf.keras.models.Model([input_img, input_tab], output_score)
model.compile(optimizer='adam',
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=[tf.keras.metrics.RootMeanSquaredError()])

In [None]:
print(model.summary())
tf.keras.utils.plot_model(model, show_shapes=True, show_layer_names=False)

## Training the model

In [None]:
history = model.fit([train_dataset, tf.convert_to_tensor(x_train)], tf.convert_to_tensor(y_train), epochs=20, batch_size=BATCH_SIZE,
                    validation_data=([val_dataset, tf.convert_to_tensor(x_val)], tf.convert_to_tensor(y_val)))

In [None]:
training_plot(['loss', 'root_mean_squared_error'], history)