## **Setting Things Up**

**1 If you haven't already, please hit :**

`File > Save a Copy in Drive`

**to copy this notebook to your Google drive, and work on a copy. If you don't do this, your changes won't be saved!**


**2 In order to use a GPU with your notebook, select :**

`Runtime > Change runtime type`

**menu, and then set the hardware accelerator dropdown to GPU. This can significantly speed up the training process.**

**3 In order to have enough memory with your notebook, select :**

`Runtime > Change runtime type`

**menu, and then select High-RAM in the Runtime shape dropdown.**

To facilitate your initial progress, we have included a ready-to-use code on Google Colab for this problem. It allows you to get started immediately. Additionally, if you prefer not to use Google Colab and prefer setting up your own programming environment or employing alternative methods, the provided files and code will still be valuable.

**PS:You need manually install the `tensorflow_text` and `tf-models-official` libraries**


**PS:You also need manually load pretrained Bert model weights `bert_en_uncased_preprocess_3` and `small_bert_bert_en_uncased_L-4_H-512_A-8_2` into `model weight` folder. The Bert model weights can be found in our datasets.**

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 27.3 gigabytes of available RAM

You are using a high-RAM runtime!


In [None]:
!pip install tensorflow_text

In [None]:
!pip install tf-models-official

In [None]:
# -*-coding:utf8 -*-
import tensorflow as tf
# print("TensorFlow version:", tf.__version__)

import os

import pandas as pd
import numpy as np

from matplotlib.pyplot import figure
import matplotlib.pyplot as plt
from PIL import Image

from tensorflow.python import keras
from keras.layers import Dense, Flatten, Conv2D
from keras.optimizers import RMSprop, Adam, SGD
from keras.callbacks import LearningRateScheduler
from keras.applications.vgg16 import VGG16
from keras import Model, Input, layers


import tensorflow_hub as hub
import tensorflow_text as text
from official.nlp import optimization  # to create AdamW optimizer

from keras import Model, Input, layers, regularizers
from keras.models import load_model
from keras import activations


TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 



In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
%cd /content/drive/MyDrive/Colab Notebooks/Vehicle Rating Prediction/new_images_with_folder
!unzip "new_images_with_folder.zip"

In [None]:
%cd /content/drive/MyDrive/Colab Notebooks/Vehicle Rating Prediction/new_interior_images_with_folder
!unzip "new_interior_images_with_folder.zip"

In [None]:

# change to your personal project address

%cd /content/drive/MyDrive/Colab Notebooks/Vehicle Rating Prediction

## **1 Data Processing**

In [None]:
var = "total score"
# var = "safety score"
# var = "performance score"
# var = "interior score"
# var = "critics score"

In [None]:
# 1 first we need to get the total score information from the csv file
# read info_data
file_name = "parametric data 2571 normalize " + var + ".csv"
info_data = pd.read_csv(file_name)
# get numpy matrix which only contains data (do not contain the title)
info_data = np.array(info_data)
print(info_data.shape)  # (2571, 303)
print(len(info_data))
print(info_data.shape[1])

(2571, 310)
2571
310


In [None]:
# Input Parametric Data (2571x303)
# column 0: origin index
# column 1: model name
# column 2-303: parametric feature
# column 304: total score
# column 305: critics score
# column 306: performance score
# column 307: interior score
# column 308: safety score
# column 309: data split index => 1: train data; 2: validation data; 3:test data

In [None]:
# train data shuffle index
num1 = 2055
idx1 = tf.range(num1)
idx1 = tf.random.shuffle(idx1)
# print(idx1)
# print(idx1[0])
with tf.compat.v1.Session():
    index1 = idx1.numpy()
# print(index1.shape)
# print(index1[0])

# validation data shuffle index
num2 = 258
idx2 = tf.range(num2)
idx2 = tf.random.shuffle(idx2)
# print(idx2)
# print(idx2[0])
with tf.compat.v1.Session():
    index2 = idx2.numpy()
# print(index2.shape)
# print(index2[0])

# test data shuffle index
num3 = 258
idx3 = tf.range(num3)
idx3 = tf.random.shuffle(idx3)
# print(idx3)
# print(idx3[0])
with tf.compat.v1.Session():
    index3 = idx3.numpy()
# print(index3.shape)
# print(index3[0])

**Assign the parametric data**

In [None]:
# assign the parametric data
# To predict the total score, you will need to assign the y variable using the values in column 304 of the parametric data.
# column 304: total score
# column 305: critics score
# column 306: performance score
# column 307: interior score
# column 308: safety score

x_train_tab = np.zeros((num1, 302))
y_train_tab = np.zeros((num1, 1))
for i in range(num1):
    x_train_tab[i, :] = np.array(info_data[index1[i], 2:304], dtype=float)
    y_train_tab[i] = np.array(info_data[index1[i], 304], dtype=float)
x_train_tab = tf.convert_to_tensor(x_train_tab)
y_train_tab = tf.convert_to_tensor(y_train_tab)
y_train = y_train_tab
print(y_train.shape)

x_validation_tab = np.zeros((num2, 302))
y_validation_tab = np.zeros((num2, 1))
for i in range(num2):
    x_validation_tab[i, :] = np.array(info_data[index2[i] + num1, 2:304], dtype=float)
    y_validation_tab[i] = np.array(info_data[index2[i] + num1, 304], dtype=float)
x_validation_tab = tf.convert_to_tensor(x_validation_tab)
y_validation_tab = tf.convert_to_tensor(y_validation_tab)
y_validation = y_validation_tab
print(y_validation.shape)

x_test_tab = np.zeros((num3, 302))
y_test_tab = np.zeros((num3, 1))
for i in range(num3):
    x_test_tab[i, :] = np.array(info_data[index3[i] + num1 + num2, 2:304], dtype=float)
    y_test_tab[i] = np.array(info_data[index3[i] + num1 + num2, 304], dtype=float)
x_test_tab = tf.convert_to_tensor(x_test_tab)
y_test_tab = tf.convert_to_tensor(y_test_tab)
y_test = y_test_tab
print(y_test.shape)

(2055, 1)
(258, 1)
(258, 1)


**Assign text data**

In [None]:
# load the text data
var_name = 'data split ' + var

sketch1 = pd.read_csv('text_data.csv', encoding='latin1')
train_df = sketch1[sketch1[var_name] == 1]
val_df = sketch1[sketch1[var_name] == 2]
test_df = sketch1[sketch1[var_name] == 3]
# print(train_df.shape)
# print(train_df)

sketch2 = train_df.astype({"text": str})
text1 = list(sketch2['text'])

sketch3 = val_df.astype({"text": str})
text2 = list(sketch3['text'])

sketch4 = test_df.astype({"text": str})
text3 = list(sketch4['text'])


train_text = [text1[i] for i in index1]
x_train_text = tf.constant(train_text)

validation_text = [text2[i] for i in index2]
x_validation_text = tf.constant(validation_text)

test_text = [text3[i] for i in index3]
x_test_text = tf.constant(test_text)
# print(len(train_text))
# print(train_text[0])


**Assign image data**

**Please take note of the image dimensions. For interior images, each image has dimensions of 300x448x3, whereas for exterior images, each image has dimensions of 290x448x3.**

In [None]:
# assign the image data
# exterior image => 290    interior image => 300
image = np.zeros((290, 448, 3))
x_train = np.zeros((num1, 290, 448, 3))
x_validation = np.zeros((num2, 290, 448, 3))
x_test = np.zeros((num3, 290, 448, 3))
# image = image_data[0]

# train data
for i in range(num1):
    folder_path = r'new_images_with_folder/' + info_data[index1[i]][1]
    dirs = os.listdir(folder_path)
    print(i)
    if len(dirs) > 0:
        # only one total picture
        dirpath = folder_path + '/' + dirs[0]  # get the angular front view of the car
        img = Image.open(dirpath)
        img_plt = np.array(img)
        x_train[i, :, :, :] = img_plt / 255.0


# validation data
for i in range(num2):
    folder_path = r'new_images_with_folder/' + info_data[index2[i]+num1][1]
    dirs = os.listdir(folder_path)
    print(i)
    if len(dirs) > 0:
        # only one total picture
        dirpath = folder_path + '/' + dirs[0]  # get the angular front view of the car
        img = Image.open(dirpath)
        img_plt = np.array(img)
        x_validation[i, :, :, :] = img_plt / 255.0


# test data
for i in range(num3):
    folder_path = r'new_images_with_folder/' + info_data[index3[i]+num1+num2][1]
    dirs = os.listdir(folder_path)
    print(i)
    if len(dirs) > 0:
        # only one total picture
        dirpath = folder_path + '/' + dirs[0]  # get the angular front view of the car
        img = Image.open(dirpath)
        img_plt = np.array(img)
        x_test[i, :, :, :] = img_plt / 255.0


x_train_img = tf.convert_to_tensor(x_train)
x_validation_img = tf.convert_to_tensor(x_validation)
x_test_img = tf.convert_to_tensor(x_test)

## **2-1 MML Model (Par+Text+Img)**

In [None]:
# construct MML model
adam_optimizer = Adam(learning_rate=0.00005)
rms_prop_optimizer = RMSprop(learning_rate=0.001)
sgd_optimizer = SGD(learning_rate=0.01, momentum=0.9, nesterov=False)

In [None]:
# load the pretrain model
# When predicting total score, critics score, performance score and safety score, we only use exterior image to train the CNN model.
# When predicting interior score, we only use interior image to train the CNN model.


# CNN
CNNmodel = tf.keras.models.load_model('model weight/' + var + '_Ex_Img.h5')
# total score, safety score, performance score, critics score => _Ex_Img
# interior socre => _In_Img
for layer in CNNmodel.layers:
  layer._name = layer._name + "_a"
CNNmodel.summary()

# MLP
MLPmodel = tf.keras.models.load_model('model weight/' + var + '_Par.h5')
for layer in MLPmodel.layers:
  layer._name = layer._name + "_b"
MLPmodel.summary()

# Bert
model_name = 'model weight/' + var + '_Text.h5'
Bertmodel = tf.keras.models.load_model(model_name, custom_objects={'KerasLayer': hub.KerasLayer})
for layer in Bertmodel.layers:
  layer._name = layer._name + "_c"
Bertmodel.summary()


In [None]:
for layer in CNNmodel.layers:
    layer.trainable = True
    print(layer.name, layer)

CNN_weight = CNNmodel.layers[-1].get_weights()[0]
CNN_bias = CNNmodel.layers[-1].get_weights()[1]
print(CNN_weight)
print(CNN_bias)

for layer in MLPmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
MLP_weight = MLPmodel.layers[-1].get_weights()[0]
MLP_bias = MLPmodel.layers[-1].get_weights()[1]
print(MLP_weight)
print(MLP_bias)

for layer in Bertmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
Bert_weight = Bertmodel.layers[-1].get_weights()[0]
Bert_bias = Bertmodel.layers[-1].get_weights()[1]
print(Bert_weight)
print(Bert_bias)

# These coefficients are calculated from the file --- "Get Linear Regression Weights"
initializer1 = []
for i in range(100):
    initializer1.append((MLP_weight[i] * 0.92594368))
for i in range(100):
    initializer1.append((Bert_weight[i] * 0.00564145))
for i in range(100):
    initializer1.append((CNN_weight[i] * 0.17254082))
initializer1 = tf.keras.initializers.Constant(initializer1)

print(initializer1)
print('finished')

out1_img = CNNmodel.layers[-2].output
out1_par = MLPmodel.layers[-2].output
out1_text = Bertmodel.layers[-2].output
out2 = tf.keras.layers.Concatenate(axis=1, name='concatenation_tab_text_img')([out1_par, out1_text, out1_img])
out5 = layers.Dense(1, activation='relu', name='concatenation_dense', kernel_initializer=initializer1)(out2)
model = Model([MLPmodel.input, Bertmodel.input, CNNmodel.input], out5, name='MML_Model_Par_Text_Img')

model.compile(
    optimizer=adam_optimizer,
    loss='mse',
    metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse'), 'mse', 'mae']
)

model.summary()

In [None]:
def scheduler(epoch, lr):
    min_lr = 0.0000001
    if epoch < 2:
        return lr
    else:
        if lr < min_lr:
            lr = min_lr
            return lr
        else:
          return lr * tf.math.exp(-0.01)
          # return lr


In [None]:
reduce_lr = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose=1)
early_stop = tf.keras.callbacks.EarlyStopping(patience=10, monitor="val_loss", restore_best_weights=True, verbose=1)
EPOCHS = 200
history = model.fit([x_train_tab, x_train_text, x_train_img], y_train, epochs=EPOCHS, batch_size=32, validation_data=([x_validation_tab, x_validation_text, x_validation_img], y_validation), verbose=2, callbacks=[early_stop, reduce_lr])
print(history)

In [None]:
test_loss, test_rmse, test_mse, test_mae = model.evaluate([x_test_tab, x_test_text, x_test_img], y_test, verbose=2)
validation_loss, validation_rmse, validation_mse, validation_mae = model.evaluate([x_validation_tab, x_validation_text, x_validation_img], y_validation, verbose=2)
train_loss, train_rmse, train_mse, train_mae = model.evaluate([x_train_tab, x_train_text, x_train_img], y_train, verbose=2)

9/9 - 0s - loss: 0.0042 - rmse: 0.0646 - mse: 0.0042 - mae: 0.0496 - 496ms/epoch - 55ms/step
9/9 - 0s - loss: 0.0046 - rmse: 0.0677 - mse: 0.0046 - mae: 0.0501 - 494ms/epoch - 55ms/step
65/65 - 4s - loss: 1.5234e-04 - rmse: 0.0123 - mse: 1.5234e-04 - mae: 0.0096 - 4s/epoch - 55ms/step


In [None]:
# store the model
# summarize the loaded model
model.summary()
# save the best performing model to file
model_name = 'model weight/' + var + '_MML_Par_Text_Img.h5'
model.save_weights(model_name)
# model.save('model weight/' + var + '_MML_Par_Text_Img.h5', 'saved_model') # infeasible

In [None]:
result = model.predict([x_test_tab, x_test_text, x_test_img])
print(result)

In [None]:
figure(figsize=(4, 3), dpi=80)
plt.scatter(y_test, result, s=3)
x = [0, 1]
y = [0, 1]
plt.plot(x, y, color="black")
plt.xlabel(var + ' ground truth')
plt.ylabel(var + ' prediction')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.tight_layout()
plt.show()

plot_history(history)

In [None]:
from math import nan
test = np.array(y_test).T
# print(test)
predict = np.array(result).T

correlation_matrix = np.corrcoef(test, predict)
print(correlation_matrix)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
##range: 0.8199158648859902

print (r_squared)

In [None]:
output = pd.DataFrame()
idx = 0
output.loc[idx, 'train_rmse'] = train_rmse
output.loc[idx, 'train_mse'] = train_mse
output.loc[idx, 'train_mae'] = train_mae

output.loc[idx, 'validation_rmse'] = validation_rmse
output.loc[idx, 'validation_mse'] = validation_mse
output.loc[idx, 'validation_mae'] = validation_mae

output.loc[idx, 'test_rmse'] = test_rmse
output.loc[idx, 'test_mse'] = test_mse
output.loc[idx, 'test_mae'] = test_mae

output.loc[idx, 'r^2'] = r_squared
pd.set_option('display.max_columns', None)
print(output)

output.to_csv('MML_value.csv')

## **2-2 MML Model (Text+Img)**

In [None]:
# construct MML model
adam_optimizer = Adam(learning_rate=0.00005)
rms_prop_optimizer = RMSprop(learning_rate=0.001)
sgd_optimizer = SGD(learning_rate=0.01, momentum=0.9, nesterov=False)

In [None]:
# load the pretrain model
# When predicting total score, critics score, performance score and safety score, we only use exterior image to train the CNN model.
# When predicting interior score, we only use interior image to train the CNN model.


# CNN
CNNmodel = tf.keras.models.load_model('model weight/' + var + '_Ex_Img.h5')
# interior socre => _In_Img
for layer in CNNmodel.layers:
  layer._name = layer._name + "_a"
CNNmodel.summary()

# Bert
model_name = 'model weight/' + var + '_Text.h5'
Bertmodel = tf.keras.models.load_model(model_name, custom_objects={'KerasLayer': hub.KerasLayer})
for layer in Bertmodel.layers:
  layer._name = layer._name + "_c"
Bertmodel.summary()


In [None]:
for layer in CNNmodel.layers:
    layer.trainable = True
    print(layer.name, layer)

CNN_weight = CNNmodel.layers[-1].get_weights()[0]
CNN_bias = CNNmodel.layers[-1].get_weights()[1]
print(CNN_weight)
print(CNN_bias)

for layer in Bertmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
Bert_weight = Bertmodel.layers[-1].get_weights()[0]
Bert_bias = Bertmodel.layers[-1].get_weights()[1]
print(Bert_weight)
print(Bert_bias)

# These coefficients are calculated from the file --- "Get Linear Regression Weights"
initializer1 = []
for i in range(100):
    initializer1.append((Bert_weight[i] * 0.00564145))
for i in range(100):
    initializer1.append((CNN_weight[i] * 0.17254082))
initializer1 = tf.keras.initializers.Constant(initializer1)

print(initializer1)
print('finished')

out1_img = CNNmodel.layers[-2].output
out1_text = Bertmodel.layers[-2].output
out2 = tf.keras.layers.Concatenate(axis=1, name='concatenation_text_img')([out1_text, out1_img])
out5 = layers.Dense(1, activation='relu', name='concatenation_dense', kernel_initializer=initializer1)(out2)
model = Model([Bertmodel.input, CNNmodel.input], out5, name='MML_Model_Text_Img')

model.compile(
    optimizer=adam_optimizer,
    loss='mse',
    metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse'), 'mse', 'mae']
)

model.summary()

In [None]:
def scheduler(epoch, lr):
    min_lr = 0.0000001
    if epoch < 2:
        return lr
    else:
        if lr < min_lr:
            lr = min_lr
            return lr
        else:
          return lr * tf.math.exp(-0.01)
          # return lr


In [None]:
reduce_lr = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose=1)
early_stop = tf.keras.callbacks.EarlyStopping(patience=10, monitor="val_loss", restore_best_weights=True, verbose=1)
EPOCHS = 200
history = model.fit([x_train_text, x_train_img], y_train, epochs=EPOCHS, batch_size=32, validation_data=([x_validation_text, x_validation_img], y_validation), verbose=2, callbacks=[early_stop, reduce_lr])
print(history)

In [None]:
test_loss, test_rmse, test_mse, test_mae = model.evaluate([x_test_text, x_test_img], y_test, verbose=2)
validation_loss, validation_rmse, validation_mse, validation_mae = model.evaluate([x_validation_text, x_validation_img], y_validation, verbose=2)
train_loss, train_rmse, train_mse, train_mae = model.evaluate([x_train_text, x_train_img], y_train, verbose=2)

9/9 - 0s - loss: 0.0042 - rmse: 0.0646 - mse: 0.0042 - mae: 0.0496 - 496ms/epoch - 55ms/step
9/9 - 0s - loss: 0.0046 - rmse: 0.0677 - mse: 0.0046 - mae: 0.0501 - 494ms/epoch - 55ms/step
65/65 - 4s - loss: 1.5234e-04 - rmse: 0.0123 - mse: 1.5234e-04 - mae: 0.0096 - 4s/epoch - 55ms/step


In [None]:
# store the model
# summarize the loaded model
model.summary()
# save the best performing model to file
model_name = 'model weight/' + var + '_MML_Text_Img.h5'
model.save_weights(model_name)
# model.save('model weight/' + var + '_MML_Text_Img.h5', 'saved_model') # infeasible

In [None]:
result = model.predict([x_test_text, x_test_img])
print(result)

In [None]:
figure(figsize=(4, 3), dpi=80)
plt.scatter(y_test, result, s=3)
x = [0, 1]
y = [0, 1]
plt.plot(x, y, color="black")
plt.xlabel(var + ' ground truth')
plt.ylabel(var + ' prediction')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.tight_layout()
plt.show()

plot_history(history)

In [None]:
from math import nan
test = np.array(y_test).T
# print(test)
predict = np.array(result).T

correlation_matrix = np.corrcoef(test, predict)
print(correlation_matrix)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
##range: 0.8199158648859902

print (r_squared)

In [None]:
output = pd.DataFrame()
idx = 0
output.loc[idx, 'train_rmse'] = train_rmse
output.loc[idx, 'train_mse'] = train_mse
output.loc[idx, 'train_mae'] = train_mae

output.loc[idx, 'validation_rmse'] = validation_rmse
output.loc[idx, 'validation_mse'] = validation_mse
output.loc[idx, 'validation_mae'] = validation_mae

output.loc[idx, 'test_rmse'] = test_rmse
output.loc[idx, 'test_mse'] = test_mse
output.loc[idx, 'test_mae'] = test_mae

output.loc[idx, 'r^2'] = r_squared
pd.set_option('display.max_columns', None)
print(output)

output.to_csv('MML_value.csv')

## **2-3 MML Model (Par+Img)**

In [None]:
# construct MML model
adam_optimizer = Adam(learning_rate=0.00005)
rms_prop_optimizer = RMSprop(learning_rate=0.001)
sgd_optimizer = SGD(learning_rate=0.01, momentum=0.9, nesterov=False)

In [None]:
# load the pretrain model
# When predicting total score, critics score, performance score and safety score, we only use exterior image to train the CNN model.
# When predicting interior score, we only use interior image to train the CNN model.


# CNN
CNNmodel = tf.keras.models.load_model('model weight/' + var + '_Ex_Img.h5')
# interior socre => _In_Img
for layer in CNNmodel.layers:
  layer._name = layer._name + "_a"
CNNmodel.summary()

# MLP
MLPmodel = tf.keras.models.load_model('model weight/' + var + '_Par.h5')
for layer in MLPmodel.layers:
  layer._name = layer._name + "_b"
MLPmodel.summary()

In [None]:
for layer in CNNmodel.layers:
    layer.trainable = True
    print(layer.name, layer)

CNN_weight = CNNmodel.layers[-1].get_weights()[0]
CNN_bias = CNNmodel.layers[-1].get_weights()[1]
print(CNN_weight)
print(CNN_bias)

for layer in MLPmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
MLP_weight = MLPmodel.layers[-1].get_weights()[0]
MLP_bias = MLPmodel.layers[-1].get_weights()[1]
print(MLP_weight)
print(MLP_bias)

# These coefficients are calculated from the file --- "Get Linear Regression Weights"
initializer1 = []
for i in range(100):
    initializer1.append((MLP_weight[i] * 0.92594368))
for i in range(100):
    initializer1.append((CNN_weight[i] * 0.17254082))
initializer1 = tf.keras.initializers.Constant(initializer1)

print(initializer1)
print('finished')

out1_img = CNNmodel.layers[-2].output
out1_par = MLPmodel.layers[-2].output
out2 = tf.keras.layers.Concatenate(axis=1)([out1_par, out1_img])
out5 = layers.Dense(1, activation='relu', name='concatenation_dense', kernel_initializer=initializer1)(out2)
model = Model([MLPmodel.input, CNNmodel.input], out5)

model.compile(
    optimizer=adam_optimizer,
    loss='mse',
    metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse'), 'mse', 'mae']
)

model.summary()

In [None]:
def scheduler(epoch, lr):
    min_lr = 0.0000001
    if epoch < 2:
        return lr
    else:
        if lr < min_lr:
            lr = min_lr
            return lr
        else:
          return lr * tf.math.exp(-0.01)
          # return lr


In [None]:
reduce_lr = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose=1)
early_stop = tf.keras.callbacks.EarlyStopping(patience=10, monitor="val_loss", restore_best_weights=True, verbose=1)
EPOCHS = 200
history = model.fit([x_train_tab, x_train_img], y_train, epochs=EPOCHS, batch_size=32, validation_data=([x_validation_tab, x_validation_img], y_validation), verbose=2, callbacks=[early_stop, reduce_lr])
print(history)

In [None]:
test_loss, test_rmse, test_mse, test_mae = model.evaluate([x_test_tab, x_test_img], y_test, verbose=2)
validation_loss, validation_rmse, validation_mse, validation_mae = model.evaluate([x_validation_tab, x_validation_img], y_validation, verbose=2)
train_loss, train_rmse, train_mse, train_mae = model.evaluate([x_train_tab, x_train_img], y_train, verbose=2)

9/9 - 0s - loss: 0.0042 - rmse: 0.0646 - mse: 0.0042 - mae: 0.0496 - 496ms/epoch - 55ms/step
9/9 - 0s - loss: 0.0046 - rmse: 0.0677 - mse: 0.0046 - mae: 0.0501 - 494ms/epoch - 55ms/step
65/65 - 4s - loss: 1.5234e-04 - rmse: 0.0123 - mse: 1.5234e-04 - mae: 0.0096 - 4s/epoch - 55ms/step


In [None]:
# store the model
# summarize the loaded model
model.summary()
# save the best performing model to file
model_name = 'model weight/' + var + '_MML_Par_Img.h5'
model.save_weights(model_name)
# model.save('model weight/' + var + '_MML_Par_Text_Img.h5', 'saved_model') # infeasible

In [None]:
result = model.predict([x_test_tab, x_test_img])
print(result)

In [None]:
figure(figsize=(4, 3), dpi=80)
plt.scatter(y_test, result, s=3)
x = [0, 1]
y = [0, 1]
plt.plot(x, y, color="black")
plt.xlabel(var + ' ground truth')
plt.ylabel(var + ' prediction')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.tight_layout()
plt.show()

plot_history(history)

In [None]:
from math import nan
test = np.array(y_test).T
# print(test)
predict = np.array(result).T

correlation_matrix = np.corrcoef(test, predict)
print(correlation_matrix)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
##range: 0.8199158648859902

print (r_squared)

In [None]:
output = pd.DataFrame()
idx = 0
output.loc[idx, 'train_rmse'] = train_rmse
output.loc[idx, 'train_mse'] = train_mse
output.loc[idx, 'train_mae'] = train_mae

output.loc[idx, 'validation_rmse'] = validation_rmse
output.loc[idx, 'validation_mse'] = validation_mse
output.loc[idx, 'validation_mae'] = validation_mae

output.loc[idx, 'test_rmse'] = test_rmse
output.loc[idx, 'test_mse'] = test_mse
output.loc[idx, 'test_mae'] = test_mae

output.loc[idx, 'r^2'] = r_squared
pd.set_option('display.max_columns', None)
print(output)

output.to_csv('MML_value.csv')

## **2-4 MML Model (Par+Text)**

In [None]:
# construct MML model
adam_optimizer = Adam(learning_rate=0.00005)
rms_prop_optimizer = RMSprop(learning_rate=0.001)
sgd_optimizer = SGD(learning_rate=0.01, momentum=0.9, nesterov=False)

In [None]:
# load the pretrain model
# When predicting total score, critics score, performance score and safety score, we only use exterior image to train the CNN model.
# When predicting interior score, we only use interior image to train the CNN model.


# MLP
MLPmodel = tf.keras.models.load_model('model weight/' + var + '_Par.h5')
for layer in MLPmodel.layers:
  layer._name = layer._name + "_b"
MLPmodel.summary()

# Bert
model_name = 'model weight/' + var + '_Text.h5'
Bertmodel = tf.keras.models.load_model(model_name, custom_objects={'KerasLayer': hub.KerasLayer})
for layer in Bertmodel.layers:
  layer._name = layer._name + "_c"
Bertmodel.summary()

In [None]:

for layer in MLPmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
MLP_weight = MLPmodel.layers[-1].get_weights()[0]
MLP_bias = MLPmodel.layers[-1].get_weights()[1]
print(MLP_weight)
print(MLP_bias)

for layer in Bertmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
Bert_weight = Bertmodel.layers[-1].get_weights()[0]
Bert_bias = Bertmodel.layers[-1].get_weights()[1]
print(Bert_weight)
print(Bert_bias)

# These coefficients are calculated from the file --- "Get Linear Regression Weights"
initializer1 = []
for i in range(100):
    initializer1.append((MLP_weight[i] * 0.92594368))
for i in range(100):
    initializer1.append((Bert_weight[i] * 0.00564145))
initializer1 = tf.keras.initializers.Constant(initializer1)

print(initializer1)
print('finished')

out1_par = MLPmodel.layers[-2].output
out1_text = Bertmodel.layers[-2].output
out2 = tf.keras.layers.Concatenate(axis=1)([out1_par, out1_text])
out5 = layers.Dense(1, activation='relu', name='concatenation_dense', kernel_initializer=initializer1)(out2)
model = Model([MLPmodel.input, Bertmodel.input], out5)

model.compile(
    optimizer=adam_optimizer,
    loss='mse',
    metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse'), 'mse', 'mae']
)

model.summary()


In [None]:
def scheduler(epoch, lr):
    min_lr = 0.0000001
    if epoch < 2:
        return lr
    else:
        if lr < min_lr:
            lr = min_lr
            return lr
        else:
          return lr * tf.math.exp(-0.01)
          # return lr

In [None]:
reduce_lr = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose=1)
early_stop = tf.keras.callbacks.EarlyStopping(patience=10, monitor="val_loss", restore_best_weights=True, verbose=1)
EPOCHS = 200
history = model.fit([x_train_tab, x_train_text], y_train, epochs=EPOCHS, batch_size=32, validation_data=([x_validation_tab, x_validation_text], y_validation), verbose=2, callbacks=[early_stop, reduce_lr])
print(history)

In [None]:
test_loss, test_rmse, test_mse, test_mae = model.evaluate([x_test_tab, x_test_text], y_test, verbose=2)
validation_loss, validation_rmse, validation_mse, validation_mae = model.evaluate([x_validation_tab, x_validation_text], y_validation, verbose=2)
train_loss, train_rmse, train_mse, train_mae = model.evaluate([x_train_tab, x_train_text], y_train, verbose=2)

In [None]:
# store the model
# summarize the loaded model
model.summary()
# save the best performing model to file
model_name = 'model weight/' + var + '_MML_Par_Text.h5'
model.save_weights(model_name)
# model.save('model weight/' + var + '_MML_Par_Text.h5', 'saved_model') # infeasible

In [None]:
result = model.predict([x_test_tab, x_test_text, x_test_img])
print(result)

In [None]:
figure(figsize=(4, 3), dpi=80)
plt.scatter(y_test, result, s=3)
x = [0, 1]
y = [0, 1]
plt.plot(x, y, color="black")
plt.xlabel(var + ' ground truth')
plt.ylabel(var + ' prediction')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.tight_layout()
plt.show()

plot_history(history)

In [None]:
from math import nan
test = np.array(y_test).T
# print(test)
predict = np.array(result).T

correlation_matrix = np.corrcoef(test, predict)
print(correlation_matrix)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
##range: 0.8199158648859902

print (r_squared)

In [None]:
output = pd.DataFrame()
idx = 0
output.loc[idx, 'train_rmse'] = train_rmse
output.loc[idx, 'train_mse'] = train_mse
output.loc[idx, 'train_mae'] = train_mae

output.loc[idx, 'validation_rmse'] = validation_rmse
output.loc[idx, 'validation_mse'] = validation_mse
output.loc[idx, 'validation_mae'] = validation_mae

output.loc[idx, 'test_rmse'] = test_rmse
output.loc[idx, 'test_mse'] = test_mse
output.loc[idx, 'test_mae'] = test_mae

output.loc[idx, 'r^2'] = r_squared
pd.set_option('display.max_columns', None)
print(output)

output.to_csv('MML_value.csv')

## **3 Verification: Load model again(we need construct structure again and load weights to verify our model was stored successfully)**

In [None]:
# 4 construct MML model
adam_optimizer = Adam(learning_rate=0.00005)
rms_prop_optimizer = RMSprop(learning_rate=0.001)
sgd_optimizer = SGD(learning_rate=0.01, momentum=0.9, nesterov=False)

# load the pretrain model
# CNN
CNNmodel = tf.keras.models.load_model('model weight/' + var + '_Ex_Img.h5')
# interior socre => _In_Img
for layer in CNNmodel.layers:
  layer._name = layer._name + "_a"
CNNmodel.summary()

# MLP
MLPmodel = tf.keras.models.load_model('model weight/' + var + '_Par.h5')
for layer in MLPmodel.layers:
  layer._name = layer._name + "_b"
MLPmodel.summary()

# Bert
model_name = 'model weight/' + var + '_Text.h5'
Bertmodel = tf.keras.models.load_model(model_name, custom_objects={'KerasLayer': hub.KerasLayer})
for layer in Bertmodel.layers:
  layer._name = layer._name + "_c"
Bertmodel.summary()


for layer in CNNmodel.layers:
    layer.trainable = True
    print(layer.name, layer)

CNN_weight = CNNmodel.layers[-1].get_weights()[0]
CNN_bias = CNNmodel.layers[-1].get_weights()[1]
# print(CNN_weight)
# print(CNN_bias)

for layer in MLPmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
MLP_weight = MLPmodel.layers[-1].get_weights()[0]
MLP_bias = MLPmodel.layers[-1].get_weights()[1]
# print(MLP_weight)
# print(MLP_bias)

for layer in Bertmodel.layers:
    layer.trainable = True
    print(layer.name, layer)
Bert_weight = Bertmodel.layers[-1].get_weights()[0]
Bert_bias = Bertmodel.layers[-1].get_weights()[1]
# print(Bert_weight)
# print(Bert_bias)

# These coefficients are calculated from the file --- "Get Linear Regression Weights"
initializer1 = []
for i in range(100):
    initializer1.append((MLP_weight[i] * 0.92594368))
for i in range(100):
    initializer1.append((Bert_weight[i] * 0.00564145))
for i in range(100):
    initializer1.append((CNN_weight[i] * 0.17254082))
initializer1 = tf.keras.initializers.Constant(initializer1)

print(initializer1)
print('finished')

out1_img = CNNmodel.layers[-2].output
out1_par = MLPmodel.layers[-2].output
out1_text = Bertmodel.layers[-2].output
out2 = tf.keras.layers.Concatenate(axis=1)([out1_par, out1_text, out1_img])
out5 = layers.Dense(1, activation='relu', name='concatenation_dense', kernel_initializer=initializer1)(out2)
model1 = Model([MLPmodel.input, Bertmodel.input, CNNmodel.input], out5)

model1.compile(
    optimizer=adam_optimizer,
    loss='mse',
    metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse'), 'mse', 'mae']
)

model1.summary()


In [None]:
model_name = 'model weight/' + var + '_MML_Par_Text_Img.h5'
model1.load_weights(model_name)
model1.summary()
print('yes')

In [None]:
result1 = model1.predict([x_test_tab, x_test_text, x_test_img])
# print(result1)

In [None]:
figure(figsize=(4, 3), dpi=400)
plt.scatter(y_test, result1, s=3)
x = [0, 1]
y = [0, 1]
plt.plot(x, y, color="black")
plt.xlabel(var + ' ground truth')
plt.ylabel(var + ' prediction')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.tight_layout()
# plt.savefig("MML_total_score.png")
plt.show()

In [None]:
test = np.array(y_test).T
# print(test)
predict = np.array(result1).T

correlation_matrix = np.corrcoef(test, predict)
print(correlation_matrix)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
##range: 0.8199158648859902

print (r_squared)