#  CA684 Machine Learning - Media Memorability Assignment

This task focuses on the problem of predicting how memorable a video is to viewers. It requires participants to automatically predict memorability scores for videos that reflect
the probability a video will be remembered. Models will be evaluated through standard evaluation
metrics used in ranking tasks (Spearman’s rank correlation).


## *TABLE OF CONTENT*



1. Function Definitions - Spearmans coefficient, read features (C3D , HMP)
2. Loading dev-set in colab
2. Data Preprocessing on Dev-set Data
3. Model Evaluation with video precomputed features
      *   Random Forest with C3D and HMP
      *   Linear Regression with C3D and HMP
      *   Neural Network with C3D and HMP
      *   Random Forest with C3D+HMP merged
4. Evaluating and comparing the features
5. Predicting the Memorability scores on Test-set
      *  Training entire 6000 Dev-set
      *  Predicting the Scores for the 2000 Test-set
6. Exporting the results to CSV file









**Import necessary libraries**


In [1]:
!pip install pyprind

import pandas as pd
from tensorflow.python.keras import Sequential
from tensorflow.python.keras import layers
from tensorflow.python.keras import regularizers
from tensorflow.python.keras.preprocessing.text import Tokenizer
import numpy as np
from string import punctuation
import pyprind
from collections import Counter
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

Collecting pyprind
  Downloading https://files.pythonhosted.org/packages/ab/b3/1f12ebc5009c65b607509393ad98240728b4401bc3593868fb161fdd3760/PyPrind-2.11.3-py2.py3-none-any.whl
Installing collected packages: pyprind
Successfully installed pyprind-2.11.3


In [31]:
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import VotingRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor

1. ***Function definations***: Functions for calculating Spearmans's correlation coefficient, read HMP features and C3D features

In [3]:
#Function to calculate Spearmans correlation coefficient
def Get_score(Y_pred,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_pred = np.squeeze(Y_pred)
    Y_true = np.squeeze(Y_true)
    if Y_pred.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_pred.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_pred':Y_pred})
            score_mat = Res[['Y_true','Y_pred']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_pred.shape[1]):
                Get_score(Y_pred[:,ii],Y_true[:,ii])


In [None]:
#fucntion to read HMP features
def read_HMP(fname):
    """Scan HMP(Histogram of Motion Patterns) features from file"""
    with open(fname) as f:
        for line in f:
            pairs=line.split()
            HMP_temp = { int(p.split(':')[0]) : float(p.split(':')[1]) for p in pairs}
    # there are 6075 bins, fill zeros
    HMP = np.zeros(6075)
    for idx in HMP_temp.keys():
        HMP[idx-1] = HMP_temp[idx]            
    return HMP

In [None]:
#function to read InceptionV3 features
def read_inception(fname):
    """Scan Inception V3 features from file"""
    with open(fname) as f:
        for line in f:
            pairs=line.split()
            incept_temp = { int(p.split(':')[0]) : float(p.split(':')[1]) for p in pairs}
    
    incept = np.zeros(6075)
    for idx in incept_temp.keys():
        incept[idx-1] = incept_temp[idx]            
    return incept

In [None]:
#Function to read C3D features
def read_C3D(fname):
    """Scan vectors from file"""
    with open(fname) as f:
        for line in f:
            C3D =[float(item) for item in line.split()] # convert to float type, using default separator
    return C3D

2. ***Loading dev-set data in colab***: We need to connect information in the google drive to this colab session. This can be done by running the following lines of code and then copy the authorization code of your account. Paste the authorization code into the output shell. We must then load the features and the memorability scores.

In [4]:
from google.colab import drive
import os
drive.mount('/content/drive/')
os.chdir('/content/drive/My Drive/CA684_Assignment/')

Mounted at /content/drive/


In [None]:
# for reproducability
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(1)

In [5]:
label_path = '/content/drive/My Drive/CA684_Assignment/Dev-set/Ground-truth/'
labels=pd.read_csv(label_path+'ground-truth.csv')

Feat_path = '/content/drive/My Drive/CA684_Assignment/Dev-set/'


In [None]:
HMP_Features = pd.DataFrame(columns = ['video', 'arrayInfo'])

#path
dirHMP_Dev = './Dev-set/HMP/'

for eachFile in os.listdir(dirHMP_Dev):
    if eachFile.endswith(".txt"):
        path = os.path.join(dirHMP_Dev, eachFile)
        arrayFile = readHMP(path)
        eachFile= eachFile.replace(".txt",".webm")
        dfHMP_Dev = dfHMP_Dev.append({'video': eachFile, 'arrayInfo': arrayFile}, ignore_index=True)
    else:
        break

In [None]:
c3d_feature = []
names = []
for filename in glob.glob("/content/drive/MyDrive/CA684_Assignment/Dev-set/C3D/*"):
  c3d = read_C3D(filename)
  c3d_feature.append(c3d)
  names.append((filename.split('/')[-1]).split('.')[0])

In [None]:
file_pick='/content/drive/My Drive/'
HMP_Pick=HMP_Features.to_pickle(file_pick+"HMP_PICK")

In [None]:
C3D_PICK=Features.to_pickle("/content/drive/My Drive/C3D_PICK")


In [None]:
#Load Inception Features
df = pd.DataFrame(columns = ['video', 'arrayInfo'])
dir_hmp = '/content/drive/My Drive/CA684_Assignment/Dev-set/InceptionV3'

for filename in os.listdir(dir_hmp):
    if filename.endswith(".txt"):
        path = os.path.join(dir_hmp, filename)
        array = read_inception(path)
        if "-56" in filename:
          filename.replace('-56','')
          fileName= filename.replace(".txt",".webm")
          df = df.append({'video': fileName, 'arrayInfo': array}, ignore_index=True)
    else:
        break

In [None]:
count=0
for item in df['video']:
  df['video'][count]=item.replace('-56.webm','.webm')
  count = count + 1
  

In [None]:
INCEPTION_PICK=df.to_pickle("/content/drive/My Drive/INCEPTION_V3")

Since laoding features everytime the colab is re-connected, its easier to save the features as a pickle file and this can be read whenever required

In [6]:
HMP_Features = pd.read_pickle(r'/content/drive/My Drive/HMP_PICK')

In [7]:
C3D_Features = pd.read_pickle(r'/content/drive/My Drive/C3D_PICK')

3. ***Data -preprocessing and merging***

In [None]:
C3D_Features.shape


(6000, 2)

In [None]:
C3D_Features.head()

Unnamed: 0,video,C3D
0,video3.webm,"[0.02024942, 0.0015778, 0.00082625, 0.00094509..."
1,video4.webm,"[0.000118, 0.00089075, 0.00018769, 4.543e-05, ..."
2,video6.webm,"[0.01176522, 0.00074577, 0.00078353, 1.328e-05..."
3,video8.webm,"[0.00022343, 0.00016499, 7.35e-06, 1.615e-05, ..."
4,video10.webm,"[9.006e-05, 0.00061494, 0.00343634, 0.00128092..."


In [8]:
final_feature = labels.merge(C3D_Features,on=["video"],how="inner")
final_feature.head(3)

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations,C3D
0,video3.webm,0.924,34,0.846,13,"[0.02024942, 0.0015778, 0.00082625, 0.00094509..."
1,video4.webm,0.923,33,0.667,12,"[0.000118, 0.00089075, 0.00018769, 4.543e-05, ..."
2,video6.webm,0.863,33,0.7,10,"[0.01176522, 0.00074577, 0.00078353, 1.328e-05..."


In [None]:
len(final_feature)

6000

In [None]:
len(final_feature['C3D'])

6000

In [None]:
final_feature['C3D'].isna().sum()

0

In [9]:
final_feature.drop(['nb_short-term_annotations','nb_long-term_annotations'], axis=1,inplace=True)

In [10]:
final_feature.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability,C3D
0,video3.webm,0.924,0.846,"[0.02024942, 0.0015778, 0.00082625, 0.00094509..."
1,video4.webm,0.923,0.667,"[0.000118, 0.00089075, 0.00018769, 4.543e-05, ..."
2,video6.webm,0.863,0.7,"[0.01176522, 0.00074577, 0.00078353, 1.328e-05..."
3,video8.webm,0.922,0.818,"[0.00022343, 0.00016499, 7.35e-06, 1.615e-05, ..."
4,video10.webm,0.95,0.9,"[9.006e-05, 0.00061494, 0.00343634, 0.00128092..."


In [None]:
HMP_Features.shape

(6000, 2)

In [None]:
HMP_Features.head

<bound method NDFrame.head of                video                                                HMP
0        video3.webm  [0.125563, 0.024036, 0.000314, 0.0, 0.015864, ...
1        video4.webm  [0.007526, 0.001421, 6.8e-05, 0.0, 0.001184, 0...
2        video6.webm  [0.109584, 0.018978, 0.000289, 0.0, 0.008774, ...
3        video8.webm  [0.120431, 0.013561, 0.000277, 0.0, 0.018974, ...
4       video10.webm  [0.005026, 0.001356, 5.5e-05, 0.0, 0.000665, 2...
...              ...                                                ...
5995  video7488.webm  [0.003779, 0.001352, 7.7e-05, 0.0, 0.000475, 7...
5996  video7489.webm  [0.001396, 0.000417, 7e-06, 0.0, 0.000145, 4e-...
5997  video7491.webm  [0.023139, 0.007435, 0.000322, 0.0, 0.004319, ...
5998  video7492.webm  [0.0149, 0.004607, 9.9e-05, 0.0, 0.001559, 1.4...
5999  video7493.webm  [0.041592, 0.013047, 0.000448, 0.0, 0.010044, ...

[6000 rows x 2 columns]>

In [11]:
final_feature_hmp=labels.merge(HMP_Features,on=["video"],how="inner")
final_feature_hmp.head(3)

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations,HMP
0,video3.webm,0.924,34,0.846,13,"[0.125563, 0.024036, 0.000314, 0.0, 0.015864, ..."
1,video4.webm,0.923,33,0.667,12,"[0.007526, 0.001421, 6.8e-05, 0.0, 0.001184, 0..."
2,video6.webm,0.863,33,0.7,10,"[0.109584, 0.018978, 0.000289, 0.0, 0.008774, ..."


In [12]:
final_feature_hmp.drop(['nb_short-term_annotations','nb_long-term_annotations'],axis=1,inplace=True)


In [13]:
final_feature_hmp.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability,HMP
0,video3.webm,0.924,0.846,"[0.125563, 0.024036, 0.000314, 0.0, 0.015864, ..."
1,video4.webm,0.923,0.667,"[0.007526, 0.001421, 6.8e-05, 0.0, 0.001184, 0..."
2,video6.webm,0.863,0.7,"[0.109584, 0.018978, 0.000289, 0.0, 0.008774, ..."
3,video8.webm,0.922,0.818,"[0.120431, 0.013561, 0.000277, 0.0, 0.018974, ..."
4,video10.webm,0.95,0.9,"[0.005026, 0.001356, 5.5e-05, 0.0, 0.000665, 2..."


In [None]:
df_inception = df.merge(labels,on=["video"],how="inner")
df_inception.columns
df_inception.head()


In [None]:
result_array = np.empty((0, 6075))
result_array

In [None]:
for line in df_inception['arrayInfo']:
    result_array = np.append(result_array, np.array([line]), axis = 0)

In [None]:
arrayInfo = df_inception['arrayInfo'].values

In [None]:
X_arrHMP = result_array
print(type(X_arrHMP))


4. PART1:  ***Evaluation of data with model*** -C3D

a) ***Training C3D with RandomForest***

In [15]:

y = final_feature[['short-term_memorability','long-term_memorability']].values #these are our target columns
X_C3D=np.array(final_feature['C3D'].tolist())


In [None]:
X_C3D.shape

In [16]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X_C3D,y,test_size=0.2,random_state=42)

In [17]:
print('X_train ', X_train.shape)
print('X_test  ', X_test.shape)
print('Y_train ', y_train.shape)
print('Y_test  ', y_test.shape)


X_train  (4800, 101)
X_test   (1200, 101)
Y_train  (4800, 2)
Y_test   (1200, 2)


In [18]:
final_feature.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6000 entries, 0 to 5999
Data columns (total 4 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   video                    6000 non-null   object 
 1   short-term_memorability  6000 non-null   float64
 2   long-term_memorability   6000 non-null   float64
 3   C3D                      6000 non-null   object 
dtypes: float64(2), object(2)
memory usage: 234.4+ KB


In [None]:

from sklearn.ensemble import RandomForestRegressor
captions_rf = RandomForestRegressor(n_estimators=100,random_state=45)

In [None]:
captions_rf.fit(X_train,y_train)

In [None]:
Y_pred2 = captions_rf.predict(X_test)
Get_score(Y_pred2, y_test)

In [19]:
from sklearn.tree import DecisionTreeRegressor
regressor2_c3d = DecisionTreeRegressor()
regressor2_c3d.fit(X_train, y_train)

DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=None, splitter='best')

In [20]:
y_pred_c3d_DT = regressor2_c3d.predict(X_test)
Get_score(y_pred_c3d_DT, y_test)

The Spearman's correlation coefficient is: 0.070
The Spearman's correlation coefficient is: -0.017


b) ***Training C3D with Linear Regression***

In [None]:
linearRegressor = LinearRegression()
linearRegressor.fit(X_train, y_train)
y_pred = linearRegressor.predict(X_test)
Get_score(y_pred, y_test)

c) ***Training C3D with NeuralNetwork***

In [None]:
model = Sequential()
model.add(layers.Dense(500,activation='relu',kernel_regularizer=None,input_shape=(X_C3D.shape[1],)))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(500,activation='relu',kernel_regularizer=None))
model.add(layers.Dropout(0.1))
model.add(layers.Dense(500,activation='relu',kernel_regularizer=None))
model.add(layers.Dropout(0.1))
#model.add(layers.Dense(100,activation='relu',kernel_regularizer=regularizers.l2(0.001)))
#model.add(layers.Dropout(0.2))
model.add(layers.Dense(2,activation='sigmoid'))
model.summary()


model.compile(optimizer='rmsprop',loss=['mae'],metrics=['acc'])
history=model.fit(x=X_train,y=y_train,batch_size=50,epochs=40,validation_split=0.2,shuffle=True,verbose=False)

In [None]:
y_pred = model.predict(X_test)
Get_score(y_pred, y_test)

d) ***Training C3D with SVR***

In [None]:
X_svr = np.array(final_feature['C3D'].tolist())
svr_y_short = final_feature[['short-term_memorability']].values
svr_y_long = final_feature[['long-term_memorability']].values

In [None]:
short_X_train,short_X_test,short_y_train,short_y_test = train_test_split(X_svr,svr_y_short,test_size=0.2,random_state=40)
long_X_train,long_X_test,long_y_train,long_y_test = train_test_split(X_svr,svr_y_long,test_size=0.2,random_state=40)

In [None]:

from sklearn.preprocessing import StandardScaler
short_X = StandardScaler()
short_y = StandardScaler()
short_X_train = short_X.fit_transform(short_X_train)
short_y_train = short_y.fit_transform(short_y_train)
long_X = StandardScaler()
long_y = StandardScaler()
long_X_train = long_X.fit_transform(long_X_train)
long_y_train = long_y.fit_transform(long_y_train)

In [None]:
from sklearn.svm import SVR
short_regressor = SVR(kernel = 'rbf')
long_regressor = SVR(kernel = 'rbf')
short_regressor.fit(short_X_train, short_y_train)
long_regressor.fit(long_X_train,long_y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [None]:
short_pred = short_regressor.predict(short_X_test)
short_pred = short_y.inverse_transform(short_pred)
long_pred = long_regressor.predict(long_X_test)
long_pred = long_y.inverse_transform(long_pred)

In [None]:
Get_score(short_pred, short_y_test)
Get_score(long_pred, long_y_test)

The Spearman's correlation coefficient is: 0.242
The Spearman's correlation coefficient is: 0.107


  PART2:  ***Evaluation of data with model*** -HMP

a) ***HMP with Random Forest***

In [None]:

y_hmp = final_feature_hmp[['short-term_memorability','long-term_memorability']].values #these are our target columns
X_hmp=np.array(final_feature_hmp['HMP'].tolist())

In [None]:
from sklearn.model_selection import train_test_split
X_train_hmp,X_test_hmp,y_train_hmp,y_test_hmp = train_test_split(X_hmp,y_hmp,test_size=0.2,random_state=42)

In [None]:
X_hmp.shape

In [None]:
hmp_rf = RandomForestRegressor(n_estimators=100,random_state=45)
hmp_rf.fit(X_train_hmp,y_train_hmp)



In [None]:
Y_pred_hmp = hmp_rf.predict(X_test_hmp)
Get_score(Y_pred_hmp, y_test)

b) ***HMP with Linear Regression***

In [None]:
linearRegressor = LinearRegression()
linearRegressor.fit(X_train_hmp, y_train_hmp)

In [None]:

y_pred_hmp = linearRegressor.predict(X_test_hmp)
Get_score(y_pred, y_test_hmp)

c) ***HMP with Neural Network***

In [None]:
model = Sequential()
model.add(layers.Dense(500,activation='relu',kernel_regularizer=None,input_shape=(X_hmp.shape[1],)))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(500,activation='relu',kernel_regularizer=None))
model.add(layers.Dropout(0.1))
model.add(layers.Dense(500,activation='relu',kernel_regularizer=None))
model.add(layers.Dropout(0.1))
#model.add(layers.Dense(100,activation='relu',kernel_regularizer=regularizers.l2(0.001)))
#model.add(layers.Dropout(0.2))
model.add(layers.Dense(2,activation='sigmoid'))
model.summary()


model.compile(optimizer='rmsprop',loss=['mae'],metrics=['acc'])
history=model.fit(x=X_train_hmp,y=y_train_hmp,batch_size=50,epochs=40,validation_split=0.2,shuffle=True,verbose=False)

In [None]:
y_pred = model.predict(X_test_hmp)
Get_score(y_pred, y_test_hmp)

In [None]:
from sklearn.tree import DecisionTreeRegressor
regressor2_hmp = DecisionTreeRegressor()
regressor2_hmp.fit(X_train_hmp, y_train_hmp)

In [None]:
y_pred_hmp_DT = regressor2_hmp.predict(X_test_hmp)
Get_score(y_pred_hmp_DT, y_test_hmp)

PART2: ***Evaluation of data with model Merge C3D and HMP***

In [21]:
final_feature_hmp_C3D=final_feature.merge(HMP_Features,on=["video"],how="inner")
final_feature_hmp_C3D.head(3)

Unnamed: 0,video,short-term_memorability,long-term_memorability,C3D,HMP
0,video3.webm,0.924,0.846,"[0.02024942, 0.0015778, 0.00082625, 0.00094509...","[0.125563, 0.024036, 0.000314, 0.0, 0.015864, ..."
1,video4.webm,0.923,0.667,"[0.000118, 0.00089075, 0.00018769, 4.543e-05, ...","[0.007526, 0.001421, 6.8e-05, 0.0, 0.001184, 0..."
2,video6.webm,0.863,0.7,"[0.01176522, 0.00074577, 0.00078353, 1.328e-05...","[0.109584, 0.018978, 0.000289, 0.0, 0.008774, ..."


In [22]:
X_hmp_C3D1=np.array(final_feature_hmp_C3D['HMP'].tolist())

In [23]:
X_hmp_C3D2=np.array(final_feature_hmp_C3D['C3D'].tolist())

In [24]:
X_hmp_C3D_final=np.concatenate((X_hmp_C3D2,X_hmp_C3D1),axis=1)

In [25]:
X_hmp_C3D_final

array([[2.0249420e-02, 1.5778000e-03, 8.2625000e-04, ..., 8.6000000e-05,
        5.8000000e-04, 0.0000000e+00],
       [1.1800000e-04, 8.9075000e-04, 1.8769000e-04, ..., 2.2000000e-04,
        7.6200000e-04, 1.2240000e-03],
       [1.1765220e-02, 7.4577000e-04, 7.8353000e-04, ..., 5.2000000e-05,
        2.5800000e-04, 2.1500000e-04],
       ...,
       [2.5890000e-05, 1.2192000e-04, 2.7810000e-05, ..., 7.5600000e-04,
        7.3800000e-04, 2.1400000e-04],
       [2.6509121e-01, 9.6539180e-02, 5.9710000e-05, ..., 6.4000000e-05,
        6.4000000e-05, 1.1000000e-05],
       [2.0589490e-02, 1.2214100e-03, 2.0660700e-03, ..., 2.8900000e-04,
        9.8800000e-04, 1.6100000e-04]])

In [26]:
y_hmp_C3D = final_feature_hmp_C3D[['short-term_memorability','long-term_memorability']].values #these are our target columns


In [27]:
X_train_hmp_C3D,X_test_hmp_C3D,y_train_hmp_C3D,y_test_hmp_C3D = train_test_split(X_hmp_C3D_final,y_hmp_C3D,test_size=0.2,random_state=42)

In [28]:
X_train_hmp_C3D.shape ,X_test_hmp_C3D.shape, y_train_hmp_C3D.shape, y_test_hmp_C3D.shape

((4800, 6176), (1200, 6176), (4800, 2), (1200, 2))

a) ***Training C3D+HMP with DecisionTree***

In [29]:

from sklearn.tree import DecisionTreeRegressor
regressor2_hmp_c3d = DecisionTreeRegressor()
regressor2_hmp_c3d.fit(X_train_hmp_C3D, y_train_hmp_C3D)


DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=None, splitter='best')

In [None]:
y_pred_hmp_c3d_DT = regressor2_hmp_c3d.predict(X_test_hmp_C3D)
Get_score(y_pred_hmp_c3d_DT, y_test_hmp_C3D)

b) ***Training C3D+HMP with RandomForest***

In [None]:
from sklearn.ensemble import RandomForestRegressor
hmp_c3d_rf = RandomForestRegressor(n_estimators=100,random_state=45)

In [None]:

hmp_c3d_rf.fit(X_train_hmp_C3D,y_train_hmp_C3D)

In [None]:
y_pred_hmp_c3d_RF = hmp_c3d_rf.predict(X_test_hmp_C3D)
Get_score(y_pred_hmp_c3d_RF, y_test_hmp_C3D)

c) ***Model training of C3D+HMP with SVR***

In [None]:
svr_X = X_hmp_C3D_final
svr_y_short = final_feature[['short-term_memorability']].values
svr_y_long = final_feature[['long-term_memorability']].values

In [None]:
#Splitting the dataset into the Training set and Test set
short_X_train,short_X_test,short_y_train,short_y_test = train_test_split(svr_X,svr_y_short,test_size=0.2,random_state=40)
long_X_train,long_X_test,long_y_train,long_y_test = train_test_split(svr_X,svr_y_long,test_size=0.2,random_state=40)

In [None]:
short_X = StandardScaler()
short_y = StandardScaler()
short_X_train = short_X.fit_transform(short_X_train)
short_y_train = short_y.fit_transform(short_y_train)
long_X = StandardScaler()
long_y = StandardScaler()
long_X_train = long_X.fit_transform(long_X_train)
long_y_train = long_y.fit_transform(long_y_train)

In [None]:
from sklearn.svm import SVR
short_regressor = SVR(kernel = 'rbf')
long_regressor = SVR(kernel = 'rbf')
short_regressor.fit(short_X_train, short_y_train)
long_regressor.fit(long_X_train,long_y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [None]:
short_pred = short_regressor.predict(short_X_test)
short_pred = short_y.inverse_transform(short_pred)
long_pred = long_regressor.predict(long_X_test)
long_pred = long_y.inverse_transform(long_pred)

In [None]:
Get_score(short_pred, short_y_test)
Get_score(long_pred, long_y_test)

The Spearman's correlation coefficient is: 0.206
The Spearman's correlation coefficient is: 0.117


d) ***Model training with Linear Regression***

In [33]:
linearRegressor = LinearRegression()
linearRegressor.fit(X_train_hmp_C3D, y_train_hmp_C3D)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [34]:

y_pred_hmp_c3d = linearRegressor.predict(X_test_hmp_C3D)
Get_score(y_pred_hmp_c3d, y_test_hmp_C3D)

The Spearman's correlation coefficient is: -0.011
The Spearman's correlation coefficient is: 0.012


***PART3: Evaluation of data with model -InceptionV3*** **bold text**

In [None]:
Y=df_inception[['short-term_memorability','long-term_memorability']].values  #targets
X=X_arrHMP #input

In [None]:
X_train_inception, X_test_inception, Y_train_inception, Y_test_inception = train_test_split(X,Y, test_size=0.2, random_state=42) # random state for reproducability

In [None]:
from sklearn.ensemble import RandomForestRegressor
inception_rf = RandomForestRegressor(n_estimators=500,random_state=45)

In [None]:
inception_rf.fit(X_train_inception,Y_train_inception)

In [None]:
inception_pred=inception_rf.predict(X_test_inception)
Get_score(inception_pred,Y_test_inception)

#Predicting Results

We have to train our selected model on 6000 Dev-set and use this to make predictions on Test-Set

We will import our submission csv file and predict the short term and long term memorability score

In [None]:
X_C3D_HMP=X_hmp_C3D_final
Y_C3D_HMP=final_feature_hmp_C3D[['short-term_memorability','long-term_memorability']].values 

In [None]:
from sklearn.ensemble import RandomForestRegressor
hmp_c3d_rf_train = RandomForestRegressor(n_estimators=100,random_state=45)

In [None]:
hmp_c3d_rf.fit(X_C3D_HMP,Y_C3D_HMP)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=45, verbose=0, warm_start=False)

In [None]:
#creating a new dataframe
hmp_test = pd.DataFrame(columns = ['video', 'arrayInfo'])

#path
dirHMP_Dev = './Test-set/HMP_test/'

for eachFile in os.listdir(dirHMP_Dev):
    if eachFile.endswith(".txt"):
        path = os.path.join(dirHMP_Dev, eachFile)
        arrayFile = read_HMP(path)
        eachFile= eachFile.replace(" ",".webm")
        hmp_test = hmp_test.append({'video':eachFile, 'arrayInfo': arrayFile}, ignore_index=True)
    else:
        break

In [None]:
hmp_test

Unnamed: 0,video,arrayInfo
0,8763,"[0.00206, 0.000583, 1.1e-05, 0.0, 0.000395, 3...."
1,8758,"[0.015623, 0.005571, 0.000267, 0.0, 0.003743, ..."
2,8768,"[0.00533, 0.001166, 2e-06, 0.0, 0.000927, 1.1e..."
3,8764,"[0.040748, 0.016237, 0.000303, 0.0, 0.007668, ..."
4,8760,"[0.056045, 0.012873, 0.000278, 0.0, 0.006645, ..."
...,...,...
1995,8750,"[0.029441, 0.00459, 6.1e-05, 0.0, 0.004635, 4...."
1996,8754,"[0.008008, 0.002233, 6.6e-05, 0.0, 0.000694, 8..."
1997,8756,"[0.045778, 0.008935, 0.000138, 0.0, 0.006778, ..."
1998,8755,"[0.009722, 0.001544, 9e-05, 0.0, 0.000899, 7e-..."


In [None]:
hmp_test['video']=hmp_test['video'].str.replace('.txt','')

In [None]:
hmp_test['video']=hmp_test['video'].str.replace('video','')

In [None]:

c3d_test=pd.read_pickle(r'/content/drive/My Drive/c3d')

In [None]:

c3d_test['video']=c3d_test['video'].str.replace('video','')

In [None]:
c3d_test

Unnamed: 0,video,c3d
0,8768,"[0.00352624, 0.00137636, 0.04618705, 7.76e-06,..."
1,8759,"[0.00777159, 0.00230215, 0.00367146, 4.773e-05..."
2,8762,"[0.00157295, 0.00139808, 0.07172299, 2.488e-05..."
3,8765,"[0.00035768, 0.00397286, 0.0088033, 0.00774053..."
4,8758,"[1.11e-06, 1.151e-05, 2.465e-05, 1.226e-05, 5...."
...,...,...
1995,8752,"[4.11e-06, 1.25e-05, 0.00177378, 2.24e-06, 3.9..."
1996,8749,"[0.00024979, 0.00022802, 0.00010895, 0.0042621..."
1997,8755,"[0.0002111, 0.00581559, 0.0002334, 0.00377348,..."
1998,8756,"[0.00214498, 0.03588087, 0.00023955, 0.0, 1.72..."


In [None]:
c3d_test=c3d_test.rename(columns = {'Name': 'video'}, inplace = False)

In [None]:
final_feature_hmp_C3D_test=hmp_test.merge(c3d_test,on=["video"],how="inner")

final_feature_hmp_C3D_test = final_feature_hmp_C3D_test.rename(columns = {'arrayInfo': 'HMP'}, inplace = False)

In [None]:
final_feature_hmp_C3D_test

Unnamed: 0,video,HMP,c3d
0,8763,"[0.00206, 0.000583, 1.1e-05, 0.0, 0.000395, 3....","[0.00017474, 0.00179736, 0.00043181, 0.0007523..."
1,8758,"[0.015623, 0.005571, 0.000267, 0.0, 0.003743, ...","[1.11e-06, 1.151e-05, 2.465e-05, 1.226e-05, 5...."
2,8768,"[0.00533, 0.001166, 2e-06, 0.0, 0.000927, 1.1e...","[0.00352624, 0.00137636, 0.04618705, 7.76e-06,..."
3,8764,"[0.040748, 0.016237, 0.000303, 0.0, 0.007668, ...","[0.00133935, 0.0002712, 0.00039588, 8.307e-05,..."
4,8760,"[0.056045, 0.012873, 0.000278, 0.0, 0.006645, ...","[0.0, 0.0, 0.0, 0.0, 0.99999964, 0.0, 0.0, 0.0..."
...,...,...,...
1995,8750,"[0.029441, 0.00459, 6.1e-05, 0.0, 0.004635, 4....","[0.00177784, 0.00017495, 4.677e-05, 0.26706579..."
1996,8754,"[0.008008, 0.002233, 6.6e-05, 0.0, 0.000694, 8...","[0.91409236, 0.00409621, 3.43e-06, 7e-07, 3.9e..."
1997,8756,"[0.045778, 0.008935, 0.000138, 0.0, 0.006778, ...","[0.00214498, 0.03588087, 0.00023955, 0.0, 1.72..."
1998,8755,"[0.009722, 0.001544, 9e-05, 0.0, 0.000899, 7e-...","[0.0002111, 0.00581559, 0.0002334, 0.00377348,..."


In [None]:
test_set_path = '/content/drive/My Drive/CA684_Assignment/Test-set/Ground-truth_test/'
final_sub=pd.read_csv(test_set_path+'ground_truth_template.csv')


In [None]:
final_sub

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,7494,,33,,12
1,7495,,34,,10
2,7496,,32,,13
3,7497,,33,,10
4,7498,,33,,10
...,...,...,...,...,...
1995,10004,,34,,17
1996,10005,,34,,9
1997,10006,,34,,12
1998,10007,,34,,12


In [None]:
final_feature_hmp_C3D_test

Unnamed: 0,video,HMP,c3d
0,8763,"[0.00206, 0.000583, 1.1e-05, 0.0, 0.000395, 3....","[0.00017474, 0.00179736, 0.00043181, 0.0007523..."
1,8758,"[0.015623, 0.005571, 0.000267, 0.0, 0.003743, ...","[1.11e-06, 1.151e-05, 2.465e-05, 1.226e-05, 5...."
2,8768,"[0.00533, 0.001166, 2e-06, 0.0, 0.000927, 1.1e...","[0.00352624, 0.00137636, 0.04618705, 7.76e-06,..."
3,8764,"[0.040748, 0.016237, 0.000303, 0.0, 0.007668, ...","[0.00133935, 0.0002712, 0.00039588, 8.307e-05,..."
4,8760,"[0.056045, 0.012873, 0.000278, 0.0, 0.006645, ...","[0.0, 0.0, 0.0, 0.0, 0.99999964, 0.0, 0.0, 0.0..."
...,...,...,...
1995,8750,"[0.029441, 0.00459, 6.1e-05, 0.0, 0.004635, 4....","[0.00177784, 0.00017495, 4.677e-05, 0.26706579..."
1996,8754,"[0.008008, 0.002233, 6.6e-05, 0.0, 0.000694, 8...","[0.91409236, 0.00409621, 3.43e-06, 7e-07, 3.9e..."
1997,8756,"[0.045778, 0.008935, 0.000138, 0.0, 0.006778, ...","[0.00214498, 0.03588087, 0.00023955, 0.0, 1.72..."
1998,8755,"[0.009722, 0.001544, 9e-05, 0.0, 0.000899, 7e-...","[0.0002111, 0.00581559, 0.0002334, 0.00377348,..."


In [None]:

X_hmp_C3D1_test=np.array(final_feature_hmp_C3D_test['HMP'].tolist())

In [None]:
X_hmp_C3D2_test=np.array(final_feature_hmp_C3D_test['c3d'].tolist())

In [None]:
X_hmp_C3D_fina_test=np.concatenate((X_hmp_C3D2_test,X_hmp_C3D1_test),axis=1)

In [None]:
final_sub['short-term_memorability']=hmp_c3d_rf.predict(X_hmp_C3D_fina_test)


In [None]:
final_sub['long-term_memorability']=hmp_c3d_rf.predict(X_hmp_C3D_fina_test)

In [None]:
final_sub

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,7494,0.826842,33,0.826842,12
1,7495,0.823570,34,0.823570,10
2,7496,0.834750,32,0.834750,13
3,7497,0.864410,33,0.864410,10
4,7498,0.889707,33,0.889707,10
...,...,...,...,...,...
1995,10004,0.877600,34,0.877600,17
1996,10005,0.893042,34,0.893042,9
1997,10006,0.885890,34,0.885890,12
1998,10007,0.874490,34,0.874490,12


In [None]:
final_sub.to_csv(r'/content/drive/My Drive/SubmissionML.csv',index=False,header=True)