The code in this notebook combines all the extracted feature sets to allow for experimentation.

# Load Libraries

Import libraries used in this notebook.

In [1]:
import pickle

import pandas as pd

# Load Data Sets And Preprocess

The following block of code loads up the two data sets and starts steps towards preprocessing them for our experiments.

In [2]:
SWC = pickle.load( open( "../Data/DataSets/SWC/SWC.p", "rb" ) )
SQS = pickle.load( open( "../Data/DataSets/SQS/SQS.p", "rb" ) )

SWC = SWC[['sID', 'query', 'type', 'class']]
SQS = SQS[['sID', 'query', 'class']]

In [3]:
SQS

Unnamed: 0,sID,query,class
0,39899,collagen vascular disease lifestyle,0
1,39900,france world cup 1998 reactions,0
2,39901,dooney bourke look alike purses,0
3,39902,VOIP phones,0
4,39903,Travel to the poconos,0
...,...,...,...
296,41399,Who plays the bad guy in Star Wars the Horde a...,1
297,41400,What is a fox's favorite kind of food?,1
298,41401,"Show me the movie called ""The Martian""",1
299,41402,What is the biggest rock found on Mars?,1


# Load Extracted Features 

In the following block of code we load all feature sets before merging all the text based features into one dataframe before joining all feature sets together.

In [4]:
searchFeatSWC = pickle.load( open( "Pickles/SearchFeatSWC.p", "rb" ) )
searchFeatSQS = pickle.load( open( "Pickles/SearchFeatSQS.p", "rb" ) )

vocabFeat = pickle.load( open( "Pickles/VocabFeat.p", "rb" ) )
lexFeat = pickle.load( open( "Pickles/LexFeat.p", "rb" ) )
synFeat = pickle.load( open( "Pickles/SynFeat.p", "rb" ) )
sPFeat = pickle.load( open( "Pickles/SPFeat.p", "rb" ) )

textBasedFeat = sPFeat.merge(vocabFeat)
textBasedFeat = textBasedFeat.merge(lexFeat)
textBasedFeat = textBasedFeat.merge(synFeat)

SWCAll = SWC.merge(textBasedFeat, how='inner', on='query')
SWCAll = SWCAll[SWCAll['type'] == 'Q'].groupby('sID').mean()
SWCAll = SWCAll.join(searchFeatSWC)

SQSAll = SQS.merge(textBasedFeat, how='inner', on='query')
SQSAll = SQSAll.set_index('sID')
searchFeatSQS = searchFeatSQS.drop(columns = ['query','class'])
SQSAll = pd.merge(SQSAll, searchFeatSQS, left_index=True, right_index=True)
SQSAll = SQSAll.drop(columns = ['query'])

# Return Aggregated Extracted Features

The following block of code returns the extracted features aggregated with their respective data sets.

In [5]:
pickle.dump(SWCAll, open( "DataSets/SWCFeatures/SWCFeat.p", "wb" ) )
pickle.dump(SQSAll, open( "DataSets/SQSFeatures/SQSFeat.p", "wb" ) )