The code in this notebook combines all the extracted feature sets to allow for experimentation.

# Load Libraries

In [1]:
import pickle
import pandas as pd

# Load Data Sets And Preprocess

The following block of code loads up the two data sets and starts steps towards preprocessing them for our experiments.

In [2]:
SWC = pickle.load( open( "../Data/DataSets/SWC/SWC.p", "rb" ) )
SQS = pickle.load( open( "../Data/DataSets/SQS/SQS.p", "rb" ) )

SWC = SWC[['sID', 'query', 'type', 'class']]
SQS = SQS[['sID', 'query', 'class']]

# Load Extracted Features 

In the following block of code we load all feature sets before merging all the text based features into one dataframe before joining all feature sets together.

In [3]:
searchFeatSWC = pickle.load( open( "Pickles/SearchFeatSWC.p", "rb" ) )
searchFeatSQS = pickle.load( open( "Pickles/SearchFeatSQS.p", "rb" ) )

vocabFeat = pickle.load( open( "Pickles/VocabFeat.p", "rb" ) )
lexFeat = pickle.load( open( "Pickles/LexFeat.p", "rb" ) )
synFeat = pickle.load( open( "Pickles/SynFeat.p", "rb" ) )
sPFeat = pickle.load( open( "Pickles/SPFeat.p", "rb" ) )

textBasedFeat = sPFeat.merge(vocabFeat)
textBasedFeat = textBasedFeat.merge(lexFeat)
textBasedFeat = textBasedFeat.merge(synFeat)

SWCAll = SWC.merge(textBasedFeat, how='inner', on='query')
SWCAll = SWCAll[SWCAll['type'] == 'Q'].groupby('sID').mean()
SWCAll = SWCAll.join(searchFeatSWC)

SQSAll = SQS.merge(textBasedFeat, how='inner', on='query')
SQSAll = SQSAll.set_index('sID')
SQSAll = SQSAll.merge(searchFeatSQS)
SQSAll = SQSAll.drop(columns = ['query'])

# Return Aggregated Extracted Features

The following block of code returns the extracted features aggregated with their respective data sets.

In [4]:
pickle.dump(SWCAll, open( "DataSets/SWCFeatures/SWCFeat.p", "wb" ) )
pickle.dump(SQSAll, open( "DataSets/SQSFeatures/SQSFeat.p", "wb" ) )