## Description
This code is only for the Amazon application. The Wikipedia application requires a 1GB file to run and was thus not included.

the __"amazonFunctions.py"__ file contains all the important function definitions (i.e. algorithms and everything). The __"amazonMain.py"__ is the script to look at (it runs the codes of this notebook). Here you can edit the hyperparameter settings and get the graphs. 

In [1]:
from amazonFunctions import * #import all the functions 
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Using TensorFlow backend.
  return f(*args, **kwds)


# Parameter Setting
We then load one of the two existing datasets.
The SIEVE-STREAMING++ algorithm is implemented by the sieveStreamingPlusPlus(stream, k, eps, function) where:
  <li>__"path"__ is the path to the file we read from. Here we load the "videoGames.csv" file. </li>
  <li>__"m"__ is the minimum number of reviews for a product to be considered.</li>
  <li>__"percTrain"__ is the fraction of data to be used for training. Default is set to percTrain = 0.01, which gives the graphs in 2e and 2f. This takes about 1 minute per trial. If we set percTrain = 0.8 it will give the graphs in 2b and 2c, but this takes about 20-30 minutes per trial. </li>
  <li>__"given"__ is the array for the different number of given products to be given to each algorithm at start.</li>
  <li>__"k"__ is the number of products to be predicted.</li>
  <li>__"thresh"__ : In Adaptive Sequence Greedy, only consider edges with this value or higher (just makes code faster) and it could be set to 0.</li>
  <li>__"inputSteps"__ is the number of input steps to LSTM . It is equal to the number of given items.</li>
  <li>__"numTrials"__ is the number of trials to run for experiments (each trial is a new split of training and testing).</li>
  <li>__"numNodes"__ is the number of nodes in feed-forward network hidden layer.</li>
  <li>__"numLSTM"__ is the number of LSTM nodes in LSTM hidden layer.</li>

In [2]:
path = 'videoGames.csv' 
m = 50 #
percTrain = 0.01 
given = 4 
k = 6 
thresh = 0.05 
inputSteps = given 
numTrials = 5 
numNodes = 256 #
numLSTM = 8 #

Next we import the data and parse the sequence for each user.

In [3]:
arr = importData(path) #Import the data
users = userDic(arr) #Parse out the sequence for each user.

Importing data...
Building userDic...


__productMap__ is a dictionary that maps each remaining product (after removing all products with fewer than m reviews) to a unique integer ID; idMap is an array that we use to map the integer ID back to the product ID.

In [4]:
productMap,idMap = removeProducts(users,m) 

Few stats from the dataset

In [5]:
print ("Number of products:",len(productMap))
print ("Number of users:",len(users))

Number of products: 958
Number of users: 24303


# Run Algorithms

Arrays to hold results of various algorithms.
The results of all the algorithms are stored in:
    <li>__ASG__: Adaptive Sequence-Greedy</li> 
    <li>__SG__: Sequence-Greedy</li> 
    <li>__nn__: Non-Adaptive Feed Forward NNAdaptive Feed Forward NN </li> 
    <li>__nnAdaptive__: Adaptive Feed Forward NN </li> 
    <li>__lstm__: Non-Adaptive Feed Forward NN</li> 
    <li>__lstmAdaptive__: Adaptive LSTM </li> 
    <li>__freq__: Frequency</li> 

In [6]:
ASG = np.zeros((2,k))
SG = np.zeros((2,k))
nn = np.zeros((2,k)) 
nnAdaptive = np.zeros((2,k))
lstm = np.zeros((2,k))
lstmAdaptive = np.zeros((2,k))
freq = np.zeros((2,k))

We run all the algorithms __numTrials__ times and store their results.

In [None]:
for i in range(0,numTrials):
    print ('------\n','Trial Number',i+1)
    trainDic,testDic = trainTestSplit(users,percTrain,productMap)
    #Format training/testing data for the feed forward NN.
    trainX,trainY = formatNeuralNetworkData(users,productMap,trainDic)
    #Format training/testing data for the LSTM.
    trainXlstm,trainYlstm = formatLSTMData(users,productMap,trainDic,inputSteps) 

    start = time.time()
    #Train the graph for Adaptive Sequence Greedy (and sequence-greedy).
    p1,p2 = productDic(users,trainDic,thresh) 
    print('Train graph:',time.time()-start,'seconds')

    start = time.time()
    model = trainNetwork(trainX,trainY,numNodes)
    print ('Train feed-forward network:',time.time()-start,'seconds')

    start = time.time()
    nn += testNetwork(users,productMap,idMap,testDic,model,given,k)/numTrials
    print ('Test feed-forward network:',time.time()-start,'seconds')

    start = time.time()
    nnAdaptive += testNetworkAdaptive(users,productMap,idMap,testDic,model,given,k)/numTrials
    print ('Test adaptive feed-forward network:',time.time()-start,'seconds')

    start = time.time()
    model = trainLSTM(trainXlstm,trainYlstm,numLSTM)
    print ('Train LSTM:',time.time()-start,'seconds')

    start = time.time()
    lstm += testLSTM(users,productMap,idMap,testDic,model,given,k,inputSteps)/numTrials
    print ('Test LSTM:',time.time()-start,'seconds')

    start = time.time()
    lstmAdaptive += testLSTMAdaptive(users,productMap,idMap,testDic,model,given,k,inputSteps)/numTrials
    print ('Test adaptive LSTM:',time.time()-start,'seconds')

    start = time.time()
    ASG += adaptive(p1,p2,users,testDic,given,k)/numTrials
    print ('Test Adaptive Sequence Greedy:',time.time()-start,'seconds')

    start = time.time()
    SG += nonadaptive(p1,p2,users,testDic,given,k)/numTrials
    print ('Test Sequence Greedy:',time.time()-start,'seconds')

    start = time.time()
    freq += nonsequence(p1,p2,users,testDic,given,k)/numTrials
    print ('Test Frequency:',time.time()-start,'seconds')


------
 Trial Number 1
Train graph: 0.010685920715332031 seconds
Train feed-forward network: 1.1515398025512695 seconds
Test feed-forward network: 2.1560418605804443 seconds
Test adaptive feed-forward network: 4.588396072387695 seconds
Train LSTM: 2.26625394821167 seconds
Test LSTM: 3.0385122299194336 seconds
Test adaptive LSTM: 4.587187051773071 seconds
Test Adaptive Sequence Greedy: 2.551939010620117 seconds
Test Sequence Greedy: 2.2159528732299805 seconds
Test Frequency: 1.4075238704681396 seconds
------
 Trial Number 2
Train graph: 0.006273746490478516 seconds
Train feed-forward network: 1.2156388759613037 seconds
Test feed-forward network: 2.072031021118164 seconds


# Plot

## Accuracy vs. Number of Recommendations

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)
xList = range(1,k+1)
plt.xlabel('Number of Recommendations',size = '25')
plt.ylabel('Accuracy Score',size='25')
plt.xticks(xList,size='22')
plt.yticks(size='22')
green_line = mlines.Line2D([], [], color='g', marker='s', linestyle='-',markersize=5, 
                           label='Adaptive Sequence-Greedy')
blue_line = mlines.Line2D([], [], color='b', marker='s', linestyle=':',markersize=5, 
                          label='Sequence-Greedy')
red_line = mlines.Line2D([], [], color='r', marker='>', linestyle=':',markersize=5, 
                         label='Frequency')
teal_line = mlines.Line2D([], [], color='c', marker='o', linestyle='-',markersize=5, 
                          label='Adaptive Feed Forward NN')
teal_dotted_line = mlines.Line2D([], [], color='c', marker='o', linestyle=':',markersize=5, 
                                 label='Non-Adaptive Feed Forward NN')
purple_line = mlines.Line2D([], [], color='m', marker='P', linestyle='-',markersize=5, 
                            label='Adaptive LSTM')
purple_dotted_line = mlines.Line2D([], [], color='m', marker='P', linestyle=':',markersize=5, 
                                   label='Non-Adaptive LSTM')
plt.legend(fontsize = 'large',handles=[green_line,blue_line,red_line,teal_line,teal_dotted_line,purple_line,
                                       purple_dotted_line], 
           bbox_to_anchor=(1.05, 1), ncol=1)
plt.plot(xList, nnAdaptive[0], '-co',linewidth=2.0,markersize=5.0)
plt.plot(xList, nn[0], ':co',linewidth=2.0,markersize=5.0)
plt.plot(xList, lstmAdaptive[0], '-mP',linewidth=2.0,markersize=5.0)
plt.plot(xList, lstm[0], ':mP',linewidth=2.0,markersize=5.0)
plt.plot(xList, ASG[0], '-gs',linewidth=2.0,markersize=5.0)
plt.plot(xList, SG[0], ':bs',linewidth=2.0,markersize=5.0)
plt.plot(xList, freq[0], ':r>',linewidth=2.0,markersize=5.0)
plt.show()

## Sequence Score vs. Number of Recommendations

In [None]:
plt.xlabel('Number of Recommendations',size = '25')
plt.ylabel('Sequence Score',size='25')
plt.xticks(xList,size='22')
plt.yticks(size='22')
green_line = mlines.Line2D([], [], color='g', marker='s', linestyle='-',markersize=5, 
                           label='Adaptive Sequence-Greedy')
blue_line = mlines.Line2D([], [], color='b', marker='s', linestyle=':',markersize=5, 
                          label='Sequence-Greedy')
red_line = mlines.Line2D([], [], color='r', marker='>', linestyle=':',markersize=5, 
                         label='Frequency')
teal_line = mlines.Line2D([], [], color='c', marker='o', linestyle='-',markersize=5, 
                          label='Adaptive Feed Forward NN')
teal_dotted_line = mlines.Line2D([], [], color='c', marker='o', linestyle=':',markersize=5, 
                                 label='Non-Adaptive Feed Forward NN')
purple_line = mlines.Line2D([], [], color='m', marker='P', linestyle='-',markersize=5, 
                            label='Adaptive LSTM')
purple_dotted_line = mlines.Line2D([], [], color='m', marker='P', linestyle=':',markersize=5, 
                                   label='Non-Adaptive LSTM')
plt.legend(fontsize = 'large',handles=[green_line,blue_line,red_line,teal_line,teal_dotted_line,purple_line,
                                       purple_dotted_line], bbox_to_anchor=(1.05, 1), ncol=1)
plt.plot(xList, nnAdaptive[1], '-co',linewidth=2.0,markersize=5.0)
plt.plot(xList, nn[1], ':co',linewidth=2.0,markersize=5.0)
plt.plot(xList, lstmAdaptive[1], '-mP',linewidth=2.0,markersize=5.0)
plt.plot(xList, lstm[1], ':mP',linewidth=2.0,markersize=5.0)
plt.plot(xList, ASG[1], '-gs',linewidth=2.0,markersize=5.0)
plt.plot(xList, SG[1], ':bs',linewidth=2.0,markersize=5.0)
plt.plot(xList, freq[1], ':r>',linewidth=2.0,markersize=5.0)
plt.show()