# Fine Foods - Recommendation Engine
We will be using food reviews from Amazon to build a recommendation engine using Factorization Machine in SAS VIYA

Factorization Machine (FM) is one of the newest algorithms in the Machine Learning space, and has been developed in SAS. FM is a general prediction algorithm, similar to Support Vector Machines, that can model very sparce data, an area where traditional Machine Learning techniques fail. 

this notebook has four parts:
1. Load in Data
2. SVD to represent text numerically
3. Train Recommendation Engine
4. Make Recommendation

## Load in Data


In [1]:
from swat import *
#swat.options.cas.print_messages = False

# Connect to the session
cashost='racesx12069.demo.sas.com'
casport=5570
casauth='U:\.authinfo_w12_race'

s = CAS(cashost, casport, authinfo=casauth, caslib="casuser")

#Load Data
f='foods_prepped'
s.loadTable(caslib='DemoData', path=f+'.csv', casout=f);

#Load actionsets
actionsets=['fedSQL', 'autoTune', 'factMac', 'textMining']
[s.builtins.loadactionset(i) for i in actionsets];


#Create shortcuts
food = s.CASTable(f)
target = 'score'
class_inputs = ['helpfulness','productid','time','userid']

NOTE: Cloud Analytic Services made the file foods_prepped.csv available as table FOODS_PREPPED in caslib CASUSER(sasdemo).
NOTE: Added action set 'fedSQL'.
NOTE: Added action set 'autoTune'.
NOTE: Added action set 'factMac'.
NOTE: Added action set 'textMining'.


In [2]:
s.dataStep.runCode('''data '''+f+'''; 
                      set '''+f+''';
                      key=_n_; run;''')

print(len(food), "Reviews")
food.head()

568454 Reviews


Unnamed: 0,helpfulness,productId,score,summary,text,time,userId,key
0,2/2,B000HEA964,5.0,Dog's Favorite Snack,These chicken chips are devored daily by my 2 ...,1212883000.0,A2E61OQYIVB55P,67425.0
1,2/2,B000HEA964,5.0,"Better Than ""Cookies""",These crunchy treats are irresistable to my Co...,1208304000.0,A2UCGE4EQZ0P4A,67426.0
2,2/2,B000HEA964,4.0,Good for small dogs.,"I have two American Eskimo dogs, and so these ...",1204157000.0,A304WL23L6EDML,67427.0
3,2/2,B000HEA964,5.0,great,My little dog loved these. Were first sent to ...,1176163000.0,A287Z78FJTTT27,67428.0
4,1/1,B000HEA964,5.0,"Cost more than steak, but my dogs love them!",My two Havanese really love these! They are v...,1285114000.0,A18UVHCREY2RE2,67429.0


## Text Mining

In [3]:
s.loadTable(caslib='DemoData', path='engstop'+'.sas7bdat', casout='engstop');

NOTE: Cloud Analytic Services made the file engstop.sas7bdat available as table ENGSTOP in caslib CASUSER(sasdemo).


In [4]:
def c_dict(name):
    training_options = dict(name      = name,
                            replace   = True)                           
    return training_options

s.textMining.tmMine(
  documents=f,
  stopList="engstop",
  docId="key",
  copyVars=class_inputs + [target],
  text='text',
  reduce=10,
  entities="STD",
  k=3,
  norm="DOC",
  u=c_dict("svdu"),
  terms=c_dict("terms"),
  parent=c_dict("parent"),
  child=c_dict("child"),
  parseConfig=c_dict("config"),
  docPro=c_dict("docpro"),
  topics=c_dict("topics"),
)

Unnamed: 0,casLib,Name,Label,Rows,Columns,casTable
0,CASUSER(sasdemo),config,,1,11,"CASTable('config', caslib='CASUSER(sasdemo)')"
1,CASUSER(sasdemo),terms,,141740,11,"CASTable('terms', caslib='CASUSER(sasdemo)')"
2,CASUSER(sasdemo),parent,,15670231,3,"CASTable('parent', caslib='CASUSER(sasdemo)')"
3,CASUSER(sasdemo),child,,16503075,3,"CASTable('child', caslib='CASUSER(sasdemo)')"
4,CASUSER(sasdemo),svdu,,72708,4,"CASTable('svdu', caslib='CASUSER(sasdemo)')"
5,CASUSER(sasdemo),docpro,,568454,9,"CASTable('docpro', caslib='CASUSER(sasdemo)')"
6,CASUSER(sasdemo),topics,,3,3,"CASTable('topics', caslib='CASUSER(sasdemo)')"


In [12]:
#s.table.droptable(name=f)

NOTE: Cloud Analytic Services dropped table foods_prepped from caslib CASUSER(sasdemo).


In [None]:


#s.loadactionset('textmining')
s.textMining.tmMine(
  documents=f,
  stopList='engstop',
  docId="key",
  copyVars=class_inputs + [target],
  text='text',
  reduce=10,
  entities="STD",
  k=3,
  norm="DOC",
  docPro="docpro"
)

## See structured representation of first 5 documents

In [5]:
s.CASTable("docpro").fetch(to=5)

Unnamed: 0,key,_Col1_,_Col2_,_Col3_,helpfulness,productId,time,userId,score
0,8408.0,0.793594,0.302304,0.528035,0/0,B00146K7MU,1288829000.0,AYYACIDP5I4V6,5.0
1,8409.0,0.774742,0.256919,0.577726,4/4,B001ESKSPY,1294618000.0,A3SQJCRXHOQ8GF,5.0
2,8410.0,0.835524,0.256906,0.485694,2/2,B001ESKSPY,1308269000.0,A1XUX4HFY8F7YW,5.0
3,8411.0,0.836214,0.289241,0.465924,6/6,B004749DY4,1327018000.0,A216NSW58Q3SCJ,4.0
4,8412.0,0.795012,0.366726,0.483184,6/7,B004749DY4,1324426000.0,ACJT8MUC0LRF0,4.0


In [None]:
s.table.save(caslib='DemoData', name='Foods_prep_text.sashdat', table="docpro")

In [14]:
s.loadactionset('autotune')

training_options = dict(
                    table     = dict(name = 'docpro'),
                    inputs    = class_inputs, # + ['_Col1_','_Col2_','_Col3_'],
                    nominals  = class_inputs,
                    target    = target,
                    seed      = 123,
                    savestate = dict(name = 'fm_model_short', replace = True))

s.invoke('autotune.tuneFactMac', trainOptions=training_options,
                                           tunerOptions=dict(maxTime=300, validationPartitionFraction=0.1),
                                           tuningParameters=[dict(namePath='nfactors', initValue=2)])

for response in s:
    for k, v in response:
        print(k, v)

NOTE: Added action set 'autotune'.
NOTE: Autotune is started for 'Factorization Machine' model.
NOTE: Autotune option SEARCHMETHOD='GA'.
NOTE: Autotune option MAXEVALS=50.
NOTE: Autotune option MAXTIME=300 (sec.).
NOTE: Autotune objective is 'Root Average Squared Error'.
NOTE: Autotune number of parallel evaluations is set to 4, each using 0 worker nodes.
NOTE: The INITVALUE '2' of tuning parameter NFACTORS is not included in the VALUELIST for this parameter.
NOTE: The initial point will not be used to seed the tuning process.
NOTE: The seed for random partition/fold generation is 123.
         Iteration       Evals     Best Objective        Time
                 0           1             1.1292       20.63
                 1           9             1.1292      300.00
NOTE: Autotune process reached maximum tuning time.
NOTE: Using SEED=123.
NOTE: Beginning data reading and levelization...
NOTE: Data reading and levelization complete.
NOTE: Beginning optimization of the factorization ma

In [39]:
s.loadactionset('autotune')

training_options = dict(
                    table     = dict(name = f),
                    inputs    = class_inputs,
                    nominals  = class_inputs,
                    target    = target,
                    seed      = 123,
                    savestate = dict(name = 'fm_model_short', replace = True))

s.invoke('autotune.tuneFactMac', trainOptions=training_options,
                                           tunerOptions=dict(maxTime=300, validationPartitionFraction=0.1),
                                           tuningParameters=[dict(namePath='nfactors', initValue=2)])

for response in s:
    for k, v in response:
        print(k, v)

NOTE: Added action set 'autotune'.
NOTE: Autotune is started for 'Factorization Machine' model.
NOTE: Autotune option SEARCHMETHOD='GA'.
NOTE: Autotune option MAXEVALS=50.
NOTE: Autotune option MAXTIME=300 (sec.).
NOTE: Autotune objective is 'Root Average Squared Error'.
NOTE: Autotune number of parallel evaluations is set to 4, each using 0 worker nodes.
NOTE: The INITVALUE '2' of tuning parameter NFACTORS is not included in the VALUELIST for this parameter.
NOTE: The initial point will not be used to seed the tuning process.
NOTE: The seed for random partition/fold generation is 123.
         Iteration       Evals     Best Objective        Time
                 0           1             1.1184       25.50
                 1           9             1.1184      300.00
NOTE: Autotune process reached maximum tuning time.
NOTE: Using SEED=123.
NOTE: Beginning data reading and levelization...
NOTE: Data reading and levelization complete.
NOTE: Beginning optimization of the factorization ma

In [41]:
s.loadactionset('factmac')

# Build the factorization machine
r = s.factmac.factmac(
  table     = dict(name = f),
  inputs    = class_inputs,
  nominals  = class_inputs,
  target    = target,
  maxIter   = 50,
  nFactors  = 20,
  learnStep = 0.01,
  seed      = 12345,
  savestate = dict(name = 'fm_model', replace = True)
)

r['FinalLoss']

NOTE: Added action set 'factmac'.
NOTE: Using SEED=12345.
NOTE: Beginning data reading and levelization...
NOTE: Data reading and levelization complete.
NOTE: Beginning optimization of the factorization machine model...
NOTE: >>> Progress: completed iteration 1
NOTE: >>> Progress: completed iteration 2
NOTE: >>> Progress: completed iteration 3
NOTE: >>> Progress: completed iteration 4
NOTE: >>> Progress: completed iteration 5
NOTE: >>> Progress: completed iteration 6
NOTE: >>> Progress: completed iteration 7
NOTE: >>> Progress: completed iteration 8
NOTE: >>> Progress: completed iteration 9
NOTE: >>> Progress: completed iteration 10
NOTE: >>> Progress: completed iteration 11
NOTE: >>> Progress: completed iteration 12
NOTE: >>> Progress: completed iteration 13
NOTE: >>> Progress: completed iteration 14
NOTE: >>> Progress: completed iteration 15
NOTE: >>> Progress: completed iteration 16
NOTE: >>> Progress: completed iteration 17
NOTE: >>> Progress: completed iteration 18
NOTE: >>> Progr

Unnamed: 0,Criterion,Value
0,MSE,0.038322
1,RMSE,0.195761
