## ALS Recommendation Model

We used Spark ML to build the ALS Matrix Factorization Model for restaurant recommendation.
We have createed for different ALS models (one per state) which we pickle and store in the file system after creation.
This model is capable of helping us get restaurant recommendations for an existing user as well recommendation for users when given a restaurant.

### Collaborative Filtering

Collaborative filtering aims to fill in the missing entries of a user-item association matrix. spark.mllib model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. spark.mllib uses the alternating least squares (ALS) algorithm to learn these latent factors. 


In [1]:
from pyspark.sql import SparkSession

import json
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
import math
import os
import shutil

#### Steps for creating ALS Model:

1.	For each state, we load the json file containing a combined view of user reviews and restaurants
2.	We select user_id, business_id, and rating and do a 80:20 random split on the data
3.	The training data is used to train the ALS model
4.	The tesing data is used to compute the RMSE 
5.	The models are pickled and saved so that it can be consumed by the flask application.


In [2]:
def execute(spark, path, state, directory):
    df = spark.read.json(path)
    df.describe()
    df = df.select('*', (df.stars * 2).alias('rescaled_rating'))
    df.createOrReplaceTempView("user_business_review")
    ratingsDf = spark.sql("Select user_unique_user_id, unique_business_id, rescaled_rating from user_business_review")
    (training, testing) = ratingsDf.randomSplit([0.8, 0.2])
    ratings = training.rdd.map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
    rank = 100
    numIterations = 15
    model = ALS.train(ratings, rank, numIterations)
    testdata = testing.rdd.map(lambda p: (p[0], p[1]))
    predictions = model.predictAll(testdata)
    p = predictions.toDF()
    p.createOrReplaceTempView("outcome")
    out = spark.sql(
        "Select u.rescaled_rating, p.* from user_business_review u,"
        " outcome p where p.user=u.user_unique_user_id and u.unique_business_id=p.product")

    print("**********", math.sqrt(out.rdd.map(lambda x: (x[0] - x[3]) ** 2).mean()), " ********** ")
    model_name = directory + "/" + state + ".parquet"
    model.save(spark.sparkContext, model_name)
    print("********* SAVED MODEL ****** ", model_name)


Filtered business review files generated in RecommendationsALS
These files are given input to create ALS models based on the state

In [3]:
data = [
    {"path" : "OH_business.json"},
    {"path" : "AZ_business.json"},
    {"path" : "NC_business.json"},
    {"path" : "ON_business.json"},
    {"path" : "NV_business.json"}
  ]

base_path = "/"

Create a Spark session and call function to create als-models based on the state

In [4]:
if __name__ == "__main__":

    spark = SparkSession.builder.appName("YelBusinessRecommendation").getOrCreate()

    if os.path.exists('als-models'):
        shutil.rmtree('als-models')
    os.mkdir('als-models')
    for file_location in data :
        print('PROCESSING :', file_location['path'])
        concat_path = file_location['path']
        directory = 'als-models/' + file_location['path'].split('.')[0]
        print(file_location['path'].split('.')[0])
        os.mkdir(directory)
        execute(spark, concat_path, file_location['path'].split('.')[0],directory)


PROCESSING : OH_business.json
OH_business
********** 5.40094655239638  ********** 
********* SAVED MODEL ******  als-models/OH_business/OH_business.parquet
PROCESSING : AZ_business.json
AZ_business
********** 4.4647040182141815  ********** 
********* SAVED MODEL ******  als-models/AZ_business/AZ_business.parquet
PROCESSING : NC_business.json
NC_business
********** 5.149190157542628  ********** 
********* SAVED MODEL ******  als-models/NC_business/NC_business.parquet
PROCESSING : ON_business.json
ON_business
********** 5.179986319199889  ********** 
********* SAVED MODEL ******  als-models/ON_business/ON_business.parquet
PROCESSING : NV_business.json
NV_business
********** 3.8718235666317473  ********** 
********* SAVED MODEL ******  als-models/NV_business/NV_business.parquet


The text in the document by Shrikant Mudholkar, Varsha Bhanushali and Monas Bhar is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/us/

The code in the document by Shrikant Mudholkar, Varsha Bhanushali and Monas Bhar is licensed under the MIT License https://opensource.org/licenses/MIT