## 11.2 Movie Recommendation Engine

Joi Chu-Ketterer <br />
DSC650 <br />
February 23, 2020 <br />

This notebook will demonstrate how to build a recommendation system. 

__1. Import Data__

First, the movies and their ratings will be imported as separate files and joined into a single table.

In [2]:
ratings = "/FileStore/tables/ratings.csv"
movies = "/FileStore/tables/movies.csv"

# this imports the data into the workbook
df_ratings = spark.read.format('csv').options(header ='true').load(ratings)
df_movies = spark.read.format('csv').options(header ='true').load(movies)

In [3]:
joined_table = df_ratings.join(df_movies, ["movieId"])
joined_table.printSchema()

In [4]:
from pyspark.sql.types import *

joined_table = joined_table.withColumn("userId", joined_table["userId"].cast(IntegerType()))
joined_table = joined_table.withColumn("movieId", joined_table["movieId"].cast(IntegerType()))
joined_table = joined_table.withColumn("rating", joined_table["rating"].cast(DoubleType()))
joined_table = joined_table.withColumn("timestamp",joined_table["timestamp"].cast(IntegerType()))

joined_table.printSchema()

__2. Train and Evaluate Recommender__ 

With the data joined in a single table, it can be used to train the recommender algorithm.

In [6]:
train, test = joined_table.randomSplit([0.8, 0.2], seed = 123)

print("Training Dataset Count: " + str(train.count()))
print("Test Dataset Count: " + str(test.count()))

In [7]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS #ALS = alternating least squares
from pyspark.sql import Row

# builds the ASL model
# cold start strategy is set to 'drop' to ensure there are no NaN evaluation metrics
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating",
          coldStartStrategy="drop")

# this the ASL model to the training data
model = als.fit(train)

# uses RMSE to evaluate the model on the test data 
predictions = model.transform(test)

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")

evaluation_RSME = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(evaluation_RSME))

__3. Generate top 10 movie recommendations__

In [9]:
def get_movieids(userid):
    
    nice = predictions[['prediction', 'movieId', 'userId', 'title']].where(predictions['userId'] == userid)
    
    return nice.sort(['prediction'], reverse = True)

In [10]:
get_movieids(127).show(truncate = False)

In [11]:
get_movieids(151).show(truncate = False)

In [12]:
get_movieids(300).show(truncate = False)