# Consulting Project 
## Recommender Systems - Solutions

The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You've just taken up a new contract with a new online food delivery company. This company is trying to differentiate itself by recommending new meals to customers based off of other customers likings.

Can you build them a recommendation system?

Your final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals. For example:

Best of luck!

** *Note from Jose: I completely made up this food data, so its likely that the actual recommendations themselves won't make any sense. But you should get a similar output to what I did given the example customer dataframe* **

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('movielens_ratings.csv')

In [3]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
movieId,1501.0,49.40573,28.937034,0.0,24.0,50.0,74.0,99.0
rating,1501.0,1.774151,1.187276,1.0,1.0,1.0,2.0,5.0
userId,1501.0,14.383744,8.59104,0.0,7.0,14.0,22.0,29.0


In [4]:
df.corr()

Unnamed: 0,movieId,rating,userId
movieId,1.0,0.036569,0.003267
rating,0.036569,1.0,0.056411
userId,0.003267,0.056411,1.0


In [5]:
import numpy as np
df['mealskew'] = df['movieId'].apply(lambda id: np.nan if id > 31 else id)

In [6]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
movieId,1501.0,49.40573,28.937034,0.0,24.0,50.0,74.0,99.0
rating,1501.0,1.774151,1.187276,1.0,1.0,1.0,2.0,5.0
userId,1501.0,14.383744,8.59104,0.0,7.0,14.0,22.0,29.0
mealskew,486.0,15.502058,9.250634,0.0,7.0,15.0,23.0,31.0


In [7]:
mealmap = { 2. : "Chicken Curry",   
           3. : "Spicy Chicken Nuggest",   
           5. : "Hamburger",   
           9. : "Taco Surprise",  
           11. : "Meatloaf",  
           12. : "Ceaser Salad",  
           15. : "BBQ Ribs",  
           17. : "Sushi Plate",  
           19. : "Cheesesteak Sandwhich",  
           21. : "Lasagna",  
           23. : "Orange Chicken",
           26. : "Spicy Beef Plate",  
           27. : "Salmon with Mashed Potatoes",  
           28. : "Penne Tomatoe Pasta",  
           29. : "Pork Sliders",  
           30. : "Vietnamese Sandwich",  
           31. : "Chicken Wrap",  
           np.nan: "Cowboy Burger",   
           4. : "Pretzels and Cheese Plate",   
           6. : "Spicy Pork Sliders",  
           13. : "Mandarin Chicken PLate",  
           14. : "Kung Pao Chicken",
           16. : "Fried Rice Plate",  
           8. : "Chicken Chow Mein",  
           10. : "Roasted Eggplant ",  
           18. : "Pepperoni Pizza",  
           22. : "Pulled Pork Plate",   
           0. : "Cheese Pizza",   
           1. : "Burrito",   
           7. : "Nachos",  
           24. : "Chili",  
           20. : "Southwest Salad",  
           25.: "Roast Beef Sandwich"}

In [8]:
df['meal_name'] = df['mealskew'].map(mealmap)

In [9]:
df.to_csv('Meal_Info.csv',index=False)

In [10]:
from pyspark.sql import SparkSession

In [11]:
spark = SparkSession.builder.appName('recconsulting').getOrCreate()

In [12]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

In [22]:
data = spark.read.csv('Meal_Info.csv',inferSchema=True,header=True)

In [23]:
data.show()

+-------+------+------+--------+--------------------+
|movieId|rating|userId|mealskew|           meal_name|
+-------+------+------+--------+--------------------+
|      2|   3.0|     0|     2.0|       Chicken Curry|
|      3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|      5|   2.0|     0|     5.0|           Hamburger|
|      9|   4.0|     0|     9.0|       Taco Surprise|
|     11|   1.0|     0|    11.0|            Meatloaf|
|     12|   2.0|     0|    12.0|        Ceaser Salad|
|     15|   1.0|     0|    15.0|            BBQ Ribs|
|     17|   1.0|     0|    17.0|         Sushi Plate|
|     19|   1.0|     0|    19.0|Cheesesteak Sandw...|
|     21|   1.0|     0|    21.0|             Lasagna|
|     23|   1.0|     0|    23.0|      Orange Chicken|
|     26|   3.0|     0|    26.0|    Spicy Beef Plate|
|     27|   1.0|     0|    27.0|Salmon with Mashe...|
|     28|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|     29|   1.0|     0|    29.0|        Pork Sliders|
|     30|   1.0|     0|    3

In [27]:
data1 = data.drop('meal_name')
data1.show()

+-------+------+------+--------+
|movieId|rating|userId|mealskew|
+-------+------+------+--------+
|      2|   3.0|     0|     2.0|
|      3|   1.0|     0|     3.0|
|      5|   2.0|     0|     5.0|
|      9|   4.0|     0|     9.0|
|     11|   1.0|     0|    11.0|
|     12|   2.0|     0|    12.0|
|     15|   1.0|     0|    15.0|
|     17|   1.0|     0|    17.0|
|     19|   1.0|     0|    19.0|
|     21|   1.0|     0|    21.0|
|     23|   1.0|     0|    23.0|
|     26|   3.0|     0|    26.0|
|     27|   1.0|     0|    27.0|
|     28|   1.0|     0|    28.0|
|     29|   1.0|     0|    29.0|
|     30|   1.0|     0|    30.0|
|     31|   1.0|     0|    31.0|
|     34|   1.0|     0|    null|
|     37|   1.0|     0|    null|
|     41|   2.0|     0|    null|
+-------+------+------+--------+
only showing top 20 rows



In [31]:
data1 = data1.na.drop()
data1.show()

+-------+------+------+--------+
|movieId|rating|userId|mealskew|
+-------+------+------+--------+
|      2|   3.0|     0|     2.0|
|      3|   1.0|     0|     3.0|
|      5|   2.0|     0|     5.0|
|      9|   4.0|     0|     9.0|
|     11|   1.0|     0|    11.0|
|     12|   2.0|     0|    12.0|
|     15|   1.0|     0|    15.0|
|     17|   1.0|     0|    17.0|
|     19|   1.0|     0|    19.0|
|     21|   1.0|     0|    21.0|
|     23|   1.0|     0|    23.0|
|     26|   3.0|     0|    26.0|
|     27|   1.0|     0|    27.0|
|     28|   1.0|     0|    28.0|
|     29|   1.0|     0|    29.0|
|     30|   1.0|     0|    30.0|
|     31|   1.0|     0|    31.0|
|      2|   2.0|     1|     2.0|
|      3|   1.0|     1|     3.0|
|      4|   2.0|     1|     4.0|
+-------+------+------+--------+
only showing top 20 rows



In [32]:
(training, test) = data1.randomSplit([0.8, 0.2])

In [33]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="mealskew", ratingCol="rating")


In [34]:
model = als.fit(training)

In [35]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)

predictions.show()

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

+-------+------+------+--------+-----------+
|movieId|rating|userId|mealskew| prediction|
+-------+------+------+--------+-----------+
|     31|   1.0|    19|    31.0|  2.0746667|
|     31|   1.0|     4|    31.0|  1.8572083|
|     31|   3.0|     8|    31.0|    1.04709|
|     31|   1.0|    18|    31.0|  2.1031294|
|     28|   1.0|     5|    28.0|-0.16151106|
|     28|   1.0|    23|    28.0|-0.61343676|
|     26|   1.0|     9|    26.0|  1.1497918|
|     27|   1.0|    26|    27.0|  2.8168237|
|     27|   1.0|     5|    27.0|  2.4363708|
|     27|   3.0|    24|    27.0|  1.0392537|
|     27|   1.0|     0|    27.0| -1.4114815|
|     12|   1.0|    13|    12.0|  0.7946665|
|     12|   1.0|    16|    12.0|  1.5104833|
|     12|   3.0|    25|    12.0| -1.9645524|
|     12|   2.0|     0|    12.0|  2.1467166|
|     22|   5.0|    22|    22.0|   4.423783|
|     22|   2.0|    15|    22.0|  1.5791476|
|     22|   1.0|     9|    22.0|  1.0862883|
|     22|   4.0|    17|    22.0|  2.5013154|
|     22| 