# Consulting Project 
## Recommender Systems - Solutions

The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You've just taken up a new contract with a new online food delivery company. This company is trying to differentiate itself by recommending new meals to customers based off of other customers likings.

Can you build them a recommendation system?

Your final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals. For example:

Best of luck!

** *Note from Jose: I completely made up this food data, so its likely that the actual recommendations themselves won't make any sense. But you should get a similar output to what I did given the example customer dataframe* **

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("movielens_ratings.csv")

In [3]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
movieId,1501.0,49.40573,28.937034,0.0,24.0,50.0,74.0,99.0
rating,1501.0,1.774151,1.187276,1.0,1.0,1.0,2.0,5.0
userId,1501.0,14.383744,8.59104,0.0,7.0,14.0,22.0,29.0


In [4]:
df.corr()

Unnamed: 0,movieId,rating,userId
movieId,1.0,0.036569,0.003267
rating,0.036569,1.0,0.056411
userId,0.003267,0.056411,1.0


In [5]:
import numpy as np

df["mealskew"] = df["movieId"].apply(lambda id: np.nan if id > 31 else id)

In [6]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
movieId,1501.0,49.40573,28.937034,0.0,24.0,50.0,74.0,99.0
rating,1501.0,1.774151,1.187276,1.0,1.0,1.0,2.0,5.0
userId,1501.0,14.383744,8.59104,0.0,7.0,14.0,22.0,29.0
mealskew,486.0,15.502058,9.250634,0.0,7.0,15.0,23.0,31.0


In [7]:
mealmap = {
    2.0: "Chicken Curry",
    3.0: "Spicy Chicken Nuggest",
    5.0: "Hamburger",
    9.0: "Taco Surprise",
    11.0: "Meatloaf",
    12.0: "Ceaser Salad",
    15.0: "BBQ Ribs",
    17.0: "Sushi Plate",
    19.0: "Cheesesteak Sandwhich",
    21.0: "Lasagna",
    23.0: "Orange Chicken",
    26.0: "Spicy Beef Plate",
    27.0: "Salmon with Mashed Potatoes",
    28.0: "Penne Tomatoe Pasta",
    29.0: "Pork Sliders",
    30.0: "Vietnamese Sandwich",
    31.0: "Chicken Wrap",
    np.nan: "Cowboy Burger",
    4.0: "Pretzels and Cheese Plate",
    6.0: "Spicy Pork Sliders",
    13.0: "Mandarin Chicken PLate",
    14.0: "Kung Pao Chicken",
    16.0: "Fried Rice Plate",
    8.0: "Chicken Chow Mein",
    10.0: "Roasted Eggplant ",
    18.0: "Pepperoni Pizza",
    22.0: "Pulled Pork Plate",
    0.0: "Cheese Pizza",
    1.0: "Burrito",
    7.0: "Nachos",
    24.0: "Chili",
    20.0: "Southwest Salad",
    25.0: "Roast Beef Sandwich",
}

In [17]:
df["meal_name"] = df["mealskew"].map(mealmap)

In [18]:
df.to_csv("Meal_Info.csv", index=False)

In [19]:
from pyspark.sql import SparkSession

In [20]:
spark = SparkSession.builder.appName("recconsulting").getOrCreate()

In [21]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

In [22]:
data = spark.read.csv("Meal_Info.csv", inferSchema=True, header=True)

In [26]:
data.show(10)

+-------+------+------+--------+--------------------+
|movieId|rating|userId|mealskew|           meal_name|
+-------+------+------+--------+--------------------+
|      2|   3.0|     0|     2.0|       Chicken Curry|
|      3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|      5|   2.0|     0|     5.0|           Hamburger|
|      9|   4.0|     0|     9.0|       Taco Surprise|
|     11|   1.0|     0|    11.0|            Meatloaf|
|     12|   2.0|     0|    12.0|        Ceaser Salad|
|     15|   1.0|     0|    15.0|            BBQ Ribs|
|     17|   1.0|     0|    17.0|         Sushi Plate|
|     19|   1.0|     0|    19.0|Cheesesteak Sandw...|
|     21|   1.0|     0|    21.0|             Lasagna|
+-------+------+------+--------+--------------------+
only showing top 10 rows



In [29]:
data = data.na.drop()

In [30]:
(training, test) = data.randomSplit([0.8, 0.2])

In [31]:
# Build the recommendation model using ALS on the training data
als = ALS(
    maxIter=5, regParam=0.01, userCol="userId", itemCol="mealskew", ratingCol="rating"
)
model = als.fit(training)

In [32]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)

predictions.show()

evaluator = RegressionEvaluator(
    metricName="rmse", labelCol="rating", predictionCol="prediction"
)
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

+-------+------+------+--------+--------------------+----------+
|movieId|rating|userId|mealskew|           meal_name|prediction|
+-------+------+------+--------+--------------------+----------+
|      7|   1.0|    28|     7.0|              Nachos| -2.255313|
|      1|   1.0|    26|     1.0|             Burrito|-0.6476444|
|      3|   1.0|    26|     3.0|Spicy Chicken Nug...|0.43380588|
|      5|   2.0|    26|     5.0|           Hamburger|0.14290446|
|      6|   3.0|    26|     6.0|  Spicy Pork Sliders| 0.8151277|
|      7|   5.0|    26|     7.0|              Nachos|-1.7353098|
|      6|   2.0|    22|     6.0|  Spicy Pork Sliders| 1.8123244|
|      6|   1.0|     1|     6.0|  Spicy Pork Sliders| 1.1201725|
|      2|   3.0|     6|     2.0|       Chicken Curry|0.14578642|
|      5|   3.0|    16|     5.0|           Hamburger| 0.5860579|
|      2|   1.0|     3|     2.0|       Chicken Curry|0.11936569|
|      0|   1.0|    19|     0.0|        Cheese Pizza| 0.5312605|
|      1|   1.0|    19|  