# Consulting Project 
## Recommender Systems

The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You've just taken up a new contract with a new online food delivery company. This company is trying to differentiate itself by recommending new meals to customers based off of other customers likings.

Can you build them a recommendation system?

Your final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals.

# Spark Session

In [14]:
from pyspark.sql import SparkSession

In [15]:
spark = SparkSession.builder.appName('meal_recommendation').getOrCreate()

In [17]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

# Reading Data

In [18]:
data = spark.read.csv('Meal_Info.csv',inferSchema=True,header=True)

In [19]:
data.show()

+------+------+------+--------+--------------------+
|mealId|rating|userId|mealskey|           meal_name|
+------+------+------+--------+--------------------+
|     2|   3.0|     0|     2.0|       Chicken Curry|
|     3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|     5|   2.0|     0|     5.0|           Hamburger|
|     9|   4.0|     0|     9.0|       Taco Surprise|
|    11|   1.0|     0|    11.0|            Meatloaf|
|    12|   2.0|     0|    12.0|        Ceaser Salad|
|    15|   1.0|     0|    15.0|            BBQ Ribs|
|    17|   1.0|     0|    17.0|         Sushi Plate|
|    19|   1.0|     0|    19.0|Cheesesteak Sandw...|
|    21|   1.0|     0|    21.0|             Lasagna|
|    23|   1.0|     0|    23.0|      Orange Chicken|
|    26|   3.0|     0|    26.0|    Spicy Beef Plate|
|    27|   1.0|     0|    27.0|Salmon with Mashe...|
|    28|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|    29|   1.0|     0|    29.0|        Pork Sliders|
|    30|   1.0|     0|    30.0| Vietnamese San

# Preprocessing

## As an information, we don't need the mealid we will work just with the mealskey

In [20]:
df = data.drop("mealId")

In [21]:
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   3.0|     0|     2.0|       Chicken Curry|
|   1.0|     0|     3.0|Spicy Chicken Nug...|
|   2.0|     0|     5.0|           Hamburger|
|   4.0|     0|     9.0|       Taco Surprise|
|   1.0|     0|    11.0|            Meatloaf|
|   2.0|     0|    12.0|        Ceaser Salad|
|   1.0|     0|    15.0|            BBQ Ribs|
|   1.0|     0|    17.0|         Sushi Plate|
|   1.0|     0|    19.0|Cheesesteak Sandw...|
|   1.0|     0|    21.0|             Lasagna|
|   1.0|     0|    23.0|      Orange Chicken|
|   3.0|     0|    26.0|    Spicy Beef Plate|
|   1.0|     0|    27.0|Salmon with Mashe...|
|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|   1.0|     0|    29.0|        Pork Sliders|
|   1.0|     0|    30.0| Vietnamese Sandwich|
|   1.0|     0|    31.0|        Chicken Wrap|
|   1.0|     0|    null|       Cowboy Burger|
|   1.0|     0|    null|       Cow

In [62]:
df.count()

622

## Droping duplicates

In [60]:
df = df.dropDuplicates(["rating", "userId", "mealskey", "meal_name"])

In [61]:
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   1.0|    22|      16|    Fried Rice Plate|
|   1.0|    15|      13|Mandarin Chicken ...|
|   2.0|    16|      30| Vietnamese Sandwich|
|   1.0|    22|      26|    Spicy Beef Plate|
|   4.0|    22|      32|       Cowboy Burger|
|   1.0|    23|      12|        Ceaser Salad|
|   5.0|    11|      27|Salmon with Mashe...|
|   3.0|    12|      32|       Cowboy Burger|
|   1.0|     5|      27|Salmon with Mashe...|
|   1.0|     6|      28| Penne Tomatoe Pasta|
|   1.0|    23|      28| Penne Tomatoe Pasta|
|   1.0|    27|      29|        Pork Sliders|
|   1.0|     3|      19|Cheesesteak Sandw...|
|   1.0|     5|       4|Pretzels and Chee...|
|   2.0|     7|      32|       Cowboy Burger|
|   1.0|    18|      20|     Southwest Salad|
|   1.0|    20|      32|       Cowboy Burger|
|   2.0|     8|       4|Pretzels and Chee...|
|   1.0|    18|       1|          

In [63]:
df.count()

622

# Filling the missing values in the **mealskey**

In [22]:
# Count the number of unique meal names
unique_meal_count = df.select("meal_name").distinct().count()

print("Number of unique meal names:", unique_meal_count)

Number of unique meal names: 33


In [23]:
from pyspark.sql.functions import max
max_val = df.select(max(df['mealskey'])).collect()
print(max_val[0][0])

31.0


In [25]:
from pyspark.sql.functions import min
min_val = df.select(min(df['mealskey'])).collect()
print(min_val[0][0])

0.0


In [24]:
# Count the number of unique meal names
unique_meal_count = df.select("mealskey").distinct().count()

print("Number of unique meal keys:", unique_meal_count)

Number of unique meal keys: 33


In [26]:
df = df.na.fill(df.select(max(df['mealskey'])).collect()[0][0]+1,['mealskey'])

In [27]:
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   3.0|     0|     2.0|       Chicken Curry|
|   1.0|     0|     3.0|Spicy Chicken Nug...|
|   2.0|     0|     5.0|           Hamburger|
|   4.0|     0|     9.0|       Taco Surprise|
|   1.0|     0|    11.0|            Meatloaf|
|   2.0|     0|    12.0|        Ceaser Salad|
|   1.0|     0|    15.0|            BBQ Ribs|
|   1.0|     0|    17.0|         Sushi Plate|
|   1.0|     0|    19.0|Cheesesteak Sandw...|
|   1.0|     0|    21.0|             Lasagna|
|   1.0|     0|    23.0|      Orange Chicken|
|   3.0|     0|    26.0|    Spicy Beef Plate|
|   1.0|     0|    27.0|Salmon with Mashe...|
|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|   1.0|     0|    29.0|        Pork Sliders|
|   1.0|     0|    30.0| Vietnamese Sandwich|
|   1.0|     0|    31.0|        Chicken Wrap|
|   1.0|     0|    32.0|       Cowboy Burger|
|   1.0|     0|    32.0|       Cow

In [29]:
from pyspark.sql.functions import col, when

In [30]:
df = df.withColumn("mealskey", col("mealskey").cast("int"))
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   3.0|     0|       2|       Chicken Curry|
|   1.0|     0|       3|Spicy Chicken Nug...|
|   2.0|     0|       5|           Hamburger|
|   4.0|     0|       9|       Taco Surprise|
|   1.0|     0|      11|            Meatloaf|
|   2.0|     0|      12|        Ceaser Salad|
|   1.0|     0|      15|            BBQ Ribs|
|   1.0|     0|      17|         Sushi Plate|
|   1.0|     0|      19|Cheesesteak Sandw...|
|   1.0|     0|      21|             Lasagna|
|   1.0|     0|      23|      Orange Chicken|
|   3.0|     0|      26|    Spicy Beef Plate|
|   1.0|     0|      27|Salmon with Mashe...|
|   1.0|     0|      28| Penne Tomatoe Pasta|
|   1.0|     0|      29|        Pork Sliders|
|   1.0|     0|      30| Vietnamese Sandwich|
|   1.0|     0|      31|        Chicken Wrap|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cow

# Split Train Test

In [31]:
(training, test) = df.randomSplit([0.8, 0.2])

In [32]:
training.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   1.0|     0|       3|Spicy Chicken Nug...|
|   1.0|     0|      11|            Meatloaf|
|   1.0|     0|      17|         Sushi Plate|
|   1.0|     0|      19|Cheesesteak Sandw...|
|   1.0|     0|      21|             Lasagna|
|   1.0|     0|      23|      Orange Chicken|
|   1.0|     0|      28| Penne Tomatoe Pasta|
|   1.0|     0|      29|        Pork Sliders|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     0|      32|       Cow

In [66]:
training.count()

518

In [64]:
training = training.dropDuplicates(["rating", "userId", "mealskey", "meal_name"])

In [65]:
training.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   1.0|    22|      16|    Fried Rice Plate|
|   1.0|    15|      13|Mandarin Chicken ...|
|   1.0|    23|      12|        Ceaser Salad|
|   2.0|    16|      30| Vietnamese Sandwich|
|   4.0|    22|      32|       Cowboy Burger|
|   3.0|    12|      32|       Cowboy Burger|
|   5.0|    11|      27|Salmon with Mashe...|
|   1.0|     5|      27|Salmon with Mashe...|
|   1.0|     6|      28| Penne Tomatoe Pasta|
|   1.0|    23|      28| Penne Tomatoe Pasta|
|   1.0|     3|      19|Cheesesteak Sandw...|
|   1.0|     5|       4|Pretzels and Chee...|
|   1.0|    18|      20|     Southwest Salad|
|   1.0|    20|      32|       Cowboy Burger|
|   2.0|     7|      32|       Cowboy Burger|
|   1.0|    18|       1|             Burrito|
|   2.0|     8|       4|Pretzels and Chee...|
|   1.0|    14|       9|       Taco Surprise|
|   1.0|    26|       2|       Chi

In [67]:
training.count()

518

In [37]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="mealskey", ratingCol="rating")
model = als.fit(training)

In [38]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)

predictions.show()

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

+------+------+--------+--------------------+------------+
|rating|userId|mealskey|           meal_name|  prediction|
+------+------+--------+--------------------+------------+
|   1.0|     1|       3|Spicy Chicken Nug...|   2.0916033|
|   1.0|     1|      32|       Cowboy Burger|   1.5863817|
|   1.0|     1|      32|       Cowboy Burger|   1.5863817|
|   1.0|     1|      32|       Cowboy Burger|   1.5863817|
|   1.0|     1|      32|       Cowboy Burger|   1.5863817|
|   1.0|     1|      32|       Cowboy Burger|   1.5863817|
|   1.0|     1|      32|       Cowboy Burger|   1.5863817|
|   1.0|     3|       1|             Burrito|   0.6205207|
|   1.0|     3|      32|       Cowboy Burger|   1.8170124|
|   1.0|     3|      32|       Cowboy Burger|   1.8170124|
|   1.0|     3|      32|       Cowboy Burger|   1.8170124|
|   1.0|     4|       6|  Spicy Pork Sliders|  0.58239216|
|   1.0|     2|      10|   Roasted Eggplant |   1.6947582|
|   1.0|     2|      32|       Cowboy Burger|   1.992288

# <span style="color:red">IMPROVING !!

In [39]:
predictions_2 = predictions.withColumn("prediction", when(col("prediction").cast("int").cast("float") >= 0, col("prediction").cast("int").cast("float")).otherwise(0))
predictions_2.show()

+------+------+--------+--------------------+----------+
|rating|userId|mealskey|           meal_name|prediction|
+------+------+--------+--------------------+----------+
|   1.0|    28|       7|              Nachos|       2.0|
|   1.0|    28|      32|       Cowboy Burger|       1.0|
|   1.0|    28|      32|       Cowboy Burger|       1.0|
|   1.0|    28|      32|       Cowboy Burger|       1.0|
|   1.0|    28|      32|       Cowboy Burger|       1.0|
|   2.0|    28|      32|       Cowboy Burger|       1.0|
|   3.0|    28|       0|        Cheese Pizza|       1.0|
|   3.0|    28|      19|Cheesesteak Sandw...|       2.0|
|   3.0|    28|      24|               Chili|       1.0|
|   3.0|    28|      32|       Cowboy Burger|       1.0|
|   1.0|    26|       1|             Burrito|       0.0|
|   1.0|    26|      32|       Cowboy Burger|       1.0|
|   1.0|    27|       9|       Taco Surprise|       1.0|
|   1.0|    27|      25| Roast Beef Sandwich|       0.0|
|   1.0|    27|      29|       

In [40]:
rmse = evaluator.evaluate(predictions_2)
print("Root-mean-square error = " + str(rmse))

Root-mean-square error = 1.484591990388747


# CREATING THE OUTPUT FUNCTION

In [57]:
def output(single_user):
    reccomendations = model.transform(single_user)
    reccomendations = reccomendations.orderBy('prediction',ascending=False)
    reccomendations.show()
    # Select the top 3 recommended meals
    top_3_recommendations = reccomendations.select("mealskey","meal_name").limit(3)
    return top_3_recommendations

In [58]:
single_user = test.filter(test['userId']==11).select(['mealskey','userId'])

In [59]:
output(single_user).show()

+--------+------+----------+
|mealskey|userId|prediction|
+--------+------+----------+
|      25|    11| 2.3875515|
|      32|    11|   2.18957|
|      32|    11|   2.18957|
|      32|    11|   2.18957|
|      32|    11|   2.18957|
|      32|    11|   2.18957|
|      32|    11|   2.18957|
|      32|    11|   2.18957|
|      32|    11|   2.18957|
|      22|    11| 0.9296804|
|      11|    11|0.27436477|
+--------+------+----------+



AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `meal_name` cannot be resolved. Did you mean one of the following? [`mealskey`, `prediction`, `userId`].;
'Project [mealskey#252, 'meal_name]
+- Sort [prediction#833 DESC NULLS LAST], true
   +- Project [mealskey#252, userId#122, UDF(features#480, features#492) AS prediction#833]
      +- Join LeftOuter, (CASE WHEN isnull(mealskey#252) THEN cast(raise_error(mealskey Ids MUST NOT be Null, NullType) as int) ELSE mealskey#252 END = id#491)
         :- Join LeftOuter, (CASE WHEN isnull(userId#122) THEN cast(raise_error(userId Ids MUST NOT be Null, NullType) as int) ELSE userId#122 END = id#479)
         :  :- Project [mealskey#252, userId#122]
         :  :  +- Filter (userId#122 = 11)
         :  :     +- Sample 0.8, 1.0, false, 995409244936955056
         :  :        +- Sort [rating#121 ASC NULLS FIRST, userId#122 ASC NULLS FIRST, mealskey#252 ASC NULLS FIRST, meal_name#124 ASC NULLS FIRST], false
         :  :           +- Project [rating#121, userId#122, cast(mealskey#226 as int) AS mealskey#252, meal_name#124]
         :  :              +- Project [rating#121, userId#122, coalesce(nanvl(mealskey#123, cast(null as double)), cast(32.0 as double)) AS mealskey#226, meal_name#124]
         :  :                 +- Project [rating#121, userId#122, mealskey#123, meal_name#124]
         :  :                    +- Relation [mealId#120,rating#121,userId#122,mealskey#123,meal_name#124] csv
         :  +- Project [_1#474 AS id#479, _2#475 AS features#480]
         :     +- SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#474, staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(FloatType,false), fromPrimitiveArray, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false, true) AS _2#475]
         :        +- ExternalRDD [obj#473]
         +- Project [_1#486 AS id#491, _2#487 AS features#492]
            +- SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#486, staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(FloatType,false), fromPrimitiveArray, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false, true) AS _2#487]
               +- ExternalRDD [obj#485]
