# Consulting Project 
## Recommender Systems

The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You've just taken up a new contract with a new online food delivery company. This company is trying to differentiate itself by recommending new meals to customers based off of other customers likings.

Can you build them a recommendation system?

Your final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals.

# Spark Session

In [1]:
from pyspark.sql import SparkSession

In [2]:
spark = SparkSession.builder.appName('meal_recommendation').getOrCreate()

23/08/21 10:47:11 WARN Utils: Your hostname, Blade-15-Base-Model resolves to a loopback address: 127.0.1.1; using 192.168.0.22 instead (on interface wlo1)
23/08/21 10:47:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/21 10:47:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [3]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

# Reading Data

In [4]:
data = spark.read.csv('Meal_Info.csv',inferSchema=True,header=True)

In [5]:
data.show()

+------+------+------+--------+--------------------+
|mealId|rating|userId|mealskey|           meal_name|
+------+------+------+--------+--------------------+
|     2|   3.0|     0|     2.0|       Chicken Curry|
|     3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|     5|   2.0|     0|     5.0|           Hamburger|
|     9|   4.0|     0|     9.0|       Taco Surprise|
|    11|   1.0|     0|    11.0|            Meatloaf|
|    12|   2.0|     0|    12.0|        Ceaser Salad|
|    15|   1.0|     0|    15.0|            BBQ Ribs|
|    17|   1.0|     0|    17.0|         Sushi Plate|
|    19|   1.0|     0|    19.0|Cheesesteak Sandw...|
|    21|   1.0|     0|    21.0|             Lasagna|
|    23|   1.0|     0|    23.0|      Orange Chicken|
|    26|   3.0|     0|    26.0|    Spicy Beef Plate|
|    27|   1.0|     0|    27.0|Salmon with Mashe...|
|    28|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|    29|   1.0|     0|    29.0|        Pork Sliders|
|    30|   1.0|     0|    30.0| Vietnamese San

# Preprocessing

## As an information, we don't need the mealid we will work just with the mealskey

In [6]:
df = data.drop("mealId")

In [7]:
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   3.0|     0|     2.0|       Chicken Curry|
|   1.0|     0|     3.0|Spicy Chicken Nug...|
|   2.0|     0|     5.0|           Hamburger|
|   4.0|     0|     9.0|       Taco Surprise|
|   1.0|     0|    11.0|            Meatloaf|
|   2.0|     0|    12.0|        Ceaser Salad|
|   1.0|     0|    15.0|            BBQ Ribs|
|   1.0|     0|    17.0|         Sushi Plate|
|   1.0|     0|    19.0|Cheesesteak Sandw...|
|   1.0|     0|    21.0|             Lasagna|
|   1.0|     0|    23.0|      Orange Chicken|
|   3.0|     0|    26.0|    Spicy Beef Plate|
|   1.0|     0|    27.0|Salmon with Mashe...|
|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|   1.0|     0|    29.0|        Pork Sliders|
|   1.0|     0|    30.0| Vietnamese Sandwich|
|   1.0|     0|    31.0|        Chicken Wrap|
|   1.0|     0|    null|       Cowboy Burger|
|   1.0|     0|    null|       Cow

In [8]:
df.count()

1501

## Droping duplicates

In [9]:
df = df.dropDuplicates(["rating", "userId", "mealskey", "meal_name"])

In [10]:
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   3.0|     3|    null|       Cowboy Burger|
|   1.0|     8|     9.0|       Taco Surprise|
|   1.0|    24|    null|       Cowboy Burger|
|   3.0|    29|    17.0|         Sushi Plate|
|   1.0|     4|     8.0|   Chicken Chow Mein|
|   1.0|     4|    11.0|            Meatloaf|
|   1.0|     5|    null|       Cowboy Burger|
|   2.0|     8|     3.0|Spicy Chicken Nug...|
|   3.0|    13|    29.0|        Pork Sliders|
|   1.0|    26|    null|       Cowboy Burger|
|   1.0|    28|     6.0|  Spicy Pork Sliders|
|   3.0|     6|    null|       Cowboy Burger|
|   3.0|    14|    31.0|        Chicken Wrap|
|   2.0|    20|    30.0| Vietnamese Sandwich|
|   5.0|     7|    25.0| Roast Beef Sandwich|
|   3.0|    26|    18.0|     Pepperoni Pizza|
|   1.0|     4|    17.0|         Sushi Plate|
|   3.0|     5|    null|       Cowboy Burger|
|   2.0|     6|    null|       Cow

In [11]:
df.count()

622

# Filling the missing values in the **mealskey**

In [12]:
# Count the number of unique meal names
unique_meal_count = df.select("meal_name").distinct().count()

print("Number of unique meal names:", unique_meal_count)

Number of unique meal names: 33


In [13]:
from pyspark.sql.functions import max
max_val = df.select(max(df['mealskey'])).collect()
print(max_val[0][0])

31.0


In [14]:
from pyspark.sql.functions import min
min_val = df.select(min(df['mealskey'])).collect()
print(min_val[0][0])

0.0


In [15]:
# Count the number of unique meal names
unique_meal_count = df.select("mealskey").distinct().count()

print("Number of unique meal keys:", unique_meal_count)

Number of unique meal keys: 33


In [16]:
df = df.na.fill(df.select(max(df['mealskey'])).collect()[0][0]+1,['mealskey'])

In [17]:
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   3.0|     3|    32.0|       Cowboy Burger|
|   1.0|     8|     9.0|       Taco Surprise|
|   1.0|    24|    32.0|       Cowboy Burger|
|   3.0|    29|    17.0|         Sushi Plate|
|   1.0|     4|     8.0|   Chicken Chow Mein|
|   1.0|     4|    11.0|            Meatloaf|
|   1.0|     5|    32.0|       Cowboy Burger|
|   2.0|     8|     3.0|Spicy Chicken Nug...|
|   3.0|    13|    29.0|        Pork Sliders|
|   1.0|    26|    32.0|       Cowboy Burger|
|   1.0|    28|     6.0|  Spicy Pork Sliders|
|   3.0|     6|    32.0|       Cowboy Burger|
|   3.0|    14|    31.0|        Chicken Wrap|
|   2.0|    20|    30.0| Vietnamese Sandwich|
|   5.0|     7|    25.0| Roast Beef Sandwich|
|   3.0|    26|    18.0|     Pepperoni Pizza|
|   1.0|     4|    17.0|         Sushi Plate|
|   3.0|     5|    32.0|       Cowboy Burger|
|   2.0|     6|    32.0|       Cow

In [18]:
from pyspark.sql.functions import col, when

In [19]:
df = df.withColumn("mealskey", col("mealskey").cast("int"))
df.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   3.0|     3|      32|       Cowboy Burger|
|   1.0|     8|       9|       Taco Surprise|
|   1.0|    24|      32|       Cowboy Burger|
|   3.0|    29|      17|         Sushi Plate|
|   1.0|     4|       8|   Chicken Chow Mein|
|   1.0|     4|      11|            Meatloaf|
|   1.0|     5|      32|       Cowboy Burger|
|   2.0|     8|       3|Spicy Chicken Nug...|
|   3.0|    13|      29|        Pork Sliders|
|   1.0|    26|      32|       Cowboy Burger|
|   1.0|    28|       6|  Spicy Pork Sliders|
|   3.0|     6|      32|       Cowboy Burger|
|   3.0|    14|      31|        Chicken Wrap|
|   2.0|    20|      30| Vietnamese Sandwich|
|   5.0|     7|      25| Roast Beef Sandwich|
|   3.0|    26|      18|     Pepperoni Pizza|
|   1.0|     4|      17|         Sushi Plate|
|   3.0|     5|      32|       Cowboy Burger|
|   2.0|     6|      32|       Cow

# Split Train Test

In [20]:
(training, test) = df.randomSplit([0.8, 0.2])

In [21]:
training.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   1.0|     0|       3|Spicy Chicken Nug...|
|   1.0|     0|      11|            Meatloaf|
|   1.0|     0|      15|            BBQ Ribs|
|   1.0|     0|      17|         Sushi Plate|
|   1.0|     0|      21|             Lasagna|
|   1.0|     0|      23|      Orange Chicken|
|   1.0|     0|      27|Salmon with Mashe...|
|   1.0|     0|      28| Penne Tomatoe Pasta|
|   1.0|     0|      29|        Pork Sliders|
|   1.0|     0|      30| Vietnamese Sandwich|
|   1.0|     0|      31|        Chicken Wrap|
|   1.0|     0|      32|       Cowboy Burger|
|   1.0|     1|       3|Spicy Chicken Nug...|
|   1.0|     1|       6|  Spicy Pork Sliders|
|   1.0|     1|      12|        Ceaser Salad|
|   1.0|     1|      13|Mandarin Chicken ...|
|   1.0|     1|      14|    Kung Pao Chicken|
|   1.0|     1|      16|    Fried Rice Plate|
|   1.0|     1|      19|Cheesestea

In [22]:
training.count()

516

In [23]:
training = training.dropDuplicates(["rating", "userId", "mealskey", "meal_name"])

In [24]:
training.show()

+------+------+--------+--------------------+
|rating|userId|mealskey|           meal_name|
+------+------+--------+--------------------+
|   1.0|    22|      16|    Fried Rice Plate|
|   1.0|    15|      13|Mandarin Chicken ...|
|   1.0|    22|      26|    Spicy Beef Plate|
|   1.0|    23|      12|        Ceaser Salad|
|   2.0|    16|      30| Vietnamese Sandwich|
|   4.0|    22|      32|       Cowboy Burger|
|   3.0|    12|      32|       Cowboy Burger|
|   5.0|    11|      27|Salmon with Mashe...|
|   1.0|     5|      27|Salmon with Mashe...|
|   1.0|     6|      28| Penne Tomatoe Pasta|
|   1.0|    23|      28| Penne Tomatoe Pasta|
|   1.0|    27|      29|        Pork Sliders|
|   1.0|     3|      19|Cheesesteak Sandw...|
|   1.0|     5|       4|Pretzels and Chee...|
|   1.0|    18|      20|     Southwest Salad|
|   1.0|    20|      32|       Cowboy Burger|
|   2.0|     7|      32|       Cowboy Burger|
|   1.0|    18|       1|             Burrito|
|   1.0|    24|       4|Pretzels a

In [25]:
training.count()

516

In [26]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="mealskey", ratingCol="rating")
model = als.fit(training)

In [27]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)

predictions.show()

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

+------+------+--------+--------------------+----------+
|rating|userId|mealskey|           meal_name|prediction|
+------+------+--------+--------------------+----------+
|   1.0|     6|       0|        Cheese Pizza| 1.0385565|
|   1.0|     6|       6|  Spicy Pork Sliders| 2.5688777|
|   1.0|     6|      18|     Pepperoni Pizza| 1.0703783|
|   1.0|     6|      26|    Spicy Beef Plate|  1.613658|
|   1.0|     5|       1|             Burrito|0.44783443|
|   1.0|     5|       8|   Chicken Chow Mein| 2.4487205|
|   1.0|     5|      31|        Chicken Wrap|0.33494687|
|   1.0|     9|       3|Spicy Chicken Nug...|  2.419892|
|   1.0|     9|      21|             Lasagna|    2.3312|
|   1.0|     4|       6|  Spicy Pork Sliders|0.73405176|
|   1.0|     4|       9|       Taco Surprise| 1.1558318|
|   1.0|     4|      11|            Meatloaf| 0.3491946|
|   1.0|     4|      24|               Chili| 1.3887255|
|   1.0|     8|       5|           Hamburger| 1.3479929|
|   1.0|     8|       7|       

## <span style="color:red">IMPROVING !!

In [28]:
predictions_2 = predictions.withColumn("prediction", when(col("prediction").cast("int").cast("float") >= 0, col("prediction").cast("int").cast("float")).otherwise(0))
predictions_2.show()

+------+------+--------+--------------------+----------+
|rating|userId|mealskey|           meal_name|prediction|
+------+------+--------+--------------------+----------+
|   1.0|     0|      19|Cheesesteak Sandw...|       1.0|
|   1.0|     2|       6|  Spicy Pork Sliders|       0.0|
|   1.0|     2|      32|       Cowboy Burger|       3.0|
|   1.0|     4|       6|  Spicy Pork Sliders|       0.0|
|   1.0|     4|       9|       Taco Surprise|       1.0|
|   1.0|     4|      11|            Meatloaf|       0.0|
|   1.0|     4|      24|               Chili|       1.0|
|   1.0|     5|       1|             Burrito|       0.0|
|   1.0|     5|       8|   Chicken Chow Mein|       2.0|
|   1.0|     5|      31|        Chicken Wrap|       0.0|
|   1.0|     6|       0|        Cheese Pizza|       1.0|
|   1.0|     6|       6|  Spicy Pork Sliders|       2.0|
|   1.0|     6|      18|     Pepperoni Pizza|       1.0|
|   1.0|     6|      26|    Spicy Beef Plate|       1.0|
|   1.0|     7|      18|     Pe

In [29]:
rmse = evaluator.evaluate(predictions_2)
print("Root-mean-square error = " + str(rmse))

Root-mean-square error = 1.5989383270110533


## <span style="color:green">IMPROVED !!

# CREATING THE OUTPUT FUNCTION

In [30]:
def output(single_user):
    reccomendations = model.transform(single_user)
    reccomendations = reccomendations.orderBy('prediction',ascending=False)
    # reccomendations.show()
    # Select the top 3 recommended meals
    top_3_recommendations = reccomendations.select("meal_name").limit(3).collect()
    top_3_meal_names = [row.meal_name for row in top_3_recommendations]
    
    return top_3_meal_names

In [31]:
single_user = test.filter(test['userId']==11)

In [32]:
output(single_user)

['Cowboy Burger', 'Taco Surprise', 'Pepperoni Pizza']

#  GOOD WORK !

-------------------------------------------------------

# Further work 

### As we can see that there is a none value in the mealskey, we want to be sure that for every meal name there is a corresponding key, for that we are going to use the **STRINGINDEXING** to encode that, we will see how it works

In [33]:
from pyspark.ml.feature import StringIndexer

In [34]:
data.show()

+------+------+------+--------+--------------------+
|mealId|rating|userId|mealskey|           meal_name|
+------+------+------+--------+--------------------+
|     2|   3.0|     0|     2.0|       Chicken Curry|
|     3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|     5|   2.0|     0|     5.0|           Hamburger|
|     9|   4.0|     0|     9.0|       Taco Surprise|
|    11|   1.0|     0|    11.0|            Meatloaf|
|    12|   2.0|     0|    12.0|        Ceaser Salad|
|    15|   1.0|     0|    15.0|            BBQ Ribs|
|    17|   1.0|     0|    17.0|         Sushi Plate|
|    19|   1.0|     0|    19.0|Cheesesteak Sandw...|
|    21|   1.0|     0|    21.0|             Lasagna|
|    23|   1.0|     0|    23.0|      Orange Chicken|
|    26|   3.0|     0|    26.0|    Spicy Beef Plate|
|    27|   1.0|     0|    27.0|Salmon with Mashe...|
|    28|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|    29|   1.0|     0|    29.0|        Pork Sliders|
|    30|   1.0|     0|    30.0| Vietnamese San

In [35]:
meal_indexer = StringIndexer(inputCol='meal_name',outputCol='mealIndex') 

In [36]:
indexed = meal_indexer.fit(data).transform(data)

indexed.show()

+------+------+------+--------+--------------------+---------+
|mealId|rating|userId|mealskey|           meal_name|mealIndex|
+------+------+------+--------+--------------------+---------+
|     2|   3.0|     0|     2.0|       Chicken Curry|      5.0|
|     3|   1.0|     0|     3.0|Spicy Chicken Nug...|     27.0|
|     5|   2.0|     0|     5.0|           Hamburger|     25.0|
|     9|   4.0|     0|     9.0|       Taco Surprise|     14.0|
|    11|   1.0|     0|    11.0|            Meatloaf|     29.0|
|    12|   2.0|     0|    12.0|        Ceaser Salad|      7.0|
|    15|   1.0|     0|    15.0|            BBQ Ribs|      4.0|
|    17|   1.0|     0|    17.0|         Sushi Plate|     28.0|
|    19|   1.0|     0|    19.0|Cheesesteak Sandw...|      8.0|
|    21|   1.0|     0|    21.0|             Lasagna|      9.0|
|    23|   1.0|     0|    23.0|      Orange Chicken|     16.0|
|    26|   3.0|     0|    26.0|    Spicy Beef Plate|     22.0|
|    27|   1.0|     0|    27.0|Salmon with Mashe...|   

<span style="color:red"> NOW WE ARE SURE THAT FOR EVERY MEAL THERE IS A UNIQUE KEY

In [37]:
# droping the mealskey
indexed = indexed.drop("mealskey","mealId")
indexed = indexed.withColumnRenamed(existing="mealIndex", new="mealskey")
indexed.show()

+------+------+--------------------+--------+
|rating|userId|           meal_name|mealskey|
+------+------+--------------------+--------+
|   3.0|     0|       Chicken Curry|     5.0|
|   1.0|     0|Spicy Chicken Nug...|    27.0|
|   2.0|     0|           Hamburger|    25.0|
|   4.0|     0|       Taco Surprise|    14.0|
|   1.0|     0|            Meatloaf|    29.0|
|   2.0|     0|        Ceaser Salad|     7.0|
|   1.0|     0|            BBQ Ribs|     4.0|
|   1.0|     0|         Sushi Plate|    28.0|
|   1.0|     0|Cheesesteak Sandw...|     8.0|
|   1.0|     0|             Lasagna|     9.0|
|   1.0|     0|      Orange Chicken|    16.0|
|   3.0|     0|    Spicy Beef Plate|    22.0|
|   1.0|     0|Salmon with Mashe...|    19.0|
|   1.0|     0| Penne Tomatoe Pasta|    30.0|
|   1.0|     0|        Pork Sliders|     1.0|
|   1.0|     0| Vietnamese Sandwich|    23.0|
|   1.0|     0|        Chicken Wrap|    15.0|
|   1.0|     0|       Cowboy Burger|     0.0|
|   1.0|     0|       Cowboy Burge

In [38]:
indexed.printSchema()

root
 |-- rating: double (nullable = true)
 |-- userId: integer (nullable = true)
 |-- meal_name: string (nullable = true)
 |-- mealskey: double (nullable = false)



In [40]:
df = indexed.withColumn("mealskey", col("mealskey").cast("int"))

In [41]:
df.show()

+------+------+--------------------+--------+
|rating|userId|           meal_name|mealskey|
+------+------+--------------------+--------+
|   3.0|     0|       Chicken Curry|       5|
|   1.0|     0|Spicy Chicken Nug...|      27|
|   2.0|     0|           Hamburger|      25|
|   4.0|     0|       Taco Surprise|      14|
|   1.0|     0|            Meatloaf|      29|
|   2.0|     0|        Ceaser Salad|       7|
|   1.0|     0|            BBQ Ribs|       4|
|   1.0|     0|         Sushi Plate|      28|
|   1.0|     0|Cheesesteak Sandw...|       8|
|   1.0|     0|             Lasagna|       9|
|   1.0|     0|      Orange Chicken|      16|
|   3.0|     0|    Spicy Beef Plate|      22|
|   1.0|     0|Salmon with Mashe...|      19|
|   1.0|     0| Penne Tomatoe Pasta|      30|
|   1.0|     0|        Pork Sliders|       1|
|   1.0|     0| Vietnamese Sandwich|      23|
|   1.0|     0|        Chicken Wrap|      15|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burge

In [42]:
(training, test) = df.randomSplit([0.8, 0.2])

In [43]:
training.show()

+------+------+--------------------+--------+
|rating|userId|           meal_name|mealskey|
+------+------+--------------------+--------+
|   1.0|     0|            BBQ Ribs|       4|
|   1.0|     0|Cheesesteak Sandw...|       8|
|   1.0|     0|        Chicken Wrap|      15|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|       Cowboy Burger|       0|
|   1.0|     0|            Meatloa

In [44]:
training = training.dropDuplicates(["rating", "userId", "mealskey", "meal_name"])
test = test.dropDuplicates(["rating", "userId", "mealskey", "meal_name"])

In [45]:
training.show()

+------+------+--------------------+--------+
|rating|userId|           meal_name|mealskey|
+------+------+--------------------+--------+
|   1.0|    15| Roast Beef Sandwich|      21|
|   3.0|    26|     Pepperoni Pizza|      17|
|   1.0|    11|             Lasagna|       9|
|   1.0|    22|    Spicy Beef Plate|      22|
|   2.0|    11|       Cowboy Burger|       0|
|   2.0|    22|           Hamburger|      25|
|   4.0|    28|       Cowboy Burger|       0|
|   1.0|    15|       Chicken Curry|       5|
|   4.0|     6|       Cowboy Burger|       0|
|   1.0|    20|    Spicy Beef Plate|      22|
|   1.0|    23|       Chicken Curry|       5|
|   2.0|     6|       Cowboy Burger|       0|
|   1.0|    11| Roast Beef Sandwich|      21|
|   1.0|    28|            BBQ Ribs|       4|
|   1.0|    19|        Chicken Wrap|      15|
|   1.0|    12|    Kung Pao Chicken|       6|
|   2.0|    21| Vietnamese Sandwich|      23|
|   4.0|    26|Pretzels and Chee...|      10|
|   5.0|    11|       Cowboy Burge

In [46]:
test.show()

+------+------+--------------------+--------+
|rating|userId|           meal_name|mealskey|
+------+------+--------------------+--------+
|   2.0|    11|       Cowboy Burger|       0|
|   4.0|    28|       Cowboy Burger|       0|
|   5.0|    15|       Cowboy Burger|       0|
|   4.0|    23| Vietnamese Sandwich|      23|
|   1.0|    11|        Cheese Pizza|      11|
|   2.0|     6|       Cowboy Burger|       0|
|   1.0|     0|      Orange Chicken|      16|
|   1.0|    27| Roast Beef Sandwich|      21|
|   2.0|    20|       Chicken Curry|       5|
|   1.0|     6|           Hamburger|      25|
|   1.0|     8|            BBQ Ribs|       4|
|   1.0|    10|               Chili|      20|
|   1.0|    10|       Cowboy Burger|       0|
|   1.0|    21|        Cheese Pizza|      11|
|   1.0|    26|Salmon with Mashe...|      19|
|   1.0|     3|    Fried Rice Plate|      31|
|   3.0|    13|       Cowboy Burger|       0|
|   4.0|    25|       Cowboy Burger|       0|
|   1.0|     3|            BBQ Rib

In [47]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="mealskey", ratingCol="rating")
model = als.fit(training)

In [48]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)

predictions.show()

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

+------+------+--------------------+--------+----------+
|rating|userId|           meal_name|mealskey|prediction|
+------+------+--------------------+--------+----------+
|   2.0|    11|       Cowboy Burger|       0| 3.0069284|
|   4.0|    28|       Cowboy Burger|       0| 3.0166168|
|   5.0|    15|       Cowboy Burger|       0| 2.0681782|
|   4.0|    23| Vietnamese Sandwich|      23| 2.4728475|
|   1.0|    11|        Cheese Pizza|      11|0.37630486|
|   2.0|     6|       Cowboy Burger|       0| 2.4545834|
|   1.0|     0|      Orange Chicken|      16| 4.4007587|
|   1.0|    27| Roast Beef Sandwich|      21|  1.920905|
|   2.0|    20|       Chicken Curry|       5| 1.9496555|
|   1.0|     6|           Hamburger|      25| 1.7315533|
|   1.0|     8|            BBQ Ribs|       4| 1.3841314|
|   1.0|    10|               Chili|      20| 1.8290526|
|   1.0|    10|       Cowboy Burger|       0|  2.532238|
|   1.0|    21|        Cheese Pizza|      11|0.18825442|
|   1.0|    26|Salmon with Mash

In [49]:
predictions_2 = predictions.withColumn("prediction", when(col("prediction").cast("int").cast("float") >= 0, col("prediction").cast("int").cast("float")).otherwise(0))
predictions_2.show()
rmse = evaluator.evaluate(predictions_2)
print("Root-mean-square error = " + str(rmse))

+------+------+--------------------+--------+----------+
|rating|userId|           meal_name|mealskey|prediction|
+------+------+--------------------+--------+----------+
|   2.0|    11|       Cowboy Burger|       0|       3.0|
|   4.0|    28|       Cowboy Burger|       0|       3.0|
|   5.0|    15|       Cowboy Burger|       0|       2.0|
|   4.0|    23| Vietnamese Sandwich|      23|       2.0|
|   1.0|    11|        Cheese Pizza|      11|       0.0|
|   2.0|     6|       Cowboy Burger|       0|       2.0|
|   1.0|     0|      Orange Chicken|      16|       4.0|
|   1.0|    27| Roast Beef Sandwich|      21|       1.0|
|   2.0|    20|       Chicken Curry|       5|       1.0|
|   1.0|     6|           Hamburger|      25|       1.0|
|   1.0|     8|            BBQ Ribs|       4|       1.0|
|   1.0|    10|               Chili|      20|       1.0|
|   1.0|    10|       Cowboy Burger|       0|       2.0|
|   1.0|    21|        Cheese Pizza|      11|       0.0|
|   1.0|    26|Salmon with Mash

In [50]:
output(single_user)

['Cowboy Burger', 'Taco Surprise', 'Pepperoni Pizza']

# <span style="color:green">PROJECT DONE 