## Consulting Project
### Recommender Systems - Solutions

The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You've just taken up a new contract with a new online food delivery company. This company is trying to differentiate itself by recommending new meals to customers based off of other customers likings.

Can you build them a recommendation system?

Your final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals. For example:

Best of luck!


In [1]:
import findspark
findspark.init('/home/gerardo-rodriguez/spark-4.0.0-bin-hadoop3')

In [2]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('rec').getOrCreate()

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/08/21 10:34:47 WARN Utils: Your hostname, Lanz-Lenovo, resolves to a loopback address: 127.0.1.1; using 192.168.1.145 instead (on interface wlp2s0)
25/08/21 10:34:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/08/21 10:34:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/08/21 10:34:49 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


In [3]:
data = spark.read.csv('Meal_Info.csv', inferSchema=True, header=True)

In [4]:
data.show()

+-------+------+------+--------+--------------------+
|movieId|rating|userId|mealskew|           meal_name|
+-------+------+------+--------+--------------------+
|      2|   3.0|     0|     2.0|       Chicken Curry|
|      3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|      5|   2.0|     0|     5.0|           Hamburger|
|      9|   4.0|     0|     9.0|       Taco Surprise|
|     11|   1.0|     0|    11.0|            Meatloaf|
|     12|   2.0|     0|    12.0|        Ceaser Salad|
|     15|   1.0|     0|    15.0|            BBQ Ribs|
|     17|   1.0|     0|    17.0|         Sushi Plate|
|     19|   1.0|     0|    19.0|Cheesesteak Sandw...|
|     21|   1.0|     0|    21.0|             Lasagna|
|     23|   1.0|     0|    23.0|      Orange Chicken|
|     26|   3.0|     0|    26.0|    Spicy Beef Plate|
|     27|   1.0|     0|    27.0|Salmon with Mashe...|
|     28|   1.0|     0|    28.0| Penne Tomatoe Pasta|
|     29|   1.0|     0|    29.0|        Pork Sliders|
|     30|   1.0|     0|    3

In [5]:
data_not_null = data.na.drop()

In [6]:
data_not_null.describe().show()

25/08/21 10:34:56 WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
[Stage 3:>                                                          (0 + 1) / 1]

+-------+------------------+------------------+-----------------+------------------+-------------------+
|summary|           movieId|            rating|           userId|          mealskew|          meal_name|
+-------+------------------+------------------+-----------------+------------------+-------------------+
|  count|               486|               486|              486|               486|                486|
|   mean|15.502057613168724|1.7366255144032923|14.46707818930041|15.502057613168724|               NULL|
| stddev| 9.250633630277568|1.1808507031723887| 8.56063554474528| 9.250633630277568|               NULL|
|    min|                 0|               1.0|                0|               0.0|           BBQ Ribs|
|    max|                31|               5.0|               29|              31.0|Vietnamese Sandwich|
+-------+------------------+------------------+-----------------+------------------+-------------------+



                                                                                

In [7]:
data_meal = data_not_null.drop('movieId')

In [8]:
data_meal.printSchema()

root
 |-- rating: double (nullable = true)
 |-- userId: integer (nullable = true)
 |-- mealskew: double (nullable = true)
 |-- meal_name: string (nullable = true)



In [9]:
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator

In [10]:
train_data, test_data = data_meal.randomSplit([0.8,0.2])

In [11]:
als = ALS(maxIter=20, seed=1, userCol='userId', itemCol='mealskew', ratingCol='rating', regParam=0.2)

In [12]:
model = als.fit(train_data)

In [13]:
predictions = model.transform(test_data)
predictions.show()

+------+------+--------+--------------------+----------+
|rating|userId|mealskew|           meal_name|prediction|
+------+------+--------+--------------------+----------+
|   1.0|     1|       6|  Spicy Pork Sliders| 0.9793216|
|   1.0|     1|      19|Cheesesteak Sandw...| 0.7889347|
|   1.0|     6|      17|         Sushi Plate| 1.1809492|
|   1.0|     6|      18|     Pepperoni Pizza| 1.3014514|
|   1.0|     6|      22|   Pulled Pork Plate| 1.0156717|
|   1.0|     6|      26|    Spicy Beef Plate| 1.2119777|
|   1.0|     6|      30| Vietnamese Sandwich| 1.2920794|
|   1.0|     3|       1|             Burrito| 0.7265112|
|   1.0|     3|       9|       Taco Surprise|0.84264743|
|   1.0|     3|      26|    Spicy Beef Plate| 0.7425171|
|   1.0|     5|       4|Pretzels and Chee...| 1.6392684|
|   1.0|     4|      11|            Meatloaf|0.90501857|
|   1.0|     4|      23|      Orange Chicken| 1.0020273|
|   1.0|     4|      24|               Chili| 0.8317848|
|   1.0|     4|      31|       

In [14]:
eva = RegressionEvaluator(metricName='rmse', predictionCol='prediction', labelCol='rating')

print('RMSE')
eva.evaluate(predictions)

RMSE


1.0339823673121413

## Test for one user

In [16]:
user_data = test_data.filter(test_data['userId'] == 2).select(['userId', 'mealskew'])
prediction_user = model.transform(user_data)
prediction_user.orderBy('prediction', ascending=False).show()

+------+--------+----------+
|userId|mealskew|prediction|
+------+--------+----------+
|     2|      28| 3.0870025|
|     2|      22| 1.5720611|
|     2|      18| 1.5212991|
|     2|      19| 0.8480537|
|     2|      12|0.65938646|
|     2|      10| 0.2963645|
+------+--------+----------+

