<a href="https://colab.research.google.com/github/kurniacf/BigData_AssigmentWeek9_CollaborativeFiltering_BuildingRecomender/blob/master/Collaborative_Filtering_5025201073_Kurnia_Cahya_Febryanto.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BIG DATA ASSIGNMENT WEEK 09
# Filtering Collaborative

- Name: Kurnia Cahya Febryanto
- Student ID: 5025201073
- Class: Big Data A
- Lecturer: Abdul Munif, S.Kom., M.Sc.

## Install & Init

In [42]:
!pip install pyspark

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [29]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row, SparkSession

In [30]:
# SparkSession Initialization
spark = SparkSession.builder \
    .master("local") \
    .appName("MovieLens") \
    .getOrCreate()

## Read and Convert Data

In [31]:
# Read data from a text file and separate elements of each line
lines = spark.read.text("./sample_data/sample_movielens_ratings.txt").rdd
parts = lines.map(lambda row: row.value.split("::"))

In [32]:
# Convert data into a DataFrame with userId, movieId, rating, and timestamp columns
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]), rating=float(p[2]), timestamp=int(p[3])))
ratings = spark.createDataFrame(ratingsRDD)

# Split the data into training (80%) and testing (20%) sets
(training, test) = ratings.randomSplit([0.8, 0.2])

## Build Recomendation model using ALS 

In [33]:
# Build the recommendation model using ALS on the training data
# Note we set cold start strategy to 'drop' to ensure we don't get NaN evaluation metrics
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating", coldStartStrategy="drop")
model = als.fit(training)

In [34]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

Root-mean-square error = 1.67731508609672


## Generate Movie Recomendation

In [35]:
# Generate top 10 movie recommendations for each user
userRecs = model.recommendForAllUsers(10)
# Generate top 10 user recommendations for each movie
movieRecs = model.recommendForAllItems(10)

In [36]:
# Generate top 10 movie recommendations for a specified set of users
users = ratings.select(als.getUserCol()).distinct().limit(3)
userSubsetRecs = model.recommendForUserSubset(users, 10)
# Generate top 10 user recommendations for a specified set of movies
movies = ratings.select(als.getItemCol()).distinct().limit(3)
movieSubSetRecs = model.recommendForItemSubset(movies, 10)

In [37]:
als = ALS(maxIter=5, regParam=0.01, implicitPrefs=True,
          userCol="userId", itemCol="movieId", ratingCol="rating")

### Print Result and Show Ouput

In [38]:
# Print the top 10 movie recommendations for each user
print("Top 10 movie recommendations for each user:")
userRecs.show(truncate=False)

Top 10 movie recommendations for each user:
+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|userId|recommendations                                                                                                                                                          |
+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|20    |[{22, 4.5160255}, {94, 4.0931993}, {75, 3.964409}, {77, 3.870597}, {28, 3.4966948}, {47, 3.3421474}, {51, 3.1723955}, {90, 3.1660483}, {53, 3.0790522}, {52, 2.8231473}] |
|10    |[{87, 4.6298594}, {2, 3.9859674}, {27, 3.9471905}, {41, 3.7677686}, {48, 3.7607305}, {40, 3.734066}, {70, 3.6255636}, {32, 3.3321154}, {49, 3.2620907}, {42, 3.1131434}] |
|0     |[{92, 2.7471068}, {53, 2.6669827}, {52, 2.4254014}, {

In [39]:
# Print the top 10 user recommendations for each movie
print("Top 10 user recommendations for each movie:")
movieRecs.show(truncate=False)

Top 10 user recommendations for each movie:
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|movieId|recommendations                                                                                                                                                        |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|20     |[{17, 4.5995255}, {12, 4.319873}, {23, 4.1532445}, {16, 3.429555}, {19, 3.3065374}, {5, 3.1627107}, {9, 2.8975656}, {29, 2.5077653}, {22, 2.497104}, {15, 2.210217}]   |
|40     |[{6, 3.8084114}, {10, 3.734066}, {2, 3.6265147}, {4, 3.057611}, {11, 2.6715837}, {21, 2.2000585}, {8, 2.1389294}, {7, 1.9825605}, {3, 1.9740173}, {13, 1.8978541}]     |
|10     |[{23, 4.0155606}, {17, 3.6326773}, {9, 3.3217201}, {12, 3

In [40]:
# Print the top 10 movie recommendations for a specified set of users
print("Top 10 movie recommendations for a specified set of users:")
userSubsetRecs.show(truncate=False)

Top 10 movie recommendations for a specified set of users:
+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|userId|recommendations                                                                                                                                                         |
+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|26    |[{53, 7.2609215}, {18, 6.837916}, {51, 6.7590694}, {74, 5.980879}, {79, 5.9293995}, {90, 5.489329}, {30, 5.451606}, {80, 5.241762}, {38, 5.0884852}, {22, 5.0597115}]   |
|19    |[{90, 3.9217904}, {32, 3.906753}, {75, 3.8865442}, {98, 3.7077284}, {94, 3.5674012}, {54, 3.424247}, {20, 3.3065374}, {17, 3.1672063}, {30, 3.0925796}, {74, 2.8209424}]|
|29    |[{46, 4.950326}, {23, 4.0825114}, {32, 3.90

In [41]:
# Print the top 10 user recommendations for a specified set of movies
print("Top 10 user recommendations for a specified set of movies:")
movieSubSetRecs.show(truncate=False)

Top 10 user recommendations for a specified set of movies:
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|movieId|recommendations                                                                                                                                                       |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|65     |[{23, 4.7112474}, {10, 2.379873}, {15, 2.1989417}, {29, 2.1959696}, {11, 2.155614}, {9, 2.113654}, {5, 2.0912316}, {3, 1.976464}, {17, 1.8776916}, {25, 1.8074851}]   |
|26     |[{17, 2.9240875}, {15, 2.9125147}, {12, 2.409271}, {23, 2.2540622}, {25, 2.2260208}, {1, 2.1222558}, {16, 2.0363817}, {24, 2.0110219}, {9, 1.5273601}, {5, 1.4693817}]|
|29     |[{8, 5.0909505}, {14, 4.7538977}, {6, 4.3845115