# Movie Recommendation Engine - using PySpark

This project is about creating a movie recommendation engine from the MovieLens data using pySpark. You can download that data by clicking here: MovieLens.

## Import required modules

In [93]:
from pyspark.sql import SparkSession
from pyspark import SparkConf
from pyspark import SparkContext


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

import os

spark = SparkSession.builder.appName('week11') \
        .getOrCreate()

sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))

### 1. Movie Recommendation Engine

a. Prepare Data

Load the data from the ratings.csv and movies.csv files and combine them on movieId. The resultant data set should contain all of the user ratings and include movie titles. The schema should look something like this.

### Importing Data

In [5]:
# Create file paths including filenames
file_path = r'/home/ram/Documents/650/week11/movielens'


In [6]:
ratings = spark.read.csv(os.path.join(file_path,'ratings.csv'), header = True)


In [7]:
ratings.printSchema()

root
 |-- userId: string (nullable = true)
 |-- movieId: string (nullable = true)
 |-- rating: string (nullable = true)
 |-- timestamp: string (nullable = true)



In [8]:
ratings.show(5)

+------+-------+------+---------+
|userId|movieId|rating|timestamp|
+------+-------+------+---------+
|     1|      1|   4.0|964982703|
|     1|      3|   4.0|964981247|
|     1|      6|   4.0|964982224|
|     1|     47|   5.0|964983815|
|     1|     50|   5.0|964982931|
+------+-------+------+---------+
only showing top 5 rows



In [9]:
movies = spark.read.csv(os.path.join(file_path,'movies.csv'), header = True)


In [10]:
movies.show(5)

+-------+--------------------+--------------------+
|movieId|               title|              genres|
+-------+--------------------+--------------------+
|      1|    Toy Story (1995)|Adventure|Animati...|
|      2|      Jumanji (1995)|Adventure|Childre...|
|      3|Grumpier Old Men ...|      Comedy|Romance|
|      4|Waiting to Exhale...|Comedy|Drama|Romance|
|      5|Father of the Bri...|              Comedy|
+-------+--------------------+--------------------+
only showing top 5 rows



In [11]:
movies.printSchema()

root
 |-- movieId: string (nullable = true)
 |-- title: string (nullable = true)
 |-- genres: string (nullable = true)



### Joining Dataframes

In [14]:
joinexpression =  [ratings.movieId == movies.movieId]
joinType = "inner"

In [48]:
ratings_movies = ratings.join(movies,joinexpression,joinType)\
                  .drop(movies.movieId)
                #.drop(movies.col("movieId"))

ratings_movies.show(5)

+------+-------+------+---------+--------------------+--------------------+
|userId|movieId|rating|timestamp|               title|              genres|
+------+-------+------+---------+--------------------+--------------------+
|     1|      1|   4.0|964982703|    Toy Story (1995)|Adventure|Animati...|
|     1|      3|   4.0|964981247|Grumpier Old Men ...|      Comedy|Romance|
|     1|      6|   4.0|964982224|         Heat (1995)|Action|Crime|Thri...|
|     1|     47|   5.0|964983815|Seven (a.k.a. Se7...|    Mystery|Thriller|
|     1|     50|   5.0|964982931|Usual Suspects, T...|Crime|Mystery|Thr...|
+------+-------+------+---------+--------------------+--------------------+
only showing top 5 rows



In [49]:
# Total number of records
ratings_movies.count()

100836

In [50]:
# Summary of combined dataset
ratings_movies.describe().show()

+-------+------------------+----------------+------------------+--------------------+--------------------+------------------+
|summary|            userId|         movieId|            rating|           timestamp|               title|            genres|
+-------+------------------+----------------+------------------+--------------------+--------------------+------------------+
|  count|            100836|          100836|            100836|              100836|              100836|            100836|
|   mean|326.12756356856676|19435.2957177992| 3.501556983616962|1.2059460873684695E9|                null|              null|
| stddev| 182.6184914635004|35530.9871987003|1.0425292390606342|2.1626103599513078E8|                null|              null|
|    min|                 1|               1|               0.5|          1000129365|"11'09""01 - Sept...|(no genres listed)|
|    max|                99|           99992|               5.0|           999873731|À nous la liberté...|           W

In [51]:
ratings_movies.na.fill(0)

DataFrame[userId: string, movieId: string, rating: string, timestamp: string, title: string, genres: string]

In [52]:
ratings_movies.printSchema()

root
 |-- userId: string (nullable = true)
 |-- movieId: string (nullable = true)
 |-- rating: string (nullable = true)
 |-- timestamp: string (nullable = true)
 |-- title: string (nullable = true)
 |-- genres: string (nullable = true)



#### Changing data types of the columns

In [53]:
ratings_movies = ratings_movies\
                 .withColumn("userId", ratings_movies.userId.cast('int'))\
                 .withColumn("movieId", ratings_movies.movieId.cast('int'))\
                 .withColumn("rating", ratings_movies.rating.cast('float'))\
                 .withColumn("timestamp", ratings_movies.timestamp.cast('long'))

                


In [54]:
ratings_movies.printSchema()

root
 |-- userId: integer (nullable = true)
 |-- movieId: integer (nullable = true)
 |-- rating: float (nullable = true)
 |-- timestamp: long (nullable = true)
 |-- title: string (nullable = true)
 |-- genres: string (nullable = true)



### b. Train Recommender

Using the data you prepared in the last step, create a movie recommendation model using collaborative filtering. Spark’s collaborative filtering documentation provides a template for building and testing this model.

Before you train the recommendation model, split the data into a training dataset and a testing dataset using the randomSplit dataframe method. Use 80% of your data for training and 20% for testing.

After fitting your model using the training dataset, calculate the predictions on the test dataset and use the RegressionEvaluator to calculate the root-mean-square error of the model.

As a reminder, Spark’s collaborative filtering documentation will be helpful in completing this task.

In [18]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

#### Spitting dataset into train and test

In [55]:
train_dataset, test_dataset = ratings_movies.randomSplit([0.8, 0.2])

### Build the recommendation model using ALS on the training data

In [57]:
als = ALS(maxIter=5, regParam=0.01, userCol="userId", 
          itemCol="movieId", ratingCol="rating",
          coldStartStrategy="drop")



#### Fitting model

In [58]:
model = als.fit(train_dataset)

#### Evaluate the model by computing the RMSE on the test data

In [59]:
predictions = model.transform(test_dataset)

#### Print predicted ratings using Alternating Least Squqare model

In [61]:
predictions.show(10)

+------+-------+------+----------+--------------------+------+----------+
|userId|movieId|rating| timestamp|               title|genres|prediction|
+------+-------+------+----------+--------------------+------+----------+
|   133|    471|   4.0| 843491793|Hudsucker Proxy, ...|Comedy| 3.4064474|
|   436|    471|   3.0| 833530187|Hudsucker Proxy, ...|Comedy| 2.5123086|
|   599|    471|   2.5|1498518822|Hudsucker Proxy, ...|Comedy| 2.8591874|
|   603|    471|   4.0| 954482443|Hudsucker Proxy, ...|Comedy| 2.3766131|
|   462|    471|   2.5|1123890831|Hudsucker Proxy, ...|Comedy| 2.7896168|
|   610|    471|   4.0|1479544381|Hudsucker Proxy, ...|Comedy|  4.359951|
|   555|    471|   3.0| 978746933|Hudsucker Proxy, ...|Comedy|  5.016017|
|   312|    471|   4.0|1043175564|Hudsucker Proxy, ...|Comedy|  3.824053|
|   469|    471|   5.0| 965425364|Hudsucker Proxy, ...|Comedy| 4.1452274|
|   608|    471|   1.5|1117161794|Hudsucker Proxy, ...|Comedy|  4.528984|
+------+-------+------+----------+----

#### Evaluate the model by computing the RMSE on the test data

In [62]:
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")

rmse = evaluator.evaluate(predictions)

print("Root-mean-square error = " + str(rmse))

Root-mean-square error = 1.0836971923314576


### c. Generate top 10 movie recommendations

Using the recommendation model, generate the top ten recommendations for each user. Using the show method, print the recommendations for the user IDs, 127, 151, and 300. You should not truncate the results and so should call the show method like this recommendations_127.show(truncate=False).

In [64]:
user_127 = ratings_movies.where("userId == 127")
user_127.show()

+------+-------+------+----------+--------------------+--------------------+
|userId|movieId|rating| timestamp|               title|              genres|
+------+-------+------+----------+--------------------+--------------------+
|   127|    272|   3.0|1263527148|Madness of King G...|        Comedy|Drama|
|   127|    903|   3.0|1263527011|      Vertigo (1958)|Drama|Mystery|Rom...|
|   127|    910|   4.0|1263527087|Some Like It Hot ...|        Comedy|Crime|
|   127|    913|   5.0|1263527066|Maltese Falcon, T...|   Film-Noir|Mystery|
|   127|   1276|   4.0|1263527076|Cool Hand Luke (1...|               Drama|
|   127|   1302|   2.0|1263527044|Field of Dreams (...|Children|Drama|Fa...|
|   127|   1747|   4.0|1263527080|  Wag the Dog (1997)|              Comedy|
|   127|   1947|   4.0|1263527174|West Side Story (...|Drama|Musical|Rom...|
|   127|   1962|   3.0|1263527180|Driving Miss Dais...|               Drama|
|   127|   2100|   2.0|1263527048|       Splash (1984)|Comedy|Fantasy|Ro...|

#### Generate top 10 movie recommendations for each user

In [66]:
userRecs = model.recommendForAllUsers(10)
userRecs.show(5)

+------+--------------------+
|userId|     recommendations|
+------+--------------------+
|   471|[[6818, 6.621679]...|
|   463|[[3925, 7.2095037...|
|   496|[[4437, 5.94772],...|
|   148|[[945, 6.072871],...|
|   540|[[6818, 6.951173]...|
+------+--------------------+
only showing top 5 rows



In [68]:
userRecs_127 = userRecs.where("userId == 127")
userRecs_127.selectExpr("userId", "explode(recommendations)").show()

+------+-----------------+
|userId|              col|
+------+-----------------+
|   127| [3089, 8.902822]|
|   127|[26171, 8.252793]|
|   127|[5048, 7.4772496]|
|   127|[7700, 7.4134827]|
|   127|[1147, 7.3185825]|
|   127| [1212, 6.720233]|
|   127|  [1464, 6.70711]|
|   127|  [82, 6.6343875]|
|   127|[7982, 6.5762467]|
|   127|[3812, 6.5515623]|
+------+-----------------+



In [71]:
userRecs_127.show(truncate=False)

+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|userId|recommendations                                                                                                                                                                         |
+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|127   |[[3089, 8.902822], [26171, 8.252793], [5048, 7.4772496], [7700, 7.4134827], [1147, 7.3185825], [1212, 6.720233], [1464, 6.70711], [82, 6.6343875], [7982, 6.5762467], [3812, 6.5515623]]|
+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+



In [69]:
userRecs_151 = userRecs.where("userId == 151")
userRecs_151.selectExpr("userId", "explode(recommendations)").show()

+------+------------------+
|userId|               col|
+------+------------------+
|   151| [7318, 7.3670363]|
|   151| [4857, 6.9862633]|
|   151| [3200, 6.6272507]|
|   151|[80693, 6.5644174]|
|   151|  [3435, 6.425026]|
|   151|  [3672, 6.299298]|
|   151|   [8810, 6.28189]|
|   151| [3213, 6.2036424]|
|   151|  [3088, 6.194953]|
|   151|[80906, 6.1843643]|
+------+------------------+



In [70]:
userRecs_300 = userRecs.where("userId == 300")
userRecs_300.selectExpr("userId", "explode(recommendations)").show()

+------+----------------+
|userId|             col|
+------+----------------+
|   300|[1260, 9.612408]|
|   300|[1147, 9.508877]|
|   300|[1212, 9.044866]|
|   300|[2730, 8.667578]|
|   300|[7982, 8.422904]|
|   300|[3983, 8.128663]|
|   300|[6643, 8.122883]|
|   300|[3089, 8.023607]|
|   300|[4334, 7.944483]|
|   300|[8638, 7.728187]|
+------+----------------+



####  Another approach to get recommendations for all 3 users in one go

In [116]:
# Create list of users that need recommendations

from pyspark.sql.types import IntegerType
users = spark.createDataFrame([127,151,300], IntegerType())\
        .withColumnRenamed("value", "userId")

In [117]:
users.show()

+------+
|userId|
+------+
|   127|
|   151|
|   300|
+------+



In [118]:
# from fitted model get 10 recommendaitons for the 3 users
Recommendations_3users = model.recommendForUserSubset(users, 10)
Recommendations_3users.show(5)

+------+--------------------+
|userId|     recommendations|
+------+--------------------+
|   300|[[1260, 9.612408]...|
|   127|[[3089, 8.902822]...|
|   151|[[7318, 7.3670363...|
+------+--------------------+



In [119]:
# printing recommendations for the 3 users
Recommendations_3users.selectExpr("userId", "explode(recommendations)").show(30)

+------+------------------+
|userId|               col|
+------+------------------+
|   300|  [1260, 9.612408]|
|   300|  [1147, 9.508877]|
|   300|  [1212, 9.044866]|
|   300|  [2730, 8.667578]|
|   300|  [7982, 8.422904]|
|   300|  [3983, 8.128663]|
|   300|  [6643, 8.122883]|
|   300|  [3089, 8.023607]|
|   300|  [4334, 7.944483]|
|   300|  [8638, 7.728187]|
|   127|  [3089, 8.902822]|
|   127| [26171, 8.252793]|
|   127| [5048, 7.4772496]|
|   127| [7700, 7.4134827]|
|   127| [1147, 7.3185825]|
|   127|  [1212, 6.720233]|
|   127|   [1464, 6.70711]|
|   127|   [82, 6.6343875]|
|   127| [7982, 6.5762467]|
|   127| [3812, 6.5515623]|
|   151| [7318, 7.3670363]|
|   151| [4857, 6.9862633]|
|   151| [3200, 6.6272507]|
|   151|[80693, 6.5644174]|
|   151|  [3435, 6.425026]|
|   151|  [3672, 6.299298]|
|   151|   [8810, 6.28189]|
|   151| [3213, 6.2036424]|
|   151|  [3088, 6.194953]|
|   151|[80906, 6.1843643]|
+------+------------------+



In [120]:
Recommendations_3users.take(1)

[Row(userId=300, recommendations=[Row(movieId=1260, rating=9.612407684326172), Row(movieId=1147, rating=9.50887680053711), Row(movieId=1212, rating=9.044865608215332), Row(movieId=2730, rating=8.667577743530273), Row(movieId=7982, rating=8.422904014587402), Row(movieId=3983, rating=8.128663063049316), Row(movieId=6643, rating=8.122882843017578), Row(movieId=3089, rating=8.02360725402832), Row(movieId=4334, rating=7.944482803344727), Row(movieId=8638, rating=7.728187084197998)])]

### Another approach to print the recommendations with movie names

In [123]:
user_rated= ratings.select('movieId','userId').where("userId == 151").distinct()

user_rated.show()

+-------+------+
|movieId|userId|
+-------+------+
|   1047|   151|
|    634|   151|
|    828|   151|
|    562|   151|
|    747|   151|
|    785|   151|
|    631|   151|
|    102|   151|
|    852|   151|
|   1367|   151|
|    761|   151|
|      3|   151|
|    788|   151|
|    784|   151|
|     95|   151|
|     12|   151|
|    880|   151|
|    842|   151|
|    216|   151|
|    135|   151|
+-------+------+
only showing top 20 rows



In [125]:
from pyspark.sql.functions import lit

# create a combination of all distinct movies (includes rated and non-rated) with the user
users_movies = ratings.select('movieId').distinct()\
                        .withColumn('userId', lit(151))

users_movies.show()

+-------+------+
|movieId|userId|
+-------+------+
|    296|   151|
|   1090|   151|
| 115713|   151|
|   3210|   151|
|  88140|   151|
|    829|   151|
|   2088|   151|
|   2294|   151|
|   4821|   151|
|  48738|   151|
|   3959|   151|
|  89864|   151|
|   2136|   151|
|    691|   151|
|   3606|   151|
| 121007|   151|
|   6731|   151|
|  27317|   151|
|  26082|   151|
| 100553|   151|
+-------+------+
only showing top 20 rows



In [127]:
# prepare users and not rated movies list
users_notrated = users_movies.subtract(user_rated).dropna()
users_notrated.show()

+-------+------+
|movieId|userId|
+-------+------+
|    296|   151|
|   1090|   151|
| 115713|   151|
|   3210|   151|
|  88140|   151|
|    829|   151|
|   2088|   151|
|   2294|   151|
|   4821|   151|
|  48738|   151|
|   3959|   151|
|  89864|   151|
|   2136|   151|
|    691|   151|
|   3606|   151|
| 121007|   151|
|   6731|   151|
|  27317|   151|
|  26082|   151|
| 100553|   151|
+-------+------+
only showing top 20 rows



In [129]:
users_notrated.printSchema()

root
 |-- movieId: string (nullable = true)
 |-- userId: string (nullable = false)



In [130]:
users_notrated = users_notrated\
                 .withColumn("userId", users_notrated.userId.cast('int'))\
                 .withColumn("movieId", users_notrated.movieId.cast('int'))         

In [131]:
users_notrated.printSchema()

root
 |-- movieId: integer (nullable = true)
 |-- userId: integer (nullable = true)



In [133]:
# predict users ratings on these movies that were not already rated by the user, using the fitted ALS model

predictions_notrated = model.transform(users_notrated).orderBy('prediction', ascending = False)
predictions_notrated.show(10)

+-------+------+----------+
|movieId|userId|prediction|
+-------+------+----------+
|   7318|   151| 7.3670363|
|   4857|   151| 6.9862633|
|   3200|   151| 6.6272507|
|  80693|   151| 6.5644174|
|   3435|   151|  6.425026|
|   3672|   151|  6.299298|
|   8810|   151|   6.28189|
|   3213|   151| 6.2036424|
|   3088|   151|  6.194953|
|  80906|   151| 6.1843643|
+-------+------+----------+
only showing top 10 rows



In [134]:
# Joining with movies data set to get titles for movies

joinexpression =  [predictions_notrated.movieId == movies.movieId]
joinType = "inner"

recommendations = predictions_notrated.join(movies,joinexpression,joinType)\
                  .drop(movies.movieId)
                #.drop(movies.col("movieId"))

recommendations.select(predictions_notrated.userId, predictions_notrated.movieId,
                      movies.title, predictions_notrated.prediction, movies.genres)\
                .orderBy('prediction', ascending = False).show(10)


+------+-------+--------------------+----------+--------------------+
|userId|movieId|               title|prediction|              genres|
+------+-------+--------------------+----------+--------------------+
|   151|   7318|Passion of the Ch...| 7.3670363|               Drama|
|   151|   4857|Fiddler on the Ro...| 6.9862633|       Drama|Musical|
|   151|   3200|Last Detail, The ...| 6.6272507|        Comedy|Drama|
|   151|  80693|It's Kind of a Fu...| 6.5644174|        Comedy|Drama|
|   151|   3435|Double Indemnity ...|  6.425026|Crime|Drama|Film-...|
|   151|   3672|        Benji (1974)|  6.299298|  Adventure|Children|
|   151|   8810|AVP: Alien vs. Pr...|   6.28189|Action|Horror|Sci...|
|   151|   3213|Batman: Mask of t...| 6.2036424|  Animation|Children|
|   151|   3088|       Harvey (1950)|  6.194953|      Comedy|Fantasy|
|   151|  80906|   Inside Job (2010)| 6.1843643|         Documentary|
+------+-------+--------------------+----------+--------------------+
only showing top 10 

In [135]:
### creating a function to atomate the steps

def recommend(user, no_of_recom):
    """
    Takes 2 inputs
    user: user id
    no_of_recom: number of recommendations required for the given user
    returns: the data frame with movie recommendations for the given user
    """

    user_rated= ratings.select('movieId','userId').where(ratings.userId == user).distinct()


    # create a combination of all distinct movies (includes rated and non-rated) with the user
    users_movies = ratings.select('movieId').distinct()\
                            .withColumn('userId', lit(user))


    # prepare users and not rated movies list
    users_notrated = users_movies.subtract(user_rated).dropna()
    

    # converting columns into int
    users_notrated = users_notrated\
                     .withColumn("userId", users_notrated.userId.cast('int'))\
                     .withColumn("movieId", users_notrated.movieId.cast('int'))         


    # predict users ratings on these movies that were not already rated by the user, using the fitted ALS model

    predictions_notrated = model.transform(users_notrated).orderBy('prediction', ascending = False)
    #predictions_notrated.show(10)



    # Joining with movies data set to get titles for movies
    joinexpression =  [predictions_notrated.movieId == movies.movieId]
    joinType = "inner"

    recommendations = predictions_notrated.join(movies,joinexpression,joinType)\
                      .drop(movies.movieId)
                    #.drop(movies.col("movieId"))

    recommendations.select(predictions_notrated.userId, predictions_notrated.movieId,
                          movies.title, predictions_notrated.prediction, movies.genres)\
                    .orderBy('prediction', ascending = False).show(no_of_recom)

    

In [137]:
# print recommendations for user 127 by calling "recommend" function

recommend(127,10)

+------+-------+--------------------+----------+--------------------+
|userId|movieId|               title|prediction|              genres|
+------+-------+--------------------+----------+--------------------+
|   127|   3089|Bicycle Thieves (...|  8.902822|               Drama|
|   127|  26171|Play Time (a.k.a....|  8.252793|              Comedy|
|   127|   5048|    Snow Dogs (2002)| 7.4772496|Adventure|Childre...|
|   127|   7700|Wages of Fear, Th...| 7.4134827|Action|Adventure|...|
|   127|   1147|When We Were King...| 7.3185825|         Documentary|
|   127|   1212|Third Man, The (1...|  6.720233|Film-Noir|Mystery...|
|   127|   1464| Lost Highway (1997)|   6.70711|Crime|Drama|Fanta...|
|   127|     82|Antonia's Line (A...| 6.6343875|        Comedy|Drama|
|   127|   7982|Tale of Two Siste...| 6.5762467|Drama|Horror|Myst...|
|   127|   3812|Everything You Al...| 6.5515623|              Comedy|
+------+-------+--------------------+----------+--------------------+
only showing top 10 

In [138]:
# print recommendations for user 151 by calling "recommend" function

recommend(151,10)

+------+-------+--------------------+----------+--------------------+
|userId|movieId|               title|prediction|              genres|
+------+-------+--------------------+----------+--------------------+
|   151|   7318|Passion of the Ch...| 7.3670363|               Drama|
|   151|   4857|Fiddler on the Ro...| 6.9862633|       Drama|Musical|
|   151|   3200|Last Detail, The ...| 6.6272507|        Comedy|Drama|
|   151|  80693|It's Kind of a Fu...| 6.5644174|        Comedy|Drama|
|   151|   3435|Double Indemnity ...|  6.425026|Crime|Drama|Film-...|
|   151|   3672|        Benji (1974)|  6.299298|  Adventure|Children|
|   151|   8810|AVP: Alien vs. Pr...|   6.28189|Action|Horror|Sci...|
|   151|   3213|Batman: Mask of t...| 6.2036424|  Animation|Children|
|   151|   3088|       Harvey (1950)|  6.194953|      Comedy|Fantasy|
|   151|  80906|   Inside Job (2010)| 6.1843643|         Documentary|
+------+-------+--------------------+----------+--------------------+
only showing top 10 

In [139]:
# print recommendations for user 300 by calling "recommend" function

recommend(300,10)

+------+-------+--------------------+----------+--------------------+
|userId|movieId|               title|prediction|              genres|
+------+-------+--------------------+----------+--------------------+
|   300|   1260|            M (1931)|  9.612408|Crime|Film-Noir|T...|
|   300|   1147|When We Were King...|  9.508877|         Documentary|
|   300|   1212|Third Man, The (1...|  9.044866|Film-Noir|Mystery...|
|   300|   2730| Barry Lyndon (1975)|  8.667578|   Drama|Romance|War|
|   300|   7982|Tale of Two Siste...|  8.422904|Drama|Horror|Myst...|
|   300|   3983|You Can Count on ...|  8.128663|       Drama|Romance|
|   300|   6643|Tokyo Story (Tôky...|  8.122883|               Drama|
|   300|   3089|Bicycle Thieves (...|  8.023607|               Drama|
|   300|   4334|        Yi Yi (2000)|  7.944483|               Drama|
|   300|   8638|Before Sunset (2004)|  7.728187|       Drama|Romance|
+------+-------+--------------------+----------+--------------------+
only showing top 10 