# ALS Recommender

Referenced ALS recommender created by Kevin Liao:
https://towardsdatascience.com/prototyping-a-recommender-system-step-by-step-part-2-alternating-least-square-als-matrix-4a76c58714a1

In [4]:
import re
import csv
import time
import gc
import math
import numpy as np

import findspark
# Find Spark Locally
location = findspark.find()
findspark.init(location, edit_rc=True)

import pyspark as ps    # for the pyspark suite
from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import IntegerType, StringType, FloatType, DateType, TimestampType
import pyspark.sql.functions as F

spark = ps.sql.SparkSession.builder \
            .master("local[4]") \
            .appName("anime recommender") \
            .getOrCreate()

sc = spark.sparkContext

from pyspark.ml.recommendation import ALS
from pyspark.mllib.evaluation import RegressionMetrics
from pyspark.sql import SparkSession, Row
from pyspark.sql.functions import col, lower
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

### Load data and initializing ALS model
---

In [5]:
path_anime = '../data/anime.csv'
path_ratings = '../data/rating.csv'

In [7]:
# anime_raw = sc.textFile(anime_filename)
# ratings_raw = sc.textFile(ratings_filename)
anime_raw = spark.read.load(path_anime,format='csv',header=True,inferSchema=True)
ratings_raw = spark.read.load(path_ratings,format='csv',header=True,inferSchema=True)

In [8]:
anime_df = anime_raw.select(['anime_id','name'])

In [9]:
ratings_df = ratings_raw.select(['user_id','anime_id','rating'])

In [10]:
# Define ALS model
model = ALS(
    userCol = 'user_id',
    itemCol = 'anime_id',
    ratingCol = 'rating',
    coldStartStrategy = 'drop'
)

In [11]:
# Split data for cross validation
train,val,test = ratings_df.randomSplit((0.6,0.2,0.2))

### Tuning Model

In [9]:
maxIter = 10
ranks = np.arange(1, 11, 1).tolist()
regParams = np.arange(.1,1.1,0.1).tolist()

In [17]:
best_model = tune_ALS(model,train,val,10,regParams,ranks)

10 latent factors and regularization = 0.1: validation RMSE is 2.0703847093073287
10 latent factors and regularization = 0.2: validation RMSE is 2.0515188723747664
10 latent factors and regularization = 0.30000000000000004: validation RMSE is 2.0772299646734296
10 latent factors and regularization = 0.4: validation RMSE is 2.1279439547687695
10 latent factors and regularization = 0.5: validation RMSE is 2.1835478787449296
10 latent factors and regularization = 0.6: validation RMSE is 2.236690466046918
10 latent factors and regularization = 0.7000000000000001: validation RMSE is 2.285666407779922
10 latent factors and regularization = 0.8: validation RMSE is 2.328942418038667
10 latent factors and regularization = 0.9: validation RMSE is 2.3688206006118553
10 latent factors and regularization = 1.0: validation RMSE is 2.409382261935226

The best model has 10 latent factors and regularization = 0.2


In [18]:
predictions = best_model.transform(test)

In [19]:
predictions = predictions.na.drop()
evaluator = RegressionEvaluator(metricName="rmse",
                                    labelCol="rating",
                                    predictionCol="prediction")

rmse = evaluator.evaluate(predictions)
print('The out-of-sample RMSE of the best tuned model is:', rmse)

The out-of-sample RMSE of the best tuned model is: 2.055884801114346


Set model parameters:

In [12]:
max_iter = 10
reg = 0.05
rank = 10

In [13]:
model = ALS(userCol='user_id', itemCol='anime_id', rank=rank, maxIter=max_iter, regParam=reg)

### Prepare Inference Data

In [17]:
ratings_df.agg({"user_id":"max"}).collect()[0][0]+1

73517

In [18]:
fav_anime = "Naruto"

In [20]:
anime_df

DataFrame[anime_id: int, name: string]

In [38]:
# Match anime title w/ spark df query
matches = anime_df \
            .filter(
            lower(
                col('name')
            ).like('%{}%'.format(fav_anime.lower()))
        ) \
            .select('anime_id', 'name')

In [39]:
matches.collect()

[Row(anime_id=28755, name='Boruto: Naruto the Movie'),
 Row(anime_id=1735, name='Naruto: Shippuuden'),
 Row(anime_id=16870, name='The Last: Naruto the Movie'),
 Row(anime_id=13667, name='Naruto: Shippuuden Movie 6 - Road to Ninja'),
 Row(anime_id=20, name='Naruto'),
 Row(anime_id=32365, name='Boruto: Naruto the Movie - Naruto ga Hokage ni Natta Hi'),
 Row(anime_id=10589, name='Naruto: Shippuuden Movie 5 - Blood Prison'),
 Row(anime_id=10075, name='Naruto x UT'),
 Row(anime_id=8246, name='Naruto: Shippuuden Movie 4 - The Lost Tower'),
 Row(anime_id=6325, name='Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono'),
 Row(anime_id=2472, name='Naruto: Shippuuden Movie 1'),
 Row(anime_id=4437, name='Naruto: Shippuuden Movie 2 - Kizuna'),
 Row(anime_id=4134, name='Naruto Shippuuden: Shippuu! &quot;Konoha Gakuen&quot; Den'),
 Row(anime_id=10686, name='Naruto: Honoo no Chuunin Shiken! Naruto vs. Konohamaru!!'),
 Row(anime_id=12979, name='Naruto SD: Rock Lee no Seishun Full-Power Ninden'),
 Ro

In [40]:
matches[0]

Column<b'anime_id'>

In [42]:
# Grab all anime ids in wildcard matches query
ids = matches.rdd.map(lambda r: r[0]).collect()

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 25, 192.168.1.17, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 477, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.7 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:503)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:638)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2139)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2164)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
	at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:168)
	at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 477, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.7 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:503)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:638)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2139)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	... 1 more


In [43]:
matched_titles = matches.rdd.map(lambda r: r[1]).collect()

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 14.0 failed 1 times, most recent failure: Lost task 0.0 in stage 14.0 (TID 26, 192.168.1.17, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 477, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.7 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:503)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:638)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2139)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2164)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
	at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:168)
	at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 477, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.7 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:503)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:638)
	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2139)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	... 1 more


---

### Make Inference

---

In [None]:
als=ALS(userCol='user_id',itemCol='anime_id',rank=1,maxIter=10,regParam=.1)

In [None]:
model = als.fit(train)

In [None]:
model

In [None]:
predictions = model.transform(val)

In [None]:
predictions

In [None]:
predictions.take(2)

In [None]:
# Check if there are NANs in predictions (debugging why my rmse is returning NaN)

In [None]:
predictions.select([F.count(F.when(F.isnan(c), c)).alias(c) for c in predictions.columns]).show()

In [None]:
predictions = predictions.na.drop()

In [None]:
predictions.select([F.count(F.when(F.isnan(c), c)).alias(c) for c in predictions.columns]).show()

In [None]:
evaluator = RegressionEvaluator(metricName="rmse",labelCol="rating",predictionCol="prediction")

In [None]:
evaluator.evaluate(predictions)

In [None]:
def test_run_tuning(model,training_data,validation_data,maxIter,regParam,rank):
    min_error = math.inf
    best_rank = -1
    best_regularization = 0
    best_model = None
    
    als=ALS(userCol='user_id',itemCol='anime_id',rank=rank,maxIter=maxIter,regParam=regParam)
    model = als.fit(training_data)
    predictions = model.transform(validation_data)
    evaluator = RegressionEvaluator(metricName="rmse",labelCol="rating",predictionCol="prediction")
    rmse = evaluator.evaluate(predictions)
    print(rmse)
    if rmse < min_error:
        min_error = rmse
        best_rank = rank
        best_regularization = reg
        best_model = model
        
    return best_model

#### Methods

In [10]:
def tune_ALS(model,training_data, validation_data, maxIter, regParams, ranks):
    
    min_error = math.inf
    best_rank = -1
    best_regularization = 0
    best_model = None
    
    for rank in ranks:
        for reg in regParams:
            # get ALS model
            als = ALS(userCol='user_id',itemCol='anime_id',rank=rank,maxIter=maxIter,regParam=reg)
            # train ALS model
            model = als.fit(training_data)
            # evaluate the model by computing the RMSE on the validation data
            predictions = model.transform(validation_data)
            # drop na in predictions
            predictions = predictions.na.drop()
            
            evaluator = RegressionEvaluator(metricName="rmse",
                                            labelCol="rating",
                                            predictionCol="prediction")
            rmse = evaluator.evaluate(predictions)
            print('{} latent factors and regularization = {}: '
                  'validation RMSE is {}'.format(rank, reg, rmse))
            if rmse < min_error:
                min_error = rmse
                best_rank = rank
                best_regularization = reg
                best_model = model
    print('\nThe best model has {} latent factors and '
          'regularization = {}'.format(best_rank, best_regularization))
    return best_model    

In [11]:
def tune_model(maxIter,regParams,ranks,split_ratio=(0.6,0.2,0.2)):
    train, val, test = ratings_df.randomSplit(split_ratio)
    # tune model to get best model for predictions
    tuned_model = tune_ALS(model, train, val, maxIter, regParams, ranks)
    
    # test model
    predictions = tuned_model.transform(test)
    evaluator = RegressionEvaluator(metricName="rmse",
                                    labelCol="rating",
                                    predictionCol="prediction")
    rmse = evaluator.evaluate(predictions)
    print('The out-of-sample RMSE of the best tuned model is:', rmse)
    # clean up
    del train, val, test, predictions, evaluator
    gc.collect()

In [12]:
# Regex matching closest name to animes
def regex_matching(fav_anime):
    print('You have input anime:', fav_anime)
    matches_df = anime_df \
        .filter(
            lower(
                col('name')
            ).like('%{}%'.format(fav_anime.lower()))
        ) \
        .select('anime_id', 'name')
    if not len(matches_df.take(1)):
        print('Oops! No match is found')
    else:
        anime_ids = matches_df.rdd.map(lambda r: r[0]).collect()
        names = matches_df.rdd.map(lambda r: r[1]).collect()
        print('Found possible matches in our database: '
              '{0}\n'.format([x for x in names]))
        return anime_ids

In [28]:
# Append a user's anime ratings to ratings_df
def append_ratings(user_id,anime_ids):
    ratings_df = ratings_raw.select(['user_id','anime_id','rating'])
    ratings_df = ratings_df.withColumn("rating", ratings_df["rating"].cast(FloatType()).alias("rating"))
    # create new user rdd
    user_rdd = sc.parallelize(
        [(user_id, anime_id, 5.0) for anime_id in anime_ids])
    # transform to user rows
    user_rows = user_rdd.map(
        lambda x: Row(
            user_id=int(x[0]),
            anime_id=int(x[1]),
            rating=float(x[2])
        )
    )
    # transform rows to spark DF
    user_df = spark.createDataFrame(user_rows) \
        .select(ratings_df.columns)
    # append to ratingsDF
    ratings_df = ratings_df.union(user_df)

In [14]:
def create_inference_data(user_id, anime_ids):
    """
    input:
        user_id: int
        anime_ids: list
        
    return:
        inference_df: dataframe
    """
    
    other_anime_ids = anime_df \
        .filter(~col('anime_id').isin(anime_ids)) \
        .select(['anime_id']) \
        .rdd.map(lambda r: r[0]) \
        .collect()
    
    # create inference rdd
    inference_rdd = sc.parallelize(
        [(user_id, anime_id) for anime_id in other_anime_ids]
    ).map(
        lambda x: Row(
            user_id=int(x[0]),
            anime_id=int(x[1]),
        )
    )
    # transform to inference DF
    inference_df = spark.createDataFrame(inference_rdd) \
        .select(['user_id', 'anime_id'])
    
    return inference_df

In [15]:
def make_inference(model,fav_anime,n_recommendations):
    # create a userId
    user_id = ratings_df.agg({"user_id": "max"}).collect()[0][0] + 1
    # get movieIds of favorite movies
    anime_ids = regex_matching(fav_anime)
    # append new user with his/her ratings into data
    append_ratings(user_id, anime_ids)
    # matrix factorization
    model = model.fit(ratings_df)
    # get data for inferencing
    inference_df = create_inference_data(user_id, anime_ids)
    # make inference
    return model.transform(inference_df) \
        .select(['anime_id', 'prediction']) \
        .orderBy('prediction', ascending=False) \
        .rdd.map(lambda r: (r[0], r[1])) \
        .take(n_recommendations)

In [31]:
best_model

ALSModel: uid=ALS_2c53714f7467, rank=10

In [29]:
def make_recommendations(fav_anime,n_recommendations):
    print('Recommendation system start to make inference ...')
    t0 = time.time()
    raw_recommends = \
        make_inference(best_model, fav_anime, n_recommendations)
    anime_ids = [r[0] for r in raw_recommends]
    scores = [r[1] for r in raw_recommends]
    print('It took my system {:.2f}s to make inference \n\
          '.format(time.time() - t0))
    # get movie titles
    anime_titles = anime_df \
        .filter(col('anime_id').isin(anime_ids)) \
        .select('name') \
        .rdd.map(lambda r: r[0]) \
        .collect()
    # print recommendations
    print('Recommendations for {}:'.format(fav_anime))
    for i in range(len(anime_titles)):
        print('{0}: {1}, with rating '
              'of {2}'.format(i+1, anime_titles[i], scores[i]))

In [30]:
make_recommendations('naruto',10)

Recommendation system start to make inference ...
You have input anime: naruto
Found possible matches in our database: ['Boruto: Naruto the Movie', 'Naruto: Shippuuden', 'The Last: Naruto the Movie', 'Naruto: Shippuuden Movie 6 - Road to Ninja', 'Naruto', 'Boruto: Naruto the Movie - Naruto ga Hokage ni Natta Hi', 'Naruto: Shippuuden Movie 5 - Blood Prison', 'Naruto x UT', 'Naruto: Shippuuden Movie 4 - The Lost Tower', 'Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono', 'Naruto: Shippuuden Movie 1', 'Naruto: Shippuuden Movie 2 - Kizuna', 'Naruto Shippuuden: Shippuu! &quot;Konoha Gakuen&quot; Den', 'Naruto: Honoo no Chuunin Shiken! Naruto vs. Konohamaru!!', 'Naruto SD: Rock Lee no Seishun Full-Power Ninden', 'Naruto Shippuuden: Sunny Side Battle', 'Naruto Movie 1: Dai Katsugeki!! Yuki Hime Shinobu Houjou Dattebayo!', 'Naruto Soyokazeden Movie: Naruto to Mashin to Mitsu no Onegai Dattebayo!!', 'Naruto Movie 2: Dai Gekitotsu! Maboroshi no Chiteiiseki Dattebayo!', 'Naruto: Dai Katsugek

AttributeError: 'ALSModel' object has no attribute 'fit'

---

In [None]:
new_ratings = ratings_raw.filter(lambda line: line != header) \
            .map(lambda line: line.split(",")) \
            .map(lambda tokens: (int(tokens[0]), int(tokens[1]), float(tokens[2])))

In [None]:
anime_id, name, genre, type, episodes, rating, members = [ '{}'.format(x) for x in list(csv.reader([input_string], delimiter=',', quotechar='"'))[0] ]

In [None]:
header = ratings_RDD.take(1)[0]
        return ratings_RDD \
            .filter(lambda line: line != header) \
            .map(lambda line: line.split(",")) \
            .map(lambda tokens: (int(tokens[0]), int(tokens[1]), float(tokens[2])))

In [None]:
def clean_anime_data(input_string):
    anime_id, name, genre, type, episodes, rating, members = [ '{}'.format(x) for x in list(csv.reader([input_string], delimiter=',', quotechar='"'))[0] ]
    anime_id = int(anime_id)
    episodes = int(episodes)
    rating = float(rating)
    members = int(members)
    return [(anime_id, name, type,rating,members, token) for token in genre.split(',')]

In [None]:
anime_clean = anime_raw.flatMap(clean_anime_data)

In [None]:
print(anime_clean.take(10))

In [None]:
anime_schema = StructType( [
    StructField('anime_id',IntegerType(),True),
    StructField('name',StringType(),True),
    StructField('type',StringType(),True),
    StructField('rating',FloatType(),True),
    StructField('members',IntegerType(),True),
    StructField('genre',StringType(),True) ] )

anime = spark.createDataFrame(anime_clean, anime_schema)

In [None]:
anime

In [None]:
# pivot movie genres
anime = anime.groupBy("anime_id", "name", "type","rating","members")\
               .pivot("genre")\
               .agg(F.count(F.col('genre')))\
               .na.fill(0)

anime.show(5)
anime.printSchema()

In [20]:
class AlsRecommender:
    """
    This a collaborative filtering recommender with Alternating Least Square
    Matrix Factorization, which is implemented by Spark
    """
    def __init__(self, spark_session, path_anime, path_ratings):
        self.spark = spark_session
        self.sc = spark_session.sparkContext
        self.animeDF = self._load_file(path_anime) \
            .select(['anime_id', 'name'])
        self.ratingsDF = self._load_file(path_ratings) \
            .select(['user_id', 'anime_id', 'rating'])
        self.model = ALS(
            userCol='user_id',
            itemCol='anime_id',
            ratingCol='rating',
            coldStartStrategy="drop")

    def _load_file(self, filepath):
        """
        load csv file into memory as spark DF
        """
        
        return self.spark.read.load(filepath, format='csv',
                                    header=True, inferSchema=True)

    def tune_model(self, maxIter, regParams, ranks, split_ratio=(0.6,0.2,0.2)):
        """
        Hyperparameter tuning for ALS model
        Parameters
        ----------
        maxIter: int, max number of learning iterations
        regParams: list of float, regularization parameter
        ranks: list of float, number of latent factors
        split_ratio: tuple, (train, validation, test)
        """
        # split data
        train, val, test = self.ratingsDF.randomSplit(split_ratio)
        # holdout tuning
        self.model = tune_ALS(self.model, train, val,
                              maxIter, regParams, ranks)
        # test model
        predictions = self.model.transform(test)
        # drop na predictions
        predictions = predictions.na.drop()
        evaluator = RegressionEvaluator(metricName="rmse",
                                        labelCol="rating",
                                        predictionCol="prediction")
        rmse = evaluator.evaluate(predictions)
        print('The out-of-sample RMSE of the best tuned model is:', rmse)
        # clean up
        del train, val, test, predictions, evaluator
        gc.collect()

    def set_model_params(self, maxIter, regParam, rank):
        """
        set model params for pyspark.ml.recommendation.ALS
        Parameters
        ----------
        maxIter: int, max number of learning iterations
        regParams: float, regularization parameter
        ranks: float, number of latent factors
        """
#         self.model = self.model \
#             .setMaxIter(maxIter) \
#             .setRank(rank) \
#             .setRegParam(regParam)

        self.model = ALS(userCol='user_id',itemCol='anime_id',rank=rank, maxIter=maxIter, regParam=regParam)

    def _regex_matching(self, fav_anime):
        """
        return the closest matches via SQL regex.
        If no match found, return None
        Parameters
        ----------
        fav_movie: str, name of user input movie
        Return
        ------
        list of indices of the matching movies
        """
        print('You have input movie:', fav_anime)
        matchesDF = self.animeDF \
            .filter(
                lower(
                    col('name')
                ).like('%{}%'.format(fav_anime.lower()))
            ) \
            .select('anime_id', 'name')
        if not len(matchesDF.take(1)):
            print('Oops! No match is found')
        else:
            anime_ids = matchesDF.rdd.map(lambda r: r[0]).collect()
            names = matchesDF.rdd.map(lambda r: r[1]).collect()
            print('Found possible matches in our database: '
                  '{0}\n'.format([x for x in names]))
            return anime_ids

    def _append_ratings(self, user_id, anime_ids):
        """
        append a user's movie ratings to ratingsDF
        Parameter
        ---------
        userId: int, userId of a user
        movieIds: int, movieIds of user's favorite movies
        """
        # create new user rdd
        user_rdd = self.sc.parallelize(
            [(user_id, anime_id, 5.0) for anime_id in anime_ids])
        # transform to user rows
        user_rows = user_rdd.map(
            lambda x: Row(
                user_id=int(x[0]),
                anime_id=int(x[1]),
                rating=float(x[2])
            )
        )
        # transform rows to spark DF
        userDF = self.spark.createDataFrame(user_rows) \
            .select(self.ratingsDF.columns)
        # append to ratingsDF
        self.ratingsDF = self.ratingsDF.union(userDF)

    def _create_inference_data(self, user_id, anime_ids):
        """
        create a user with all movies except ones were rated for inferencing
        """
        # filter movies
        other_anime_ids = self.animeDF \
            .filter(~col('anime_id').isin(anime_ids)) \
            .select(['anime_id']) \
            .rdd.map(lambda r: r[0]) \
            .collect()
        # create inference rdd
        inferenceRDD = self.sc.parallelize(
            [(user_id, anime_id) for anime_id in other_anime_ids]
        ).map(
            lambda x: Row(
                user_id=int(x[0]),
                anime_id=int(x[1]),
            )
        )
        # transform to inference DF
        inferenceDF = self.spark.createDataFrame(inferenceRDD) \
            .select(['user_id', 'anime_id'])
        return inferenceDF

    def _inference(self, model, fav_anime, n_recommendations):
        """
        return top n movie recommendations based on user's input movie
        Parameters
        ----------
        model: spark ALS model
        fav_movie: str, name of user input movie
        n_recommendations: int, top n recommendations
        Return
        ------
        list of top n similar movie recommendations
        """
        # create a userId
        user_id = self.ratingsDF.agg({"user_id": "max"}).collect()[0][0] + 1
        # get movieIds of favorite movies
        anime_ids = self._regex_matching(fav_anime)
        # append new user with his/her ratings into data
        self._append_ratings(user_id, anime_ids)
        # matrix factorization
        model = model.fit(self.ratingsDF)
        # get data for inferencing
        inferenceDF = self._create_inference_data(user_id, anime_ids)
        # make inference
        
        results = model.transform(inferenceDF) \
                        .select(['anime_id', 'prediction']).na.drop()

        return results \
            .orderBy('prediction', ascending=False) \
            .rdd.map(lambda r: (r[0], r[1])) \
            .take(n_recommendations)

    def make_recommendations(self, fav_anime, n_recommendations):
        """
        make top n movie recommendations
        Parameters
        ----------
        fav_movie: str, name of user input movie
        n_recommendations: int, top n recommendations
        """
        # make inference and get raw recommendations
        print('Recommendation system start to make inference ...')
        t0 = time.time()
        raw_recommends = \
            self._inference(self.model, fav_anime, n_recommendations)
        anime_ids = [r[0] for r in raw_recommends]
        scores = [r[1] for r in raw_recommends]

        print('It took my system {:.2f}s to make inference \n\
              '.format(time.time() - t0))
        # get movie titles
        anime_titles = self.animeDF \
            .filter(col('anime_id').isin(anime_ids)) \
            .select('name') \
            .rdd.map(lambda r: r[0]) \
            .collect()
        # print recommendations
        print('Recommendations for {}:'.format(fav_anime))
        for i in range(len(anime_titles)):
            print('{0}: {1}, with rating '
                  'of {2}'.format(i+1, anime_titles[i], scores[i]))


def tune_ALS(model, train_data, validation_data, maxIter, regParams, ranks):
    """
    grid search function to select the best model based on RMSE of
    validation data
    Parameters
    ----------
    model: spark ML model, ALS
    train_data: spark DF with columns ['userId', 'movieId', 'rating']
    validation_data: spark DF with columns ['userId', 'movieId', 'rating']
    maxIter: int, max number of learning iterations
    regParams: list of float, one dimension of hyper-param tuning grid
    ranks: list of float, one dimension of hyper-param tuning grid
    Return
    ------
    The best fitted ALS model with lowest RMSE score on validation data
    """
    # initial
    min_error = math.inf
    best_rank = -1
    best_regularization = 0
    best_model = None
    for rank in ranks:
        for reg in regParams:
            # get ALS model
            # als = model.setMaxIter(maxIter).setRank(rank).setRegParam(reg)
            als = ALS(userCol='user_id',itemCol='anime_id',rank=rank,maxIter=maxIter,regParam=reg)
            # train ALS model
            model = als.fit(train_data)
            # evaluate the model by computing the RMSE on the validation data
            predictions = model.transform(validation_data)
            # drop na predictions
            predictions = predictions.na.drop()
            evaluator = RegressionEvaluator(metricName="rmse",
                                            labelCol="rating",
                                            predictionCol="prediction")
            rmse = evaluator.evaluate(predictions)
            print('{} latent factors and regularization = {}: '
                  'validation RMSE is {}'.format(rank, reg, rmse))
            if rmse < min_error:
                min_error = rmse
                best_rank = rank
                best_regularization = reg
                best_model = model
    print('\nThe best model has {} latent factors and '
          'regularization = {}'.format(best_rank, best_regularization))
    return best_model

### Step by step run method

1. Initialize class (creates spark context, loads files into dataframes)
2. Tune model to get best ALS model performance
3. Input the optimized parameters we discovered into model fit again
3. Make recommendations w/ newly tuned model
4. See the results

In [21]:
maxIter = 10
ranks = np.arange(8, 11, 1).tolist()
# ranks = [10]
regParams = np.arange(.1,.3,0.1).tolist()

In [22]:
test_run_1 = AlsRecommender(spark,path_anime,path_ratings)

In [16]:
test_run_1.tune_model(maxIter,regParams,ranks)

8 latent factors and regularization = 0.1: validation RMSE is 2.067548507760002
8 latent factors and regularization = 0.2: validation RMSE is 2.0549814646971387
9 latent factors and regularization = 0.1: validation RMSE is 2.072738029893258
9 latent factors and regularization = 0.2: validation RMSE is 2.0556866235017797
10 latent factors and regularization = 0.1: validation RMSE is 2.068691916946381
10 latent factors and regularization = 0.2: validation RMSE is 2.0529514555570194

The best model has 10 latent factors and regularization = 0.2
The out-of-sample RMSE of the best tuned model is: 2.05305404019835


In [23]:
test_run_1.set_model_params(maxIter=10,regParam=0.2,rank=10)

In [25]:
test_run_1.make_recommendations('Naruto',10)

Recommendation system start to make inference ...
You have input movie: Naruto
Found possible matches in our database: ['Boruto: Naruto the Movie', 'Naruto: Shippuuden', 'The Last: Naruto the Movie', 'Naruto: Shippuuden Movie 6 - Road to Ninja', 'Naruto', 'Boruto: Naruto the Movie - Naruto ga Hokage ni Natta Hi', 'Naruto: Shippuuden Movie 5 - Blood Prison', 'Naruto x UT', 'Naruto: Shippuuden Movie 4 - The Lost Tower', 'Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono', 'Naruto: Shippuuden Movie 1', 'Naruto: Shippuuden Movie 2 - Kizuna', 'Naruto Shippuuden: Shippuu! &quot;Konoha Gakuen&quot; Den', 'Naruto: Honoo no Chuunin Shiken! Naruto vs. Konohamaru!!', 'Naruto SD: Rock Lee no Seishun Full-Power Ninden', 'Naruto Shippuuden: Sunny Side Battle', 'Naruto Movie 1: Dai Katsugeki!! Yuki Hime Shinobu Houjou Dattebayo!', 'Naruto Soyokazeden Movie: Naruto to Mashin to Mitsu no Onegai Dattebayo!!', 'Naruto Movie 2: Dai Gekitotsu! Maboroshi no Chiteiiseki Dattebayo!', 'Naruto: Dai Katsugek

In [45]:
test_run_1.make_recommendations('Hunter X',10)

Recommendation system start to make inference ...
You have input movie: Hunter X
Found possible matches in our database: ['Hunter x Hunter (2011)', 'Hunter x Hunter', 'Hunter x Hunter OVA', 'Hunter x Hunter: Greed Island Final', 'Hunter x Hunter: Greed Island', 'Irregular Hunter X: The Day of Sigma', 'Hunter x Hunter Movie: Phantom Rouge', 'Hunter x Hunter Pilot', 'Hunter x Hunter Movie: The Last Mission']

It took my system 34.49s to make inference 
              
Recommendations for Hunter X:
1: Shouwa Genroku Rakugo Shinjuu: Yotarou Hourou-hen, with rating of 6.871402740478516
2: Doraemon Movie 01: Nobita no Kyouryuu, with rating of 6.745322227478027
3: Mikan Enikki, with rating of 6.717744827270508
4: Seimei no Kagaku: Micro Patrol, with rating of 6.647804260253906
5: Princess Princess Specials, with rating of 6.625237941741943
6: Chogattai Majutsu Robot Ginguiser, with rating of 6.615678310394287
7: Nozomi Witches, with rating of 6.59459924697876
8: Thermae Romae x LOFT Collaborat