# Setup

In [1]:
# base
import matplotlib.pyplot as plt
plt.style.use('dark_background')
import seaborn as sns
import pandas as pd
import numpy as np
import datetime

In [2]:
import sys
sys.path.insert(1, '/home/mauricio/code/mcr')
from mcr.util import glimpse, plot_value_counts, plot_value_counts_timeseries, missing_report, plot_missing, plot_unique, plot_duplicates, size
from mcr.spark import roem

In [3]:
from pyspark import SparkContext
# SparkContext.getOrCreate(conf: Optional[pyspark.conf.SparkConf] = None) -> 'SparkContext'
# Amount of memory to use per executor process
# SparkContext.setSystemProperty('spark.driver.cores', '1g') #only in cluster mode
SparkContext.setSystemProperty('spark.driver.memory', '16g')
SparkContext.setSystemProperty('spark.executor.memory', '3g')
SparkContext.setSystemProperty('spark.executor.cores', '16')
sc = SparkContext.getOrCreate()
# sc.setLogLevel('ERROR')  # ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
# https://intellipaat.com/community/18452/spark-gives-a-stackoverflowerror-when-training-using-als
sc.setCheckpointDir('checkpoint/')
sc.getConf().getAll()

23/05/10 19:41:41 WARN Utils: Your hostname, rig resolves to a loopback address: 127.0.1.1; using 192.168.0.102 instead (on interface enp6s0)
23/05/10 19:41:41 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/05/10 19:41:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


[('spark.app.id', 'local-1683758502026'),
 ('spark.executor.memory', '3g'),
 ('spark.app.submitTime', '1683758501528'),
 ('spark.executor.id', 'driver'),
 ('spark.driver.memory', '16g'),
 ('spark.driver.host', '192.168.0.102'),
 ('spark.app.name', 'pyspark-shell'),
 ('spark.driver.extraJavaOptions',
  '-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UN

In [4]:
from pyspark.sql import functions as F
from pyspark.sql.types import *
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local[*]').appName('spark_application').getOrCreate()
print(spark.version)

3.4.0


# Overview of binary, implicit ratings

## Binary ratings
Notice that all ratings are either a 1 or a 0. We must treat binary ratings like these as implicit ratings. If we treated them like explicit ratings and didn't include the 0's, the best performing model would simply predict 1 for everything, and deliver a deceivingly ideal RMSE of 0.

Also, as with our previous Million Songs model, we can't use the RMSE as a model evaluation metric.

Ultimately, when our machine learning process holds out random observations in the test set, we want our model to generate high predictions for those movies that users have actually watched. For this reason, we'll use our ROEM metric again. We'll apply the same concepts we've covered previously on this binary dataset.

The convenience of using the MovieLens dataset is that we can see how our binary model performs against the original, true preference ratings of the original MovieLens dataset. 

In [5]:
# Read data from CSV file
movie_ratings = spark.read.csv('ratings.csv',
                               sep=',',
                               header=True,
                               inferSchema=True).drop('timestamp')

In [6]:
movie_ratings.rdd.getNumPartitions(), movie_ratings.count()

(1, 100004)

In [7]:
users = movie_ratings.select("userId").distinct().orderBy('userId')

In [8]:
users.rdd.getNumPartitions(), users.count()

(1, 671)

In [9]:
movies = movie_ratings.select("movieId").distinct().orderBy('movieId')

In [10]:
movies.rdd.getNumPartitions(), movies.count()

(1, 9066)

In [11]:
cross_join = users.crossJoin(movies)

In [12]:
cross_join.rdd.getNumPartitions(), cross_join.count()

(1, 6083286)

In [13]:
binary_movie_ratings = cross_join\
    .join(movie_ratings, ["userId", "movieId"], "left").distinct().orderBy(['userId', 'movieId'])\
    .fillna(0, subset='rating')

In [14]:
binary_movie_ratings.rdd.getNumPartitions(), binary_movie_ratings.count()

                                                                                

(17, 6083286)

In [15]:
binary_movie_ratings = binary_movie_ratings\
    .withColumn('viewed', (F.col('rating')>0).cast('integer'))\
    .drop('rating')

In [16]:
binary_movie_ratings.rdd.getNumPartitions(), binary_movie_ratings.count()

                                                                                

(17, 6083286)

In [17]:
binary_movie_ratings = binary_movie_ratings.coalesce(1).persist()

In [18]:
binary_movie_ratings.rdd.getNumPartitions(), binary_movie_ratings.count()

                                                                                

(1, 6083286)

## Class imbalance

One word about binary models. While it's perfectly feasible to feed binary data like this into ALS and get meaningful recommendations, the data does have a sort of class imbalance where the vast majority of ratings are 0's with a small percentage of 1's.

**Since implicit ratings models use customized error metrics like ROEM and not RMSE, the class imbalance doesn't really pose a problem like it might in classification problems.**

ALS can still generate meaningful recommendations from this type of data but there are strategies that can be taken with the data to try and improve recommendations. 

In [19]:
print("Sparsity: ", 1 - (movie_ratings.count() / (users.count() * movies.count())))

Sparsity:  0.9835608583913366


## Item weighting
* Item Weighting: Movies with more user views = higher weight

## Item weighting and user weighting
* Item Weighting: Movies with more user views = higher weight
* User Weighting: Users that have seen more movies will have lower weights applied to unseen movies

## Exercises

### Binary model performance

In [20]:
from pyspark.ml.recommendation import ALS

In [21]:
# Empty list to be filled with models
model_list = []
# Complete each of the hyperparameter value lists
ranks = [10, 20, 30, 40]
maxIters = [10, 20, 30, 40]
regParams = [.05, .1, .15]
alphas = [20, 40, 60, 80]
# For loop will automatically create and store ALS models
for r in ranks:
    for mi in maxIters:
        for rp in regParams:
            for a in alphas:
                model_list.append(ALS(userCol= "userId", itemCol= "movieId",
                                      ratingCol= "viewed", rank = r, maxIter = mi, regParam = rp,
                                      alpha = a, coldStartStrategy="drop",nonnegative = True,
                                      implicitPrefs = True))
print(len(model_list))

192


In [22]:
# Split the data into training and test sets
(training, test) = binary_movie_ratings.randomSplit([0.8, 0.2], seed=1)
training = training.persist()
test = test.persist()
print(f'Partitions: Data={binary_movie_ratings.rdd.getNumPartitions()} Train={training.rdd.getNumPartitions()} Test={test.rdd.getNumPartitions()}')
data_count = binary_movie_ratings.count()
training_count = training.count()
test_count = test.count()
print(f'Counts: Data={data_count} Train={training_count} Test={test_count} Difference={data_count-training_count-test_count}')

Partitions: Data=1 Train=1 Test=1


[Stage 109:>                                                        (0 + 1) / 1]

Counts: Data=6083286 Train=4867485 Test=1215801 Difference=0


                                                                                

In [23]:
#Building 5 folds within the training set.
train1, train2, train3, train4, train5 = training.randomSplit([0.2, 0.2, 0.2, 0.2, 0.2], seed=1)
print(f'Partitions: T1={train1.rdd.getNumPartitions()} T2={train2.rdd.getNumPartitions()} T3={train3.rdd.getNumPartitions()} T4={train4.rdd.getNumPartitions()} T5={train5.rdd.getNumPartitions()}')
train1_count = train1.count()
train2_count = train2.count()
train3_count = train3.count()
train4_count = train4.count()
train5_count = train5.count()
print(f'Counts: T1={train1_count} T2={train2_count} T3={train3_count} T4={train4_count} T5={train5_count} Difference={training_count-train1_count-train2_count-train3_count-train4_count-train5_count}')

Partitions: T1=1 T2=1 T3=1 T4=1 T5=1
Counts: T1=974355 T2=972252 T3=975341 T4=973115 T5=972422 Difference=0


In [24]:
fold1 = train2.union(train3).union(train4).union(train5).persist()
fold2 = train3.union(train4).union(train5).union(train1).persist()
fold3 = train4.union(train5).union(train1).union(train2).persist()
fold4 = train5.union(train1).union(train2).union(train3).persist()
fold5 = train1.union(train2).union(train3).union(train4).persist()
foldlist = [(fold1, train1), (fold2, train2), (fold3, train3), (fold4, train4), (fold5, train5)]
print(f'Partitions: F1={fold1.rdd.getNumPartitions()} F2={fold2.rdd.getNumPartitions()} F3={fold3.rdd.getNumPartitions()} F4={fold4.rdd.getNumPartitions()} F5={fold5.rdd.getNumPartitions()}')
fold1_count = fold1.count()
fold2_count = fold2.count()
fold3_count = fold3.count()
fold4_count = fold4.count()
fold5_count = fold5.count()
print(f'Counts: F1={fold1_count} F2={fold2_count} F3={fold3_count} F4={fold4_count} F5={fold5_count}\n')
print('Differences:', end=' ')
print(f'F1={fold1_count-train2_count-train3_count-train4_count-train5_count}', end=' ')
print(f'F2={fold2_count-train3_count-train4_count-train5_count-train1_count}', end=' ')
print(f'F3={fold3_count-train4_count-train5_count-train1_count-train2_count}', end=' ')
print(f'F4={fold4_count-train5_count-train1_count-train2_count-train3_count}', end=' ')
print(f'F5={fold5_count-train1_count-train2_count-train3_count-train4_count}')

Partitions: F1=4 F2=4 F3=4 F4=4 F5=4


[Stage 155:>                                                        (0 + 4) / 4]

Counts: F1=3893130 F2=3895233 F3=3892144 F4=3894370 F5=3895063

Differences: F1=0 F2=0 F3=0 F4=0 F5=0


                                                                                

In [25]:
from time import time
ROEMS = []
for model_num, model in enumerate(model_list):
    print(f'Training model {model_num}')
    fold_num = 1
    roem_sum = 0
    t = time()
    for ft_pair in foldlist:
        # Fits model to fold within training data
        fitted_model = model.fit(ft_pair[0])
        # Generates predictions using fitted_model on respective CV test data
        predictions = fitted_model.transform(ft_pair[1])
        # Generates and prints a ROEM metric CV test data
        r = roem(spark, predictions, 'userId', 'viewed')
        # print(f"\tROEM (fold #{fold_num}): {r}")
        roem_sum += r
        fold_num += 1
    print(f'\tAverage ROEM {roem_sum/(fold_num-1)}')
    # Fits model to all of training data and generates preds for test data
    v_fitted_model = model.fit(training)
    v_predictions = v_fitted_model.transform(test)
    v_ROEM = roem(spark, v_predictions, 'userId', 'viewed')
    # Adds validation ROEM to ROEM list
    ROEMS.append(v_ROEM)
    print (f'\tValidation ROEM: {v_ROEM}')
    print(f"\tElapsed {time()-t:.0f}s")

Training model 0
	Average ROEM 0.13504418790222456


                                                                                

	Validation ROEM: 0.132365951689422
	Elapsed 34s
Training model 1
	Average ROEM 0.13961932221069306
	Validation ROEM: 0.13781299079125872
	Elapsed 30s
Training model 2


                                                                                

	Average ROEM 0.1427257801109573
	Validation ROEM: 0.14193794703142493
	Elapsed 32s
Training model 3
	Average ROEM 0.14527778463950947


                                                                                

	Validation ROEM: 0.1462752625535619
	Elapsed 32s
Training model 4
	Average ROEM 0.13487474275309044


                                                                                

	Validation ROEM: 0.1316352504010507
	Elapsed 31s
Training model 5
	Average ROEM 0.13957516725874347


                                                                                

	Validation ROEM: 0.13747189014360245
	Elapsed 31s
Training model 6
	Average ROEM 0.14264749928934797


                                                                                

	Validation ROEM: 0.1418289510841638
	Elapsed 30s
Training model 7
	Average ROEM 0.1451261460454578
	Validation ROEM: 0.14599876621927885
	Elapsed 31s
Training model 8


                                                                                

	Average ROEM 0.13471006992960177


                                                                                

	Validation ROEM: 0.13126350976324697
	Elapsed 31s
Training model 9
	Average ROEM 0.13944488773308988


                                                                                

	Validation ROEM: 0.13723007780239074
	Elapsed 31s
Training model 10


                                                                                

	Average ROEM 0.14263967898876428


                                                                                

	Validation ROEM: 0.14182939024529034
	Elapsed 32s
Training model 11
	Average ROEM 0.145232709780321


                                                                                

	Validation ROEM: 0.14578411494017285
	Elapsed 32s
Training model 12
	Average ROEM 0.1340155870066262
	Validation ROEM: 0.1298693512110217
	Elapsed 48s
Training model 13
	Average ROEM 0.13822653202894367


                                                                                

	Validation ROEM: 0.13553050676368408
	Elapsed 47s
Training model 14
	Average ROEM 0.14060848172796198
	Validation ROEM: 0.13976128374388586
	Elapsed 46s
Training model 15


                                                                                

	Average ROEM 0.14336945495793538


                                                                                

	Validation ROEM: 0.1429918119483238
	Elapsed 48s
Training model 16
	Average ROEM 0.13380683503988636


                                                                                

	Validation ROEM: 0.12964626977217494
	Elapsed 47s
Training model 17
	Average ROEM 0.13801698674035062
	Validation ROEM: 0.13511635206234934
	Elapsed 47s
Training model 18
	Average ROEM 0.14079869951690413
	Validation ROEM: 0.13953707466494358
	Elapsed 48s
Training model 19


                                                                                

	Average ROEM 0.14326018509312802
	Validation ROEM: 0.1428304598665216
	Elapsed 48s
Training model 20


                                                                                

	Average ROEM 0.13388531973981457


                                                                                

	Validation ROEM: 0.12952627225175295
	Elapsed 48s
Training model 21
	Average ROEM 0.13805544735656244
	Validation ROEM: 0.13471902005121209
	Elapsed 47s
Training model 22


                                                                                

	Average ROEM 0.14089334239796175
	Validation ROEM: 0.13982916379175964
	Elapsed 47s
Training model 23


                                                                                

	Average ROEM 0.14342291595849951


                                                                                

	Validation ROEM: 0.1430051362772585
	Elapsed 48s
Training model 24
	Average ROEM 0.1336065884496756


                                                                                

	Validation ROEM: 0.1291472337273522
	Elapsed 65s
Training model 25


                                                                                

	Average ROEM 0.13790419883157307


                                                                                

	Validation ROEM: 0.13535642830695885
	Elapsed 65s
Training model 26


                                                                                

	Average ROEM 0.13982818565483854
	Validation ROEM: 0.1394512861096455
	Elapsed 65s
Training model 27


                                                                                

	Average ROEM 0.1422933004162892


                                                                                

	Validation ROEM: 0.14279050218533654
	Elapsed 65s
Training model 28


                                                                                

	Average ROEM 0.13335430358535333


                                                                                

	Validation ROEM: 0.12933422881842857
	Elapsed 66s
Training model 29


                                                                                

	Average ROEM 0.13806149911882365


                                                                                

	Validation ROEM: 0.13494493316016734
	Elapsed 65s
Training model 30


                                                                                

	Average ROEM 0.14043838688505597


                                                                                

	Validation ROEM: 0.13912253318376094
	Elapsed 65s
Training model 31


23/05/10 20:07:10 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14250792138460158


                                                                                

	Validation ROEM: 0.14231800474540512
	Elapsed 67s
Training model 32


                                                                                

	Average ROEM 0.13321379843365105


                                                                                

	Validation ROEM: 0.12864592010228654
	Elapsed 65s
Training model 33


                                                                                

	Average ROEM 0.1378304598266539


                                                                                

	Validation ROEM: 0.13484265844747717
	Elapsed 66s
Training model 34


                                                                                

	Average ROEM 0.14077177353205156


                                                                                

	Validation ROEM: 0.13948520062582917
	Elapsed 64s
Training model 35


23/05/10 20:10:48 WARN TaskMemoryManager: Failed to allocate a page (33554432 bytes), try again.
23/05/10 20:10:48 WARN TaskMemoryManager: Failed to allocate a page (33554432 bytes), try again.
                                                                                

	Average ROEM 0.14288703034157763


                                                                                

	Validation ROEM: 0.1424563225111516
	Elapsed 65s
Training model 36
	Average ROEM 0.1334642941065499


                                                                                

	Validation ROEM: 0.12878239021125443
	Elapsed 82s
Training model 37


                                                                                

	Average ROEM 0.13776618676188943


                                                                                

	Validation ROEM: 0.13545331365826604
	Elapsed 80s
Training model 38


                                                                                

	Average ROEM 0.1395295799108238


                                                                                

	Validation ROEM: 0.13912599097978623
	Elapsed 81s
Training model 39
	Average ROEM 0.1416381550887822
	Validation ROEM: 0.14172242889384937
	Elapsed 79s
Training model 40
	Average ROEM 0.13335071640449075


                                                                                

	Validation ROEM: 0.12944547018983327
	Elapsed 83s
Training model 41
	Average ROEM 0.1378295204388691


                                                                                

	Validation ROEM: 0.13455689970645976
	Elapsed 79s
Training model 42


23/05/10 20:20:14 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.1405397468478953


                                                                                

	Validation ROEM: 0.13954841178840854
	Elapsed 82s
Training model 43


                                                                                

	Average ROEM 0.14228932816064743


                                                                                

	Validation ROEM: 0.1423045527313477
	Elapsed 83s
Training model 44
	Average ROEM 0.13322878404843172


                                                                                

	Validation ROEM: 0.12848390693085782
	Elapsed 81s
Training model 45


23/05/10 20:24:06 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.13748331801125663


                                                                                

	Validation ROEM: 0.1341590639445566
	Elapsed 81s
Training model 46


                                                                                

	Average ROEM 0.14068853349587312


                                                                                

	Validation ROEM: 0.13962459925715753
	Elapsed 83s
Training model 47


                                                                                

	Average ROEM 0.14239213804317358


                                                                                

	Validation ROEM: 0.14246966657773447
	Elapsed 82s
Training model 48


                                                                                

	Average ROEM 0.13863042651546126


                                                                                

	Validation ROEM: 0.13555247533780995
	Elapsed 35s
Training model 49


                                                                                

	Average ROEM 0.1436069428832361


                                                                                

	Validation ROEM: 0.13989538430518494
	Elapsed 35s
Training model 50


                                                                                

	Average ROEM 0.14654290922886895
	Validation ROEM: 0.14341628958538777
	Elapsed 35s
Training model 51


                                                                                

	Average ROEM 0.1487606818046242
	Validation ROEM: 0.14538609710695313
	Elapsed 35s
Training model 52


                                                                                

	Average ROEM 0.13810282739410149


                                                                                

	Validation ROEM: 0.13540570195224408
	Elapsed 34s
Training model 53


23/05/10 20:31:18 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14316708379197313


                                                                                

	Validation ROEM: 0.13936474378802371
	Elapsed 37s
Training model 54


                                                                                

	Average ROEM 0.1462378061235631


                                                                                

	Validation ROEM: 0.14313497503931838
	Elapsed 36s
Training model 55


                                                                                

	Average ROEM 0.14841506107068483
	Validation ROEM: 0.14492869561591154
	Elapsed 36s
Training model 56


23/05/10 20:33:12 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.13746069870799113


                                                                                

	Validation ROEM: 0.13472579276236038
	Elapsed 36s
Training model 57


                                                                                

	Average ROEM 0.14281227578244943
	Validation ROEM: 0.13908649979426227
	Elapsed 36s
Training model 58


23/05/10 20:34:06 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14592227563708438
	Validation ROEM: 0.14280521256348508
	Elapsed 36s
Training model 59


23/05/10 20:34:48 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
23/05/10 20:34:55 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14804686882748405


                                                                                

	Validation ROEM: 0.1446871801416708
	Elapsed 37s
Training model 60


23/05/10 20:35:34 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
23/05/10 20:35:52 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.13694762840355662


                                                                                

	Validation ROEM: 0.13374784029378356
	Elapsed 55s
Training model 61


                                                                                

	Average ROEM 0.14302015780451457


                                                                                

	Validation ROEM: 0.13862140714071478
	Elapsed 54s
Training model 62


                                                                                

	Average ROEM 0.14697768328656363


                                                                                

	Validation ROEM: 0.14141602653791432
	Elapsed 55s
Training model 63


                                                                                

	Average ROEM 0.14944587399152004
	Validation ROEM: 0.1445196186525905
	Elapsed 55s
Training model 64


                                                                                

	Average ROEM 0.13641230706490398


                                                                                

	Validation ROEM: 0.13355472835801402
	Elapsed 55s
Training model 65


                                                                                

	Average ROEM 0.14246684559715003
	Validation ROEM: 0.1387184162361263
	Elapsed 54s
Training model 66


23/05/10 20:40:53 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
23/05/10 20:41:03 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.1465460834535203


                                                                                

	Validation ROEM: 0.14143786892499763
	Elapsed 56s
Training model 67


                                                                                

	Average ROEM 0.14913464920170444


                                                                                

	Validation ROEM: 0.14439722430936175
	Elapsed 55s
Training model 68


23/05/10 20:42:52 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.13579389618685456
	Validation ROEM: 0.1329786895143588
	Elapsed 54s
Training model 69


23/05/10 20:43:38 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14186245204077497


                                                                                

	Validation ROEM: 0.1383563174568765
	Elapsed 56s
Training model 70


                                                                                

	Average ROEM 0.14597027612863103
	Validation ROEM: 0.14155636899390028
	Elapsed 56s
Training model 71


                                                                                

	Average ROEM 0.14872276770630752
	Validation ROEM: 0.14407732037891588
	Elapsed 56s
Training model 72


                                                                                

	Average ROEM 0.13653267273033684


                                                                                

	Validation ROEM: 0.1336239902226059
	Elapsed 74s
Training model 73


                                                                                

	Average ROEM 0.14228047398907145


                                                                                

	Validation ROEM: 0.13844051022251286
	Elapsed 75s
Training model 74


                                                                                

	Average ROEM 0.14644516670859042
	Validation ROEM: 0.14161261520316756
	Elapsed 77s
Training model 75


                                                                                

	Average ROEM 0.14947379153477405


                                                                                

	Validation ROEM: 0.14492334097200743
	Elapsed 76s
Training model 76


                                                                                

	Average ROEM 0.13624936520952707
	Validation ROEM: 0.13337095803067922
	Elapsed 75s
Training model 77


                                                                                

	Average ROEM 0.14207285733793112
	Validation ROEM: 0.13830164187312502
	Elapsed 76s
Training model 78


                                                                                

	Average ROEM 0.14628612932882806


                                                                                

	Validation ROEM: 0.14110406038544224
	Elapsed 76s
Training model 79


23/05/10 20:56:00 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14930945508149468
	Validation ROEM: 0.14476438472589093
	Elapsed 77s
Training model 80


                                                                                

	Average ROEM 0.13543618748689215


                                                                                

	Validation ROEM: 0.13236172698879645
	Elapsed 75s
Training model 81


                                                                                

	Average ROEM 0.1415990703487202


                                                                                

	Validation ROEM: 0.13789753020346093
	Elapsed 76s
Training model 82


                                                                                

	Average ROEM 0.14586128801673567


                                                                                

	Validation ROEM: 0.1412655000912802
	Elapsed 76s
Training model 83


                                                                                

	Average ROEM 0.14901421083591346


                                                                                

	Validation ROEM: 0.14440921280385807
	Elapsed 78s
Training model 84


                                                                                

	Average ROEM 0.13630959203316623


                                                                                

	Validation ROEM: 0.13390318318356093
	Elapsed 94s
Training model 85


                                                                                

	Average ROEM 0.1419556785578616


                                                                                

	Validation ROEM: 0.13897890521694944
	Elapsed 95s
Training model 86


                                                                                

	Average ROEM 0.14623343074644995


                                                                                

	Validation ROEM: 0.14124639853465423
	Elapsed 95s
Training model 87


                                                                                

	Average ROEM 0.14926918200667255
	Validation ROEM: 0.1450032041354042
	Elapsed 94s
Training model 88


                                                                                

	Average ROEM 0.13635771590003548


                                                                                

	Validation ROEM: 0.1334075714655216
	Elapsed 95s
Training model 89


                                                                                

	Average ROEM 0.141472910751435


                                                                                

	Validation ROEM: 0.13804501477390113
	Elapsed 95s
Training model 90


                                                                                

	Average ROEM 0.14587998629687007


                                                                                

	Validation ROEM: 0.1411850294072741
	Elapsed 95s
Training model 91


                                                                                

	Average ROEM 0.14918714439319472


                                                                                

	Validation ROEM: 0.14484098858824698
	Elapsed 94s
Training model 92


                                                                                

	Average ROEM 0.13545251122941734
	Validation ROEM: 0.13234879246392922
	Elapsed 94s
Training model 93


                                                                                

	Average ROEM 0.14118351603956814


                                                                                

	Validation ROEM: 0.1375734665004763
	Elapsed 95s
Training model 94


                                                                                

	Average ROEM 0.14514864980989128


                                                                                

	Validation ROEM: 0.1410238101250071
	Elapsed 96s
Training model 95


                                                                                

	Average ROEM 0.1488484448020585


                                                                                

	Validation ROEM: 0.14437537710506584
	Elapsed 96s
Training model 96


                                                                                

	Average ROEM 0.14222318138822204


                                                                                

	Validation ROEM: 0.14034521704124034
	Elapsed 43s
Training model 97


                                                                                

	Average ROEM 0.14667514790945949


                                                                                

	Validation ROEM: 0.14496173038015073
	Elapsed 44s
Training model 98


                                                                                

	Average ROEM 0.14957092727793758


                                                                                

	Validation ROEM: 0.1471165695065454
	Elapsed 44s
Training model 99


                                                                                

	Average ROEM 0.15097051895295727


                                                                                

	Validation ROEM: 0.1481925767964645
	Elapsed 45s
Training model 100


                                                                                

	Average ROEM 0.14132542050252006
	Validation ROEM: 0.13940476482316724
	Elapsed 43s
Training model 101


                                                                                

	Average ROEM 0.1460569945745009


                                                                                

	Validation ROEM: 0.14402721766109514
	Elapsed 44s
Training model 102


                                                                                

	Average ROEM 0.1488929044943704


                                                                                

	Validation ROEM: 0.1460011964897352
	Elapsed 44s
Training model 103


                                                                                

	Average ROEM 0.15039355677375238


                                                                                

	Validation ROEM: 0.14742711826459934
	Elapsed 44s
Training model 104


                                                                                

	Average ROEM 0.14053886173189734


                                                                                

	Validation ROEM: 0.13790207236912908
	Elapsed 43s
Training model 105


                                                                                

	Average ROEM 0.14548467998166156


                                                                                

	Validation ROEM: 0.14322432299579163
	Elapsed 44s
Training model 106


                                                                                

	Average ROEM 0.14825187812140694


                                                                                

	Validation ROEM: 0.1451798128936253
	Elapsed 44s
Training model 107


                                                                                

	Average ROEM 0.1497844585234276


                                                                                

	Validation ROEM: 0.14670850855085305
	Elapsed 44s
Training model 108


                                                                                

	Average ROEM 0.13983758522207285


                                                                                

	Validation ROEM: 0.13851447113166246
	Elapsed 65s
Training model 109


                                                                                

	Average ROEM 0.14578257710533032


                                                                                

	Validation ROEM: 0.14460772161951801
	Elapsed 65s
Training model 110


                                                                                

	Average ROEM 0.14859199704468123


                                                                                

	Validation ROEM: 0.14664831961141792
	Elapsed 66s
Training model 111


                                                                                

	Average ROEM 0.1502407404768053


                                                                                

	Validation ROEM: 0.14788378493737986
	Elapsed 67s
Training model 112


                                                                                

	Average ROEM 0.13964520898781047


                                                                                

	Validation ROEM: 0.1381442255016398
	Elapsed 65s
Training model 113


                                                                                

	Average ROEM 0.14494360411526802


                                                                                

	Validation ROEM: 0.1427020510238676
	Elapsed 66s
Training model 114


                                                                                

	Average ROEM 0.14789934696958776


                                                                                

	Validation ROEM: 0.14546092791178614
	Elapsed 66s
Training model 115


                                                                                

	Average ROEM 0.14977016766073922


                                                                                

	Validation ROEM: 0.14679350927179502
	Elapsed 66s
Training model 116


                                                                                

	Average ROEM 0.13896274263346975


                                                                                

	Validation ROEM: 0.13620332356295392
	Elapsed 66s
Training model 117


                                                                                

	Average ROEM 0.14423990751443141


                                                                                

	Validation ROEM: 0.14215877082646208
	Elapsed 66s
Training model 118


                                                                                

	Average ROEM 0.14740353577356985


                                                                                

	Validation ROEM: 0.14481523152132164
	Elapsed 66s
Training model 119


                                                                                

	Average ROEM 0.1491413336790713


                                                                                

	Validation ROEM: 0.1459088892840836
	Elapsed 66s
Training model 120


                                                                                

	Average ROEM 0.1391800621049712


                                                                                

	Validation ROEM: 0.1383131047850371
	Elapsed 86s
Training model 121


                                                                                

	Average ROEM 0.14521029363654542


                                                                                

	Validation ROEM: 0.145369306784346
	Elapsed 86s
Training model 122


                                                                                

	Average ROEM 0.14817391472347066


                                                                                

	Validation ROEM: 0.14679216660816072
	Elapsed 89s
Training model 123


                                                                                

	Average ROEM 0.15031466336962726


                                                                                

	Validation ROEM: 0.14819415031795538
	Elapsed 88s
Training model 124


                                                                                

	Average ROEM 0.13925164475043061


                                                                                

	Validation ROEM: 0.13796296502607064
	Elapsed 86s
Training model 125


                                                                                

	Average ROEM 0.1444788140220492


                                                                                

	Validation ROEM: 0.14260525760037127
	Elapsed 87s
Training model 126


                                                                                

	Average ROEM 0.14781297527680948


                                                                                

	Validation ROEM: 0.14561369383753003
	Elapsed 88s
Training model 127


                                                                                

	Average ROEM 0.14986064009865468


                                                                                

	Validation ROEM: 0.1467178766397655
	Elapsed 88s
Training model 128


                                                                                

	Average ROEM 0.13842693599132772


                                                                                

	Validation ROEM: 0.13603593299090494
	Elapsed 86s
Training model 129


                                                                                

	Average ROEM 0.14404492980877354


                                                                                

	Validation ROEM: 0.14131894496731068
	Elapsed 87s
Training model 130


                                                                                

	Average ROEM 0.1473456017513409


                                                                                

	Validation ROEM: 0.14487853203927228
	Elapsed 88s
Training model 131


                                                                                

	Average ROEM 0.14923947118488856


                                                                                

	Validation ROEM: 0.1461707486642829
	Elapsed 88s
Training model 132


                                                                                

	Average ROEM 0.1391483120748031


                                                                                

	Validation ROEM: 0.1377334604320532
	Elapsed 106s
Training model 133


                                                                                

	Average ROEM 0.14506629818936817
	Validation ROEM: 0.14522602920535113
	Elapsed 108s
Training model 134


                                                                                

	Average ROEM 0.14816466328623046


                                                                                

	Validation ROEM: 0.14695763912159246
	Elapsed 109s
Training model 135


                                                                                

	Average ROEM 0.15042279386165705


                                                                                

	Validation ROEM: 0.1487096975061543
	Elapsed 109s
Training model 136


                                                                                

	Average ROEM 0.13874073368094206


                                                                                

	Validation ROEM: 0.13782755120980686
	Elapsed 108s
Training model 137


                                                                                

	Average ROEM 0.14456706736407615


                                                                                

	Validation ROEM: 0.14281674661370505
	Elapsed 108s
Training model 138


23/05/10 22:11:43 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14776984379401614


                                                                                

	Validation ROEM: 0.14618044727271365
	Elapsed 110s
Training model 139


                                                                                

	Average ROEM 0.14985607839251092
	Validation ROEM: 0.14736399044765508
	Elapsed 110s
Training model 140


                                                                                

	Average ROEM 0.1382819340068474


                                                                                

	Validation ROEM: 0.1358852856774993
	Elapsed 109s
Training model 141


                                                                                

	Average ROEM 0.1438340226707915


                                                                                

	Validation ROEM: 0.14102615046846417
	Elapsed 109s
Training model 142


                                                                                

	Average ROEM 0.14723144843908315


                                                                                

	Validation ROEM: 0.14501325967160286
	Elapsed 109s
Training model 143


                                                                                

	Average ROEM 0.14946293985126752


                                                                                

	Validation ROEM: 0.14645468967190156
	Elapsed 111s
Training model 144


                                                                                

	Average ROEM 0.14637115864418337


                                                                                

	Validation ROEM: 0.14190739172849462
	Elapsed 50s
Training model 145


                                                                                

	Average ROEM 0.15038965725774275


                                                                                

	Validation ROEM: 0.14631747694824665
	Elapsed 52s
Training model 146


                                                                                

	Average ROEM 0.15216945942670695


                                                                                

	Validation ROEM: 0.14865343572179313
	Elapsed 51s
Training model 147


                                                                                

	Average ROEM 0.15328887835589644


                                                                                

	Validation ROEM: 0.14935498034910963
	Elapsed 52s
Training model 148


                                                                                

	Average ROEM 0.1453648657314218


                                                                                

	Validation ROEM: 0.14161833751063796
	Elapsed 50s
Training model 149


                                                                                

	Average ROEM 0.14984364000956882


                                                                                

	Validation ROEM: 0.1455448575462131
	Elapsed 51s
Training model 150


                                                                                

	Average ROEM 0.15152397975492118


                                                                                

	Validation ROEM: 0.14758920791589494
	Elapsed 51s
Training model 151


                                                                                

	Average ROEM 0.15249051770870928


                                                                                

	Validation ROEM: 0.1487525246909251
	Elapsed 53s
Training model 152


                                                                                

	Average ROEM 0.144301718381317


                                                                                

	Validation ROEM: 0.14050644970964848
	Elapsed 50s
Training model 153


                                                                                

	Average ROEM 0.1490280782171008


                                                                                

	Validation ROEM: 0.14455280442659052
	Elapsed 52s
Training model 154


                                                                                

	Average ROEM 0.15075200226507596


                                                                                

	Validation ROEM: 0.14703256111706187
	Elapsed 52s
Training model 155


                                                                                

	Average ROEM 0.15174214773927605


                                                                                

	Validation ROEM: 0.14771375583057086
	Elapsed 52s
Training model 156


                                                                                

	Average ROEM 0.14403880225772445


                                                                                

	Validation ROEM: 0.14167038368983884
	Elapsed 75s
Training model 157


                                                                                

	Average ROEM 0.14903188014785845


                                                                                

	Validation ROEM: 0.14595922045726034
	Elapsed 76s
Training model 158


                                                                                

	Average ROEM 0.15196674074258404


                                                                                

	Validation ROEM: 0.14811393814095775
	Elapsed 78s
Training model 159


                                                                                

	Average ROEM 0.1529276921723461


                                                                                

	Validation ROEM: 0.14979311345050375
	Elapsed 79s
Training model 160


                                                                                

	Average ROEM 0.1435101738081947


                                                                                

	Validation ROEM: 0.14015425669583992
	Elapsed 75s
Training model 161


23/05/10 22:39:01 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14851134321134604


                                                                                

	Validation ROEM: 0.14506187090546988
	Elapsed 77s
Training model 162


                                                                                

	Average ROEM 0.15113450227167563


                                                                                

	Validation ROEM: 0.14740060242123773
	Elapsed 77s
Training model 163


23/05/10 22:41:36 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.15254667814109965


                                                                                

	Validation ROEM: 0.14953105761168217
	Elapsed 78s
Training model 164


                                                                                

	Average ROEM 0.14234902325321042


                                                                                

	Validation ROEM: 0.13929464275603295
	Elapsed 75s
Training model 165


                                                                                

	Average ROEM 0.147791140082631


                                                                                

	Validation ROEM: 0.14402219755693915
	Elapsed 77s
Training model 166


                                                                                

	Average ROEM 0.1506205717635226


                                                                                

	Validation ROEM: 0.1467341470016498
	Elapsed 77s
Training model 167


                                                                                

	Average ROEM 0.15210238461276174


                                                                                

	Validation ROEM: 0.14854690737941106
	Elapsed 78s
Training model 168


                                                                                

	Average ROEM 0.14338280394499747


                                                                                

	Validation ROEM: 0.14158003264575356
	Elapsed 99s
Training model 169


                                                                                

	Average ROEM 0.1480755252292621


                                                                                

	Validation ROEM: 0.14639187462273812
	Elapsed 101s
Training model 170


                                                                                

	Average ROEM 0.15185008374791162


                                                                                

	Validation ROEM: 0.14899988047567211
	Elapsed 102s
Training model 171


                                                                                

	Average ROEM 0.1535917032427009


                                                                                

	Validation ROEM: 0.15064019459988343
	Elapsed 104s
Training model 172


                                                                                

	Average ROEM 0.14280862533384642


                                                                                

	Validation ROEM: 0.14003579302095215
	Elapsed 100s
Training model 173


                                                                                

	Average ROEM 0.1478502206410063


                                                                                

	Validation ROEM: 0.14529138860570553
	Elapsed 101s
Training model 174


                                                                                

	Average ROEM 0.15090048524832916


                                                                                

	Validation ROEM: 0.14824765074499444
	Elapsed 103s
Training model 175


                                                                                

	Average ROEM 0.15292103306271465


                                                                                

	Validation ROEM: 0.14995190393827174
	Elapsed 105s
Training model 176


                                                                                

	Average ROEM 0.14175172976498093


                                                                                

	Validation ROEM: 0.13871659592844426
	Elapsed 100s
Training model 177


                                                                                

	Average ROEM 0.14696587558448343


                                                                                

	Validation ROEM: 0.14465845166517635
	Elapsed 103s
Training model 178


                                                                                

	Average ROEM 0.15049690865530443


                                                                                

	Validation ROEM: 0.1470162181874785
	Elapsed 101s
Training model 179


                                                                                

	Average ROEM 0.1522688149868009


                                                                                

	Validation ROEM: 0.14950713541690064
	Elapsed 103s
Training model 180


                                                                                

	Average ROEM 0.14308098719008375


                                                                                

	Validation ROEM: 0.1419993211196451
	Elapsed 123s
Training model 181


                                                                                

	Average ROEM 0.14786410541983536


                                                                                

	Validation ROEM: 0.14632083244266594
	Elapsed 126s
Training model 182


                                                                                

	Average ROEM 0.15166616827473067


                                                                                

	Validation ROEM: 0.14904751890713652
	Elapsed 128s
Training model 183


                                                                                

	Average ROEM 0.15353412089656165


                                                                                

	Validation ROEM: 0.15164052235629197
	Elapsed 129s
Training model 184


                                                                                

	Average ROEM 0.14224251011189018


                                                                                

	Validation ROEM: 0.14008354181628113
	Elapsed 126s
Training model 185


                                                                                

	Average ROEM 0.1472235362893324


                                                                                

	Validation ROEM: 0.1453560143180103
	Elapsed 127s
Training model 186


                                                                                

	Average ROEM 0.15038869331300642


                                                                                

	Validation ROEM: 0.1488121974078903
	Elapsed 129s
Training model 187


                                                                                

	Average ROEM 0.152814560187487


                                                                                

	Validation ROEM: 0.15124338400325887
	Elapsed 127s
Training model 188


                                                                                

	Average ROEM 0.14142760689563486


                                                                                

	Validation ROEM: 0.13831972740037804
	Elapsed 119s
Training model 189


23/05/10 23:27:41 WARN TaskMemoryManager: Failed to allocate a page (134217728 bytes), try again.
                                                                                

	Average ROEM 0.14671678145579986


                                                                                

	Validation ROEM: 0.1447523930609863
	Elapsed 117s
Training model 190


                                                                                

	Average ROEM 0.15007043728413372


                                                                                

	Validation ROEM: 0.1479646746276781
	Elapsed 121s
Training model 191


                                                                                

	Average ROEM 0.15248293101484625




	Validation ROEM: 0.15018115568720378
	Elapsed 121s


                                                                                

In [26]:
# Find the index of the smallest ROEM
i = np.argmin(ROEMS)
print(f'Smallest ROEM #{i}: {ROEMS[i]}')

Smallest ROEM #44: 0.12848390693085782


In [27]:
# Extract the best_model
best_model = model_list[i]
# Extract the Rank
print ("Rank: ", best_model.getRank())
# Extract the MaxIter value
print ("MaxIter: ", best_model.getMaxIter())
# Extract the RegParam value
print ("RegParam: ", best_model.getRegParam())
# Extract the Alpha value
print ("Alpha: ", best_model.getAlpha())

Rank:  10
MaxIter:  40
RegParam:  0.15
Alpha:  20.0


In [30]:
fitted_model = best_model.fit(training)
binary_test_predictions = fitted_model.transform(test)
# Look at the test predictions
binary_test_predictions.show(3)
# Evaluate ROEM on test predictions
print(f"ROEM: {roem(spark, binary_test_predictions, 'userId', 'viewed')}")
# Look at user 42's test predictions
binary_test_predictions.filter(F.col("userId") == 42).show(3)

                                                                                

+------+-------+------+-----------+
|userId|movieId|viewed| prediction|
+------+-------+------+-----------+
|   148|      5|     0| 0.42935294|
|   148|      7|     0| 0.45020267|
|   148|     11|     0|  0.4760792|
|   148|     13|     0|0.102432765|
|   148|     15|     0|0.092653215|
|   148|     30|     0| 0.22208145|
|   148|     34|     0| 0.78784674|
|   148|     38|     0| 0.07041129|
|   148|     42|     0| 0.13359056|
|   148|     53|     0|        0.0|
|   148|     54|     0|  0.0121748|
|   148|     58|     1| 0.46741182|
|   148|     60|     0| 0.19647813|
|   148|     63|     0| 0.08127988|
|   148|     74|     0| 0.18537197|
|   148|     81|     0| 0.25807658|
|   148|     93|     0|0.050872304|
|   148|     97|     0| 0.12778923|
|   148|    105|     0|  0.4534478|
|   148|    121|     0|  0.1340127|
+------+-------+------+-----------+
only showing top 20 rows

ROEM: 0.12848390693085782
+------+-------+------+------------+
|userId|movieId|viewed|  prediction|
+------+--

> The model has a pretty low ROEM. Did you notice that the model predicted some high numbers for unseen movies? This indicates that the model is creating recommendations from the movies that users have not seen.

### Recommendations from binary data

In [45]:
# Read data from CSV file
movies = spark.read.csv('movies.csv', sep=',', header=True, inferSchema=True)
original_ratings = movie_ratings.join(movies, on='movieId')

In [46]:
# View user 26's original ratings
print ("User 26 Original Ratings:")
original_ratings.filter(F.col("userId") == 26).show()

User 26 Original Ratings:
+-------+------+------+--------------------+--------------------+
|movieId|userId|rating|               title|              genres|
+-------+------+------+--------------------+--------------------+
|      1|    26|   5.0|    Toy Story (1995)|Adventure|Animati...|
|     32|    26|   4.5|Twelve Monkeys (a...|Mystery|Sci-Fi|Th...|
|     47|    26|   4.5|Seven (a.k.a. Se7...|    Mystery|Thriller|
|     50|    26|   4.5|Usual Suspects, T...|Crime|Mystery|Thr...|
|     63|    26|   0.5|Don't Be a Menace...|        Comedy|Crime|
|     69|    26|   4.5|       Friday (1995)|              Comedy|
|    153|    26|   3.5|Batman Forever (1...|Action|Adventure|...|
|    165|    26|   2.5|Die Hard: With a ...|Action|Crime|Thri...|
|    260|    26|   3.0|Star Wars: Episod...|Action|Adventure|...|
|    296|    26|   3.5| Pulp Fiction (1994)|Comedy|Crime|Dram...|
|    316|    26|   0.5|     Stargate (1994)|Action|Adventure|...|
|    318|    26|   4.0|Shawshank Redempt...|      

In [54]:
binary_recs = fitted_model.transform(binary_movie_ratings).join(movies, on='movieId')
# View user 26's recommendations
print ("User 26 Recommendations:")
binary_recs.filter(F.col("userId") == 26).orderBy('prediction', ascending=False).show()

User 26 Recommendations:
+-------+------+------+----------+--------------------+--------------------+
|movieId|userId|viewed|prediction|               title|              genres|
+-------+------+------+----------+--------------------+--------------------+
|    318|    26|     1| 1.1315075|Shawshank Redempt...|         Crime|Drama|
|    356|    26|     1| 1.1144809| Forrest Gump (1994)|Comedy|Drama|Roma...|
|  58559|    26|     1| 1.1115212|Dark Knight, The ...|Action|Crime|Dram...|
|    296|    26|     1| 1.0668908| Pulp Fiction (1994)|Comedy|Crime|Dram...|
|    593|    26|     1| 1.0519606|Silence of the La...|Crime|Horror|Thri...|
|  46578|    26|     0|  1.033709|Little Miss Sunsh...|Adventure|Comedy|...|
|  60069|    26|     1| 1.0318687|       WALL·E (2008)|Adventure|Animati...|
|   4993|    26|     0| 1.0294702|Lord of the Rings...|   Adventure|Fantasy|
|  72998|    26|     1| 1.0273633|       Avatar (2009)|Action|Adventure|...|
|  63082|    26|     1| 1.0272381|Slumdog Millionai

In [55]:
# View user 99's original ratings
print ("User 99 Original Ratings:")
original_ratings.filter(F.col("userId") == 99).show()

User 99 Original Ratings:
+-------+------+------+--------------------+--------------------+
|movieId|userId|rating|               title|              genres|
+-------+------+------+--------------------+--------------------+
|      1|    99|   4.0|    Toy Story (1995)|Adventure|Animati...|
|      2|    99|   2.0|      Jumanji (1995)|Adventure|Childre...|
|     17|    99|   3.0|Sense and Sensibi...|       Drama|Romance|
|     28|    99|   3.0|   Persuasion (1995)|       Drama|Romance|
|     39|    99|   4.0|     Clueless (1995)|      Comedy|Romance|
|     47|    99|   1.0|Seven (a.k.a. Se7...|    Mystery|Thriller|
|     50|    99|   4.0|Usual Suspects, T...|Crime|Mystery|Thr...|
|    105|    99|   3.0|Bridges of Madiso...|       Drama|Romance|
|    110|    99|   3.0|   Braveheart (1995)|    Action|Drama|War|
|    154|    99|   3.0|Beauty of the Day...|               Drama|
|    160|    99|   2.0|        Congo (1995)|Action|Adventure|...|
|    170|    99|   2.0|      Hackers (1995)|Action

In [56]:
# View user 99's recommendations
print ("User 99 Recommendations:")
binary_recs.filter(F.col("userId") == 99).orderBy('prediction', ascending=False).show()

User 99 Recommendations:
+-------+------+------+----------+--------------------+--------------------+
|movieId|userId|viewed|prediction|               title|              genres|
+-------+------+------+----------+--------------------+--------------------+
|    608|    99|     1| 1.1151934|        Fargo (1996)|Comedy|Crime|Dram...|
|    260|    99|     0| 1.1001455|Star Wars: Episod...|Action|Adventure|...|
|    858|    99|     0| 1.0485737|Godfather, The (1...|         Crime|Drama|
|     36|    99|     0| 1.0440861|Dead Man Walking ...|         Crime|Drama|
|   1073|    99|     0| 1.0357492|Willy Wonka & the...|Children|Comedy|F...|
|   1617|    99|     1|  1.029409|L.A. Confidential...|Crime|Film-Noir|M...|
|    527|    99|     1| 1.0276374|Schindler's List ...|           Drama|War|
|    780|    99|     0| 1.0226269|Independence Day ...|Action|Adventure|...|
|   1225|    99|     0| 1.0222312|      Amadeus (1984)|               Drama|
|   1097|    99|     0| 1.0154196|E.T. the Extra-Te

# Course recap

## Resources

* [McKinsey&Company: "How Retailers Can Keep Up With Consumers"](https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers)

    A paper published by McKinsey and Company discussing the power of recommendation engines like ALS based models.

* [ALS Data Preparation: Wide to Long Function](https://github.com/jamenlong/ALS_expected_percent_rank_cv/blob/master/wide_to_long_function.py)

    Code to build the wide_to_long function discussed in the section about preparing data for ALS.

* [Hu, Koren, Volinsky: "Collaborative Filtering for Implicit Feedback Datasets"](http://yifanhu.net/PUB/cf.pdf)

    White paper that provides the academic background for building ALS models using implicit ratings. I highly recommend reading this paper as it provides a lot of context and insight into how these models work and alternative ways to evaluate them.

* [GitHub Repo: Cross Validation With Implicit Ratings in Pyspark](https://github.com/jamenlong/ALS_expected_percent_rank_cv/blob/master/ROEM_cv.py)

    GitHub link for code that manages the cross validation and model evaluation for implicit ratings models using ALS in Pyspark

* [Pan, Zhou, Cao, Liu, Lukose, Scholz, Yang: "One Class Collaborative Filtering"](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.306.4684&rep=rep1&type=pdf)

    Paper that discusses the math and intuition behind the user-based weighting and item-based weighting methodologies for addressing the class imbalance present in binary ratings models.