Authors:
    <br>Alejandro Alvarez (axa)
    <br>Brenda Palma (bpalmagu)

# <center>ML-Jokes: Collaborative Filtering </center>

## Setup

In [1]:
# Path to ml-jokes folder
import os
if os.getcwd().split('/')[-2] == 'ml-jokes': os.chdir('..')
print(f'Current directory: {os.getcwd()}')
assert set(['data', 'mljokes', 'environment.yml', 'nbs']) <= set(os.listdir()), \
    'Wrong path; go to ./heinz-95729-project/api/ml-jokes'

Current directory: /home/brendapalmag/eCommerce/heinz-95729-project/api/ml-jokes


In [50]:
import pandas as pd
import numpy as np
from surprise import SVD
from surprise import Dataset, Reader
from surprise import accuracy
from surprise import dump
from surprise.model_selection import cross_validate, train_test_split, GridSearchCV
from mljokes.data import read_jokes, read_ratings
from mljokes import cf_utils as cfu

## Data Prep

In [24]:
# Read jokes
jokes = read_jokes()

In [16]:
# Read and consolidate jester ratings
ratings = read_ratings()
ratings.head(3)

Unnamed: 0,user_id,joke_id,rating,test_user
0,0,1,99.0,0
1,0,2,99.0,0
2,0,3,99.0,0
3,0,4,99.0,0
4,0,5,-1.65,0


In [17]:
# Filter put missing rating (99.0)
unseen = ratings.loc[ratings['rating']==99]
ratings = ratings.loc[ratings['rating']!=99]
ratings.reset_index(inplace=True, drop=True)
ratings.sort_values(by=['user_id', 'joke_id'], ascending=True, inplace=True, ignore_index=True)
ratings.head(3)

Unnamed: 0,user_id,joke_id,rating,test_user
0,0,5,-1.65,0
1,0,7,-0.78,0
2,0,8,6.89,0


## SVD Model

In [5]:
# Build custom dataset 
reader = Reader(rating_scale=(ratings.rating.min(), ratings.rating.max()))
data = Dataset.load_from_df(ratings[['user_id', 'joke_id', 'rating']], reader)

In [55]:
param_grid = {'n_epochs': [10], 'lr_all': [0.002, 0.005],
              'reg_all': [0.2, 0.5]}

gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)

gs.fit(data)

# best RMSE score
print(gs.best_score['rmse'])

# combination of parameters that gave the best RMSE score
print(gs.best_params['rmse'])

4.318686959352793
{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.2}


In [57]:
trainset = data.build_full_trainset()

# Train model
model = gs.best_estimator['rmse']
model.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f3391776430>

In [58]:
# Save model
file_name = './results/cf_model'
dump.dump(file_name, algo=model, verbose=0)

## Get Recommendations

In [59]:
# Load model
file_name = './results/cf_model'
_, model = dump.load(file_name)

In [60]:
users = ratings.loc[ratings['test_user']==1, 'user_id'].unique()[:3]

for u in users:
    unseen_by_user = unseen.values[unseen['user_id']==u, 1]
    u_jokes, u_ratings = cfu.get_recommendations(model, u, unseen_by_user, 3)
    print("Recommendations for user", u)
    print("-"*30)
    print(u_jokes, u_ratings)
    display_jokes(jokes, u_jokes)

    

Recommendations for user 1229
------------------------------
[83 72 31] [1.35235697 1.33885159 1.14044954]
What a woman says: "This place is a mess! C'mon, You and I need to clean up, Your stuff is lying on the floor and you'll have no clothes to wear, if we don't do laundry right now!" What a man hears: blah, blah, blah, blah, C'mon blah, blah, blah, blah, you and I blah, blah, blah, blah, on the floor blah, blah, blah, blah, no clothes blah, blah, blah, blah, RIGHT NOW! 

On the first day of college, the Dean addressed the students, pointing out some of the rules: "The female dormitory will be out-of-bounds for all male students and the male dormitory to the female students. Anybody caught breaking this rule will be finded $20 the first time." He continued, "Anybody caught breaking this rule the second time will be fined $60. Being caught a third time will cost you a fine of $180. Are there any questions ?" At this point, a male student in the crowd inquired: "How much for a season p