Authors:
    <br>Alejandro Alvarez (axa)
    <br>Brenda Palma (bpalmagu)

# <center>ML-Jokes: Collaborative Filtering </center>

## Setup

In [1]:
# Path to ml-jokes folder
import os
if os.getcwd().split('/')[-2] == 'ml-jokes': os.chdir('..')
print(f'Current directory: {os.getcwd()}')
assert set(['data', 'mljokes', 'environment.yml', 'nbs']) <= set(os.listdir()), \
    'Wrong path; go to ./heinz-95729-project/api/ml-jokes'

Current directory: /home/brendapalmag/eCommerce/heinz-95729-project/api/ml-jokes


In [50]:
import pandas as pd
import numpy as np
from surprise import SVD
from surprise import Dataset, Reader
from surprise import accuracy
from surprise import dump
from surprise.model_selection import cross_validate, train_test_split, GridSearchCV
from mljokes.data import read_jokes, read_ratings
from mljokes import cf_utils as cfu

## Data Prep

In [24]:
# Read jokes
jokes = read_jokes()

In [16]:
# Read and consolidate jester ratings
ratings = read_ratings()
ratings.head(3)

Unnamed: 0,user_id,joke_id,rating,test_user
0,0,1,99.0,0
1,0,2,99.0,0
2,0,3,99.0,0
3,0,4,99.0,0
4,0,5,-1.65,0


In [17]:
# Filter put missing rating (99.0)
unseen = ratings.loc[ratings['rating']==99]
ratings = ratings.loc[ratings['rating']!=99]
ratings.reset_index(inplace=True, drop=True)
ratings.sort_values(by=['user_id', 'joke_id'], ascending=True, inplace=True, ignore_index=True)
ratings.head(3)

Unnamed: 0,user_id,joke_id,rating,test_user
0,0,5,-1.65,0
1,0,7,-0.78,0
2,0,8,6.89,0


## SVD Model

In [5]:
# Build custom dataset 
reader = Reader(rating_scale=(ratings.rating.min(), ratings.rating.max()))
data = Dataset.load_from_df(ratings[['user_id', 'joke_id', 'rating']], reader)

In [6]:
trainset, testset = train_test_split(data, test_size=.25)

# Train model
model = SVD()
model.fit(trainset)

# Evaluate model based on rmse
predictions = model.test(testset)
accuracy.rmse(predictions)


RMSE: 4.4251


4.425063786621994

In [51]:
# Save model
file_name = './results/cf_model'
dump.dump(file_name, algo=model, verbose=0)

In [None]:
trainset = trainset.build_full_trainset()
model = SVD()
model.fit(trainset)

## Get Recommendations

In [52]:
# Load model
file_name = './results/cf_model'
_, model = dump.load(file_name)

In [53]:
users = ratings.loc[ratings['test_user']==1, 'user_id'].unique()[:3]

for u in users:
    unseen_by_user = unseen.values[unseen['user_id']==u, 1]
    u_jokes, u_ratings = cfu.get_recommendations(model, u, unseen_by_user, 3)
    print("Recommendations for user", u)
    print("-"*30)
    print(u_jokes, u_ratings)
    display_jokes(jokes, u_jokes)

    

Recommendations for user 1229
------------------------------
[76 66 52] [2.60074274 2.20493765 1.93294894]
There once was a man and a woman that both got in a terrible car wreck. Both of their vehicles were completely destroyed, buy fortunately, no one was hurt. In thankfulness, the woman said to the man, 'We are both okay, so we should celebrate. I have a bottle of wine in my car, let's open it.' So the woman got the bottleout of the car, and handed it to the man. The man took a really big drink, and handed the woman the bottle. The woman closed the bottle and put it down. The man asked, 'Aren't you going to take a drink?' The woman cleverly replied, 'No, I think I'll just wait for the cops to get here.' 

A lawyer opened the door of his BMW, when suddenly a car came along and hit the door, ripping it off completely. When the police arrived at the scene, the lawyer was complaining bitterly about the damage to his precious BMW. "Officer, look what they've done to my Beeeeemer!!!", he w