# LKPy Example
INFO 4817 / 5871
Spring 2019
Professor Robin Burke

### Import libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

### Import LKPy classes

In [2]:
from lenskit import batch, topn, util
from lenskit import crossfold as xf
from lenskit.algorithms import Recommender
from lenskit.algorithms.basic import Bias, Popular
from lenskit.metrics.predict import rmse

## Load MovieLens data
### Ratings
Need to label columns user, item, rating, timestamp. LKPy expectation

In [3]:
ratings = pd.read_csv('ratings.csv', encoding='Latin 1')

ratings.columns = ['user', 'item', 'rating', 'timestamp']

ratings.head()

Unnamed: 0,user,item,rating,timestamp
0,12882,1,4.0,1147195252
1,12882,32,3.5,1147195307
2,12882,47,5.0,1147195343
3,12882,50,5.0,1147185499
4,12882,110,4.5,1147195239


### Movies
Set index for easy joins later. Drop the genre column.

In [12]:
movies = pd.read_csv('movies.csv', encoding='Latin 1')
movies = movies.set_index('movieId')
movies = movies.drop(columns='genres')
movies.head()

Unnamed: 0_level_0,title
movieId,Unnamed: 1_level_1
1,Toy Story (1995)
2,Jumanji (1995)
3,Grumpier Old Men (1995)
4,Waiting to Exhale (1995)
5,Father of the Bride Part II (1995)


## Popular recommender
### Instantiate

In [13]:
pop = Popular()

### Fit

In [14]:
pop.fit(ratings)

<lenskit.algorithms.basic.Popular at 0x11611f400>

### Compute recommendations for a particular user
User 12882 (they should all be the same!). 20 movies.

Set index and join with titles.

In [27]:
pop_recs = pop.recommend(1282, 20, candidates=ratings['item'].unique())
pop_recs = pop_recs.set_index('item')
pop_recs.join(movies)

Unnamed: 0_level_0,score,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1
2571,668,"Matrix, The (1999)"
4993,628,"Lord of the Rings: The Fellowship of the Ring,..."
356,621,Forrest Gump (1994)
296,613,Pulp Fiction (1994)
5952,597,"Lord of the Rings: The Two Towers, The (2002)"
2959,588,Fight Club (1999)
7153,577,"Lord of the Rings: The Return of the King, The..."
318,564,"Shawshank Redemption, The (1994)"
260,535,Star Wars: Episode IV - A New Hope (1977)
593,533,"Silence of the Lambs, The (1991)"


## Do these recommendations look the same or different to what you got for the popularity-based recommender in Homework 1?

A. Pretty much the same

B. Different

## Bias recommender

### Instantiate
Use damping = 5

In [15]:
bias = Bias(damping=5)

### Fit
Also create a `Recommender` object -- `Bias` is a `Predictor` object. Note that `Popular` is not a `Predictor` object: you can't use it to predict ratings.

In [16]:
bias.fit(ratings)

<lenskit.algorithms.basic.Bias at 0x11611ea58>

In [17]:
bias_t = Recommender.adapt(bias)

### Compute recommendations for a particular user
Use all movies as candidates

In [19]:
recs = bias_t.recommend(1282, 20, candidates=ratings['item'].unique())
recs = recs.set_index('item')
recs.join(movies)

Unnamed: 0_level_0,score,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1
318,4.356802,"Shawshank Redemption, The (1994)"
858,4.306888,"Godfather, The (1972)"
2959,4.252142,Fight Club (1999)
1203,4.226909,12 Angry Men (1957)
296,4.212007,Pulp Fiction (1994)
7502,4.210983,Band of Brothers (2001)
1221,4.207637,"Godfather: Part II, The (1974)"
1248,4.19526,Touch of Evil (1958)
2571,4.190223,"Matrix, The (1999)"
4226,4.187542,Memento (2000)


# Compute RMSE of this recommender
5-fold cross-validation partitioned by user. Key function is 

`xf.partition_users`: takes a data frame, a number of iterations, and a sampling method

Set the random seed!

For each partition:

1. Create a new recommender object
1. Fit to training partition
1. Call predict. (Takes a data frame of user, item pairs.)
1. Get the prediction results (odd data structure returns)
1. Calculate RMSE
1. Print it and build list

Compute the mean

In [26]:
np.random.seed(20190207)

rmse_lst = []
for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], 5, xf.SampleFrac(0.2)):
    bias = Bias(damping=5)
    bias.fit(train)
    to_predict = test[['user', 'item']]
    pred = bias.predict(to_predict)
    to_predict['pred'] = pred
#     print(to_predict)
    rmse_val = rmse(to_predict['pred'], test['rating'])
    print(rmse_val)
    rmse_lst.append(rmse_val)
    
#     test['pred']

print(rmse_lst)
np.mean(rmse_lst)

0.8180835778251909
0.7886417312895626
0.828339174029659
0.805508641585362
0.8068673097796218
[0.8180835778251909, 0.7886417312895626, 0.828339174029659, 0.805508641585362, 0.8068673097796218]


0.8094880869018792

## Why set the random seed for this step?

**A**. Because otherwise the cell will produce different results each time.

B. Because otherwise random numbers won't get generated.

C. Because otherwise the cell will produce the same results each time.

## What is the difference between computing RMSE (root mean squared error) and MAE (mean absolute error) for this experiment?

**A**. RMSE magnifies large errors on individual ratings compared to MAE

B. RMSE minimizes large errors on individual ratings compared to MAE

C. They are equivalent between the square root and the square are inverses

D. RMSE is a ranking measure and MAE is an accuracy measure.