## Installing lightfm & Import libraries¶

In [2]:
!pip install lightfm

Collecting lightfm
[?25l  Downloading https://files.pythonhosted.org/packages/e9/8e/5485ac5a8616abe1c673d1e033e2f232b4319ab95424b42499fabff2257f/lightfm-1.15.tar.gz (302kB)
[K    100% |████████████████████████████████| 307kB 2.6MB/s ta 0:00:01
Building wheels for collected packages: lightfm
  Running setup.py bdist_wheel for lightfm ... [?25ldone
[?25h  Stored in directory: /Users/drpoo/Library/Caches/pip/wheels/eb/bb/ac/188385a5da6627956be5d9663928483b36da576149ab5b8f79
Successfully built lightfm
Installing collected packages: lightfm
Successfully installed lightfm-1.15


In [3]:
import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM
from lightfm.cross_validation import random_train_test_split
from lightfm.evaluation import auc_score



## Fetch data from movielens dataset

In [4]:
#fetch data and format it
data = fetch_movielens(min_rating=4.0)

In [5]:
for col in data:
  print (col)

train
test
item_features
item_feature_labels
item_labels


In [6]:
#print training and testing data
print(repr(data['train']))
print(repr(data['test']))

<943x1682 sparse matrix of type '<class 'numpy.int32'>'
	with 49906 stored elements in COOrdinate format>
<943x1682 sparse matrix of type '<class 'numpy.int32'>'
	with 5469 stored elements in COOrdinate format>


## Modelling
Here we are using three types of loss function

- logistic: useful when both positive (1) and negative (-1) interactions are present.
- bpr : Bayesian Personalised Ranking pairwise loss.
- warp : Weighted Approximate-Rank Pairwise loss.

In [7]:
#create model
model = LightFM(loss='warp')
model1 = LightFM(loss='logistic')
model2 = LightFM(loss='bpr')
#train model
model.fit(data['train'], epochs=30, num_threads=4)
model1.fit(data['train'], epochs=30, num_threads=4)
model2.fit(data['train'], epochs=30, num_threads=4)

<lightfm.lightfm.LightFM at 0x10da331d0>

In [8]:
#number of users and movies in training data
n_users, n_items = data['train'].shape

# Indices of movie rated by the user_id '3'
#data['train'].todense()[3].nonzero()
data['train'].tocsr()[3].indices

array([ 10, 257, 270, 299, 300, 323, 326, 328, 358, 359, 361], dtype=int32)

In [9]:
data['item_labels'][data['train'].tocsr()[3].indices]


array(['Seven (Se7en) (1995)', 'Contact (1997)',
       'Starship Troopers (1997)', 'Air Force One (1997)',
       'In & Out (1997)', 'Lost Highway (1997)', 'Cop Land (1997)',
       'Desperate Measures (1998)', 'Assignment, The (1997)',
       'Wonderland (1997)', 'Blues Brothers 2000 (1998)'], dtype=object)

In [10]:
data['item_labels'].shape

(1682,)

## Check Metrics Score
Here we are using AUC Score as a metrics, where 1 being the best model.

In [11]:
auc = auc_score(model, data['test'],train_interactions=data['train'])
auc1 = auc_score(model1, data['test'],train_interactions=data['train'])
auc2 = auc_score(model2, data['test'],train_interactions=data['train'])
print ('WARP AUC Score: {} '.format(auc.sum()/auc.shape))
print ('Logistic AUC Score : {} '.format(auc1.sum()/auc1.shape))
print ('BPR AUC Score : {} '.format(auc2.sum()/auc2.shape))

WARP AUC Score: [0.93231502] 
Logistic AUC Score : [0.87935219] 
BPR AUC Score : [0.86290903] 


Since, Warp model outperforms the other two model. We will be using warp model to predict the movies.

In [12]:
def sample_recommendation(model, data, user_ids):

    #number of users and movies in training data
    n_users, n_items = data['train'].shape

    #generate recommendations for each user we input
    for user_id in user_ids:

        #movies they already like
        known_positives = data['item_labels'][data['train'].tocsr()[user_id].indices]

        #movies our model predicts they will like
        scores = model.predict(user_id, np.arange(n_items))
        #rank them in order of most liked to least
        top_items = data['item_labels'][np.argsort(-scores)]

        #print out the results
        print("User %s" % user_id)
        print("     Known positives:")

        for x in known_positives[:3]:
            print("        %s" % x)

        print("     Recommended:")

        for x in top_items[:3]:
            print("        %s" % x)

In [13]:
sample_recommendation(model, data, [3, 25, 450])

User 3
     Known positives:
        Seven (Se7en) (1995)
        Contact (1997)
        Starship Troopers (1997)
     Recommended:
        Scream (1996)
        L.A. Confidential (1997)
        Titanic (1997)
User 25
     Known positives:
        Dead Man Walking (1995)
        Star Wars (1977)
        Fargo (1996)
     Recommended:
        Contact (1997)
        Fargo (1996)
        L.A. Confidential (1997)
User 450
     Known positives:
        Contact (1997)
        George of the Jungle (1997)
        Event Horizon (1997)
     Recommended:
        Scream (1996)
        I Know What You Did Last Summer (1997)
        Kiss the Girls (1997)
