# Recommender System

Some hints for hacking the challenge:

* Why would they have selected this problem for the challenge? 
* What are some gotchas in this domain I should know about?
* What is the highest level of accuracy that others have achieved with this dataset or similar problems / datasets ?
* What types of visualizations will help me grasp the nature of the problem / data?
* What feature engineering might help improve the signal?
* Which modeling techniques are good at capturing the types of relationships I see in this data?
* Now that I have a model, how can I be sure that I didn't introduce a bug in the code? If results are too good to be true, they probably are!
* What are some of the weakness of the model and and how can the model be improved with additional work?


### Summary

I have used spotlight (https://github.com/maciejkula/spotlight) library to import MovieLens 1M dataset, split the dataset to train and test set, train it using Implicit Factorization model. The spotlight library can also be used to import MovieLens 10M/20M/100K dataset. 

### Import Libraries

In [1]:
from spotlight.cross_validation import random_train_test_split
from spotlight.datasets.movielens import get_movielens_dataset
from spotlight.evaluation import mrr_score
from spotlight.factorization.implicit import ImplicitFactorizationModel

print ("Done Import")

Done Import


### Get the dataset and randomly split to train and test set

In [3]:
#get the dataset
dataset = get_movielens_dataset(variant='1M')

#split the train test set
train, test = random_train_test_split(dataset)

#number of users and movies in training data
n_users = train.num_users
n_items = train.num_items

print ("number of users")
print (n_users)

print ("number of movies")
print (n_items)

number of users
6041
number of movies
3953


### Get the Model

In [4]:
#get the model
model = ImplicitFactorizationModel(n_iter=3,loss='bpr')
print ("Using Implicit Factorization Model from Spotlight library")

Using Implicit Factorization Model from Spotlight library


### Fit the Model

In [5]:
#fit the model
model.fit(train)
print ("Fit the model with train data")

Fit the model with train data


### Find MRR Score

In [13]:
#One score is given for every user with interactions in the test set, 
#representing the mean reciprocal rank of all their test items.

#MRR scores: it's ranking every item for a given user, then computing the MRR of the positive items in the test set. 
#In that sense, it is constructing a ranking for each user (making predictions) and then computing the performance metric.

mrr = mrr_score(model, test)
print ("MRR Score")
print (mrr)

MRR Score
[ 0.10952729  0.02140233  0.12615023 ...,  0.00300406  0.00907233
  0.02024415]


### Define Recommendation Function

In [11]:
#The higher the score, the stronger the recommendation
#For example if outputs a negative number, the model ranks that movie lower than if the output were a positive number

def sample_recommendation(model, data, user_ids):

    #generate recommendations for each user we input
    for user_id in user_ids:
        #movies our model predicts they will like
        scores = model.predict(user_id)
        print("User %s" % user_id)

        print (scores)

In [12]:
#call the recommendation function to test
sample_recommendation(model, dataset, [3, 25, 450])


User 3
[-20.88231277  20.29821587  13.05710411 ..., -13.8057003   -9.36943722
   0.94555044]
User 25
[-18.12098885  27.03703308  12.57414818 ...,  -5.48110199  -3.6222446
   5.24459696]
User 450
[-14.90546608  22.6123867    4.96127892 ...,  -2.64198732  -2.8596313
   9.96584606]


### Discussion

#### Why would they have selected this problem for the challenge?

#### What are some gotchas in this domain I should know about?

#### What is the highest level of accuracy that others have achieved with this dataset or similar problems / datasets ?

#### What types of visualizations will help me grasp the nature of the problem / data?

#### What feature engineering might help improve the signal?

#### Which modeling techniques are good at capturing the types of relationships I see in this data?

#### Now that I have a model, how can I be sure that I didn't introduce a bug in the code? If results are too good to be true, they probably are!

#### What are some of the weakness of the model and and how can the model be improved with additional work?