# Skafos Recommender System Report


The report below address the ways how a recommender system can be implemented as a proof of concept and will dive deeper into the various business use cases that can benefit from this algorithm

<img src="Capture.png" alt="Skafos Website Image">

### Importing the python module with all functions written

This module helps to modularize this report with only relevant parts removing the clutter. Please go through the module attached in the same zip file for further information on technical details of the code used as helper functions

In [114]:
# Importing Libraries and helper functions

from IPython.display import HTML ## Setting display options for Ipython Notebook

import recommender_system_module

### Loading the data 

In this proof of concept, we use the Movie Lens dataset with reduced data points for experimentation. From this dataset, we import the ratings and the movie data. 

<b>What sample data did you find/use/create?</b>
<li> Good amount of resources available on this dataset because it is a common dataset used for creating experimentation on recommender systems </li>
<li>The dataset provides the option to use the smaller version of the dataset to provide quick experimentation and demos  </li>
<li> The dataser requires minimal cleaning and preprocessing tht makes it great for creating quick use cases </li>

In [43]:
# Importing rating data and having a look
ratings = pd.read_csv('./ml-latest-small/ratings.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


As we can see rating data contain user id, movie id and a rating between 0.5 to 5 with a timestamp representing when the rating was given.

In [44]:
movies = pd.read_csv('./ml-latest-small/movies.csv')
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


Movie data consist of movie id, their title, and genre they belong.

### Creating an Interaction Matrix

The Interaction matrix is when the data is created in an interaction format where rows represent each user and columns represent each movie id with ratings as values. This matrix is very useful for further steps of building a recommender system. 




In [45]:
# Creating interaction matrix using rating data
interactions = create_interaction_matrix(df = ratings,
                                         user_col = 'userId',
                                         item_col = 'movieId',
                                         rating_col = 'rating')
interactions.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,4.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Creating User and Item Dictionary to be used while providing recommendations


In [46]:
# Create User Dict
user_dict = create_user_dict(interactions=interactions)
# Create Item dict
movies_dict = create_item_dict(df = movies,
                               id_col = 'movieId',
                               name_col = 'title')

## Building the Matrix Factorization Model




<b>What machine learning algorithms did you use and why?</b>
<li> The model of choice is the Matrix Factorization from the LightFM package </li>
<li>  LightFM combines both approaches - <b>Content Based Filtering</b> and <b>Collaborative Filtering</b> and overcome a lot of the challenges of each individual approach. </li>

They can deal with new items or new users:

When you deploy a collaborative model to production, you’ll often run into the problem that you need to predict for unseen users or items – like when a new user registers or visits your website, or your content team publishes a new article.

Usually you have to wait at least until the next training cycle, or until the user interacts with some item, to be able to make recommendations for these users.

But the hybrid model can make predictions even in this case: It will simply use the partially available features to compute the recommendations.

Hybrid models can also deal with missing features:

Sometimes features are missing for some users and items (simply because you haven’t been able to collect them yet), which is a problem if you’re relying on a content-based model.

Hybrid recommenders perform for returning users (those who are known from training) as well as new users/items, as long as you have features about them. This is especially useful for items, but also for new users (you can ask users what they’re interested in when they visit your site for the first time).

In [47]:
mf_model = runMF(interactions = interactions,
                 n_components = 30,
                 loss = 'warp',
                 epoch = 30,
                 n_jobs = 4)

# Business Use Cases

## Usecase 1: Item recommendation to a user

<p>  In this use case, we want to show a user, items he might be interested in buying/viewing based on his/her interactions done in the past. Typical industry examples for this are like “Deals recommended for you” on Amazon or “Top pics for a user” on Netflix or personalized email campaigns. </p>

<p> We can use the sample_recommendation_user function for this case. This functions take matrix factorization model, interaction matrix, user dictionary, item dictionary, user_id and the number of items as input and return the list of item id’s a user may be interested in interacting. </p>

<p> The <b>User ID 2</b> likes blockbuster hits, Oscar winning movies, Action genre, and Super hero movies such as Wolf of Wall Street, Dark Knight, Mad Max, etc and it clearly recommended movies in similar genre and likings such as Inception, Interstellar, Shawshank Redemption, Shutter Island, Dark Knight Rises etc. that clearly show that the recommeder system works well to classify the user likes and recommend items similar to his/her likings. </p>
<p> Similar models can also be used for building sections like “Based on your recent browsing history” recommendations by just changing the rating matrix only to contain interaction which is recent and based on browsing history visits on specific items. </p>

In [71]:
## Calling 10 movie recommendation for user id 2
rec_list = sample_recommendation_user(model = mf_model, 
                                      interactions = interactions, 
                                      user_id = 2, 
                                      user_dict = user_dict,
                                      item_dict = movies_dict, 
                                      threshold = 4,
                                      nrec_items = 10,
                                      show = True)

Known Likes:
1- The Jinx: The Life and Deaths of Robert Durst (2015)
2- Mad Max: Fury Road (2015)
3- Wolf of Wall Street, The (2013)
4- Warrior (2011)
5- Inside Job (2010)
6- Town, The (2010)
7- Inglourious Basterds (2009)
8- Step Brothers (2008)
9- Dark Knight, The (2008)
10- Good Will Hunting (1997)

 Recommended Items:
1- Inception (2010)
2- Django Unchained (2012)
3- Interstellar (2014)
4- Up (2009)
5- Shawshank Redemption, The (1994)
6- Shutter Island (2010)
7- Superbad (2007)
8- Iron Man (2008)
9- Dark Knight Rises, The (2012)
10- Departed, The (2006)


## Usecase 2: User recommendation to a item

<p>In this use case, we will discuss how we can recommend a list of users specific to a particular item. Example of such cases is when you are running a promotion on an item and want to run an e-mail campaign around this promotional item to only 10,000 users who might be interested in this item.  </p>
<p>We can use the sample_recommendation_item function for this case. This functions take matrix factorization model, interaction matrix, user dictionary, item dictionary, item_id and the number of users as input and return the list of user id’s who are more likely be interested in the item.  </p>
<p> As you can see function return a list of userID who might be interested in item id 7. Another example why you might need such model is when there is an old inventory sitting in your warehouse which needs to clear up otherwise you might have to write it off, and you want to clear it by giving some discount to users who might be interested in buying. </p>



In [72]:
## Calling 15 user recommendation for item id 7
sample_recommendation_item(model = mf_model,
                           interactions = interactions,
                           item_id = 7,
                           user_dict = user_dict,
                           item_dict = movies_dict,
                           number_of_user = 15)

[455, 270, 38, 458, 14, 121, 214, 404, 530, 446, 35, 150, 142, 349, 512]

## Usecase 3: Item recommendation to items

<p> In this use case, we will discuss how we can recommend a list of items specific to a particular item. This kind of models will help you to find similar/related items or items which can be bundled together. Typical industry use case for such models are in cross-selling and up-selling opportunities on product page like <i>“Products related to this item”</i>, <i>“Frequently bought together”</i>, <i>“Customers who bought this also bought this”</i> and <i>“Customers who viewed this item also viewed”.</i>
<i>“Customers who bought this also bought this”</i> and <i>“Customers who viewed this item also viewed”</i> can also be solved through market basket analysis. </p>

To achieve this use case, we will create a cosine distance matrix using item embeddings generated by matrix factorization model. This will help us calculate similarity between items, and then we can recommend top N similar item to an item of interest. First step is to create a item-item distance matrix using the create_item_emdedding_distance_matrix function. This function takes matrix factorization models and interaction matrix as input and returns an item_embedding_distance_matrix.

In [50]:
## Creating item-item distance matrix
item_item_dist = create_item_emdedding_distance_matrix(model = mf_model,
                                                       interactions = interactions)
## Checking item embedding distance matrix
item_item_dist.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.723447,0.39656,0.050512,0.404206,0.550951,0.277181,-0.172806,-0.04648,0.683943,...,-0.752654,-0.562824,-0.663769,-0.704766,-0.685246,-0.677348,-0.691598,-0.741992,-0.692034,-0.160309
2,0.723447,1.0,0.543204,0.249388,0.596315,0.521,0.412073,0.021579,0.117613,0.659097,...,-0.537373,-0.378787,-0.467492,-0.527341,-0.50739,-0.485915,-0.503944,-0.551678,-0.558859,-0.204803
3,0.39656,0.543204,1.0,0.407931,0.637342,0.359759,0.559229,0.066711,0.397449,0.605802,...,-0.445049,-0.403174,-0.460357,-0.467931,-0.480889,-0.430935,-0.509483,-0.521553,-0.537097,-0.611855
4,0.050512,0.249388,0.407931,1.0,0.532424,0.108768,0.639103,0.220541,0.459687,0.077649,...,-0.130509,-0.117315,-0.141524,-0.110657,-0.150576,-0.206342,-0.144814,-0.120778,-0.127819,-0.355229
5,0.404206,0.596315,0.637342,0.532424,1.0,0.191461,0.605178,0.120899,0.348679,0.386718,...,-0.345075,-0.237765,-0.298857,-0.32025,-0.333892,-0.289534,-0.309297,-0.298172,-0.319273,-0.40452


<p> As we can see the matrix have movies as both row and columns and the value represents the cosine distance between them. Next step is to use item_item_recommendation function to get top N items with respect to an item_id. This function takes item embedding distance matrix, item_id, item_dictionary and number of items to be recommended as input and return similar item list as output. </p>

<p> As we can see for “Godfather: Part II” movie we are getting movies with similar release year as it maps user likes with similar time of movie release as well as other parts of the Godfather as possible recommendations as the item of choice </p>

In [113]:
## Calling 10 recommended items for item id 
rec_list = item_item_recommendation(item_emdedding_distance_matrix = item_item_dist,
                                    item_id =1221,
                                    item_dict = movies_dict,
                                    n_items = 10)

Item of interest :Godfather: Part II, The (1974)
Item similar to the above item:
1- Godfather, The (1972)
2- Reservoir Dogs (1992)
3- Goodfellas (1990)
4- 2001: A Space Odyssey (1968)
5- Matrix, The (1999)
6- Usual Suspects, The (1995)
7- Alien (1979)
8- Gladiator (2000)
9- Blade Runner (1982)
10- Saving Private Ryan (1998)


## Are there any trade offs to expect with this approach (inference speed, performance, etc)?

There are 3 major issues to deal with when deploying this model to production:
<ol> 
    <li> <b>Data issues:</b> The model with hybrid approach is more accurate with fewer data points and tend to perform badly when the data to be processed increases. This issue can be easily handled in production by keeping a threshold of k data points per user and pulling only the latest k interactions of the user with the system while retraining the model. This will not only solve the data issue but will also keep the model trained on the latest trends and behaviours of the user </li>
    <li> <b>Trust:</b> Although some user behaviour can be modeled, there are many users whose actual behaviour and preferances are very different from the recommendations provided because of sparse data points and limited interaction with the system creating bias. The recommendations can also be manipulated by creating false reviews of certain products resulting in that product being the top of all charts. </li>
    <li> <b>Lack of Innovation:</b>When the model is not retrained with new data, it may create a bias which negatively impacts the performance. For example, the Disney movie Frozen when released, if the model was not updated, recommended frozen food to users because of a bias created and the inability of the model to distinguesh the movie from other frozen items. </li>

</ol>

## How do you plan to validate that the models are “performing well enough” both at train time and once deployed in the wild?

The validation of the model can be performed with the below strategies:
<ul>
<li>The model can be split into K-Folds performing K-Fold cross validation to understand how the model performs with different training and validation data </li>
<li>The two best evaluation metrics for such a model will be RMSE and R^2 as they capture how well the model performs with Bias and Variance in the data </li>

<li>The <b>root of mean squared error (RMS or RMSE)</b> is used to measure the differences between the model predicted values and the test dataset observed values. Technically it's the square root of the average of the squares of the errors. The lower it is, the better the model is. </li>
<li><b>R Squared</b> indicates how well data fits a model. Ranges from 0 to 1. A value of 0 means that the data is random or otherwise can't be fit to the model. A value of 1 means that the model exactly matches the data. You want your R Squared score to be as close to 1 as possible. </li>
</ul>
<p> Once the model is in the wild, we can continuously measure the performance of the model with real time graphs and visualization as well as set triggers and maintain thresholds to ensure that the model maintains its performance. </p>

<p> We need to retrain the model with new data based on <b>Concept Drift</b> to counter the changes in behaviour of customers based on natural causes or time change and this can be either triggered with the performance drop of the model in case of situations such as Covid-19 that changed the way users behave or retrained on regular intervals to ensure we capture the latest trends of the users </p>


## What are the deployment options for the solution (i.e. how will the engineering team be able to use them)?

One possible deployment option is listed below:

The core of the system is a flask app that receives a user ID and returns the relevant items for that user. It will (re)load the LightFM model and query a redis instance for item and/or user features.

We’ll assume that user and item features are stored and serialised in a redis database and can be retrieved by the flask app at any time.

All applications will be deployed as microservices via docker containers.

<img src="Capture2.png" alt="ML Infrastructure">

## References
<ol>
    <li>https://www.rtinsights.com/three-challenges-for-recommendation-engines/  </li>
    <li> https://www.kaggle.com/shreyashnadage/lightfm-with-plotly </li>
    <li> https://www.analyticsvidhya.com/blog/2018/01/factorization-machines/ </li>
    <li>https://www.datarevenue.com/en-blog/building-a-production-ready-recommendation-system  </li>
    <li> https://github.com/lyst/lightfm </li>
    <li> https://towardsdatascience.com/how-to-create-a-production-ready-recommender-system-3c932752f8ea </li>
</ol>