<small><i>Updated January 2020 - This notebook was created by [Santi Seguí](https://ssegui.github.io/). </i></small>

<div class="alert alert-info" style = "border-radius:10px;border-width:3px;border-color:darkblue;font-family:Verdana,sans-serif;">
<h1>Recommenders Systems</h1>
<h3>Item-Based Recommender System</h3>
</div>
<br>

Item-Based Recommmender Systems is a special type of CF-Recommender Systems.
Instead on relying on the user similarity, prediction can rely on item similarities.

<img src="images/neighbourhood.png" width=600>


The similarity between two users is measured as a tendecy of measuring items simiarly. Pearson or Cosine distance are usually used.
### Similarity Functions

The computation of the similarity between the items is critical. Similar to the procedure done when comparing users. The similarity computation between two items <i>p</i> and <i>q</i> can be obtained by first isolating the users that have both rated the items (set <i>U</i>), and then then apply a distance or correlation function.

<ul>
    <li>Euclidean distance</li>
    $$sim(p,q) = \sqrt{\sum_{u \in P}{(r_{u,p} - r_{u,q})^2}}$$
    <br>
    <li>Pearson Correlation</li>
    $$sim(p,q) = \frac{\sum_{u\in U} (r_{u,p}-\bar{r_p})(r_{u,q}-\bar{r_q})}{\sqrt{\sum_{u \in U}(r_{u,p}-\bar{r_p})^2}\sqrt{\sum_{u\in U}(r_{u,q}-\bar{r_q})^2}}$$
    <br>
    <li>Cosine distance</li>
    $$ sim(p,q) = \frac{\vec{p}· \vec{q}}{|\vec{p}| * |\vec{q}|}$$
    <br>
    </ul>
<br>
Where: 

* $sim(p,q)$ is the similarity between items "p" and user "q"
* $U$ is the set of users that have both rated "p" and "q"
* $r_{u,p}$ is the rating of movie "p" by user "u"
* $\bar{r_p}$ is the mean rating of product "p"

<br>

### WARNING : 
Computing similarity using basic cosine measure in item-based case has one important drawback: The differences in rating scale between different users are not taken into account.
The Adjusted Cosine Similarity offsets this drawback by subtracting the corresponding user average from each co-rated pair:
$$ sim(p,q) = \frac{\sum_{u \in U} (r_{u,p} - \bar{r}_u) (r_{u,j} -\bar{r}_u)}{\sqrt{ \sum_{u \in U}  (r_{u,p} - \bar{r}_u)^2}\sqrt{ \sum_{u \in U}  (r_{u,q} - \bar{r}_u)^2} } $$

### Prediction Function

$$pred(u,p) =  \frac{\sum_{q \in M}{sim(p,q)*(r_{u,q})}}{\sum_{q \in M}{sim(p,q)}} $$

In [1]:
#NETFLIX REAL 50.000.000 usuaris and 100.000 items
%autosave 150
%matplotlib inline
import pandas as pd
import numpy as np
import math
import matplotlib.pylab as plt

# Load Data set
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols)

r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep='\t', names=r_cols)

# the movies file contains columns indicating the movie's genres
# let's only load the first three columns of the file with usecols
m_cols = ['movie_id', 'title', 'release_date']
movies = pd.read_csv('ml-100k/u.item', sep='|', names=m_cols, usecols=range(3), encoding='latin-1')

# Construcció del DataFrame
data = pd.merge(pd.merge(ratings, users), movies)
data = data[['user_id','title', 'movie_id','rating','release_date','sex','age']]


print("La BD has "+ str(data.shape[0]) +" ratings")
print("La BD has ", data.user_id.nunique()," users")
print("La BD has ", data.movie_id.nunique(), " movies")
data.head()



# let's define some functions
def compute_rmse(y_pred, y_true):
    """ Compute Root Mean Squared Error. """
    return np.sqrt(np.mean(np.power(y_pred - y_true, 2)))

def assign_to_set(df):
    sampled_ids = np.random.choice(df.index,
                                   size=np.int64(np.ceil(df.index.size * 0.2)),
                                   replace=False)
    df.loc[sampled_ids, 'for_testing'] = True
    return df

def create_train_test(data,key = 'user_id'):
    data['for_testing'] = False
    grouped = data.groupby(key, group_keys=False).apply(assign_to_set)
    # dataframe used to train our model
    data_train = data[grouped.for_testing == False]
    # dataframe used to evaluate our model
    data_test = data[grouped.for_testing == True]
    return data_train, data_test

def evaluate(estimate_f,data_train,data_test):
    """ RMSE-based predictive performance evaluation with pandas. """
    ids_to_estimate = zip(data_test.user_id, data_test.movie_id)
    estimated = np.array([estimate_f(u,i) if u in data_train.user_id else 3 for (u,i) in ids_to_estimate ])
    real = data_test.rating.values
    return compute_rmse(estimated, real)


# In order to speed up the test let's create a tiny dataset
dataSmall = data[data['movie_id']<100]
print(dataSmall.shape)

dataSmall_train, dataSmall_test =  create_train_test(data)

print('# Users:', dataSmall.user_id.nunique())
print('# Movies:',dataSmall.movie_id.nunique())

Autosaving every 150 seconds
La BD has 100000 ratings
La BD has  943  users
La BD has  1682  movies
(14384, 7)
# Users: 805
# Movies: 99


<div class="alert alert-success">
**EXERCISE: Create an item-based recommender system. **
<br>Instead of computing similiarity between users we need to compute similarities between items. 
</div>

In [2]:
from scipy.stats import pearsonr

# Returns a pearsonCorrealation-based similarity score for person1 and person2
def SimPearson(DataFrame,User1,User2,min_common_items=1):
    # GET MOVIES OF USER1
    movies_user1=DataFrame[DataFrame['user_id'] ==User1 ]
    # GET MOVIES OF USER2
    movies_user2=DataFrame[DataFrame['user_id'] ==User2 ]
    
    # FIND SHARED FILMS
    rep=pd.merge(movies_user1 ,movies_user2,on='movie_id',)
    if len(rep)==0:
        return 0    
    if(len(rep)<min_common_items):
        return 0    
    res=pearsonr(rep['rating_x'],rep['rating_y'])[0]
    if(isnan(res)):
        return 0
    return res

# Returns a pearsonCorrealation-based similarity score for movie1 and movie2
def SimPearsonItem(DataFrame,movie1,movie2,min_common_items=1):
    # GET USER THAT HAVE SEEN MOVIE1

    # GET USER THAT HAVE SEEN MOVIE2
    
    # FIND USERS THAT HAVE SEEN BOTH OF THEM
    
    # COMPUTE AND RETURN ITEM SIMILARITY
    return 0
    

In [3]:
import tqdm as tq # conda install -y tqdm

class CollaborativeFilteringItemBased:
    """ Item-Based Collaborative filtering using a custom sim(p,p'). """
    
    def __init__(self,DataFrame, similarity=SimPearsonItem, min_common_items=10, max_sim_movies=10):
        """ Constructor """
        self.sim_method=similarity# Gets recommendations for a person by using a weighted average
        self.df=DataFrame
        self.sim = pd.DataFrame(np.sum([0]), columns = DataFrame.movie_id.unique(), index = DataFrame.movie_id.unique())
        self.min_common_items=min_common_items
        self.max_sim_movies=max_sim_movies

    def learn(self):
        """ Prepare data structures for estimation. Similarity matrix for users """
        allMovies=set(self.df['movie_id'])
        self.sim = {}
        pbar = tq.tqdm(total=len(allMovies))
        for movie1 in allMovies:
            self.sim.setdefault(movie1, {})
            a=self.df[self.df['movie_id']==movie1][['user_id']]
            data_reduced=pd.merge(self.df,a,on='user_id')
            for movie2 in allMovies:
                # no es comparem am nosalres mateixos
                if movie1==movie2: continue
                self.sim.setdefault(movie2, {})
                if(movie1 in self.sim[movie2]):continue # since is a simetric matrix
                sim=self.sim_method(data_reduced,movie1,movie2,self.min_common_items)
                #print movie1,movie2,sim
                if(sim<0):
                    self.sim[movie1][movie2]=0
                    self.sim[movie2][movie1]=0
                else:
                    self.sim[movie1][movie2]=sim
                    self.sim[movie2][movie1]=sim  
            pbar.update(1)
        pbar.close()
                
    def estimate(self, user_id, movie_id):
        if movie_id not in self.df.movie_id.unique(): # If the movie was not in the training set
            return 3.5
        totals={}
        movie_users=self.df[self.df['user_id'] ==user_id]
        rating_num=0.0
        rating_den=0.0
        allMovies=set(movie_users['movie_id'])
        
        listOrdered=sorted([(self.sim[movie_id][other],other) for other in allMovies if movie_id!=other],reverse=True)
        
        for item in range(min(len(listOrdered),self.max_sim_movies)):
            other=listOrdered[item][1]
            rating_num += self.sim[movie_id][other] * (float(movie_users[movie_users['movie_id']==other]['rating']))
            rating_den += self.sim[movie_id][other]
        if rating_den==0: 
            if self.df.rating[self.df['movie_id']==movie_id].mean()>0:
                # return the mean movie rating if there is no similar for the computation
                return self.df.rating[self.df['movie_id']==movie_id].mean()
            else:
                # else return mean user rating 
                return self.df.rating[self.df['user_id']==user_id].mean()
        return rating_num/rating_den

In [4]:
reco = CollaborativeFilteringItemBased(dataSmall_train,similarity=SimPearsonItem,min_common_items=1,max_sim_movies=10)
reco.learn()
reco.estimate(user_id=2,movie_id=55)

100%|██████████| 1651/1651 [00:12<00:00, 127.16it/s]


3.745614035087719

In [None]:
print('RMSE for Collaborative Recomender: %s' % evaluate(reco.estimate,dataSmall_train,dataSmall_test))

RMSE for Collaborative Recomender: 1.0794258283104319


### Advantatges

+ Item similarity used to be <b>more stable</b> than user-similarity, 
+ More scalable. The update frequency of the items similarity is not as critical than user-similarity since it is more stable
+ It uses to perform better in RMSE terms, while being also more scalable. <br>

Both advantages are related with the fact that there is usually less users than item.<br><br>


### Disadvantages
* It tends to recommend obvoius items. It is very difficult for the item-based method to discover highly different items to recommend. 
+ Why: Similarities are computed with more data. The evidence of non-popular items its really complex since there is no evidence on the similarity.
 + May be frustating for browsing/enterteiment 
 + Can be good for shopping, consumption taks

<div class="alert alert-info" style = "border-radius:10px;border-width:3px;border-color:darkblue;font-family:Verdana,sans-serif;">
<h3>Latent Factor Models.</h3>
</div>

Singular Value Decomposition (SVD) is another well known method for recommender systems.
The key idea fo the SVD model is to factorize the user-item matrix rating matrix into two lower rank matrices, one containing the user factors, while the other containing the  item factors.



### Dimensionality Reduction and Neighbourhood Methods

Dimensionality reduction can improve neighborhood methods in terms of accuracy and also in terms of efficiency.


* A reduced representation of the data can be created in terms of either row-wise latent factors or column-wise latent factors. In other, words, the reduced representation will compress items or users into latent factors. 
* Can alleviate the sparsity problems for neigbborhood-based models.
* Depending on which represetations is compressed, it can be used as user-based or item-based neigborhood model.
* The latent representation can be also computed in <b>both</b> dimensions simultaneously. The full $m \times n$ matrix is obtained in one shot without the use of neigborhood-based methods.



**Latent Factor Models**:
![alt svd](images/svd.png)

$$ R = UV^T$$
$$ \hat{R} \approx  UV^T$$
$$ \hat{r}_{i,j} \approx  \hat{u}_i\cdot\hat{v}_j$$
$$\sum_{s=1}^{k}\text{(Affinity of user i to concept S)}\cdot\text{(Affinity of item j to concept S)}$$



In other words, the idea is to found topics as follows:
![alt svd](images/svd2.png)
This representation can be computed using standard methods like <b>PCA</b> or <b>SVD</b>.


For the purpose of the recommendation systems, we can also be interested in the matrix factorization that keeps the same dimensionality. It can be understood as a model-based recommender system.

The matrix factorization is done on the user-item ratings matrix $R$. From a high level, matrix factorization can be thought of as finding 2 matrices whose product is the original matrix:
$$ \hat{R} = Q^TP$$
where:
$$ r_{u,i} = q^T_ip_u$$


The goal of the method consist of optimizing the matrices $Q$ and  $P$. It can be done as follows:
$$ minimize_{Q,P} \sum_{u,i \in K} (r_{u,i} - q^t_ip_u )^2$$

To generalize well and not over-fit the training set, a penalty term is usually introduced into the minimization equation. This is represented by a regularization factor $\lambda$ multiplied by the square sum of the magnitudes of user and item vectors.

$$ minimize_{Q,P} \sum_{u,i \in K} (r_{u,i} - q^t_ip_u )^2 + \lambda(|| q_i||^2 + || p_u||^2)$$



Ok, but... what should we do with the huge amount of unknow values? 
* Traditionally, the average rating of the movies was used for all unkown items.

And, what happens if the number of users and item is large? 
* The factorization of the matrix becomes harder and the solution is not ensured to be the optimal. 

### SVD
![alt svd](images/svd4.png)
* Columns of U and V are constrained to be mutually orthogonal. 
* Mutual orthogonality has the advantage that the concepts can be completely independent of one another. Can be interpreted in scatterplots

Studies showed that if the evaluation is done with <b>Top-N recommendations</b> metric unkown values can be set to 0. In this way we obtain a sparse user-item matrix. This is an important point since the comutation of the SVD is easier and faster on sparse matrices.

While similarities are hard to be computed in huge dimensional sparse rating matrices, these similarities between user/items in this reduced representation are more robust because the new low-dimensional representation is fully specified (and faster!). It can be computed, using the simple cosine or dot product. 

In [None]:
#Since this code is quite fast, lets work with the 1M dataset (results will be nicer!)
data = pd.io.parsers.read_csv('ml-1m/ratings.dat', 
    names=['user_id', 'movie_id', 'rating', 'time'],
    engine='python', delimiter='::')
movies = pd.io.parsers.read_csv('ml-1m/movies.dat',
    names=['movie_id', 'title', 'genre'],
    engine='python', delimiter='::')

In [None]:
#Lets create a rating Matrix of size M x N
num_movies = np.max(data.movie_id.values)
num_users = np.max(data.user_id.values)

ratings_mat = np.ndarray(shape=(num_movies, 
                                num_users),
                         dtype=np.uint8)
ratings_mat[data.movie_id.values-1, data.user_id.values-1] = data.rating.values

In [None]:
from scipy.sparse.linalg import svds

#Normalise matrix (subtract mean off)
normalised_mat = ratings_mat - np.asarray([(np.mean(ratings_mat, 1))]).T
A = normalised_mat.T /np.sqrt(num_movies -1)
U, S, V = np.linalg.svd(A)

In [None]:
#Let's check the shape of the matices
print(U.shape, S.shape,V.shape)

In [None]:
#Calculate cosine similarity, sort by most similar and return the top N.
def top_cosine_similarity(data, movie_id, top_n=10):
    index = movie_id - 1 # Movie id starts from 1
    movie_row = data[index, :]
    magnitude = np.sqrt(np.einsum('ij, ij -> i', data, data))
    similarity = np.dot(movie_row, data.T) / (magnitude[index] * magnitude)
    sort_indexes = np.argsort(-similarity)
    return sort_indexes[:top_n]

# Helper function to print top N similar movies
def print_similar_movies(movie_data, movie_id, top_indexes):
    print('Recommendations for {0}: \n'.format(
    movie_data[movie_data.movie_id == movie_id].title.values[0]))
    for id in top_indexes + 1:
        print(movie_data[movie_data.movie_id == id].title.values[0])

In [None]:
k = 50
movie_id = 1 # Let's check for one movie
top_n = 5

sliced = V.T[:, :k] # representative data
indexes = top_cosine_similarity(sliced, movie_id, top_n)
print_similar_movies(movies, movie_id, indexes)

<div class="alert alert-success">
**EXERCISE: Create a new method recommender method using the SVD.** </div>

In [None]:
from sklearn.decomposition import TruncatedSVD
class SVD_CollaborativeFiltering:
    """ Collaborative filtering using a custom sim(u,u'). """
    
    def __init__(self,DataFrame, similarity=SimPearson,num_components=10):
        """ Constructor """
    def learn(self):
        """ Prepare data structures for estimation. Similarity matrix for users """
                
    def estimate(self, user_id, movie_id):
        return 3.5

In [None]:
reco = SVD_CollaborativeFiltering(dataSmall_train,num_components=80)
reco.learn()
reco.estimate(user_id=2,movie_id=1)

In [None]:
print('RMSE for Collaborative Recomender: %s' % evaluate(reco.estimate,dataSmall_train,dataSmall_test))

### Let's think about the problem! 
![alt Netflix vs Spotify](images/NetFlixVsSpotify.png)


In [None]:
# What do you think?

### Important Challenges on Recommender Music:
* Cold Start Problem
 * Really sparse problem
 * Cross domain techniques
* Automatic PlayList Generation
 * Contextual features are really important: from time, mode or location to weather
* Evaluation
 * How do we know that was a good track for the user?

### RecSys 2018
Interesting Challenge on Recommender Systms:
http://www.recsyschallenge.com/2018/

This challenge, in collaboration with Spotify, focuses on music recommendation, specifically the challenge of automatic playlist continuation. By suggesting appropriate songs to add to a playlist, a Recommender System can increase user engagement by making playlist creation easier, as well as extending listening beyond the end of existing playlists.

Spotify have released a public dataset of playlists, consisting of a large number of playlist titles and associated track listings. For the evaluation purpose, there is a set of playlists from which a number of tracks have been withheld. The task will be to predict the missing tracks in those playlists.


![alt recsys](images/recsys.png)

<div class="alert alert-info" style = "border-radius:10px;border-width:3px;border-color:darkblue;font-family:Verdana,sans-serif;font-size:16px;">
<h3>Evaluating Recommender systems.</h3>
</div>


Can we evaluate if  our recommender is <b>good</b>?

If we look at this image: 
![alt topAmazon](images/topAmazon.png)
One may end with the following questions:

1st Question: <b> Which prodcuts should be in this list?</b>

2nd Question: <b> How do we know that these are good recommendations?</b>

3rd Question: <b> Good in terms of what</b>
* Data Scientist vs. Marketing vs. Economist


What does <b>Good</b> means??
 * Recommendation accuracy?
 * Recommendation quality?
 * System usability?
 * System  satisfaction?
 



##### There is three main ways to evaluate a Recommender Systems
The evaluation of the recommender systems is one the <b>most critical</b> steps when building a recommender system.
* A single criterion cannot capture many goals of the designer.

1) Offline Evaluation
 + Historical, such as ratings, are used. 
 + In some cases, temporal information is also provided with the ratings, such as the time-stamp at which the information was obtained.

2) User Studies
 + Test subjects are actively recruited, and asked to interact
 + with the recommendation system to perform some actions. Example: satisfaction questionaries


3) Online Evaluation
 + Online evaluation also leverage user studies except that users are real users of fully developed  or commercial  system. 
 + The user directly plays with the system,  usually different methods are compared with different random uses
 
![alt AB-testing](images/ab-testing.png)

There exist several ways to evaluate a recommender system. Ones of the most populars are RMSE and MAE. However, <b>these metrics do not really measure the user experience</b>. 

Top-N performance metric can be also used and in fact, the evaluation with this metric is more closer to what is important for the user, just those top recommendations.


<div class="alert alert-success">
**EXERCISE:**
Create a new method in the recommendetion class that returns the TOP N recomendations for a user.</div>

In [None]:
def getTopMovies(reco,user_id,N = 10):
    # Returns the N best (and not seen) movies for the user_id. 
    return 0

### How to compute the precision-recall?
We can use the strategy followed by P. Cremonesi et.al. in http://dl.acm.org/citation.cfm?id=1864721 <br>
In order to measure the precision recall, first the models is trained using the training data, and then, for each item $i$ rated with 5 stars in the test data set:
* A set of 100 random unseen movies for the user of the item $i$ are seleted. We assume that these random movies will not be at the same interest than the 5 star movie
* We predict the rating of the movie of item $i$ and 100 random unseen movies.
* We form a rank list by ordering all the 101 item according to the predicted rating. Let denote $p$ the rank of the test item $i$ within the list. The best results correspondes to the case the test item $i$ precedes all the random items (i.e., p=1).
* A top-N recommendation list by piking the N top ranked items from the list. If $p \leq N$ we have a hit. Otherwise we have a miss. Chanches of hit incresases as N is higher.



<br><br>
<div class="alert alert-success">
**EXERCISE:**
Create a new method for the evaluation of the precision/recall curve using Top-N recomendations output.</div>

In [None]:
def evaluateTop(estimate_f,data_train,data_test, N = 10):
    """ Precision-Recall evaluation bas on top Recommendations """

In [None]:
precision,recall = evaluateTop(reco.estimate,dataSmall_train,dataSmall_test,N=10)


In [None]:
plt.plot( range(0,len(recall)),recall)
plt.plot( range(0,len(recall1)),recall1)
plt.plot( range(0,len(recall2)),recall2)
plt.xlabel('N')
plt.ylabel('recall')
plt.show()
plt.plot(recall, precision)
plt.plot(recall1, precision1)
plt.plot(recall2, precision2)
plt.ylabel('precision')
plt.xlabel('recall')
plt.show()

<div class="alert alert-info" style = "border-radius:10px;border-width:3px;border-color:darkblue;font-family:Verdana,sans-serif;">
<h3>Other Popular Methods</h3>
</div>

### Restricted Boltzmann Machines for Collaborative Filtering  (2007)
http://www.cs.toronto.edu/~fritz/absps/netflix.pdf

### Method: SLIM - Sparse Linear Methods for Tope-N recommender systems (2011)

Computes the item-item relations, by estimating an $item \times item$ sparse aggregation coefficient matrix R.
The recommendation score of an unrated item $i$ for a user $u$ is:
$$ \hat{r}_{u,i} = r^T_u s_i$$
and the minimization problem is defined as:
$$\underset{S}{\text{minimize}} \frac{1}{2} \sum_{u,i}(r_{u,i} - \hat{r}_{u,i}^2) + \frac{\beta}{2}||S||^2_F  + \lambda||S||_1$$
$$ \text{subject to  }  R\geq 0, \text{and } Diag(S) = 0$$
![alt slim](images/slim.png)

http://ieeexplore.ieee.org/document/6137254/ 

