# **Collaborative recommender module**

## **Introduction**

Collaborative recommender module is constructed to make personalized recommendations that are highly rated by users having similar user profiles.

Yelp dataset contains about 200K business, 2M users and 8M reviews, the user - restaurants interaction matrix would be a sparse matrix. Thus matrix factorization algorithms are used to construct the matrix and provide recommendations.

## **Implementation Strategy**

### **1 Ranking**
#### **1.1 SVD (Singular Value Decomposition)**
#### **1.1.1 SVD without bias** 
The user - retaurants matrix is factorized into user latent features and restaurants latent features matrix using SVD algorithm.

#### **1.1.2 SVD with bias**
To the original SVD matrices, user bias and restaurant bias matrix are introduced. 

`pred_rating = user latent features X restaurants latent features + user bias + restaurants bias + global_mean `

#### **1.2 NMF (Non negative matrix factorization)**
#### **1.2.1 NMF without bias**
The user - retaurants matrix is factorized into non negative user latent features and non negative restaurants latent features matrix using NMF algorithm.

#### **1.2.2 NMF with bias**
To the original NMF matrices, user bias and restaurant bias matrix are introduced.

`scikit-suprise`, and `scikit-learn` packages are used fro prototyping the mentioned algorithms. Prototyping results indicates, amongst all four algorithms SVD with bias and NMF without bias provides acceptable ratings predictions. Being flexible, and best performing SVD with bias is used for module implementation. 

### **2 Implementation**

#### **2.1 Optimization**
Based on the prototyping results SVD with bias algorithm is optimized via gridsearch cross validation. 

#### **2.2 Evaluation**
**RMSE** <br>
of the model are ----------- for testset with new users or restaurants, testset with no new users or restaurants, and testset with only users or restaurants with more than 5 ratings. 

**NDCG** <br>

#### 2.3 **Module Development**
1. All relevant restaurants are filtered for Collaborative Recommendations module
2. Extracted latent matrix, bias matrix for user and item from trained, optimized SVD algorithm are fed to class
3. For given user id, ratings are predicted for all filtered restaurants by multiplying user laten and item latent, adding biases for user, item as well as global mean ratings
4. Predicted ratings are paired with correspoding restaurants and filtered the list by unrated restaurants by user
5. Recommendations from this module are merged with content based recommender module's recommendations and after sorting the list by predicted ratings, final recommendations are displayed

#### **2.4 Testing**
Different test cases are implemented to see completeness and computing time.

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import pickle

# For loop visualization
from tqdm import tqdm

# For graphical representation
%matplotlib inline
import matplotlib.pyplot as plt

# For algorithm, optimization, evaluation
from surprise import Reader, Dataset
from surprise.model_selection import train_test_split
from surprise import SVD, accuracy, NMF

In [2]:
business = pd.read_csv('clean_business.csv')
review = pd.read_csv('clean_review.csv')

In [3]:
print('Number of restaurants in business dataset: ', len(business.business_id.unique()))
print('Number of restaurants in review dataset: ', len(review.business_id.unique()))

set_bus = set(business.business_id.unique())
set_rev = set(review.business_id.unique())

if len(set_bus) == len(set_bus.intersection(set_rev)):
    print('\nAll business_id from business dataset can be found in review dataset')
else:
    print('\nNot all business_id from business dataset can be found in review dataset')

Number of restaurants in business dataset:  44202
Number of restaurants in review dataset:  209394

All business_id from business dataset can be found in review dataset


In [4]:
# Reduced review by removing the duplicated user, restaurant rating combinations

review_r = review[~review.duplicated(['user_id','business_id'], keep='first')]

# Keep only required columns
review_r = review_r[['user_id', 'business_id', 'stars']]
review_r = review_r.dropna(axis='index')
review_r.reset_index(inplace=True, drop=True)

print('Original datset length: ', len(review)) 
print('Reduced dataset length: ', len(review_r))

Original datset length:  8021124
Reduced dataset length:  7735089


In [5]:
review_r.head()

Unnamed: 0,user_id,business_id,stars
0,OwjRMXRC0KyPrIlcjaXeFQ,-MhfebM0QIsKt87iDN-FNw,2.0
1,nIJD_7ZXHq-FX8byPMOkMQ,lbrU8StCq3yDfr-QMnGrmQ,1.0
2,V34qejxNsCbcgD8C0HVk-Q,HQl28KMwrEKHqhFrrDqVNQ,5.0
3,ofKDkJKXSKZXu5xJNGiiBQ,5JxlZaqCnk1MnbgRirs40Q,1.0
4,UgMW8bLE0QMJDCkQ1Ax5Mg,IS4cv902ykd8wj1TR0N3-A,4.0


### 1 **Ranking**
#### 1.1 **SVD (Singular Value Decomposition)**
#### 1.1.1 **SVD without bias**

#### 1.1.2 **SVD with bias**

#### 1.2 NMF (Non Negative Matrix Factorization)
#### 1.2.1 NMF without bias

#### 1.2.2 MNF with bias

### 2 **Implementation**
#### 2.1 **Optimization**

#### **Optimized parameters = {'n_factors': 10, 'n_epochs': 50, 'lr_all': 0.005, 'biased': True}**
(ran this optimization task on seperate notebook and took 5 hours to run on 8 core CPU, 32 GB VM)

#### 2.2 **Evaluation**
#### **NDCG** <br>

In [6]:
def dcg_r(r, k):
    r = np.asfarray(r)[:min(len(r), k)]                                    # Convert to float type numpy array
    if r.size:
        return np.sum(r/np.log2(np.arange(2, r.size + 2)))
    return None

def ndcg_r(r, k):
    idcg = dcg_r(sorted(r, reverse=True), k)
    dcg = dcg_r(r, k)
    if idcg == None or dcg == None:
        return None
    return dcg/idcg

In [7]:
%%time

# Reader object with rantings scale
reader = Reader(rating_scale=(1, 5))

# Load trainset, NOTE: the columns must correspond to user id, item id and ratings in the exact order
data = Dataset.load_from_df(review_r, reader)

# Build training, testing data
trainset, testset = train_test_split(data, test_size=0.20)

# Retrain with optimized parameters
algo_optimized = SVD(n_factors=10, n_epochs=50, biased=True)
algo_optimized.fit(trainset)
predictions = algo_optimized.test(testset)
accuracy.rmse(predictions)

# Gather useful info from model
mean_bias = algo_optimized.trainset.global_mean 
user_latent_bias, item_latent_bias = algo_optimized.pu, algo_optimized.qi
user_bias, item_bias = algo_optimized.bu, algo_optimized.bi

RMSE: 1.3398
CPU times: user 9min 10s, sys: 3.1 s, total: 9min 13s
Wall time: 9min 13s


In [8]:
rating_predict_1 = pd.DataFrame(index=np.arange(len(predictions)),columns=['user_id','business_id','stars','rating_predict'])

i=0
for entry in tqdm(predictions):
    rating_predict_1.iloc[i,:] = entry.uid, entry.iid, entry.r_ui, entry.est
    i += 1    
assert i == len(predictions)

100%|██████████| 1547018/1547018 [01:52<00:00, 13722.37it/s]


In [9]:
%%time

# Particular user_id as an example
user_id = '---1lKK3aKOuomHnwAkAow'                                              # 12 review ratings available in 'testset'

# Rank the 'rating_predict' dataframe by the predicted ratings in descending order
rating_predict_1 = rating_predict_1.sort_values('rating_predict', ascending=False)

# Filter to the user_id of interest only
rec = rating_predict_1[rating_predict_1.user_id == user_id].set_index('business_id')[['rating_predict','stars']]
print('Ranking by predicted ratings:\n', rec)

# NDCG @top 10 and @top 5
NDCG_10 = ndcg_r(r=rec.stars.values, k=10)
NDCG_5 = ndcg_r(r=rec.stars.values, k=5)

print('\nNormalized discounted cumulative gain achieved at top-10 based on testset:\n', NDCG_10)
print('\nNormalized discounted cumulative gain achieved at top-5 based on testset:\n', NDCG_5)

Ranking by predicted ratings:
                        rating_predict stars
business_id                                
N8Rwk4XrKaHYXXninuxg9Q              5   5.0
--9e1ONYQuAa-CB_Rrw7Tw              5   4.0
mz9ltimeAIy2c2qf5ctljw              5   5.0
dM8Yp8StA1NdusK5Ta_j-g              5   3.0
OicpDroqnfmbtw5jSgf4lQ              5   5.0
RhTBGAHFqnFTgSUDJtBuIQ       4.914954   5.0
kNc-qG_AarowPl82M7KFLw       4.826665   5.0
yp2nRId4v-bDtrYl5A3F-g       4.819657   1.0
Od2VpwoOBxycNyQNOMJ6eQ        4.59364   1.0
rXEQbezXp1GadjEuWj6c1g       4.444073   5.0
ZgPnRzWjQR5NtiauGBww7g       4.387538   5.0
edV_IqWqz5KVSGLrsru5EQ       4.327611   5.0
alaiXlogA286nuM8tj9ghw       4.280578   5.0
WOO81gScY3_VpaIfXFAKpw       4.273429   4.0
dZB5VuI4mCVRz8qQUwUgCg        4.26523   1.0
_PJas5ctpmJDFp3lTkSq1A       4.247842   5.0
UutHMmZx1CQcjiyfmVa_7g       4.186887   5.0
ow5ku7hfMqU94mylTd3WlQ       3.944067   5.0
2Cs9bSN-fMnY3H-0pFP1mg       3.636469   5.0
DV13F0bhe55dV1AhwoO50g       3.513851   5.0
1

In [10]:
# Store useful information from trained model

useful_info = {'mean_rating': algo_optimized.trainset.global_mean,
               'user_latent': algo_optimized.pu,
               'item_latent': algo_optimized.qi,
               'user_bias': algo_optimized.bu,
               'item_bias': algo_optimized.bi,
               'userid_to_index': algo_optimized.trainset._raw2inner_id_users,
               'itemid_to_index': algo_optimized.trainset._raw2inner_id_items
              }

In [11]:
# Pickle useful info 
with open('svd_algo_trained_info', 'wb') as f:
    pickle.dump(useful_info, f)
    
# Pickle trained model
with open('svd_bias_algo_trained', 'wb') as f:
    pickle.dump(algo_optimized, f)

In [12]:
business = pd.read_csv('clean_business.csv')
review = pd.read_csv('clean_review.csv')

mean_global = ((business.stars * business.review_count).sum() / (business.review_count.sum()))
k = 30                                                                  # 50% quantile of the review counts 

business['stars_adj'] = ((business.review_count * business.stars) + (k * mean_global)) / (business.review_count + k)

In [13]:
class Recommender_Engine:
    
    def __init__(self, n=10, stars_original=False):
        """
        Instantiate the object. Default setting for ranking would be stars_adj with top 10 recommendations.
        """
        
        self.n = n                                                     # Number of recommendations
        self.stars_original = stars_original                           # Boolean for ranking method                            
        self.disply_columns = ['name', 'address', 'city','state',\
                               'attributes.RestaurantsPriceRange2',\
                               'review_count','stars','stars_adj',\
                               'cuisine','style']                    # List of columns to be displayed in the results
        
        if self.stars_original:
            score = 'stars'
        else:
            score = 'stars_adj'
            
        self.recommendation = business[business.is_open == 1].sort_values(score, ascending=False)
                                                                      # Filter only open restaurants
    
    def display(self):
        
        if len(self.recommendation) == 0:
            print("Sorry, there are no matching recommendations.")
        elif self.n < len(self.recommendation):
            print("Below is the list of the top {} recommended restaurants for you: ".format(self.n))
            print(self.recommendation.iloc[:self.n][self.disply_columns])
        else:
            print("Below is the list of the top {} recommended restaurants for you: ".format(len(self.recommendation)))
            print(self.recommendation.iloc[self.disply_columns]) 
    
            
    def collaborative_filtering(self, user_id=None):
        self.user_id = user_id
        if self.user_id is None:
            print('User ID is not provided')
            return None
        if len(user_id) != 22:                                        # Sanity check on length of user id
            print('Invalid user ID')
            return None
        
        self.recommendation = business[business.is_open == 1]
        if 'stars_pred' in self.recommendation.columns:
            self.recommendation.drop('stars_pred', axis=1, inplace=True)
            
        self.display_columns = ['name', 'address', 'city','state',\
                                'attributes.RestaurantsPriceRange2',\
                                'review_count','stars','stars_adj',\
                                'cuisine','style']
            
        with open('svd_algo_trained_info', 'rb') as f:
            useful_info = pickle.load(f)
            
        mean_rating = useful_info['mean_rating']
        user_latent = useful_info['user_latent']
        item_latent = useful_info['item_latent']
        user_bias = useful_info['user_bias']
        item_bias = useful_info['item_bias']
        userid_idx = useful_info['userid_to_index']
        itemid_idx = useful_info['itemid_to_index']
        
        # Recommendations
        if self.user_id in userid_idx:
            u_idx = userid_idx[self.user_id]
            pred = mean_rating + user_bias[u_idx] + item_bias + np.dot(user_latent[u_idx,:], item_latent.T)
        else:
            print('Sorry, no personlaized recommendations yet!')
            print('\nHere are generic recommendations: ')
            
            pred = mean_rating + item_bias
            
        prediction = pd.DataFrame(data=pred, index=itemid_idx.values(), columns=['stars_pred'])
        prediction.index.name == 'matrix_item'
        assert len(prediction) == len(pred)
        prediction['business_id'] = list(itemid_idx.keys())
        
        # Filter to unrated business by user
        if self.user_id in userid_idx:
            rated_bus = review[review.user_id == self.user_id].business_id.unique()
            prediction = prediction[~prediction.business_id.isin(rated_bus)]
            
        self.recommendation = self.recommendation.merge(prediction, on='business_id', how='inner')
        self.recommendation = self.recommendation.sort_values('stars_pred', ascending=False).reset_index(drop=True)
        self.display_columns.insert(0, 'stars_pred')
        self.display()
        
        return self.recommendation

In [14]:
%%time

# Instantiate the object
results = Recommender_Engine();

# Test case 1: Display results
print('Test case 1: *****------------*****\n');
results.display();

# Test case 2: No user_id input
print('Test case 2: *****------------*****\n');
results.collaborative_filtering();

# test 3: User with no previous user data
print('Test case 3: *****------User with no previous user data------*****\n')
results.collaborative_filtering(user_id='-NzChtoNOw706kps82x0Kg');

# test 4: User with few restaurants reviews
print('Test case 4: *****------User with few restaurants reviews------*****\n')
results.collaborative_filtering(user_id='---89pEy_h9PvHwcHNbpyg');

# test 5: User with more than 100 restaurants reviews
print('Test case 5: *****------User with more than 100 restaurants reviews------*****\n')
results.collaborative_filtering(user_id='---1lKK3aKOuomHnwAkAow');

Test case 1: *****------------*****

Below is the list of the top 10 recommended restaurants for you: 
                          name                           address        city  \
29761          Little Miss BBQ              4301 E University Dr     Phoenix   
2648              Brew Tea Bar      7380 S Rainbow Blvd, Ste 101   Las Vegas   
33734          Cocina Madrigal                    4044 S 16th St     Phoenix   
35172  Green Corner Restaurant        1038 W Southern Ave, Ste 1        Mesa   
3590            Worth Takeaway                     218 W Main St        Mesa   
9839            Zenaida's Cafe      3430 E Tropicana Ave, Ste 32   Las Vegas   
34397          Kodo Sushi Sake  15040 N Northsight Blvd, Ste 104  Scottsdale   
21628  Bajamar Seafood & Tacos             1615 S Las Vegas Blvd   Las Vegas   
11825                   Karved              3957 S Maryland Pkwy   Las Vegas   
34603    Not Your Typical Deli    1166 South Gilbert Rd, Ste 101     Gilbert   

      state attr

Unnamed: 0,city,attributes.GoodForMeal,attributes.Smoking,attributes.BusinessAcceptsBitcoin,address,attributes.BYOBCorkage,attributes.WheelchairAccessible,attributes.RestaurantsDelivery,state,attributes.OutdoorSeating,...,hours.Monday,attributes.CoatCheck,hours,hours.Friday,attributes.BusinessAcceptsCreditCards,attributes.RestaurantsTableService,cuisine,style,stars_adj,stars_pred
0,Las Vegas,,,,3708 Las Vegas Blvd S,,,True,NV,True,...,0:0-0:0,,"{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W...",0:0-0:0,True,,,"casinos,restaurants",3.998499,7.322162
1,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,2880 S Las Vegas Blvd,'yes_free',,False,NV,False,...,16:0-22:0,,"{'Monday': '16:0-22:0', 'Tuesday': '16:0-22:0'...",16:0-22:0,True,True,,restaurants,3.991027,7.219524
2,Surprise,"{'dessert': False, 'latenight': False, 'lunch'...",,,"17191 N Litchfield Rd, Ste 40",,,True,AZ,False,...,11:0-22:0,,"{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'...",11:0-23:0,True,True,"sushi bars,japanese","bars,restaurants,nightlife",3.978878,7.209917
3,Phoenix,"{'dessert': False, 'latenight': False, 'lunch'...",,,"15414 N 19th Ave, Ste K",'no',,False,AZ,False,...,11:0-21:0,,"{'Monday': '11:0-21:0', 'Tuesday': '11:0-21:0'...",11:0-21:0,True,,"latin american,chinese,asian fusion,mexican",restaurants,3.996104,7.136144
4,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,"1350 E Flamingo Rd, Ste 18",,,False,NV,False,...,11:0-23:0,,"{'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0'...",11:0-23:0,True,True,"sushi bars,seafood,japanese",restaurants,3.996602,7.065306
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30309,Phoenix,"{'dessert': False, 'latenight': False, 'lunch'...",,,4804 E Chandler Blvd,,,True,AZ,True,...,11:0-23:0,,"{'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0'...",11:0-2:0,True,,american (traditional),"sports bars,bars,restaurants,nightlife",3.101509,0.368630
30310,Henderson,,,,"11125 S Eastern Ave, Ste 100",,,,NV,,...,,,"{'Tuesday': '10:0-19:0', 'Wednesday': '10:0-19...",10:0-19:0,True,,pizza,restaurants,2.831926,0.285485
30311,Las Vegas,"{'dessert': None, 'latenight': None, 'lunch': ...",,,4100 Paradise Rd,,,False,NV,True,...,0:0-0:0,,"{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W...",0:0-0:0,True,True,american (traditional),"restaurants,casinos",2.579830,0.206903
30312,Phoenix,"{'dessert': False, 'latenight': False, 'lunch'...",,,3647 E Indian School Rd,,,True,AZ,True,...,0:0-0:0,,"{'Monday': '0:0-0:0', 'Tuesday': '11:0-0:30', ...",11:0-2:0,True,,american (traditional),"sports bars,bars,nightlife,restaurants",3.079304,0.053577
