**Recommender system:**<br>
* Module 1 - non-personalized keyword-filtering recommender:<br>
build keyword search-based restaurant recommender module to filter by keyword. Keywords could include, for instance, location-based information (zip code, longitude, latitude)  and restaurant feature-based information (cuisine, style). 
The restaurant inventory will be filtered by keywords first, then ranked by its average rating or weighted smart rating taking into consideration the popularity (depending on user’s choice). The top-k restaurants from the list will be returned as the top-k recommendations.<br>
* Module 2 - personalized content-based recommender:<br>
With user ID and restaurant’s metadata, build a content based recommender module that recommends restaurants that are similar to user’s preference inferred from user’s past ratings. More specifically, pairwise similarity scores will be computed for restaurants based on their vectorized feature representation extracted using CountVectorizer or TfidfVectorizer and recommend restaurants based on rankings of the weighted similarity score (e.g. cosine similarity). The important restaurant metadata to consider include categories, attributes, location.<br>
* Module 3 - personalized collaborative recommender:<br>
With user_id x restaurant_id rating matrix, build a collaborative recommender module. Remember that the dataset has a total of 1,518,169 users, 188,593 businesses and 5,996,995 reviews. In terms of the user_id x business_id matrix, the matrix is very sparse (0.003% non-empty). Therefore, matrix factorization algorithms will be used to complete the highly sparse matrix and generate recommendations.<br>
* Metrics chosen for evaluating and optimizing the ‘goodness’ of the algorithms:<br>
a) measure prediction accuracy: RMSE(root mean squared error) <br>
b) measure ranking effectiveness: NDCG(Normalized Discounted Cumulative Gain) at top-k<br>
* Integration - combine the above modules to build a hybrid recommendation engine:<br>
To combine the above modules, a few simple interactive questions will be added:<br>
a) “Want customized recommendations based on your user history by providing your user ID?”  If no, activate the simple recommender module to provide base-case recommendations using location information and/or optional keywords<br>
b) If yes, prompt to ask follow up question: “do you want to try something new based on people like you?” If yes, activate the collaborative filtering module to recommend new restaurants based on similar peers; otherwise, use content filter module to recommend similar restaurants. <br>
* Other improvements:<br>
Optimize restaurant ranking by weighting the average rating based on total number of ratings (popularity), weighting the individual rating according to their recency, etc. With a quick interactive question: “want smart rating instead?” The alternative restaurant ranking method based on the above weighted scores will be activated and used instead of the simple average rating.<br>
* Potential caveats - cold start problem:<br>
a) new restaurant → content-based recommendation module will be able to use the features (metadata) of the new restaurant and include it when generating recommendations.<br>
b) new user → will be treated as if the user ID is not available (both has no user history) and similar recommender module will be used to recommend restaurants based on location, keywords, popularity, etc. 

**Note:**<br> 
Only a subset of Yelp restaurants from a few selected states are available in this dataset. Among them, only Arizona, Nevada, Ohio, North Carolina and Pennsylvania have a rich catalog of over 5000 restaurants. Only the top two states, Arizona and Nevada have over 10000 restaurants. 

# 1. All necessary imports and functions

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np

from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut
import pickle
import os.path
from sklearn.metrics.pairwise import linear_kernel

In [2]:
# import all necessary dataset to power the recommender modules
business = pd.read_csv('business_clean.csv')  # contains business data including location data, attributes and categories
business['postal_code'] = business.postal_code.astype(str) # update the data type of the 'postal_code' column to string
review = pd.read_csv('review_clean.csv') # contains full review text data including the user_id that wrote the review and the business_id the review is written for
# extract a subset of reviews related to restaurants, since we are only interested in restaurant-type business
review_s = review[review.business_id.isin(business.business_id.unique())] 

In [3]:
def great_circle_mile(lat1, lon1, lat2, lon2):
    """
    Compute geodesic distances (great-circle distance) of two points on the globe given their coordinates. 
    The function returns the distance in miles. 
    Note: 1. Calculation uses the earth's mean radius of 6371.009 km, 
    2. The central subtended angle is calculated by formula: 
    alpha = cos-1*[sin(lat1)*sin(lat2)+ cos(lat1)*cos(lat2)*cos(lon1-lon2)]
    """
    
    from math import sin, cos, acos, radians
    
    lat1, lon1, lat2, lon2 = radians(lat1), radians(lon1), radians(lat2), radians(lon2) # convert degrees to radians
    earth_radius = 6371.009  # use earth's mean radius in kilometers
    alpha = acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2)*cos(lon1-lon2)) # alpha is in radians
    dis_km = alpha * earth_radius
    dis_mile = dis_km * 0.621371   # convert kilometer to mile
    
    return dis_mile

In [4]:
# adding 'adjusted_score' to the 'business' dataset, which adjusts the restaurnat average star ratings by the number of ratings it has

globe_mean = ((business.stars * business.review_count).sum())/(business.review_count.sum())
k = 22 # set strength k to 22, which is the 50% quantile of the review counts for all businesses
business['adjusted_score'] = (business.review_count * business.stars + k * globe_mean)/(business.review_count + k)

# 2. Building hybrid recommendation engine

## 2.1 Implementation

In [5]:
class Recommender:
    
    def __init__(self, n=5, original_score=False, personalized=False):
        """initiate a Recommender object.
        ---
        Optional keyword arguments to be passed are:
        1. the desired number of recommendations to make ('n'), the default number is 10.
        2. the score for ranking the recommendations ('original_score'): by default, the adjusted score will be used for ranking; 
            To rank by the original average rating of the restaurant, pass original_score=True
        3. 'personalized': a boolean to indicate if additional data needs to be loaded to compute personalized recommendations. 
        ---
        In addition, a few class variables will be initiated upon creation for internal use:        
        1. the class variable '.module' is used to keep track of whether a personalized recommendation is available or not.
            it only takes one of the following values with the default being 0
            0: no personalization yet
            1: a personalized recommendation has been computed using the collaborative module
            2: a personalzied recommendation has been computed using the content-based module
        2. the class variable '.column_to_dispay' is used to keep track of a list of column names to display in the recommendation results.
            the list will be updated based on the modules being called.
        3. the class variable '.recomm' is used to store the current list of recommendations
        """
        
        self.n = n # number of recommendations to make, default is 5
        self.original_score = original_score # boolean indicating whether the original average rating or the adjusted score is used
        self.module = 0 # variable indicating which recommender module is used, default is 0
        self.column_to_display = ['state','city','name','address','attributes.RestaurantsPriceRange2','cuisine',\
                                  'style','review_count','stars','adjusted_score'] # initiate a list of columns to display in the recommendation results
        
        # upon class creation, initiate the recommendation to be all the open restaurants from the entire catalog of 'business' dataset sorted by the score of interest
        if self.original_score:  # set sorting criteria to the originial star rating
            score = 'stars'
        else:  # set sorting criteria to the adjusted score
            score = 'adjusted_score'
        self.recomm = business[business.is_open == 1].sort_values(score, ascending=False)
        
        # pre-load additional information if personalized modules are desired
        if personalized:
            
            # load information for collaborative module
            with open('svd_trained_info.pkl', 'rb') as f:
                self.svd_trained_info = pickle.load(f)
                
            # load information for content-based module
            with open('rest_pcafeature_all.pkl', 'rb') as f: 
                self.rest_pcafeature = pickle.load(f)   # load the saved restaurant pcafeature vectors
            max_bytes = 2**31 - 1
            bytes_in = bytearray(0)
            input_size = os.path.getsize('user_pcafeature_all.pkl')
            with open('user_pcafeature_all.pkl','rb') as f: 
                for _ in range(0, input_size, max_bytes):
                    bytes_in += f.read(max_bytes)
                self.user_pcafeature = pickle.loads(bytes_in)  # load the saved user pcafeature vectors
           
    def _filter_by_location(self):
        """Filter and update the dataframe of recommendations by the matching location of interest.
        A combination of state, city and zipcode is used as the location information, partially missing information can be handled. 
        Matching restaurant is defined as the restaurant within the acceptable distance (max_distance) of the location of interest.
        note: this hidden method should only be called within the method 'keyword'
        """       
        geolocator = Nominatim(user_agent="yelp_recommender") # use geopy.geocoders to make geolocation queries
        address = [self.city, self.state, self.zipcode]
        address = ",".join([str(i) for i in address if i != None])
        # use geolocate query to find the coordinate for the location of interest
        try:
            location = geolocator.geocode(address, timeout=10) 
        except GeocoderTimedOut as e:
            print("Error: geocode failed to locate the address of interest {} with message {}".format(address, e.message))            

        # calculate the geodesic distance between each restaurant and the location of interest and add as a new column ''distance_to_interest'
        self.recomm['distance_to_interest'] = self.recomm.apply(lambda row: great_circle_mile(row.latitude, row.longitude, location.latitude, location.longitude), axis=1)
        # add the new column 'distance_to_interest' to the list of columns to display in the recommendation result
        self.column_to_display.insert(0, 'distance_to_interest')
        # filter by the desired distance
        self.recomm = self.recomm[self.recomm.distance_to_interest <= self.max_distance]

    def _filter_by_state(self):
        """ Filter and update the dataframe of recommendations by the matching state.
        note: this hidden method should only be called within the method 'keyword'
        """
        self.recomm = self.recomm[self.recomm.state == self.state.upper()]
    
    def _filter_by_cuisine(self):
        """ Filter and update the dataframe of recommendations by the matching cuisine of interest. 
        note: this hidden method should only be called within the method 'keyword'
        """                         
        idx = []
        for i in self.recomm.index: 
            if self.recomm.loc[i,'cuisine'] is not np.nan:
                entries = self.recomm.loc[i,'cuisine'].split(',')
                if self.cuisine in entries:
                    idx.append(i)
        self.recomm = self.recomm.loc[idx]
    
    def _filter_by_style(self):  
        """ Filter and update the dataframe of recommendations by the matching style of interest. 
        note: this hidden method should only be called within the method 'keyword'
        """
        idx = []
        for i in self.recomm.index: 
            if self.recomm.loc[i,'style'] is not np.nan:
                entries = self.recomm.loc[i,'style'].split(',')
                if self.style in entries:
                    idx.append(i)
        self.recomm = self.recomm.loc[idx]
        
    def _filter_by_price(self):
        """Filter and update the dataframe of recommendations by the matching price range of interest. 
        note: this hidden method should only be called within the method 'keyword'
        """
        self.recomm = self.recomm[self.recomm['attributes.RestaurantsPriceRange2'].isin(self.price)]
    
    def display_recommendation(self, n=5):
        """ Display the list of top n recommended restaurants
        """
        self.n = n # update the number of recommendations to display
        if len(self.recomm) == 0:
            print("Sorry, there is no matching recommendations.")
        elif self.n < len(self.recomm):  # display only the top n from the recommendation list
            print("Below is a list of the top {} recommended restaurants for you: ".format(self.n))
            print(self.recomm.iloc[:self.n][self.column_to_display])
        else:  # display all if # of recommendations is less than self.n
            print("Below is a list of all {} recommended restaurants for you: ".format(len(self.recomm)))
            print(self.recomm[self.column_to_display])
     
    #---------------------------------------------------------------
    # non-personalized keyword filtering-based recommender module
    def keyword(self, df=business[business.is_open == 1], zipcode=None, city=None, state=None, max_distance=10, cuisine=None, style=None, price=None, personalized=False, original_score=False):
        """Non-personalized recommendation by keyword filtering: 
        Support filtering by the distance and location (zipcode, city, state) of interest, 
        by the desired cuisine, by the desired style, and by the desired price range. 
        The module supports multiple price range inputs separated by comma.
        ---
        Note:
        df: the default restaurant catalog is all the open restaurants in the 'business' dataset, 
            if a subset is prefered, e.g. previous filtered result, the subset can be passed via keyword argument 'df'
        state: needs to be the upper case of the state abbreviation, e.g.: 'NV', 'CA'
        max_distance: the max acceptable distance between the restaurant and the location of interest, unit is in miles, default is 10
        """
        
        # re-initiate the following variables every time the module is called so that the recommendation starts fresh
        self.recomm = df # start with the desired restaurant catalog
        self.recomm['distance_to_interest'] = np.nan # reset the distance between each restaurant and the location of interest
        self.column_to_display = ['state','city','name','address','attributes.RestaurantsPriceRange2','cuisine','style','review_count','stars','adjusted_score'] # reset the columns to display
        self.original_score = original_score
        
        # assign variables based on user's keyword inputs
        self.zipcode, self.city, self.state, self.max_distance = zipcode, city, state, max_distance
        self.cuisine, self.style, self.price = cuisine, style, price
        
        # check self.module and column names to see a personalized score is available for ranking and displaying personalized recommendations
        if personalized:
            if (self.module == 0) or ('predicted_stars' not in self.recomm.columns and 'similarity_score' not in self.recomm.columns):
                print("no personalized list of recommendations is generated yet!")
                print("please first run the collaborative recommender module or content-based recommender module for a personalized recommendations.")
                return None
        
        # filter by restaurant location
        if (self.zipcode != None) or (self.city != None) or (self.state != None):      
            if (self.zipcode != None) or (self.city != None): # use zipcode and/or city whenever available
                self._filter_by_location()
            else: # filter by state if state is the only location information available 
                self._filter_by_state()
            if len(self.recomm) == 0:
                print("no restaurant found for the matching location of interest.")
                return None
        
        # filter by restaurant 'cuisine'
        if self.cuisine != None:
            self._filter_by_cuisine()
            if len(self.recomm) == 0:
                print("no restaurant found for the matching cuisine of {}".format(self.cuisine))
                return None
    
        # filter by restaurant 'style'
        if self.style != None:
            self._filter_by_style() 
            if len(self.recomm) == 0:
                print("no restaurant found for the matching style of {}".format(self.style))
                return None
        
        # filter by restaurant price range
        if self.price != None:
            self.price = [i.strip() for i in price.split(',')] #extract multiple inputs of price range
            self._filter_by_price()
            if len(self.recomm) == 0:
                print("no restaurant found for the matching price of {}".format(self.price))
                return None
        
        # sort the matching list of restaurants by the score of interest
        if personalized:
            if self.module == 1:
                score = 'predicted_stars'
                self.column_to_display.insert(0, 'predicted_stars')  # add 'predicted_stars' to the list of columns to display
            elif self.module == 2:
                score = 'similarity_score'
                self.column_to_display.insert(0, 'similarity_score')  # add 'similarity_score' to the list of columns to display
        elif self.original_score:  # set sorting criteria to the originial star rating
            score = 'stars'
        else:  # set sorting criteria to the adjusted score
            score = 'adjusted_score'
        self.recomm = self.recomm.sort_values(score, ascending=False)
        
        # display the list of top n recommendations
        self.display_recommendation()
        
        return self.recomm
    
    #------------------------------------------------------------
    # personalized collaborative recommender module
    def collaborative(self, user_id=None):
        """Personalized recommendation by collaborative filtering: 
        Recommendation is generated based on the predicted ratings from user x restaurant matrix factorization.
        ---
        note:
        Passing of user_id is required for the collaborative personalized module. If user's history is not available,
        a generic recommendation will be computed and returned based on all users' history in the database. 
        ---
        """
        
        self.user_id = user_id # user_id for personalized recommendation using collaborative module 
        if self.user_id is None:
            print("no user_id is provided!")
            return None
        if len(self.user_id) != 22:
            print("invalid user id!")
            return None
        
        # initiate every time the module is called
        self.recomm = business[business.is_open ==1] # start with all open restaurants from the entire 'business' catalog
        self.column_to_display = ['state','city','name','address','attributes.RestaurantsPriceRange2',\
                                  'cuisine','style','review_count','stars','adjusted_score'] # reset the columns to display
        if 'predicted_stars' in self.recomm.columns:
            self.recomm.drop('predicted_stars', axis=1, inplace=True) # delete the column of 'predicted_stars' if already present
        
        # extract all necessary information saved from the matrix factorization algorithm
        user_latent, item_latent = self.svd_trained_info['user_latent'], self.svd_trained_info['item_latent']
        user_bias, item_bias = self.svd_trained_info['user_bias'], self.svd_trained_info['item_bias']
        r_mean = self.svd_trained_info['mean_rating'] # global mean of all ratings
        userid_to_idx, itemid_to_idx = self.svd_trained_info['userid_to_index'], self.svd_trained_info['itemid_to_index']        
        
        # predict personalized restaurant ratings for the user_id of interest
        if self.user_id in userid_to_idx:
            u_idx = userid_to_idx[self.user_id]
            pred = r_mean + user_bias[u_idx] + item_bias + np.dot(user_latent[u_idx,:],item_latent.T)
        else: 
            print("sorry, no personal data available for this user_id yet!")
            print("Here is the generic recommendation computed from all the users in our database:")
            pred = r_mean + item_bias
        
        # pairing the predicted ratings with the business_id by matching the corresponding matrix indices of the business_id
        prediction = pd.DataFrame(data=pred, index=itemid_to_idx.values(), columns=['predicted_stars']) 
        assert len(prediction) == len(pred)
        prediction['business_id'] = list(itemid_to_idx.keys())
        
        # filter to unrated business_id only by the user_id of interest if a personal history is available
        if self.user_id in userid_to_idx:       
            busi_rated = review[review.user_id == self.user_id].business_id.unique()
            prediction = prediction[~prediction.business_id.isin(busi_rated)]
        
        # inner-join the prediction dataframe with the recommendation catalog on 'business_id' to retrieve all relevant business information
        # note: the .merge step needs to be performed prior to extracting the top n, because many businesses in 'review' dataset are not restaurant-related, therefore not present in 'business' dataset
        self.recomm = self.recomm.merge(prediction, on='business_id', how='inner') 
        
        # sort the prediction by the predicted ratings in descending order
        self.recomm = self.recomm.sort_values('predicted_stars', ascending=False).reset_index(drop=True)
        
        # add 'predicted_stars' to the list of columns to display and update self.module to 1
        self.column_to_display.insert(0, 'predicted_stars') 
        self.module = 1
        
        # display the list of top n recommendations
        self.display_recommendation()
        
        return self.recomm
    
    
    #------------------------------------------------------------
    # personalized content-based recommender module
    def content(self, user_id=None):
        """Personalized recommendation by content-based filtering based on restaurant reviews: 
        Recommendation is generated based on cosine similarity scores between user and restaurant feature vectors. 
        The feature vector space is extracted based on all the restaurant reviews.
        ---
        note:
        Passing of user_id is required for the content-based personalized module. 
        If user's history is not available, an empty dataframe will be returned along with a warning message. 
        ---
        """
        
        self.user_id = user_id # user_id for personalized recommendation using content-based module
        if self.user_id is None:
            print("no user_id is provided!")
            return None
        if len(self.user_id) != 22:
            print("invalid user id!")
            return None
        if self.user_id not in review_s.user_id.unique(): # check if previous restaurant rating/review history is available for the user_id of interest
            print("sorry, no personal data available for this user_id yet!")
            return None
        
        # initiate every time the module is called
        self.recomm = business[business.is_open ==1] # start with all open restaurants from the entire 'business' catalog
        self.column_to_display = ['state','city','name','address','attributes.RestaurantsPriceRange2',\
                                  'cuisine','style','review_count','stars','adjusted_score'] # reset the columns to display
        if 'similarity_score' in self.recomm.columns:
            self.recomm.drop('similarity_score', axis=1, inplace=True) # delete the column of 'cosine_similarity' if already present
        
        # predict personalized cosine similarity scores for the user_id of interest
        sim_matrix = linear_kernel(self.user_pcafeature.loc[user_id].values.reshape(1, -1), self.rest_pcafeature)
        sim_matrix = sim_matrix.flatten()
        sim_matrix = pd.Series(sim_matrix, index = self.rest_pcafeature.index)
        sim_matrix.name = 'similarity_score'
        
        # pairing the computed cosine similarity score with the business_id by matching the corresponding matrix indices of the business_id
        self.recomm = pd.concat([sim_matrix, self.recomm.set_index('business_id')], axis=1, join='inner').reset_index()
        
        # filter to unrated business_id only by the user_id of interest if a personal history is available      
        busi_rated = review_s[review_s.user_id == self.user_id].business_id.unique()
        self.recomm = self.recomm[~self.recomm.business_id.isin(busi_rated)]
               
        # sort the recommendation by the cosine similarity score in descending order
        self.recomm = self.recomm.sort_values('similarity_score', ascending=False).reset_index(drop=True)
           
        # add 'similarity_score' to the list of columns to display and update self.module to 2
        self.column_to_display.insert(0, 'similarity_score') 
        self.module = 2
        
        # display the list of top n recommendations
        self.display_recommendation()
        
        return self.recomm

## 2.2 Testing

### 2.2.1 Testing of the non-personalized keyword filtering recommender module

In [8]:
%%time
# initiate a Recommender object
recommender = Recommender(n=3)

CPU times: user 47.8 ms, sys: 10.1 ms, total: 57.9 ms
Wall time: 60.5 ms


In [9]:
%%time

# test0: display only (same as no keywords)
print("------\nresult from test0 (display only): ")
recommender.display_recommendation()

# test1: no keywords
print("------\nresult from test1 (no keywords): ")
recommender.keyword();

# test 2: a combination of city, state and zipcode
print("------\nresult from test2 (a combination of city and state): ")
recommender.keyword(city='Phoenix', state='AZ', zipcode='85023');

# test 3: a combination of cuisine and style
print("------\nresult from test3 (a combination of cuisine and style): ")
recommender.keyword(cuisine='barbeque', style='restaurants');

# test 4: a combination of state, cuisine and style
print("------\nresult from test4 (a combination of state, cuisine and style): ")
recommender.keyword(state='NV', cuisine='desserts', style='restaurants');

# test 5: no matching location
print("------\nresult from test5 (no matching location): ")
recommender.keyword(city='milpitas', zipcode='95035');

# test 6: no matching 'cuisine'
print("------\nresult from test6 (no matching cuisine): ")
recommender.keyword(cuisine='abc');

# test 7: no matching 'style'
print("------\nresult from test7 (no matching style): ")
recommender.keyword(style='abc');

# test 8: a combination of location, cuisine and style
print("------\nresult from test8 (a combination of location, cuisine and style): ")
recommender.keyword(city='Phoenix', zipcode='85023',cuisine='barbeque', style='restaurants');

# test 9: a combination of price range, cuisine and style
print("------\nresult from test9 (a combination of price range, cuisine and style): ")
recommender.keyword(price='1', cuisine='barbeque', style='restaurants');

# test 10: a combination of two price ranges, location, cuisine and style
print("------\nresult from test10 (a combination of two price ranges, location, cuisine and style): ")
recommender.keyword(price='1, 2', zipcode='85023',cuisine='barbeque', style='restaurants');

# test 11: use the original average rating and return top 10 recommendations
print("------\nresult from test11 (top 10 recommendations ranked by original average rating): ")
recommender = Recommender(n=10, original_score=True, personalized=False)
recommender.keyword(city='Phoenix', zipcode='85023',cuisine='barbeque', style='restaurants');

------
result from test0 (display only): 
Below is a list of the top 5 recommended restaurants for you: 
      state             city                name  \
7464     AZ          Phoenix     Little Miss BBQ   
31910    NV        Las Vegas        Brew Tea Bar   
45401    NV        Las Vegas          Gelatology   
7784     NV  North Las Vegas        Poke Express   
28162    NV        Las Vegas  Meráki Greek Grill   

                            address  attributes.RestaurantsPriceRange2  \
7464           4301 E University Dr                                2.0   
31910  7380 S Rainbow Blvd, Ste 101                                1.0   
45401  7910 S Rainbow Blvd, Ste 110                                1.0   
7784        655 W Craig Rd, Ste 118                                2.0   
28162  4950 S Rainbow Blvd, Ste 160                                2.0   

                                   cuisine               style  review_count  \
7464                              barbeque         restau

no restaurant found for the matching location of interest.
------
result from test6 (no matching cuisine): 
no restaurant found for the matching cuisine of abc
------
result from test7 (no matching style): 
no restaurant found for the matching style of abc
------
result from test8 (a combination of location, cuisine and style): 
Below is a list of the top 5 recommended restaurants for you: 
       distance_to_interest state     city  \
9236               4.579892    AZ  Phoenix   
23589              9.612561    AZ  Phoenix   
25730              7.135294    AZ  Phoenix   
44933              1.051736    AZ  Phoenix   
15502              6.433504    AZ  Phoenix   

                                         name                  address  \
9236                                  Bobby Q          8501 N 27th Ave   
23589                         Reathrey Sekong  1312 E Indian School Rd   
25730                   Papa Joe's Fish-N-Que   2019 W Bethany Home Rd   
44933                 Pork on a F

As shown, 11 tests (11 queries) are performed with a total CPU time of 10 seconds and elapsed time of 15 seconds. This averages to roughly 1-2 seconds per queries which is very reasonable in practice.

### 2.2.2 Testing of the personalized collaborative recommender module

In [10]:
%%time

# initiate a Recommender object
recommender = Recommender(n=5, personalized=True)

CPU times: user 3.83 s, sys: 12.3 s, total: 16.1 s
Wall time: 32.3 s


In [11]:
%%time

# test0: display only (same as no keywords)
print("------\nresult from test0 (display only): ")
recommender.display_recommendation()

# test1: no user id input
print("------\nresult from test1 (no user id input): ")
recommender.collaborative();

# test 2: invalid user id input
print("------\nresult from test2 (invalid user id input): ")
recommender.collaborative(user_id='928402');

------
result from test0 (display only): 
Below is a list of the top 5 recommended restaurants for you: 
      state             city                name  \
7464     AZ          Phoenix     Little Miss BBQ   
31910    NV        Las Vegas        Brew Tea Bar   
45401    NV        Las Vegas          Gelatology   
7784     NV  North Las Vegas        Poke Express   
28162    NV        Las Vegas  Meráki Greek Grill   

                            address  attributes.RestaurantsPriceRange2  \
7464           4301 E University Dr                                2.0   
31910  7380 S Rainbow Blvd, Ste 101                                1.0   
45401  7910 S Rainbow Blvd, Ste 110                                1.0   
7784        655 W Craig Rd, Ste 118                                2.0   
28162  4950 S Rainbow Blvd, Ste 160                                2.0   

                                   cuisine               style  review_count  \
7464                              barbeque         restau

In [12]:
%%time

# test 3: valid user id (no user data)
print("------\nresult from test3 (valid user id --- no user review data): ")
recommender.collaborative(user_id='-NzChtoNOw706kps82x0Kg');

------
result from test3 (valid user id --- no user review data): 
sorry, no personal data available for this user_id yet!
Here is the generic recommendation computed from all the users in our database:
Below is a list of the top 5 recommended restaurants for you: 
   predicted_stars state       city                              name  \
0         4.972518    AZ      Tempe  Affordable Party & Event Rentals   
1         4.952788    NV  Henderson                        Party Pros   
2         4.939452    NV  Henderson                    Firelight Barn   
3         4.937603    AZ    Phoenix         La Parilla Villa Catering   
4         4.930186    WI    Madison           The Conscious Carnivore   

                         address  attributes.RestaurantsPriceRange2  \
0         510 S 52nd St, Ste 105                                NaN   
1              1153 Enchanted Ct                                NaN   
2  133 W Lake Mead Pkwy, Ste 140                                2.0   
3          

In [15]:
%%time

# test 4: valid user id (user has only one review)
print("------\nresult from test4 (valid user id --- user has only one review): ")
recommender.collaborative(user_id='---89pEy_h9PvHwcHNbpyg');

------
result from test4 (valid user id --- user has only one review): 
Below is a list of the top 5 recommended restaurants for you: 
   predicted_stars state        city                              name  \
0         5.162014    AZ  Scottsdale                    Aloha Cakes AZ   
1         5.158219    AZ       Tempe  Affordable Party & Event Rentals   
2         5.157192    NV   Las Vegas                  CHEFit Meal Prep   
3         5.152490    NV   Henderson                    Firelight Barn   
4         5.144190    NV   Henderson                        Party Pros   

                         address  attributes.RestaurantsPriceRange2  \
0                            NaN                                2.0   
1         510 S 52nd St, Ste 105                                NaN   
2                6235 S Pecos Rd                                2.0   
3  133 W Lake Mead Pkwy, Ste 140                                2.0   
4              1153 Enchanted Ct                                N

As shown, it takes only 1 second to return the personalized recommendation ranks, but due to the limited user preference history, the recommendation is somewhat similar to the generic recommendation for unseen users. 

In [14]:
%%time

# test 5: valid user id (user has over 100 reviews)
print("------\nresult from test5 (valid user id --- user has over 100 reviews): ")
recommender.collaborative(user_id='---1lKK3aKOuomHnwAkAow');

------
result from test5 (valid user id --- user has over 100 reviews): 
Below is a list of the top 5 recommended restaurants for you: 
   predicted_stars state             city                     name  \
0         6.505740    NV        Las Vegas             Kabob N More   
1         6.396645    NV        Las Vegas              Tasty Grill   
2         6.306595    NV        Las Vegas           Tacos N' Ritas   
3         6.051560    NV        Las Vegas  KUMI by Chef Akira Back   
4         6.046227    NV  North Las Vegas  Amazing Thai Restaurant   

                          address  attributes.RestaurantsPriceRange2  \
0           3049 S Las Vegas Blvd                                2.0   
1               4140 S Durango Dr                                1.0   
2  MGM Grand, 3799 Las Vegas Blvd                                2.0   
3           3950 Las Vegas Blvd S                                3.0   
4          3000 W Ann Rd, Ste 109                                2.0   

          

As shown, even for users with more review history where the module needs to filter and remove all the rated restaurants from the recommendation list, it only takes 1 second to return the personalized recommendation rank. Thanks to the rich personal preference history, the recommendation is really personalized. As in this case, it seems to suggest that the user prefers restaurants with a rich number of reviews (popular restaurants), reasonable to good ratings (3.5-4.5) and in the lower price range (\$-\$$).

In [16]:
%%time

# test 6: valid user id (user has over 100 reviews)
print("------\nresult from test6 (valid user id --- user has over 100 reviews): ")
rec = recommender.collaborative(user_id='---1lKK3aKOuomHnwAkAow');

# filter the personalized recommendation with keywords
print("------\nfurther filtering the personalized recommendations by keywords:")
rec = recommender.keyword(df=rec, city='Phoenix', personalized=True)

------
result from test6 (valid user id --- user has over 100 reviews): 
Below is a list of the top 5 recommended restaurants for you: 
   predicted_stars state             city                     name  \
0         6.505740    NV        Las Vegas             Kabob N More   
1         6.396645    NV        Las Vegas              Tasty Grill   
2         6.306595    NV        Las Vegas           Tacos N' Ritas   
3         6.051560    NV        Las Vegas  KUMI by Chef Akira Back   
4         6.046227    NV  North Las Vegas  Amazing Thai Restaurant   

                          address  attributes.RestaurantsPriceRange2  \
0           3049 S Las Vegas Blvd                                2.0   
1               4140 S Durango Dr                                1.0   
2  MGM Grand, 3799 Las Vegas Blvd                                2.0   
3           3950 Las Vegas Blvd S                                3.0   
4          3000 W Ann Rd, Ste 109                                2.0   

          

In [17]:
%%time

# test 7: try to run keyword filtering on personalized recommendation directly
print("------\nresult from test7 (run keyword filtering on personalized recommendations directly):")
recommender.keyword(city='Phoenix', personalized=True)

------
result from test7 (run keyword filtering on personalized recommendations directly):
no personalized list of recommendations is generated yet!
please first run the collaborative recommender module or content-based recommender module for a personalized recommendations.
CPU times: user 43.7 ms, sys: 78.4 ms, total: 122 ms
Wall time: 128 ms


### 2.2.3 Testing of the personalized content-based recommender module

In [18]:
%%time

# initiate a Recommender object
recommender = Recommender(n=10, personalized=True)

CPU times: user 3.82 s, sys: 12.6 s, total: 16.4 s
Wall time: 30.9 s


In [19]:
%%time

# test0: display only (same as no keywords)
print("------\nresult from test0 (display only): ")
recommender.display_recommendation()

# test1: no user id input
print("------\nresult from test1 (no user id input): ")
recommender.content();

# test 2: invalid user id input
print("------\nresult from test2 (invalid user id input): ")
recommender.content(user_id='928402');

# test 3: valid user id (no user data)
print("------\nresult from test3 (valid user id --- no user review data): ")
recommender.content(user_id='-NzChtoNOw706kps82x0Kg');

------
result from test0 (display only): 
Below is a list of the top 5 recommended restaurants for you: 
      state             city                name  \
7464     AZ          Phoenix     Little Miss BBQ   
31910    NV        Las Vegas        Brew Tea Bar   
45401    NV        Las Vegas          Gelatology   
7784     NV  North Las Vegas        Poke Express   
28162    NV        Las Vegas  Meráki Greek Grill   

                            address  attributes.RestaurantsPriceRange2  \
7464           4301 E University Dr                                2.0   
31910  7380 S Rainbow Blvd, Ste 101                                1.0   
45401  7910 S Rainbow Blvd, Ste 110                                1.0   
7784        655 W Craig Rd, Ste 118                                2.0   
28162  4950 S Rainbow Blvd, Ste 160                                2.0   

                                   cuisine               style  review_count  \
7464                              barbeque         restau

In [21]:
%%time

# test 4: valid user id (user has only one review)
print("------\nresult from test4 (valid user id --- user has only one review): ")
recommender.content(user_id='---89pEy_h9PvHwcHNbpyg');

------
result from test4 (valid user id --- user has only one review): 
Below is a list of the top 5 recommended restaurants for you: 
   similarity_score state       city                               name  \
0          0.920511    NV  Henderson      The Bar At Bermuda & St. Rose   
1          0.913473    NV  Las Vegas  The Bar @ Las Vegas Blvd & Wigwam   
2          0.912803    NV  Las Vegas      The Bar @ Tropicana & Durango   
3          0.859365    NV  Las Vegas            The Bar @Trails Village   
4          0.848265    NV  Las Vegas              Distill - A Local Bar   

                          address  attributes.RestaurantsPriceRange2  \
0                11624 Bermuda Rd                                2.0   
1           8560 S Las Vegas Blvd                                2.0   
2         4705 S Durango, Ste 140                                2.0   
3  1916 Village Center Cir, Ste 6                                2.0   
4      6430 N Durango Dr, Ste 110                     

The total time to return the recommendation is only around 1 second. As shown, the personalized recommendation result features mid-price range bars of high ratings near Las Vegas, these are very personalized recommendations based on the user's only review of a 4-star nightlife bars in the mid-price range located in Las Vegas, and the user gives the bar a 5-star review with strong positive words, a clear indication of his/her preference.

In [22]:
%%time

# test 5: valid user id (user has over 100 reviews)
print("------\nresult from test5 (valid user id --- user has over 100 reviews): ")
recommender.content(user_id='Ox89nMY8HpT0vxfKGqDPdA');

------
result from test5 (valid user id --- user has over 100 reviews): 
Below is a list of the top 5 recommended restaurants for you: 
   similarity_score state     city                          name  \
0          0.656317    AZ  Gilbert              Joe's Farm Grill   
1          0.613937    AZ     Mesa                  Orchard Eats   
2          0.583544    AZ  Phoenix                 Welcome Diner   
3          0.575226    AZ  Phoenix  Wally's American Pub N Grill   
4          0.574281    AZ  Phoenix    Phoenix Public Market Cafe   

                    address  attributes.RestaurantsPriceRange2  \
0     3000 E Ray Rd, Bldg 1                                2.0   
1  2036 N Gilbert Rd, Ste 6                                1.0   
2           929 E Pierce St                                2.0   
3            5029 N 44th St                                2.0   
4            14 E Pierce St                                2.0   

                                             cuisine  \
0 

As shown, the personalized recommendation list features popular restaurants (over 150 reviews) in the low-to-mid price range, featuring american style cuisines (pizza, burger, sandwiches), located in Arizona. These are very personalized recommendations based on the user's history of 120 restaurant reviews.

In [23]:
%%time

# test 6: valid user id (user has over 100 reviews)
print("------\nresult from test6 (valid user id --- user has over 100 reviews): ")
rec = recommender.content(user_id='---1lKK3aKOuomHnwAkAow');

# filter the personalized recommendation with keywords
print("------\nfurther filtering the personalized recommendations by keywords:")
rec = recommender.keyword(df=rec, city='Phoenix', personalized=True)

------
result from test6 (valid user id --- user has over 100 reviews): 
Below is a list of the top 5 recommended restaurants for you: 
   similarity_score state       city                      name  \
0          0.642374    NV  Las Vegas                   Firefly   
1          0.616575    NV  Las Vegas      Julian Serrano Tapas   
2          0.610763    NV  Las Vegas                   Sinatra   
3          0.604054    NV  Las Vegas  Trevi Italian Restaurant   
4          0.602714    NV  Las Vegas       Eatt Gourmet Bistro   

                            address  attributes.RestaurantsPriceRange2  \
0                  3824 Paradise Rd                                2.0   
1             3730 Las Vegas Blvd S                                3.0   
2             3131 Las Vegas Blvd S                                4.0   
3             3500 S Las Vegas Blvd                                2.0   
4  7865 W Sahara Ave, Ste 104 & 105                                2.0   

                      

In [24]:
%%time

# test 7: try to run keyword filtering on personalized recommendation directly
print("------\nresult from test7 (run keyword filtering on personalized recommendations directly):")
recommender.keyword(city='Phoenix', personalized=True)

------
result from test7 (run keyword filtering on personalized recommendations directly):
no personalized list of recommendations is generated yet!
please first run the collaborative recommender module or content-based recommender module for a personalized recommendations.
CPU times: user 49.4 ms, sys: 89.2 ms, total: 139 ms
Wall time: 159 ms


# 3. Build the user interface of the hybrid recommendation engine

## 3.1 Implementation

In [35]:
def recommender_interface():
    """
    User interface for the recommendation engine.
    """
    
    print("Hey, welcome to the Hybrid Yelp Recommender!")
    # initiation
    boolean = True
    personalized = False
    original_score = False
    n = 5

    # initiate a recommender object
    print("Please wait while we initiate the recommendation engine\n loading...")
    recommender = Recommender(personalized=True)
    print("Yeah, recommendation engine is ready to use!")      
    
    # def nested function for obtaining user keywords for filtering the recommendations
    def _keywords():
        # initiate
        zipcode, city, state = None, None, None
        max_distance = 10
        cuisine, style, price = None, None, None
        # list of supporting cuisines and styles
        cuisines = ['mexican','italian','chinese','japanese','thai','indian','american (new)','american (traditional)',\
                    'french','middle eastern','korean','mediterranean','vietnamese','cajun','greek','hawaiian',\
                    'asian fusion','vegetarian','vegan','steakhouse','barbeque','sushi bars','tex-mex','specialty food',\
                    'gluten-free','coffee & tea','desserts','seafood','ice cream & frozen yogurt','bakeries','beer',\
                    'wine & spirits','soup','pizza','hot dogs','burgers','donuts','cupcakes','salad','tacos',\
                    'chicken wings','sandwiches','bubble tea','tapas/small plates','shaved ice','bagels','southern',\
                    'local flavor','latin american','custom cakes','ethinic food']
        styles = ['restaurants','fast food','food stands','street vendors','nightlife','buffets','bars','food trucks',\
                  'breakfast & brunch','diners','cocktail bars','pubs','sports bars','wine bars','beer bars',\
                  'casinos','juice bars & smoothies','caterers','delis','cafes','lounges','music venues',\
                  'performing arts','food delivery services','dive bars','dance clubs','breweries']
    
        r = input("What would you like to filter by? \n1 location (zipcode, city, state);\n2 cuisine;\n3 style;\n4 price range\nPlease enter the corresponding numbers. Multiple filtering criteria are supported, please separate the corresponding numbers by comma.\n")
        if len(r) > 0:
            print("Great! Now let's gather your filtering criteria.")
            kws = r.split(',')
            for kw in kws:
                try:
                    kw = int(kw)
                except:
                    print("Ooops, invalid input of '{}' skipped".format(kw))
                    continue
                if kw == 1:
                    print("Please follow the instructions to enter your location of interest or use the ENTER/RETURN key to skip.")
                    r = input("Please enter the zipcode of interest or use the ENTER/RETURN key to skip\n")
                    if len(r) > 0:
                        zipcode = r
                    r = input("Please enter the city of interest or use the ENTER/RETURN key to skip\n")
                    if len(r) > 0:
                        city = r
                    r = input("Please enter the state of interest or use the ENTER/RETURN key to skip\n")
                    if len(r) > 0:
                        state = r
                    r = input("Please enter the max distance allowed between the restaurant and your location of interest or use the ENTER/RETURN key to skip\n")
                    if len(r) > 0:
                        try:
                            max_distance = int(r)
                        except:
                            print("Ooops, invalid number! The max distance is set to the default 10 miles.")                                
                elif kw == 2: 
                    r = input("Please select one from the following cuisines as your interest or use the ENTER/RETURN key to skip:\n{}\n".format(cuisines))
                    if len(r) > 0:
                        cuisine = r
                elif kw == 3: 
                    r = input("Please select one from the following styles as your interest or use the ENTER/RETURN key to skip:\n{}\n".format(styles))
                    if len(r) > 0:
                        style = r
                elif kw == 4:
                    r = input("Please indicate your price range of interest: \n1 cheap ($);\n2 medium ($$);\n3 expensive ($$$);\n4 most expensive($$$$)\nPlease enter the corresponding number(s) separated by comma\n")
                    if len(r) > 0:
                        price = r
                else:
                    print("Ooops, invalid input of '{}' skipped".format(kw))
        return zipcode, city, state, max_distance, cuisine, style, price
    
    while boolean:
        
        r = input("Want to try a customized recommendation based on your Yelp user history? yes/no\n")   
        if r.startswith('Y') or r.startswith('y'):
            personalized = True   
    
        if personalized: 
            print("Awesome! Let's start your personalized recommendation. ")
            r = input("To retrieve your user history, please enter your Yelp User ID (length of 22 characters):\n")
            if len(r) == 0:
                print("Ooops, no user id is provided! Let's give it another try.")
                continue
            elif len(r) != 22:
                print("Ooops, it seems to be an invalid user id! Let's give it another try.")
                continue
            else:
                user_id = r
                print("Great! Valid user id fetched! Just one more question before generating your recommendations")
                r = input("Which personalized recommendation would you prefer? \n1. Something new based on people like you; \n2. Something similar to your favorate restaurants; \nPlease enter 1 or 2\n")
                try: 
                    r = int(r)
                    if r not in [1,2]:
                        print("Ooops, invalid input! Let's give it another try.")
                        continue
                    else:
                        print("Awesome, All set! Here is your personalized recommendations:\n")
                        if r == 1: 
                            result = recommender.collaborative(user_id=user_id)
                        else:
                            result = recommender.content(user_id=user_id)
                except:
                    print("Ooops, invalid input! Let's give it another try.")
                    continue
        else: 
            print("That's cool! Let's filter by keywords and generate your recommendations!")
            zipcode, city, state, max_distance, cuisine, style, price = _keywords()
            print("Great! Filtering criteria fetched! Just one more question before generating your recommendations")
            r = input("Wanna rank your recommendations by 'smart' ratings?\n'smart' rating adjusts the original restaurnat average star rating by the number of ratings it receives.\nEnter no to deactivate smart ratings or any other key to continue\n")
            if r.startswith('N') or r.startswith('n'):
                original_score = True
            print("Awesome, all set! Here is your recommendations:\n")
            result = recommender.keyword(zipcode=zipcode, city=city, state=state, max_distance=max_distance, cuisine=cuisine, style=style, price=price, original_score=original_score)
     
        # refine the recommendation results
        if result is not None and len(result) > 0:
            r = input("Would you like to display more/less recommendation results? Enter the desire number to continue or any other key to skip:\n")
            try:
                n = int(r)
                recommender.display_recommendation(n=n)
            except:
                pass
            r = input("Would you like to further filter your recommendation results by keywords? Enter yes to continue or any other key to skip:\n")
            if r.startswith('Y') or r.startswith('y'):
                zipcode, city, state, max_distance, cuisine, style, price = _keywords()
                result = recommender.keyword(df=result, zipcode=zipcode, city=city, state=state, max_distance=max_distance, cuisine=cuisine, style=style, price=price, personalized=personalized)
        
        # quit or restart
        print("Awesome, all done!")
        r = input("Please enter q to quit the recommender engine, or enter c to restart with another recommendation\n")
        if len(r) == 0 or r.startswith('Q') or r.startswith('q'):
            boolean = False
            print("Enjoy your recommendations! See you next time!")

## 3.2 Testing

### Non-personalized recommendation

In [36]:
recommender_interface()

Hey, welcome to the Hybrid Yelp Recommender!
Please wait while we initiate the recommendation engine
 loading...
Yeah, recommendation engine is ready to use!
Want to try a customized recommendation based on your Yelp user history? yes/no
no
That's cool! Let's filter by keywords and generate your recommendations!
What would you like to filter by? 
1 location (zipcode, city, state);
2 cuisine;
3 style;
4 price range
Please enter the corresponding numbers. Multiple filtering criteria are supported, please separate the corresponding numbers by comma.
1,2,3,4
Great! Now let's gather your filtering criteria.
Please follow the instructions to enter your location of interest or use the ENTER/RETURN key to skip.
Please enter the zipcode of interest or use the ENTER/RETURN key to skip

Please enter the city of interest or use the ENTER/RETURN key to skip
las vegas
Please enter the state of interest or use the ENTER/RETURN key to skip

Please enter the max distance allowed between the restaurant 

Would you like to further filter your recommendation results by keywords? Enter yes to continue or any other key to skip:

Awesome, all done!
Please enter q to quit the recommender engine, or enter c to restart with another recommendation
q
Enjoy your recommendations! See you next time!


### Personalized collaborative recommendation

In [37]:
recommender_interface()

Hey, welcome to the Hybrid Yelp Recommender!
Please wait while we initiate the recommendation engine
 loading...
Yeah, recommendation engine is ready to use!
Want to try a customized recommendation based on your Yelp user history? yes/no
yes
Awesome! Let's start your personalized recommendation. 
To retrieve your user history, please enter your Yelp User ID (length of 22 characters):
---1lKK3aKOuomHnwAkAow
Great! Valid user id fetched! Just one more question before generating your recommendations
Which personalized recommendation would you prefer? 
1. Something new based on people like you; 
2. Something similar to your favorate restaurants; 
Please enter 1 or 2
1
Awesome, All set! Here is your personalized recommendations:

Below is a list of the top 5 recommended restaurants for you: 
   predicted_stars state             city                     name  \
0         6.505740    NV        Las Vegas             Kabob N More   
1         6.396645    NV        Las Vegas              Tasty G

### Personalized content-based recommendation

In [38]:
recommender_interface()

Hey, welcome to the Hybrid Yelp Recommender!
Please wait while we initiate the recommendation engine
 loading...
Yeah, recommendation engine is ready to use!
Want to try a customized recommendation based on your Yelp user history? yes/no
yes
Awesome! Let's start your personalized recommendation. 
To retrieve your user history, please enter your Yelp User ID (length of 22 characters):
---1lKK3aKOuomHnwAkAow
Great! Valid user id fetched! Just one more question before generating your recommendations
Which personalized recommendation would you prefer? 
1. Something new based on people like you; 
2. Something similar to your favorate restaurants; 
Please enter 1 or 2
2
Awesome, All set! Here is your personalized recommendations:

Below is a list of the top 5 recommended restaurants for you: 
   similarity_score state       city                      name  \
0          0.642374    NV  Las Vegas                   Firefly   
1          0.616575    NV  Las Vegas      Julian Serrano Tapas   
2   