# Keyword search recommender module

## Introduction

Keyword search recommender filtering is constructed to provide recommendations with seacrhed keywords. 

It consist of two sub-modules:
1. Location based keywords such as zip code, longitude, latitude
2. Feature based keywords such as cuisine, style, price

## Implementation Strategy

### 1 Pre computation

#### 1.1 Distnace calculation
To calculate distance between recommended restaurant and user location, `the great circle distance calculated assuming a spherical model of the earth` is implemented as it is more accurate and faster in terms of computing power when compared to other methods. 

#### 1.2 Adjusted ratings
Adjusted ratings is calculated to replace restaurant's average star ratings.

The adjusted score uses the mechanism of the damped mean to regulate the extreme cases of having only a few extreme ratings. k controls the strength of the damping effect: the larger k is, the more actual ratings are required to overcome the global mean. In this case, k is set to 30 (50% quantile of the review counts for all businesses), but it can be tuned according to various business considerations.

Adjusted ratings improves original ratings by considering:
1. Average ratings of the restaurant
2. Number of ratings by users

### 2 Implementation

#### 2.1 Development
Restaurants are filtered by all the user's keywords then ranked by user selected ranking critetia. At the end, top n restaurants returned (top n configuarble by user's preference). 

#### 2.2 Testing
Different test cases are implemented to see completeness and computing time.

In [1]:
import pandas as pd
import numpy as np

# for distance calculations
from geopy.distance import great_circle   
from geopy.geocoders import Nominatim

import warnings
warnings.filterwarnings('ignore')

### 1 Pre computation

#### 1.1 Distance calculation

In [2]:
business = pd.read_csv('clean_business.csv')
business['postal_code'] = business.postal_code.astype(str)
business.head()

Unnamed: 0,city,attributes.GoodForMeal,attributes.Smoking,attributes.BusinessAcceptsBitcoin,address,attributes.BYOBCorkage,attributes.WheelchairAccessible,attributes.RestaurantsDelivery,state,attributes.OutdoorSeating,...,hours.Tuesday,attributes.BusinessParking,hours.Monday,attributes.CoatCheck,hours,hours.Friday,attributes.BusinessAcceptsCreditCards,attributes.RestaurantsTableService,cuisine,style
0,Champaign,,,,404 E Green St,,,True,IL,False,...,11:30-14:30,,11:30-14:30,,"{'Monday': '11:30-14:30', 'Tuesday': '11:30-14...",11:30-14:30,True,,"ethnic food,specialty food","food trucks,restaurants"
1,Charlotte,,,,4508 E Independence Blvd,,,,NC,False,...,,"{'garage': False, 'street': False, 'validated'...",,,,,True,,middle eastern,restaurants
2,Mesa,,,,1255 W Main St,,,,AZ,,...,0:0-0:0,"{'garage': False, 'street': False, 'validated'...",0:0-0:0,,"{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W...",0:0-0:0,True,,"donuts,coffee & tea",juice bars & smoothies
3,Phoenix,,,,"4550 East Cactus Rd, #KSFC-4",,,False,AZ,False,...,10:0-21:0,"{'garage': False, 'street': False, 'validated'...",10:0-21:0,,"{'Monday': '10:0-21:0', 'Tuesday': '10:0-21:0'...",10:0-21:0,True,,bakeries,"fast food,restaurants"
4,Las Vegas,,,,9595 W Tropicana Ave,,,False,NV,False,...,,"{'garage': False, 'street': False, 'validated'...",,,,,True,,mexican,"restaurants,fast food"


In [3]:
delhi = (28.644800, 77.216721)
mumbai = (19.076090, 72.877426)

print(great_circle(delhi, mumbai).miles)

715.5398883895449


In [4]:
## Get geolocation of a query

geolocator = Nominatim(user_agent="Recommendation")
address = "175 5th Avenue NYC"
location = geolocator.geocode(address)

print(location.address)
print((location.latitude, location.longitude))

Flatiron Building, 175, 5th Avenue, Flatiron District, Manhattan Community Board 5, Manhattan, New York County, New York, 10010, United States
(40.741059199999995, -73.98964162240998)


#### 1.2 Adjusted ratings

In [5]:
# Global mean ratings

mean_global = ((business.stars * business.review_count).sum() / (business.review_count.sum()))
print("Global mean ratings is ", mean_global)

Global mean ratings is  3.7613193324997347


In [6]:
# check review_count quantile

print(business.review_count.quantile([0.25, 0.5, 0.75]))

0.25    10.0
0.50    30.0
0.75    92.0
Name: review_count, dtype: float64


In [7]:
# set k = 30 as it is 50% quantile
k = 30

business['stars_adj'] = ((business.review_count * business.stars) + (k * mean_global)) / (business.review_count + k)

# Display changed metrics 

print('\n Rank by original stars:')
print(business[['review_count', 'stars', 'stars_adj']].sort_values('stars', ascending=False).head())

print('\n Rank by adjusted stars:')
print(business[['review_count', 'stars', 'stars_adj']].sort_values('stars_adj', ascending=False).head())


 Rank by original stars:
       review_count  stars  stars_adj
15052             3    5.0   3.873927
4334              3    5.0   3.873927
28927            22    5.0   4.285377
40392             3    5.0   3.873927
11821            87    5.0   4.682390

 Rank by adjusted stars:
       review_count  stars  stars_adj
29761          2329    5.0   4.984247
2648           1827    5.0   4.979989
33734          1107    5.0   4.967317
35172           858    5.0   4.958153
3590            842    5.0   4.957385


### 2 Implementation

#### 2.1 Development

In [11]:
class Recommender_Engine:    
    
    def __init__(self, n=5, stars_original=False):
        """
        Instantiate the object. Default setting for ranking would be stars_adj, to rank by original stars set stars_original=True.
        """
        self.n = n                                                     # Number of recommendations
        self.stars_original = stars_original                           # Boolean for ranking method                            
        self.disply_columns = ['name', 'address', 'city','state',\
                               'attributes.RestaurantsPriceRange2',\
                               'review_count','stars','stars_adj',\
                               'cuisine','style']                    # List of columns to be displayed in the results
        
        if self.stars_original:
            score = 'stars'
        else:
            score = 'stars_adj'
            
        self.recommendation = business[business.is_open == 1].sort_values(score, ascending=False)
                                                                      # Filter only open restaurants
            
    def filter_location(self):
        """
        Filter recommendations by user's location. Matching restaurant is the restaurant within the acceptable distance of the location of interest.
        """
        from geopy.geocoders import Nominatim
        from geopy.distance import great_circle
        
        geolocator = Nominatim(user_agent="Recommendation")
        address = [self.city, self.state, self.zipcode]
        address = ",".join([str(i) for i in address if i != None])
        location = geolocator.geocode(address, timeout=10)
        
        # Calculate recommendations distance and append a column
        self.recommendation['distance_recommendations'] = self.recommendation.apply\
        (lambda row: (great_circle((row.latitude, row.longitude), (location.latitude, location.longitude)).miles), axis=1)
        
        self.disply_columns.insert(0, 'distance_recommendations')
        self.recommendation = self.recommendation[self.recommendation.distance_recommendations <= self.distance_max]
        
    def filter_state(self):
        self.recommendation = self.recommendation[self.recommendation.state == self.state]
        
    def filter_price(self):
        self.recommendation = self.recommendation[self.recommendation\
                            ['attributes.RestaurantsPriceRange2'].isin(self.price)]
    
    def filter_cuisine(self):
        
        idx = []
        for i in self.recommendation.index:
            if self.recommendation.loc[i, 'cuisine'] is not np.nan:
                entry = self.recommendation.loc[i, 'cuisine']
                entry = str(entry).split(',')
                if self.cuisine in entry:
                    idx.append(i)
        self.recommendation = self.recommendation.loc[idx]
         
    def filter_style(self):
        
        idx = []
        for i in self.recommendation.index:
            if self.recommendation.loc[i, 'style'] is not np.nan:
                entry = self.recommendation.loc[i, 'style']
                entry = str(entry).split(',')
                if self.style in entry:
                    idx.append(i)
        self.recommendation = self.recommendation.loc[idx]
        
    def display(self):
        
        if len(self.recommendation) == 0:
            print("Sorry, there are no matching recommendations.")
        elif self.n < len(self.recommendation):
            print("Below is the list of the top {} recommended restaurants for you: ".format(self.n))
            print(self.recommendation.iloc[:self.n][self.disply_columns])
        else:
            print("Below is the list of the top {} recommended restaurants for you: ".format(len(self.recommendation)))
            print(self.recommendation.iloc[self.disply_columns]) 
        
    def keyword_filtering(self, catalog=business[business.is_open == 1], price=None,\
                          zipcode=None, city=None, state=None, distance_max=10, cuisine=None, style=None):
        
        
        self.recommendation = catalog                                  # Set restaurant catalog
        self.recommendation['distance__recommendations'] = np.nan      # Reset distance
        self.disply_columns = ['name', 'address', 'city','state',\
                               'attributes.RestaurantsPriceRange2',\
                               'review_count','stars','stars_adj',\
                               'cuisine','style']                    # Reset columns
        
        # Based on keyword search
        self.zipcode = zipcode
        self.city = city
        self.state = state
        self.distance_max = distance_max
        self.cuisine = cuisine
        self.style = style
        self.price = price
        
        # Filter_location
        if (self.zipcode != None) or (self.city != None) or (self.state != None):      
            if (self.zipcode != None) or (self.city != None):
                self.filter_location()
            elif (self.state != None):
                self.filter_state()
            if len(self.recommendation) == 0:
                print("Sorry, there are no matching recommendations.")
                
        # Filter_price
        if self.price != None:
            self.price = [i.strip() for i in price.split(',')]        # Multiple inputs 
            self.filter_price()
            if len(self.recommendation) == 0:
                print("Sorry, there are no matching recommendations.")
             
        # Filter_cuisine
        if self.cuisine != None:
            self.filter_cuisine()
            if len(self.recommendation) == 0:
                print("Sorry, there are no matching recommendations.")
        
        # Filter_style
        if self.style != None:
            self.filter_style()
            if len(self.recommendation) == 0:
                print("Sorry, there are no matching recommendations.")
                
                
        # Sort recommendations by user input for ranking method
        if self.stars_original:
            score = 'stars'
        else:
            score = 'stars_adj'
            
        self.recommendation = self.recommendation.sort_values(score, ascending=False)
        
        # Display recommendations
        self.display()
        
        return self.recommendation
        
                

#### 2.2 Testing

In [13]:
%%time

# Instantiate the object
results = Recommender_Engine();

# Test case 1: Display results
print('Test case 1: *****------------*****\n');
results.display();

# Test case 2: Keyword with city
print('Test case 2: *****------------*****\n')
results.keyword_filtering(city='Phoenix');

# Test case 3: Keyword with state
print('Test case 3: *****------------*****\n')
results.keyword_filtering(state='NV');

# Test case 4: Keyword with zipcode
print('Test case 4: *****------------*****\n')
results.keyword_filtering(zipcode='85023');

# Test case 5: Keyword combination of city, state and zipcode
print('Test case 5: *****------------*****\n')
results.keyword_filtering(city='Las Vegas', state='NV', zipcode='89101');

# Test case 6: Keyword with cuisine
print('Test case 6: *****------------*****\n')
results.keyword_filtering(cuisine='mexican');

# Test case 7: Keyword with style
print('Test case 7: *****------------*****\n')
results.keyword_filtering(style='food trucks');

# Test case 8: Keyword combination of cuisine, style
print('Test case 8: *****------------*****\n')
results.keyword_filtering(cuisine='barbeque', style='restaurants');

# Test case 9: Keyword combination of location, cuisine, style
print('Test case 9: *****------------*****\n')
results.keyword_filtering(city='Las Vegas', state='NV', cuisine='mexican', style='restaurants');

# Test case 10: Keyword combination of price range, location, cuisine, style
print('Test case 10: *****------------*****\n')
results.keyword_filtering(price='1,2', city='Las Vegas', state='NV', cuisine='mexican', style='restaurants');

Test case 1: *****------------*****

Below is the list of the top 5 recommended restaurants for you: 
                          name                       address       city state  \
29761          Little Miss BBQ          4301 E University Dr    Phoenix    AZ   
2648              Brew Tea Bar  7380 S Rainbow Blvd, Ste 101  Las Vegas    NV   
33734          Cocina Madrigal                4044 S 16th St    Phoenix    AZ   
35172  Green Corner Restaurant    1038 W Southern Ave, Ste 1       Mesa    AZ   
3590            Worth Takeaway                 218 W Main St       Mesa    AZ   

      attributes.RestaurantsPriceRange2  review_count  stars  stars_adj  \
29761                                 2          2329    5.0   4.984247   
2648                                  1          1827    5.0   4.979989   
33734                                 2          1107    5.0   4.967317   
35172                                 2           858    5.0   4.958153   
3590                                

Unnamed: 0,city,attributes.GoodForMeal,attributes.Smoking,attributes.BusinessAcceptsBitcoin,address,attributes.BYOBCorkage,attributes.WheelchairAccessible,attributes.RestaurantsDelivery,state,attributes.OutdoorSeating,...,attributes.CoatCheck,hours,hours.Friday,attributes.BusinessAcceptsCreditCards,attributes.RestaurantsTableService,cuisine,style,stars_adj,distance__recommendations,distance_recommendations
21628,Las Vegas,"{'dessert': None, 'latenight': False, 'lunch':...",,False,1615 S Las Vegas Blvd,,True,False,NV,False,...,,"{'Monday': '0:0-0:0', 'Tuesday': '11:0-21:0', ...",11:0-21:0,True,True,"tacos,seafood,mexican","fast food,dive bars,bars,nightlife,restaurants",4.945988,,1.087552
26503,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,"3250 N Tenaya Way, Ste 104",,True,True,NV,False,...,,"{'Monday': '0:0-0:0', 'Tuesday': '10:0-18:0', ...",10:0-18:0,True,False,"coffee & tea,mexican,sandwiches,french,bakerie...","fast food,cafes,restaurants",4.755524,,6.758736
3100,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,6030 W Windmill Ln,,,False,NV,False,...,,"{'Monday': '0:0-0:0', 'Tuesday': '10:0-15:0', ...",10:0-21:0,True,False,mexican,restaurants,4.724738,,9.599407
9063,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,"3355 Spring Mountain Rd, Ste 35",,True,True,NV,True,...,,"{'Monday': '0:0-0:0', 'Tuesday': '17:0-22:0', ...",17:0-22:0,True,True,"seafood,latin american,mexican,tapas/small plates","bars,wine bars,restaurants,nightlife",4.695406,,3.499963
21366,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,5717 Merrill Ave,,,True,NV,True,...,,"{'Monday': '10:0-20:0', 'Tuesday': '10:0-20:0'...",10:0-20:0,True,,"mexican,american (traditional)",restaurants,4.642688,,7.011026
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16968,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,2880 S Las Vegas Blvd,,,False,NV,False,...,,"{'Monday': '12:0-22:0', 'Thursday': '12:0-22:0...",12:0-22:0,True,,"american (traditional),mexican","fast food,restaurants",2.312660,,2.252226
36137,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,5757 Wayne Newton Blvd,,,False,NV,False,...,,,,True,False,mexican,restaurants,2.245765,,5.833564
40482,Las Vegas,"{'dessert': False, 'latenight': False, 'lunch'...",,,5757 Wayne Newton Blvd,,,False,NV,False,...,,"{'Monday': '7:0-21:0', 'Tuesday': '7:0-21:0', ...",7:0-21:0,True,,mexican,restaurants,2.096072,,5.993529
6894,North Las Vegas,"{'dessert': True, 'latenight': False, 'lunch':...",,,6160 N Decatur Blvd,,,True,NV,False,...,,"{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W...",0:0-0:0,True,False,mexican,"restaurants,fast food",2.056062,,7.913717


As observed above, 10 queries took about 9 seconds of time. This comes about 1 second per query which is reasonable computation time