# Yelp Recommendation Engine

Yelp releases their data for academic purposes and challenges the students community to come up with ideas that can help evolve the company. We are breaking the dataset into a small chunk and creating a recommendation system that will help the user choose the restuarant based on a few choices they have in a particular city. 

## Exploratory Analysis

### For Business

#### We need to come up with an accepted range of data as the downloaded yelp dataset contains 10,000,000 rows and we do not have the capacity to run analysis on these many rows.

* Identify the right Category
* Idenitfy the right City
* Identify the relevant reviews


**Step 1:** Importing Libraries for exploration

In [1]:
import pandas as pd
import numpy as np

We have decided that we want to recommend a restuarant to the end user. Hence the category will be **Restuarants**

Importing the yelp Business Data Set and Finding the relevant **City**

In [2]:
rest = pd.read_json("yelp_academic_dataset_business.json", lines=True)

In [3]:
rest.loc[0].attributes['GoodForKids']

'False'

Looking at the row to identify necessary variables

In [4]:
rest.loc[1]

business_id                                QXAEGFB4oINsVuTFxEYKFQ
name                                   Emerald Chinese Restaurant
address                                      30 Eglinton Avenue W
city                                                  Mississauga
state                                                          ON
postal_code                                               L5R 3E7
latitude                                                  43.6055
longitude                                                -79.6523
stars                                                         2.5
review_count                                                  128
is_open                                                         1
attributes      {'RestaurantsReservations': 'True', 'GoodForMe...
categories      Specialty Food, Restaurants, Dim Sum, Imported...
hours           {'Monday': '9:0-0:0', 'Tuesday': '9:0-0:0', 'W...
Name: 1, dtype: object

The chosen category **Restaurants** Have about 60000 rows. This is managable but we can drill down to a single city. Hence We will look into the available cities

In [5]:
len(rest[rest['categories'].str.contains('Restaurants') == True])

59371

#### Listing the Cities Available

In [6]:
list(rest.loc[rest['state'] == 'CA', 'city'].unique())

['Los Angeles',
 'Las Vegas',
 'Monterey Park',
 'Las Vegas Nv',
 'Chandler',
 'San Diego',
 'Antioch',
 'Temecula',
 'Peninsula',
 'Huntington Beach',
 'Surprise',
 'Scottadale',
 'Nationwide',
 'Gilbert',
 'Morgan Hill',
 'Sacramento',
 'Dublin']

In [7]:
rest[rest['city'].str.contains('Diego')]

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
56948,oBEFhUe7yEH1PK25bImCWA,Brooks Photography,"4372 West Point Loma Blvd, Ste A",San Diego,CA,92107,36.175,-115.136389,1.5,35,1,{'BusinessAcceptsCreditCards': 'True'},"Session Photography, Event Photography, Event ...",


In [8]:
rest[rest['city'].str.contains('Angeles')]

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
19062,YMeWjOd1svHDGdDCKoiGgg,Electric Daisy Carnival,7000 N Las Vegas Blvd,Los Angeles,CA,90037,36.27326,-115.00943,4.5,36,0,"{'OutdoorSeating': 'True', 'GoodForKids': 'Fal...","Local Flavor, Festivals, Arts & Entertainment,...",
137673,gavl0UJkI0Z5Dzs_tHXQ9A,Rebecca Vinacour Photography,,Los Angeles,CA,90001,33.457453,-112.060988,4.5,11,1,"{'BusinessAcceptsCreditCards': 'True', 'Busine...","Session Photography, Event Planning & Services...",


In [9]:
rest[rest['city'].str.contains('Vegas')]

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
7,gbQN7vr_caG_A1ugSmGhWg,Supercuts,"4545 E Tropicana Rd Ste 8, Tropicana",Las Vegas,NV,89121,36.099872,-115.074574,3.5,3,1,"{'RestaurantsPriceRange2': '3', 'GoodForKids':...","Hair Salons, Hair Stylists, Barbers, Men's Hai...","{'Monday': '10:0-19:0', 'Tuesday': '10:0-19:0'..."
17,PZ-LZzSlhSe9utkQYU8pFg,Carluccio's Tivoli Gardens,"1775 E Tropicana Ave, Ste 29",Las Vegas,NV,89119,36.100016,-115.128529,4.0,40,0,"{'OutdoorSeating': 'False', 'BusinessAcceptsCr...","Restaurants, Italian",
18,nh_kQ16QAoXWwqZ05MPfBQ,Myron Hensel Photography,,Las Vegas,NV,89121,36.116549,-115.088115,5.0,21,1,{'BusinessAcceptsCreditCards': 'True'},"Event Planning & Services, Photographers, Prof...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
20,dFMxzHygTy6F873843dHAA,Fremont Arcade,"450 Fremont St, Ste 179",Las Vegas,NV,89101,36.169993,-115.140685,4.5,38,1,{'GoodForKids': 'True'},"Arcades, Arts & Entertainment","{'Monday': '11:0-0:0', 'Tuesday': '11:0-0:0', ..."
21,lxnuq9wJiwLOPJ4uZU2ljg,Las Vegas Motorcars,"3650 N 5th, Ste 100",North Las Vegas,NV,89032,36.225851,-115.132800,3.5,3,1,{'BusinessAcceptsCreditCards': 'True'},"Automotive, Car Dealers","{'Monday': '9:0-18:0', 'Tuesday': '9:0-18:0', ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
192584,XlRGtOjPuiEfZsKuo2fRdw,Costco Hearing Aid Center,801 S Pavilion Dr,Las Vegas,NV,89101,36.163000,-115.332709,5.0,3,1,,,
192586,6tOKoZX1Gj3Uzjc1-JzYNQ,Premier Landscape Maintenance,,Las Vegas,NV,89107,36.168824,-115.218557,1.0,3,1,"{'BusinessAcceptsCreditCards': 'True', 'Busine...","Landscape Architects, Home Services, Gardeners...","{'Monday': '0:0-0:0', 'Tuesday': '7:0-14:0', '..."
192598,vIAEWbTJc657yN8I4z7whQ,Starbucks,"8164 S. Las Vegas Blvd., #100",Las Vegas,NV,89123,36.041407,-115.171698,3.0,138,1,"{'OutdoorSeating': 'True', 'WiFi': 'u'free'', ...","Food, Coffee & Tea","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
192604,nqb4kWcOwp8bFxzfvaDpZQ,Sanderson Plumbing,,North Las Vegas,NV,89032,36.213732,-115.177059,5.0,9,1,{'BusinessAcceptsCreditCards': 'True'},"Water Purification Services, Water Heater Inst...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."


We can see that San Diego and Los Angeles are something relevant but when we explored further, they containes **<10 rows**. Hence we are moving on to 'Las Vegas' that contains about **7000** restaurants

#### We will only focus on these business from now on. 

*Storing it into a json file for easy access and loading for the next step

In [10]:
rest_focus = rest[(rest['categories'].str.contains('Restaurants') == True) & rest['city'].str.contains('Vegas') == True]

In [None]:
rest_focus.to_json("Yelp Chosen Restaruants.json")


## Exploratory Analysis

### For Reviews

We want only the relevant reviews. Hence we will load the complete data, filter the relevant reviews by comparing the **'business_id'** variable and save that into another excel file:

In [12]:
rev = pd.read_json("yelp_academic_dataset_review.json", lines = True)

In [15]:
rev.loc[0]

review_id                                 Q1sbwvVQXV2734tPgoKj4Q
user_id                                   hG7b0MtEbXx5QzbzE6C_VA
business_id                               ujmEBvifdJM6h6RLv4wQIg
stars                                                          1
useful                                                         6
funny                                                          1
cool                                                           0
text           Total bill for this horrible service? Over $8G...
date                                         2013-05-07 04:34:36
Name: 0, dtype: object

In [23]:
len(rev)

6685900

In [19]:
useful_review = pd.merge(rev, rest_focus, on='business_id')

In [20]:
len(useful_review)

1269770

In [24]:
useful_review.loc[0]

review_id                                  kbtscdyz6lvrtGjD1quQTg
user_id                                    FIk4lQQu1eTe2EpzQ4xhBA
business_id                                8mIrX_LrOnAqWsB5JrOojQ
stars_x                                                         4
useful                                                          0
funny                                                           0
cool                                                            0
text            Like walking back in time, every Saturday morn...
date                                          2011-11-30 02:11:15
name                                         Pinball Hall Of Fame
address                                      1610 E Tropicana Ave
city                                                    Las Vegas
state                                                          NV
postal_code                                                 89119
latitude                                                  36.1014
longitude 

In [25]:
useful_review[useful_review['user_id'] == 'FIk4lQQu1eTe2EpzQ4xhBA']

Unnamed: 0,review_id,user_id,business_id,stars_x,useful,funny,cool,text,date,name,...,state,postal_code,latitude,longitude,stars_y,review_count,is_open,attributes,categories,hours
0,kbtscdyz6lvrtGjD1quQTg,FIk4lQQu1eTe2EpzQ4xhBA,8mIrX_LrOnAqWsB5JrOojQ,4,0,0,0,"Like walking back in time, every Saturday morn...",2011-11-30 02:11:15,Pinball Hall Of Fame,...,NV,89119,36.101449,-115.130511,4.5,1258,1,"{'RestaurantsGoodForGroups': 'True', 'Restaura...","Performing Arts, Amusement Parks, Museums, Arc...","{'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0'..."
37061,63CwPLQFB6azpxnDcdWwzg,FIk4lQQu1eTe2EpzQ4xhBA,Wxxvi3LZbHNIDwJ-ZimtnA,5,0,0,0,"Love this place, walking into the front desk a...",2011-11-29 17:57:46,The Venetian Las Vegas,...,NV,89109,36.121189,-115.169657,4.0,3499,1,"{'BusinessAcceptsCreditCards': 'True', 'Restau...","Shopping Centers, Resorts, Arts & Entertainmen...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
40422,zVZtAp5z1Nm2XgKoScIM6g,FIk4lQQu1eTe2EpzQ4xhBA,tstimHoMcYbkSC4eBA1wEg,4,0,0,0,Quiet small place only three tables with patro...,2013-10-31 02:18:37,Maria's Mexican Restaurant & Bakery,...,NV,89156,36.195615,-115.040529,4.5,184,1,"{'BikeParking': 'True', 'BusinessParking': '{'...","Mexican, Restaurants, Patisserie/Cake Shop, Fo...","{'Monday': '11:0-21:0', 'Tuesday': '10:0-21:0'..."
40434,-YEhLnvihXC8NvjaGvsxww,FIk4lQQu1eTe2EpzQ4xhBA,tstimHoMcYbkSC4eBA1wEg,4,1,1,1,"Stopped in after a bit of dinner next store, l...",2013-10-31 02:12:20,Maria's Mexican Restaurant & Bakery,...,NV,89156,36.195615,-115.040529,4.5,184,1,"{'BikeParking': 'True', 'BusinessParking': '{'...","Mexican, Restaurants, Patisserie/Cake Shop, Fo...","{'Monday': '11:0-21:0', 'Tuesday': '10:0-21:0'..."
43167,gY_ZZsmx3k5IF5qXEH-hnQ,FIk4lQQu1eTe2EpzQ4xhBA,qG_WEgPa8MBo1dPUOkTMlw,4,0,0,0,I have relatives that talk about in and out as...,2011-11-30 03:13:42,In-N-Out Burger,...,NV,89147,36.100646,-115.302042,4.5,372,1,"{'BusinessParking': '{'garage': False, 'street...","Burgers, Fast Food, Restaurants","{'Monday': '10:30-1:0', 'Tuesday': '10:30-1:0'..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1262689,E1ARJq-Ln7kTqmdOZFvW2g,FIk4lQQu1eTe2EpzQ4xhBA,8fiZjbPMgg4qDSLIdZsWZg,4,2,1,2,"Think bright yellow and orange booths, daisy l...",2013-12-30 02:06:10,Sunshine Cafe,...,NV,89108,36.187877,-115.206286,4.0,39,0,"{'RestaurantsPriceRange2': '1', 'OutdoorSeatin...","Restaurants, Diners, Breakfast & Brunch","{'Monday': '7:0-15:0', 'Tuesday': '7:0-15:0', ..."
1266683,4zRNR2Xfz8sbZ6hxiLc40w,FIk4lQQu1eTe2EpzQ4xhBA,lF4pEu4_55SSFEo6Q58ftQ,4,2,1,2,My husband had been reading about this place f...,2012-07-13 17:04:44,German Bread Bakery,...,NV,89134,36.202011,-115.282528,4.5,178,1,"{'OutdoorSeating': 'False', 'Caters': 'True', ...","Restaurants, Desserts, Pretzels, Food, Bakerie...","{'Monday': '7:30-16:30', 'Tuesday': '7:30-16:3..."
1266878,1PE_WpQzWdo9uZkdE5SZ3Q,FIk4lQQu1eTe2EpzQ4xhBA,CZJ-s3Io3TQ0Jr2r_tm51g,3,1,0,0,This place is now bigger and free standing bui...,2016-01-04 02:06:08,Popeyes Louisiana Kitchen,...,NV,89146,36.143640,-115.225597,2.5,33,0,"{'BikeParking': 'True', 'BusinessParking': '{'...","Chicken Wings, Fast Food, Restaurants","{'Monday': '10:0-22:0', 'Tuesday': '10:0-22:0'..."
1267261,0z-Xe5xL-XgkqePG0iwCxg,FIk4lQQu1eTe2EpzQ4xhBA,TUpyKJFqL_ySZMo54pT-Sw,4,0,0,0,Mostly icream and other sweets but they have s...,2013-10-31 02:07:50,La Flor De Michoacan Restaurant,...,NV,89156,36.195615,-115.040529,3.5,52,1,"{'BikeParking': 'True', 'RestaurantsReservatio...","Juice Bars & Smoothies, American (New), Food, ...","{'Monday': '10:0-22:0', 'Tuesday': '10:0-22:0'..."


#### Shortlisted reviews

When checked on a single user, we found that person has rated 369 restuarants. This is great to train our recommendation engine model. We are saving this subset into another json file

In [None]:
useful_review.to_json("Yelp Useful Reviews.json")