# Business Understanding

## Introduction
Savor Space is a growing tourism and travel agency offering a wide range of services; from booking accommodations to providing tour guides and creating personalized travel experiences. Our mission is to ensure that tourists have an enriching and seamless journey hence fully enjoying their destination.

We believe that traveling is about more than just sightseeing,it's also about savoring local cuisines that suit individual tastes. Whether a tourist is looking for traditional dishes, vegan options, or a fine dining experience, finding their preferred restaurant guarantees a satisfying experience. To enhance this specific aspect of their journey, Savor Space is developing a restaurant recommendation system designed to help tourists discover dining spots that align with their preferences,ensuring a memorable and enjoyable culinary experience.








## Business Problem

Its not easy for a majority of tourists to find restaurants that align with their tastes, especially when visiting unfamiliar locations. Lack of personalized recommendations means that they may rely on random reviews, which do not always reflect their preferences or specific dining needs. The lack of tailored suggestions can negatively impact a tourist's overall experience. The problem Savor Space seeks to solve is how to provide accurate and personalized restaurant recommendations based on individual preferences, allowing tourists to easily find eateries they will enjoy.

#### Stakeholders
* **Savor Space Management:** An interest in offering personalized and innovative services to boost customer satisfaction and retention thus positvely impacting their ROI.

* **Tourists:** Seeking personalized restaurant recommendations that match their tastes hence improving their travel experience.

* **Restaurant Owners:** Although they are not directly involved, they may benefit from increased traffic when their restaurant is recommended in alignment with customer preferences.

#### Objectives

**1. To develop a robust restaurant recommendation system**- this offers personalized restaurant suggestions based on user preferences, past behaviors, and restaurant reviews.

**2. To improve tourist satisfaction**- this is done by enabling them discover restaurants that match their individual tastes, enhancing their overall travel experience.

**3. To leverage data science techniques**-these include: Natural Language Processing (NLP) and recommendation algorithms (content-based and collaborative filtering) to ensure accurate and reliable recommendations.

**4. To evaluate and improve the recommendation system**- can be achieved by using advanced models and performance metrics like RMSE to optimize the system's accuracy.


## Data Understanding

In order to come up with a cutting edge restaurant recommendation system, we opted to get real-time data from Yelp.com. The data was extracted through web scraping and a total of five json datasets were obtained. 

**Files Obtained:**
1) business.json
2) checkin.json
3) review.json
4) tips.json 
5) user.json

This restaurant recommendation system leverages two  primary datasets namely the **business.json** and the **review.json** as the had relevant information required to develop the recmmendation system.

**1) Business Dataset**

This dataset includes essential information about a variety of restaurants. The columns found in this dataset include:
 
 * business_id	
 * name	
 * address	
 * city	
 * state	
 * postal_code	
 * latitude	
 * longitude	
 * stars	
 * review_count	
 * is_open 	
 * attributes	
 * categories	
 * hours

**2) Review Dataset**

This dataset provides insights into user preferences and their dining experiences. The columns in this dataset include:

* user_id	
* business_id	
* stars	
* date	
* text	
* useful	
* funny	
* cool


In [13]:
# Import the necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json 
import csv
import folium
import string
import  collections
import  pickle

from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from surprise import Reader , Dataset
from tabulate import tabulate
from surprise.prediction_algorithms import SVD
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic, KNNBaseline
from surprise.model_selection import GridSearchCV
from surprise.model_selection import cross_validate
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer
from nltk.tokenize import RegexpTokenizer
from tensorflow.keras import models ,layers, optimizers , losses, regularizers, metrics
from wordcloud import WordCloud

plt.style.use("fivethirtyeight")
%matplotlib inline




As the data is too large to be utilised or even loaded into the notebook, we are only going to extract a section of the dataset, 54,380 rows from both datasets.

##### **Loading the datasets**


The reveiw and business datasets are loaded into the notebook in that order and the obtained rows are moved to CSV file. Next we preview and explore the columns using functions such as **.head()** and **.info()**. After confirming our data we megre the datasets and name the new dataframe restaurant_df. We then move on to the next step which is data preparation

In [50]:

# Chunk size defines the number of lines to process at one time to manage memory usage efficiently.
chunk_size= 1000  

# Open the Yelp dataset JSON file
with open( 'C:\\Users\\Administrator\\Downloads\\yelp_dataset\\yelp_academic_dataset_review.json', 'r') as f:
    with open('yelp_review_sample.csv', 'w', newline='') as csvfile:
        fieldnames = ['review_id', 'user_id', 'business_id', 'stars', 'date', 'text', 'useful', 'funny', 'cool']
        writer= csv.DictWriter(csvfile, fieldnames=fieldnames) # Create a DictWriter to write data to CSV in dict format
        writer.writeheader()
        # Use for loops
        count= 0
        for line in f:
            data = json.loads(line)  # pass lines from the json file into a dict
            writer.writerow(data)  # Write the dictionary data to the CSV file as a new row
            count += 1
            if count >= 54380: # Stop running after hitting the indicated number of rows
                break # Exit the loop

# Load the generated csv file into a pd dataframe
review_df= pd.read_csv( 'yelp_review_sample.csv')

# preview the first five rows of the DataFrame
review_df.head()

Unnamed: 0,review_id,user_id,business_id,stars,date,text,useful,funny,cool
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3.0,2018-07-07 22:09:11,"If you decide to eat here, just be aware it is...",0,0,0
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5.0,2012-01-03 15:28:18,I've taken a lot of spin classes over the year...,1,0,1
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3.0,2014-02-05 20:30:30,Family diner. Had the buffet. Eclectic assortm...,0,0,0
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5.0,2015-01-04 00:01:03,"Wow! Yummy, different, delicious. Our favo...",1,0,1
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4.0,2017-01-14 20:54:15,Cute interior and owner (?) gave us tour of up...,1,0,1


In [51]:
# initialize an empty list for data storage
json_data= []

# Define the number of lines to read 
lines_to_read= 54380

# Open the JSON file and read the specified number of lines
with open(r'C:\Users\Administrator\Downloads\yelp_academic_dataset_business.json', 'r', encoding='utf-8') as file:
    for i, line in enumerate(file):
        if i>= lines_to_read:
            break # Stop iterating once it achieves the required lines

         # Load each line as a JSON object
        json_object= json.loads(line)
        json_data.append(json_object)
# Convert the json_data list into a pandas DataFrame
business_df= pd.DataFrame(json_data)

# Display the first few rows of the DataFrame
business_df.head(5)

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,101 Walnut St,Green Lane,PA,18054,40.338183,-75.471659,4.5,13,1,"{'BusinessAcceptsCreditCards': 'True', 'Wheelc...","Brewpubs, Breweries, Food","{'Wednesday': '14:0-22:0', 'Thursday': '16:0-2..."


In [47]:
# Explore the business dataset's information
business_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   business_id   5 non-null      object 
 1   name          5 non-null      object 
 2   address       5 non-null      object 
 3   city          5 non-null      object 
 4   state         5 non-null      object 
 5   postal_code   5 non-null      object 
 6   latitude      5 non-null      float64
 7   longitude     5 non-null      float64
 8   stars         5 non-null      float64
 9   review_count  5 non-null      int64  
 10  is_open       5 non-null      int64  
 11  attributes    5 non-null      object 
 12  categories    5 non-null      object 
 13  hours         4 non-null      object 
dtypes: float64(3), int64(2), object(9)
memory usage: 692.0+ bytes


In [52]:
# Explore the business dataset's information
review_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54380 entries, 0 to 54379
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   review_id    54380 non-null  object 
 1   user_id      54380 non-null  object 
 2   business_id  54380 non-null  object 
 3   stars        54380 non-null  float64
 4   date         54380 non-null  object 
 5   text         54380 non-null  object 
 6   useful       54380 non-null  int64  
 7   funny        54380 non-null  int64  
 8   cool         54380 non-null  int64  
dtypes: float64(1), int64(3), object(5)
memory usage: 3.7+ MB


In [56]:
# Merge the two datasets using the business_id primary key
restaurant_df =pd.merge( left = review_df , right = business_df, how ='left', on ='business_id')

# previewing the new merged dataset
restaurant_df.head()


Unnamed: 0,review_id,user_id,business_id,stars_x,date,text,useful,funny,cool,name,...,state,postal_code,latitude,longitude,stars_y,review_count,is_open,attributes,categories,hours
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3.0,2018-07-07 22:09:11,"If you decide to eat here, just be aware it is...",0,0,0,Turning Point of North Wales,...,PA,19454,40.210196,-75.223639,3.0,169,1,"{'NoiseLevel': 'u'average'', 'HasTV': 'False',...","Restaurants, Breakfast & Brunch, Food, Juice B...","{'Monday': '7:30-15:0', 'Tuesday': '7:30-15:0'..."
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5.0,2012-01-03 15:28:18,I've taken a lot of spin classes over the year...,1,0,1,Body Cycle Spinning Studio,...,PA,19119,39.952103,-75.172753,5.0,144,0,"{'BusinessAcceptsCreditCards': 'True', 'GoodFo...","Active Life, Cycling Classes, Trainers, Gyms, ...","{'Monday': '6:30-20:30', 'Tuesday': '6:30-20:3..."
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3.0,2014-02-05 20:30:30,Family diner. Had the buffet. Eclectic assortm...,0,0,0,Kettle Restaurant,...,AZ,85713,32.207233,-110.980864,3.5,47,1,"{'RestaurantsReservations': 'True', 'BusinessP...","Restaurants, Breakfast & Brunch",
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5.0,2015-01-04 00:01:03,"Wow! Yummy, different, delicious. Our favo...",1,0,1,Zaika,...,PA,19114,40.079848,-75.02508,4.0,181,1,"{'Caters': 'True', 'Ambience': '{'romantic': F...","Halal, Pakistani, Restaurants, Indian","{'Tuesday': '11:0-21:0', 'Wednesday': '11:0-21..."
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4.0,2017-01-14 20:54:15,Cute interior and owner (?) gave us tour of up...,1,0,1,Melt,...,LA,70119,29.962102,-90.087958,4.0,32,0,"{'BusinessParking': '{'garage': False, 'street...","Sandwiches, Beer, Wine & Spirits, Bars, Food, ...","{'Monday': '0:0-0:0', 'Friday': '11:0-17:0', '..."


In [57]:
# Explore the new merged dataset's information
restaurant_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54380 entries, 0 to 54379
Data columns (total 22 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   review_id     54380 non-null  object 
 1   user_id       54380 non-null  object 
 2   business_id   54380 non-null  object 
 3   stars_x       54380 non-null  float64
 4   date          54380 non-null  object 
 5   text          54380 non-null  object 
 6   useful        54380 non-null  int64  
 7   funny         54380 non-null  int64  
 8   cool          54380 non-null  int64  
 9   name          54380 non-null  object 
 10  address       54380 non-null  object 
 11  city          54380 non-null  object 
 12  state         54380 non-null  object 
 13  postal_code   54380 non-null  object 
 14  latitude      54380 non-null  float64
 15  longitude     54380 non-null  float64
 16  stars_y       54380 non-null  float64
 17  review_count  54380 non-null  int64  
 18  is_open       54380 non-nu

In [58]:
# Get a summary of the numerical columns
restaurant_df.describe()

Unnamed: 0,stars_x,useful,funny,cool,latitude,longitude,stars_y,review_count,is_open
count,54380.0,54380.0,54380.0,54380.0,54380.0,54380.0,54380.0,54380.0,54380.0
mean,3.84498,0.890438,0.253567,0.346396,36.050556,-89.00535,3.769796,389.177014,0.766532
std,1.352256,1.866532,1.035998,1.073067,5.289909,14.446695,0.67134,628.925711,0.423041
min,1.0,0.0,0.0,0.0,27.5843,-120.026076,1.0,5.0,0.0
25%,3.0,0.0,0.0,0.0,29.967159,-90.239235,3.5,61.0,1.0
50%,4.0,0.0,0.0,0.0,38.612534,-86.252569,4.0,170.0,1.0
75%,5.0,1.0,0.0,0.0,39.946685,-75.325252,4.0,430.0,1.0
max,5.0,91.0,98.0,49.0,53.644501,-74.658572,5.0,4554.0,1.0
