# Exploratory Data Analysis


Check for data types, null, incomplete, and outlier values.

In [1]:
import pandas as pd 

df = pd.read_csv('/Volumes/FLASHDRIVE/restaurant_data_6000.csv')

In [2]:
df.head()


Unnamed: 0.1,Unnamed: 0,business_status,name,place_id,price_level,rating,user_ratings_total
0,0,OPERATIONAL,La Tonalteca - Scranton PA,ChIJJ_Dvlq_ZxIkRkmPo2AAn-78,2.0,4.3,1043.0
1,1,OPERATIONAL,Applebee's Grill + Bar,ChIJeeoml6_ZxIkR4gpya51wqFQ,2.0,4.2,1027.0
2,2,OPERATIONAL,Perkins Restaurant & Bakery,ChIJx-L7LrvZxIkRErNgliMbcPM,2.0,4.1,1133.0
3,3,OPERATIONAL,Sun Gourmet,ChIJ25Eel6_ZxIkRwFi18_6mjwc,,4.3,51.0
4,4,OPERATIONAL,Buffalo Wild Wings,ChIJJ_JfmKXZxIkRdHRRNNTMotQ,2.0,3.7,792.0


In [3]:
df.describe()

Unnamed: 0.1,Unnamed: 0,price_level,rating,user_ratings_total
count,27663.0,20563.0,26765.0,26765.0
mean,13831.0,1.524583,4.193241,363.376686
std,7985.764585,0.580556,0.461453,592.919213
min,0.0,1.0,1.0,1.0
25%,6915.5,1.0,4.0,82.0
50%,13831.0,1.0,4.3,191.0
75%,20746.5,2.0,4.5,423.0
max,27662.0,4.0,5.0,17006.0


In [4]:
df = df.drop(columns='Unnamed: 0')

In [5]:
df.isna().sum()

business_status         20
name                     0
place_id                 0
price_level           7100
rating                 898
user_ratings_total     898
dtype: int64

In [6]:
df.head()

Unnamed: 0,business_status,name,place_id,price_level,rating,user_ratings_total
0,OPERATIONAL,La Tonalteca - Scranton PA,ChIJJ_Dvlq_ZxIkRkmPo2AAn-78,2.0,4.3,1043.0
1,OPERATIONAL,Applebee's Grill + Bar,ChIJeeoml6_ZxIkR4gpya51wqFQ,2.0,4.2,1027.0
2,OPERATIONAL,Perkins Restaurant & Bakery,ChIJx-L7LrvZxIkRErNgliMbcPM,2.0,4.1,1133.0
3,OPERATIONAL,Sun Gourmet,ChIJ25Eel6_ZxIkRwFi18_6mjwc,,4.3,51.0
4,OPERATIONAL,Buffalo Wild Wings,ChIJJ_JfmKXZxIkRdHRRNNTMotQ,2.0,3.7,792.0


In [7]:
df.sort_values(by=['user_ratings_total'],ascending=False)

Unnamed: 0,business_status,name,place_id,price_level,rating,user_ratings_total
6239,OPERATIONAL,Westfield Garden State Plaza,ChIJ51pKtVL6wokRcgTPdEqH1ME,,4.5,17006.0
6780,OPERATIONAL,Westfield Garden State Plaza,ChIJ51pKtVL6wokRcgTPdEqH1ME,,4.5,17006.0
11427,OPERATIONAL,Hard Rock Cafe,ChIJh3tl5lRYwokRtY1QuaZADu0,2.0,4.3,16687.0
16154,OPERATIONAL,Kings Plaza Shopping Center,ChIJ30o8q0NDwokRBrGdmsRwc_s,,4.3,16006.0
14101,OPERATIONAL,Resorts World Casino New York City,ChIJIdDbQENnwokRbCypkvjzglk,,3.7,12364.0
...,...,...,...,...,...,...
27508,OPERATIONAL,SUBJECT PROPERTY,ChIJ3yld_HexxokRkruY5QzUR90,,,
27607,OPERATIONAL,Maite Mendoza,ChIJVXb_dXBNwYkRUxgK8Qu7vrM,,,
27620,OPERATIONAL,Cool Hummus,ChIJcYvZYrNNwYkR80pmWcINUi4,,,
27622,OPERATIONAL,King's China,ChIJpWLtIe5NwYkRWyJIro3ltHQ,,,


Some establishments such as Plazas, Shopping Centers, Resorts, and etc have been categorized by Google's API
as a restaurant. Either include an extra parameter within Google Places API to ensure the establishment's sole
purpose is food reatil. User's cannot make a decision without understanding the price level of the restaurant so,
deleting all establishments with Nan values within the price_level column should filter out non-restaurant establishments.

In [8]:
df = df.dropna(subset=['price_level'])

In [9]:
df.describe()

Unnamed: 0,price_level,rating,user_ratings_total
count,20563.0,20528.0,20528.0
mean,1.524583,4.169242,434.964244
std,0.580556,0.400526,576.144997
min,1.0,1.0,1.0
25%,1.0,4.0,130.0
50%,1.0,4.2,256.0
75%,2.0,4.4,516.25
max,4.0,5.0,16687.0


In [10]:
df.isna().sum()

business_status        7
name                   0
place_id               0
price_level            0
rating                35
user_ratings_total    35
dtype: int64

There is a possibility to remove restaurants without a rating or a user rating total but, these establishments
may just not have digital presence. If so, this restaurant may only use word of mouth for marketing. This could lead
to a potentially successful or disastrous business.

In [11]:
outliers = df.loc[df['rating'].isna()]
outliers.sort_values(by=['price_level'],ascending=False)

Unnamed: 0,business_status,name,place_id,price_level,rating,user_ratings_total
12460,OPERATIONAL,Market Plate,ChIJc_1NKbWtw4kRIhunlTajisg,3.0,,
25603,OPERATIONAL,Market Burger,ChIJybEMSe2HwYkRmyRkyKjaiAQ,3.0,,
5362,OPERATIONAL,Market Plate,ChIJl6qqx6nkwokRnLqDrtnfz-Y,3.0,,
9721,OPERATIONAL,West Orange Sandwich,ChIJ82Dq0-iqw4kRN9YCFvTVqKs,3.0,,
9730,OPERATIONAL,West Orange Sandwich,ChIJ82Dq0-iqw4kRN9YCFvTVqKs,3.0,,
12974,OPERATIONAL,Wing Street,ChIJ2c1x8JRtxIkRZ3Un0eZFRHY,2.0,,
22496,OPERATIONAL,Olive Garden Italian Restaurant,ChIJxS-WiXQvwokR90pgfwLf3rs,2.0,,
23386,OPERATIONAL,Vault Taproom,ChIJOziEo-ZXwYkRe5bEtWidHbI,2.0,,
23406,OPERATIONAL,Vault Taproom,ChIJOziEo-ZXwYkRe5bEtWidHbI,2.0,,
2825,OPERATIONAL,Mumbai Express,ChIJCzSrAjTDwokRNgOkAjYC05E,2.0,,
