## Wake County - Restaurant Food Inspections Analysis

In [1]:
# import pandas, numpy, matplotlib, seaborn 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# importing the requests library
import requests

### Resources
 1. [Restaurants in Wake County Data Info](https://www.arcgis.com/home/item.html?id=124c2187da8c41c59bde04fa67eb2872)
 2. [Wake County Open Data](https://data-wake.opendata.arcgis.com/search?tags=restaurants)
 3. [Food Inspection Violations Data Info](https://data.wakegov.com/datasets/Wake::food-inspection-violations/about)
 4. [Wake County Yelp Initiative](https://ash.harvard.edu/news/wake-county-yelp-initiative)
 5. [Yelp LIVES data](https://www.yelp.com/healthscores/feeds)

In [3]:
# pip install ipynb if this fails
# the first time you run this, it will execute these, but run it again if you'd like
# warning: there's an issue where the arguments won't work so just use no-arg functions to pull
from ipynb.fs.full.RestaurantInspectionsData import getFoodInspectionsDf, preprocess_inspections
from ipynb.fs.full.RestaurantsData import getRestaurantsDf, preprocess_restaurants, preprocess_restaurants_yelp
from ipynb.fs.full.RestaurantViolationsData import getViolationsDf, preprocess_violations
from ipynb.fs.full.WeatherData import getWeatherData, preprocess_weatherdata

Using pre-fetched inspections data
(20956, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20956 entries, 0 to 20955
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   OBJECTID     20956 non-null  int64  
 1   HSISID       20956 non-null  int64  
 2   SCORE        20956 non-null  float64
 3   DATE         20956 non-null  object 
 4   DESCRIPTION  16880 non-null  object 
 5   TYPE         20956 non-null  object 
 6   INSPECTOR    20956 non-null  object 
 7   PERMITID     20956 non-null  int64  
dtypes: float64(1), int64(3), object(4)
memory usage: 1.3+ MB


None

{'OBJECTID': 20956,
 'HSISID': 3875,
 'SCORE': 48,
 'DATE': 782,
 'DESCRIPTION': 6030,
 'TYPE': 2,
 'INSPECTOR': 47,
 'PERMITID': 3875}

Using pre-fetched restaurants data
restaurants df shape: (3641, 15)


Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,ADDRESS2,CITY,STATE,POSTALCODE,PHONENUMBER,RESTAURANTOPENDATE,FACILITYTYPE,PERMITID,X,Y,GEOCODESTATUS
0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,Ste 167,RALEIGH,NC,27613,(919) 676-9968,2013-08-14T04:00:00Z,Restaurant,2,-78.725938,35.908783,M
1,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,,RALEIGH,NC,27609,(919) 890-5225,2021-05-13T04:00:00Z,Restaurant,22,-78.622635,35.866275,M
2,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,STE 1509,RALEIGH,NC,27601,(919) 707-8060,2012-04-12T04:00:00Z,Restaurant,26,-78.639431,35.782205,M
3,1891533,4092016161,HIBACHI 88,3416 POOLE RD,,RALEIGH,NC,27610,(919) 231-1688,2012-04-18T04:00:00Z,Restaurant,28,-78.579533,35.767246,M
4,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,,CARY,NC,27511,(919) 459-2670,2016-03-11T05:00:00Z,Restaurant,29,-78.778021,35.787986,M



Display Raw Data Info------------------------------

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3641 entries, 0 to 3640
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   OBJECTID            3641 non-null   int64  
 1   HSISID              3641 non-null   int64  
 2   NAME                3641 non-null   object 
 3   ADDRESS1            3641 non-null   object 
 4   ADDRESS2            485 non-null    object 
 5   CITY                3641 non-null   object 
 6   STATE               3641 non-null   object 
 7   POSTALCODE          3641 non-null   object 
 8   PHONENUMBER         3487 non-null   object 
 9   RESTAURANTOPENDATE  3641 non-null   object 
 10  FACILITYTYPE        3641 non-null   object 
 11  PERMITID            3641 non-null   int64  
 12  X                   3641 non-null   float64
 13  Y                   3641 non-null   float64
 14  GEOCODESTATUS       3641 non-null   object 
dtypes

None


---------------------------------------------------



{'OBJECTID': 3641,
 'HSISID': 3641,
 'NAME': 3507,
 'ADDRESS1': 3164,
 'ADDRESS2': 298,
 'CITY': 45,
 'STATE': 1,
 'POSTALCODE': 565,
 'PHONENUMBER': 3127,
 'RESTAURANTOPENDATE': 2250,
 'FACILITYTYPE': 10,
 'PERMITID': 3641,
 'X': 2154,
 'Y': 2154,
 'GEOCODESTATUS': 3}


Preprocessing--------------------------------------

Dropping columns with more than 25% missing values: Index(['ADDRESS2'], dtype='object')
OBJECTID              0.0
HSISID                0.0
NAME                  0.0
ADDRESS1              0.0
CITY                  0.0
POSTALCODE            0.0
RESTAURANTOPENDATE    0.0
PERMITID              0.0
X                     0.0
Y                     0.0
GEOCODESTATUS         0.0
dtype: float64
(2385, 11)

Display--------------------------------------------



Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,CITY,POSTALCODE,RESTAURANTOPENDATE,PERMITID,X,Y,GEOCODESTATUS
0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,RALEIGH,27613,2013-08-14,2,-78.725938,35.908783,M
1,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,RALEIGH,27609,2021-05-13,22,-78.622635,35.866275,M
2,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,RALEIGH,27601,2012-04-12,26,-78.639431,35.782205,M
3,1891533,4092016161,HIBACHI 88,3416 POOLE RD,RALEIGH,27610,2012-04-18,28,-78.579533,35.767246,M
4,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,CARY,27511,2016-03-11,29,-78.778021,35.787986,M


Fetching restaurant violations data...
violations df shape: (1685520, 18)
Done


Unnamed: 0,OBJECTID,HSISID,INSPECTDATE,CATEGORY,STATECODE,CRITICAL,QUESTIONNO,VIOLATIONCODE,SEVERITY,SHORTDESC,INSPECTEDBY,COMMENTS,POINTVALUE,OBSERVATIONTYPE,VIOLATIONTYPE,CDCRISKFACTOR,CDCDATAITEM,PERMITID
0,188572555,4092015776,2012-12-14T05:00:00Z,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,Christy Klaus,3-201.11-Packaged food shall be labeled as spe...,1.0,Out,VR,,Food shall be obtained from sources that compl...,14516
1,188572556,4092040137,2013-03-18T04:00:00Z,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,Lisa McCoy,Chicken kabobs are not approved to be on this ...,1.0,Out,,,Food shall be obtained from sources that compl...,20186
2,188572557,4092015740,2013-03-19T04:00:00Z,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,Karla Crowder,3-201.11 Provide documentation (receipts) for ...,0.0,In,,,Food shall be obtained from sources that compl...,11367
3,188572558,4092016206,2013-03-27T04:00:00Z,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,Melissa Harrison,Pf - 3-201.11 - Habash Shawerma Spices from Ha...,0.0,Out,CDI,,Food shall be obtained from sources that compl...,577
4,188572559,4092014578,2013-04-23T04:00:00Z,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,Melissa Harrison,Pf - 3-201.11 - Packaged frozen banana popsicl...,0.0,Out,CDI,,Food shall be obtained from sources that compl...,2036


(1685520, 18)


OBJECTID           0.000000
HSISID             0.000000
INSPECTDATE        0.000000
CATEGORY           0.000000
STATECODE          0.000000
CRITICAL           0.046407
QUESTIONNO         0.000000
VIOLATIONCODE      0.000000
SEVERITY           0.046407
SHORTDESC          0.000000
INSPECTEDBY        0.000000
COMMENTS           0.000927
POINTVALUE         0.000000
OBSERVATIONTYPE    0.000000
VIOLATIONTYPE      0.422790
CDCDATAITEM        0.014618
PERMITID           0.000000
dtype: float64

(127309, 17)


Unnamed: 0,OBJECTID,HSISID,INSPECTDATE,CATEGORY,STATECODE,CRITICAL,QUESTIONNO,VIOLATIONCODE,SEVERITY,SHORTDESC,INSPECTEDBY,COMMENTS,POINTVALUE,OBSERVATIONTYPE,VIOLATIONTYPE,CDCDATAITEM,PERMITID
15,188572810,4092017322,2020-07-10,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,Lauren Harden,3-201.11; PIC states that bakery items in disp...,0.0,Out,,Food shall be obtained from sources that compl...,41
26,188572821,4092110158,2019-02-20,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Kaitlyn Yow,3-202.11;,0.0,N/O,,Refrigerated food shall be at a temperature of...,11426
27,188572822,4092014259,2019-09-23,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Laura McNeill,3-202.11; upon arrival the manager had receive...,0.0,Out,,Refrigerated food shall be at a temperature of...,11599
28,188572823,4092014045,2020-10-13,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Ursula Gadomski,3-202.11; Priority; Shredded cabbage was recei...,1.0,Out,CDI,Refrigerated food shall be at a temperature of...,12939
29,188572824,4092050030,2021-01-21,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Laura McNeill,"3-202.11; Ground beef wrap, rice bowl, and min...",1.0,Out,CDI,Refrigerated food shall be at a temperature of...,19


{'OBJECTID': 127309,
 'HSISID': 4534,
 'INSPECTDATE': 793,
 'CATEGORY': 25,
 'STATECODE': 19,
 'CRITICAL': 3,
 'QUESTIONNO': 56,
 'VIOLATIONCODE': 323,
 'SEVERITY': 4,
 'SHORTDESC': 92,
 'INSPECTEDBY': 51,
 'COMMENTS': 117913,
 'POINTVALUE': 7,
 'OBSERVATIONTYPE': 6,
 'VIOLATIONTYPE': 4,
 'CDCDATAITEM': 273,
 'PERMITID': 4535}

[{'elevation': 113.1, 'mindate': '2007-08-28', 'maxdate': '2021-11-04', 'latitude': 35.969613, 'name': 'RALEIGH 10.3 N, NC US', 'datacoverage': 0.7617, 'id': 'GHCND:US1NCWK0001', 'elevationUnit': 'METERS', 'longitude': -78.688719}, {'elevation': 110, 'mindate': '2007-09-18', 'maxdate': '2021-11-04', 'latitude': 35.805725, 'name': 'RALEIGH 1.5 SW, NC US', 'datacoverage': 0.9372, 'id': 'GHCND:US1NCWK0002', 'elevationUnit': 'METERS', 'longitude': -78.675888}, {'elevation': 73.2, 'mindate': '2007-08-25', 'maxdate': '2021-10-30', 'latitude': 35.814267, 'name': 'RALEIGH 6.2 E, NC US', 'datacoverage': 0.9635, 'id': 'GHCND:US1NCWK0003', 'elevationUnit': 'METERS', 'longitude': -78.547817}, {'elevation': 125, 'mindate': '2007-08-30', 'maxdate': '2021-11-03', 'latitude': 35.71335, 'name': 'APEX 3.4 ESE, NC US', 'datacoverage': 0.8807, 'id': 'GHCND:US1NCWK0004', 'elevationUnit': 'METERS', 'longitude': -78.7854}, {'elevation': 123.1, 'mindate': '2007-09-04', 'maxdate': '2021-11-04', 'latitude': 35.

[{'date': '2019-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 66.0}, {'date': '2019-01-02T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 52.0}, {'date': '2019-01-03T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 51.0}, {'date': '2019-01-04T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 50.0}, {'date': '2019-01-05T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 54.0}, {'date': '2019-01-06T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 52.0}, {'date': '2019-01-07T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 46.0}, {'date': '2019-01-08T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 53.0}, {'date': '2019-

[{'date': '2020-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 45.0}, {'date': '2020-01-02T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 45.0}, {'date': '2020-01-03T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 50.0}, {'date': '2020-01-04T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 58.0}, {'date': '2020-01-05T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 45.0}, {'date': '2020-01-06T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 44.0}, {'date': '2020-01-07T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 42.0}, {'date': '2020-01-08T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 42.0}, {'date': '2020-

[{'date': '2021-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 47.0}, {'date': '2021-01-02T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 47.0}, {'date': '2021-01-03T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 51.0}, {'date': '2021-01-04T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 44.0}, {'date': '2021-01-05T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 37.0}, {'date': '2021-01-06T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 42.0}, {'date': '2021-01-07T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 37.0}, {'date': '2021-01-08T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00013722', 'attributes': 'H,,S,', 'value': 38.0}, {'date': '2021-

datatype
TAVG    0.0
dtype: float64

(1025, 1)


datatype,TAVG
date,Unnamed: 1_level_1
2019-01-01,66.0
2019-01-02,52.0
2019-01-03,51.0
2019-01-04,50.0
2019-01-05,54.0


Using pre-fetched weather data
weather df shape: (1025, 5)


datatype,TAVG
date,Unnamed: 1_level_1
2019-01-01,66.0
2019-01-02,52.0
2019-01-03,51.0
2019-01-04,50.0
2019-01-05,54.0


datatype,TAVG
date,Unnamed: 1_level_1
2021-10-17,58.0
2021-10-18,56.0
2021-10-19,57.0
2021-10-20,60.0
2021-10-21,62.0


## Fetch Inspections

In [41]:
food_inspections_raw = getFoodInspectionsDf()
inspections = preprocess_inspections(food_inspections_raw.copy())
inspections.head()

Unnamed: 0,OBJECTID,HSISID,SCORE,DATE,DESCRIPTION,TYPE,INSPECTOR,PERMITID
0,21950255,4092017542,93.0,2019-04-04,"*NOTICE* AS OF JANUARY 1, 2019, THE NC FOOD CO...",Inspection,Joanne Rutkofske,33
1,21950256,4092017542,93.5,2019-10-07,Follow-Up: 10/17/2019,Inspection,Naterra McQueen,33
2,21950257,4092017542,92.5,2020-05-19,"*NOTICE* AS OF JANUARY 1, 2019, THE NC FOOD CO...",Inspection,Naterra McQueen,33
3,21950258,4092017542,94.0,2020-10-09,PIC cannot sign due to COVID-19 concerns.,Inspection,Nicole Millard,33
4,21950259,4092017542,94.0,2021-03-24,PIC cannot sign due to COVID-19 concerns.,Inspection,Nicole Millard,33


## Fetch Restaurants

In [42]:
restaurants_raw = getRestaurantsDf()
restaurants = preprocess_restaurants(restaurants_raw.copy())
restaurants.head()

Using pre-fetched restaurants data
restaurants df shape: (3641, 15)
Dropping columns with more than 25% missing values: Index(['ADDRESS2'], dtype='object')
OBJECTID              0.0
HSISID                0.0
NAME                  0.0
ADDRESS1              0.0
CITY                  0.0
POSTALCODE            0.0
RESTAURANTOPENDATE    0.0
PERMITID              0.0
X                     0.0
Y                     0.0
GEOCODESTATUS         0.0
dtype: float64


Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,CITY,POSTALCODE,RESTAURANTOPENDATE,PERMITID,X,Y,GEOCODESTATUS
0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,RALEIGH,27613,2013-08-14,2,-78.725938,35.908783,M
1,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,RALEIGH,27609,2021-05-13,22,-78.622635,35.866275,M
2,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,RALEIGH,27601,2012-04-12,26,-78.639431,35.782205,M
3,1891533,4092016161,HIBACHI 88,3416 POOLE RD,RALEIGH,27610,2012-04-18,28,-78.579533,35.767246,M
4,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,CARY,27511,2016-03-11,29,-78.778021,35.787986,M


## Fetch violations

In [43]:
violations_raw = getViolationsDf()
violations = preprocess_violations(violations_raw.copy())
violations.head()

Using pre-fetched violations data
violations df shape: (1681260, 18)


Unnamed: 0,OBJECTID,HSISID,INSPECTDATE,CATEGORY,STATECODE,CRITICAL,QUESTIONNO,VIOLATIONCODE,SEVERITY,SHORTDESC,INSPECTEDBY,COMMENTS,POINTVALUE,OBSERVATIONTYPE,VIOLATIONTYPE,CDCDATAITEM,PERMITID
26,186468705,4092025252,2020-01-24,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,David Adcock,3-201.11; Some of the lamb was purchased from ...,1.0,Out,,Food shall be obtained from sources that compl...,18067
27,186468706,4092030492,2021-06-14,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,David Adcock,3-201.11;(B)Employees stated that when they ne...,1.0,Out,,Food shall be obtained from sources that compl...,15779
36,186468715,4092110158,2019-02-20,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Kaitlyn Yow,3-202.11;,0.0,N/O,,Refrigerated food shall be at a temperature of...,11926
37,186468716,4092010218,2019-05-24,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Jackson Hooton,3-202.11; Priority; Box of diced tomatoes was ...,0.0,Out,CDI,Refrigerated food shall be at a temperature of...,4038
38,186468717,4092014259,2019-09-23,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Laura McNeill,3-202.11; upon arrival the manager had receive...,0.0,Out,,Refrigerated food shall be at a temperature of...,14777


## Fetch weather data

In [52]:
weatherdata_raw = getWeatherData()
weatherdata = preprocess_weatherdata(weatherdata_raw.copy())
weatherdata.head()

Using pre-fetched weather data
weather df shape: (1025, 5)


datatype,TAVG
date,Unnamed: 1_level_1
2019-01-01,66.0
2019-01-02,52.0
2019-01-03,51.0
2019-01-04,50.0
2019-01-05,54.0


## Fetch Yelp Ratings Data

In [4]:
#get list of restaurants with all relevant information plus phone number post-preprocessing
restaurants_yelp_raw = getRestaurantsDf()
restaurants_yelp = preprocess_restaurants_yelp(restaurants_yelp_raw.copy())
restaurants_yelp.head()


Using pre-fetched restaurants data
restaurants df shape: (3641, 15)
Dropping columns with more than 25% missing values: Index(['ADDRESS2'], dtype='object')
OBJECTID              0.000000
HSISID                0.000000
NAME                  0.000000
ADDRESS1              0.000000
CITY                  0.000000
POSTALCODE            0.000000
PHONENUMBER           0.038155
RESTAURANTOPENDATE    0.000000
PERMITID              0.000000
X                     0.000000
Y                     0.000000
GEOCODESTATUS         0.000000
dtype: float64


Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,CITY,POSTALCODE,PHONENUMBER,RESTAURANTOPENDATE,PERMITID,X,Y,GEOCODESTATUS
0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,RALEIGH,27613,(919) 676-9968,2013-08-14,2,-78.725938,35.908783,M
1,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,RALEIGH,27609,(919) 890-5225,2021-05-13,22,-78.622635,35.866275,M
2,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,RALEIGH,27601,(919) 707-8060,2012-04-12,26,-78.639431,35.782205,M
3,1891533,4092016161,HIBACHI 88,3416 POOLE RD,RALEIGH,27610,(919) 231-1688,2012-04-18,28,-78.579533,35.767246,M
4,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,CARY,27511,(919) 459-2670,2016-03-11,29,-78.778021,35.787986,M


In [5]:
#write to csv
restaurants_yelp.to_csv('restaurants_yelp.csv')

In [6]:
#read in processed yelp data
yelpmatch_phone = pd.read_csv('yelpmatch_phone.csv')

In [7]:
yelpmatch_phone.head()

Unnamed: 0.1,Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,CITY,POSTALCODE,PHONENUMBER,RESTAURANTOPENDATE,PERMITID,X,Y,GEOCODESTATUS,id,review_count,categories,rating,price
0,0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,RALEIGH,27613,(919) 676-9968,2013-08-14,2,-78.725938,35.908783,M,RjELMSrh2DuTBJQ4YpzXUA,63.0,"[{'alias': 'chinese', 'title': 'Chinese'}]",3.5,$
1,4,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,RALEIGH,27609,(919) 890-5225,2021-05-13,22,-78.622635,35.866275,M,wDZG-Ry6IcC_QITBLBPHxQ,22.0,"[{'alias': 'newamerican', 'title': 'American (...",4.5,
2,9,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,RALEIGH,27601,(919) 707-8060,2012-04-12,26,-78.639431,35.782205,M,-qCrGWYePySXmcngRhal4Q,89.0,"[{'alias': 'cafes', 'title': 'Cafes'}, {'alias...",4.0,$$
3,15,1891533,4092016161,HIBACHI 88,3416 POOLE RD,RALEIGH,27610,(919) 231-1688,2012-04-18,28,-78.579533,35.767246,M,21FAnridQkQCJMM_PfyfcA,46.0,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",3.5,$
4,21,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,CARY,27511,(919) 459-2670,2016-03-11,29,-78.778021,35.787986,M,eG8mLFHm6BQ9GHsemO-ShQ,179.0,"[{'alias': 'beergardens', 'title': 'Beer Garde...",4.0,$$


## Next Steps ( we have T-minus 2 weeks!!!! !!!!!! FREAK OUTTTTT !!!!) 
0. Pull the police incidents/crime data and possibly cencus tracked income by location - Hearsch & Shyamal 
1. Make sure yelp data is sourced in this main notebook (minimal datapoints: ratings, dollar signs, type/cuisine, review metadata, other features at Ms. Park's discretion)
2. Clean & Validate the data as part of Data Prep, EDA. Join tables by inspection. We want historical data per inspection and then we want to predict the risk scores for restaurants in high risk for future inspections. Note that although we have data around inspections by date, we don't really want to do a time series forecasting,bc time series forecasting sucks!
3. Deal with missing values and encode variables 
4. Feature engineering 
5. Baseline model
6. More complicated model
7. Datasheets for datasets (ask Jon about this in next class if we need datasheet for every table or for every source) - Christine 
8. Hearsch - Ethical checklist 
9. Visualizations and story telling!
10. Get started on a slideshow (FUN PART)