## Wake County - Restaurant Food Inspections Analysis

In [51]:
# import pandas, numpy, matplotlib, seaborn 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# importing the requests library
import requests

In [52]:
pip install ipynb

Note: you may need to restart the kernel to use updated packages.


### Resources
 1. [Restaurants in Wake County Data Info](https://www.arcgis.com/home/item.html?id=124c2187da8c41c59bde04fa67eb2872)
 2. [Wake County Open Data](https://data-wake.opendata.arcgis.com/search?tags=restaurants)
 3. [Food Inspection Violations Data Info](https://data.wakegov.com/datasets/Wake::food-inspection-violations/about)
 4. [Wake County Yelp Initiative](https://ash.harvard.edu/news/wake-county-yelp-initiative)
 5. [Yelp LIVES data](https://www.yelp.com/healthscores/feeds)

In [53]:
# the first time you run this, it will execute these, but run it again if you'd like
# warning: there's an issue where the arguments won't work so just use no-arg functions to pull
from ipynb.fs.full.RestaurantInspectionsData import getFoodInspectionsDf, preprocess_inspections
from ipynb.fs.full.WeatherData import getWeatherData, preprocess_weatherdata
from ipynb.fs.full.YelpData import fetchYelpDataByPhone, preprocess_yelpdata
from ipynb.fs.full.CrimeData import getCrimeDataDf, preprocess_crimedata

## Fetch Inspections

In [54]:
food_inspections_raw = getFoodInspectionsDf()
inspections = preprocess_inspections(food_inspections_raw.copy())
inspections.head()

Using pre-fetched inspections data


Unnamed: 0,OBJECTID,HSISID,SCORE,DATE,DESCRIPTION,TYPE,INSPECTOR,PERMITID
0,21950255,4092017542,93.0,2019-04-04,"*NOTICE* AS OF JANUARY 1, 2019, THE NC FOOD CO...",Inspection,Joanne Rutkofske,33
1,21950256,4092017542,93.5,2019-10-07,Follow-Up: 10/17/2019,Inspection,Naterra McQueen,33
2,21950257,4092017542,92.5,2020-05-19,"*NOTICE* AS OF JANUARY 1, 2019, THE NC FOOD CO...",Inspection,Naterra McQueen,33
3,21950258,4092017542,94.0,2020-10-09,PIC cannot sign due to COVID-19 concerns.,Inspection,Nicole Millard,33
4,21950259,4092017542,94.0,2021-03-24,PIC cannot sign due to COVID-19 concerns.,Inspection,Nicole Millard,33


## Fetch Restaurants

In [55]:
restaurants = pd.read_csv('preprocessed_restaurants.csv', dtype={'PHONENUMBER': str})
restaurants.head()

Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,CITY,POSTALCODE,PHONENUMBER,RESTAURANTOPENDATE,PERMITID,X,Y,GEOCODESTATUS
0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,RALEIGH,27613,19196769968,2013-08-14,2,-78.725938,35.908783,M
1,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,RALEIGH,27609,19198905225,2021-05-13,22,-78.622635,35.866275,M
2,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,RALEIGH,27601,19197078060,2012-04-12,26,-78.639431,35.782205,M
3,1891533,4092016161,HIBACHI 88,3416 POOLE RD,RALEIGH,27610,19192311688,2012-04-18,28,-78.579533,35.767246,M
4,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,CARY,27511,19194592670,2016-03-11,29,-78.778021,35.787986,M


## Fetch violations

In [56]:
violations = pd.read_csv('preprocessed_violations.csv')
violations.head()

Unnamed: 0,OBJECTID,HSISID,INSPECTDATE,CATEGORY,STATECODE,CRITICAL,QUESTIONNO,VIOLATIONCODE,SEVERITY,SHORTDESC,INSPECTEDBY,COMMENTS,POINTVALUE,OBSERVATIONTYPE,VIOLATIONTYPE,CDCDATAITEM,PERMITID
0,188572810,4092017322,2020-07-10,Approved Source,".2653,.2655",,9,3-201.11,,Food obtained from approved source,Lauren Harden,3-201.11; PIC states that bakery items in disp...,0.0,Out,,Food shall be obtained from sources that compl...,41
1,188572821,4092110158,2019-02-20,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Kaitlyn Yow,3-202.11;,0.0,N/O,,Refrigerated food shall be at a temperature of...,11426
2,188572822,4092014259,2019-09-23,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Laura McNeill,3-202.11; upon arrival the manager had receive...,0.0,Out,,Refrigerated food shall be at a temperature of...,11599
3,188572823,4092014045,2020-10-13,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Ursula Gadomski,3-202.11; Priority; Shredded cabbage was recei...,1.0,Out,CDI,Refrigerated food shall be at a temperature of...,12939
4,188572824,4092050030,2021-01-21,Approved Source,".2653,.2655",,10,3-202.11,,Food received at proper temperature,Laura McNeill,"3-202.11; Ground beef wrap, rice bowl, and min...",1.0,Out,CDI,Refrigerated food shall be at a temperature of...,19


## Fetch weather data

In [57]:
weatherdata_raw = getWeatherData()
weatherdata = preprocess_weatherdata(weatherdata_raw.copy())
weatherdata.head()

Using pre-fetched weather data
weather df shape: (1390, 5)


datatype,TAVG
date,Unnamed: 1_level_1
2018-01-01,22.0
2018-01-02,20.0
2018-01-03,21.0
2018-01-04,26.0
2018-01-05,21.0


## Fetch Yelp Ratings Data

In [58]:
# read in yelp and restaurant data
restdf = restaurants.copy()
restdf['PHONENUMBER'] = restdf['PHONENUMBER'].str.split('+').str[1]
yelpdatadf = preprocess_yelpdata(fetchYelpDataByPhone())

# match data
yelpdatadf['phone'] = yelpdatadf['phone'].astype(str)
yelpmatch_phone = pd.merge(restdf, yelpdatadf, how='right', left_on='PHONENUMBER', right_on='phone')
yelpmatch_phone.drop_duplicates(subset=['HSISID'], inplace=True)
yelpmatch_phone.head()

Using pre-fetched yelp data
yelpdataraw df shape: (2145, 15)


Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,CITY,POSTALCODE,PHONENUMBER,RESTAURANTOPENDATE,PERMITID,X,...,is_closed,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone
0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,RALEIGH,27613,19196769968,2013-08-14,2,-78.725938,...,True,63,"[{'alias': 'chinese', 'title': 'Chinese'}]",3.5,"{'latitude': 35.90946322502375, 'longitude': -...","['delivery', 'pickup']",$,"{'address1': '13220 Strickland Rd', 'address2'...",19196769968,(919) 676-9968
2,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,RALEIGH,27609,19198905225,2021-05-13,22,-78.622635,...,False,23,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 35.86631037241957, 'longitude': -...",[],,"{'address1': '832 Spring Forest Rd', 'address2...",19198905225,(919) 890-5225
3,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,RALEIGH,27601,19197078060,2012-04-12,26,-78.639431,...,False,89,"[{'alias': 'cafes', 'title': 'Cafes'}, {'alias...",4.0,"{'latitude': 35.7823492580703, 'longitude': -7...",['delivery'],$$,"{'address1': '121 W Jones St', 'address2': '',...",19197078060,(919) 707-8060
4,1891533,4092016161,HIBACHI 88,3416 POOLE RD,RALEIGH,27610,19192311688,2012-04-18,28,-78.579533,...,False,46,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",3.5,"{'latitude': 35.76724, 'longitude': -78.57953}","['delivery', 'pickup']",$,"{'address1': '3416-100 Poole Rd', 'address2': ...",19192311688,(919) 231-1688
5,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,CARY,27511,19194592670,2016-03-11,29,-78.778021,...,False,179,"[{'alias': 'beergardens', 'title': 'Beer Garde...",4.0,"{'latitude': 35.7881885091418, 'longitude': -7...",['delivery'],$$,"{'address1': '202 E Cedar St', 'address2': '',...",19194592670,(919) 459-2670


In [59]:
yelpmatch_phone.head()

Unnamed: 0,OBJECTID,HSISID,NAME,ADDRESS1,CITY,POSTALCODE,PHONENUMBER,RESTAURANTOPENDATE,PERMITID,X,...,is_closed,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone
0,1891530,4092016487,PEACE CHINA,13220 Strickland RD,RALEIGH,27613,19196769968,2013-08-14,2,-78.725938,...,True,63,"[{'alias': 'chinese', 'title': 'Chinese'}]",3.5,"{'latitude': 35.90946322502375, 'longitude': -...","['delivery', 'pickup']",$,"{'address1': '13220 Strickland Rd', 'address2'...",19196769968,(919) 676-9968
2,1891531,4092018622,Northside Bistro & Cocktails,832 SPRING FOREST RD,RALEIGH,27609,19198905225,2021-05-13,22,-78.622635,...,False,23,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 35.86631037241957, 'longitude': -...",[],,"{'address1': '832 Spring Forest Rd', 'address2...",19198905225,(919) 890-5225
3,1891532,4092016155,DAILY PLANET CAFE,11 W JONES ST,RALEIGH,27601,19197078060,2012-04-12,26,-78.639431,...,False,89,"[{'alias': 'cafes', 'title': 'Cafes'}, {'alias...",4.0,"{'latitude': 35.7823492580703, 'longitude': -7...",['delivery'],$$,"{'address1': '121 W Jones St', 'address2': '',...",19197078060,(919) 707-8060
4,1891533,4092016161,HIBACHI 88,3416 POOLE RD,RALEIGH,27610,19192311688,2012-04-18,28,-78.579533,...,False,46,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",3.5,"{'latitude': 35.76724, 'longitude': -78.57953}","['delivery', 'pickup']",$,"{'address1': '3416-100 Poole Rd', 'address2': ...",19192311688,(919) 231-1688
5,1891534,4092017180,BOND BROTHERS BEER COMPANY,202 E CEDAR ST,CARY,27511,19194592670,2016-03-11,29,-78.778021,...,False,179,"[{'alias': 'beergardens', 'title': 'Beer Garde...",4.0,"{'latitude': 35.7881885091418, 'longitude': -7...",['delivery'],$$,"{'address1': '202 E Cedar St', 'address2': '',...",19194592670,(919) 459-2670


## Fetch crime data proxy data

In [60]:
crime_data_raw = getCrimeDataDf()
crimedatadf = preprocess_crimedata(crime_data_raw)
crimedatadf.head()

Using pre-fetched crime data
crime df shape: (1341, 25)
Dropping columns with more than 25% missing values: Index(['crime_type', 'reported_block_address'], dtype='object')


Unnamed: 0,OBJECTID,crime_category,crime_code,crime_description,district,reported_date,reported_year,reported_month,reported_day,reported_dayofwk,latitude,longitude
0,529022,LARCENY,35D,Larceny/Theft from Building,North,2020-02-17T05:15:00Z,2020,2,17,Monday,35.874504,-78.622925
1,529023,FRAUD,56B,Fraud/Credit Card-ATM Fraud,North,2020-02-17T05:15:00Z,2020,2,17,Monday,35.874504,-78.622925
2,529035,ALL OTHER OFFENSES,80A,All Other/All Other Offenses,Downtown,2020-02-17T05:48:00Z,2020,2,17,Monday,35.778107,-78.6342
3,529038,MISCELLANEOUS,81F,Miscellaneous/Mental Commitment,Northwest,2020-02-17T06:35:00Z,2020,2,17,Monday,0.0,0.0
4,529046,MISCELLANEOUS,81C,Miscellaneous/Found Property,Downtown,2020-02-17T08:11:00Z,2020,2,17,Monday,0.0,0.0


## Brainstorm Features

1.  Restaurant Data: POSTALCODE, RESTAURANTOPENDATE, X, Y, CITY
2.  Weather Data: Date (index), Avg Daily Hourly Temperature
3.  Restaurant Violations: the entirety / all features now

## Next Steps ( we have T-minus 1 weeks!!!! !!!!!! FREAK OUTTTTT !!!!) 
0. Pull the police incidents/crime data and possibly cencus tracked income by location - Hearsch & Shyamal 
1. Make sure yelp data is sourced in this main notebook (minimal datapoints: ratings, dollar signs, type/cuisine, review metadata, other features at Ms. Park's discretion)
2. Clean & Validate the data as part of Data Prep, EDA. Join tables by inspection. We want historical data per inspection and then we want to predict the risk scores for restaurants in high risk for future inspections. Note that although we have data around inspections by date, we don't really want to do a time series forecasting,bc time series forecasting sucks!
3. Deal with missing values and encode variables 
4. Feature engineering 
5. Baseline model
6. More complicated model
7. Datasheets for datasets (ask Jon about this in next class if we need datasheet for every table or for every source) - Christine 
8. Hearsch - Ethical checklist 
9. Visualizations and story telling!
10. Get started on a slideshow (FUN PART)