## Food Inspection Modelling

### Potential Features  

The initial plan is to consider these features.  The basic data cleaning was performed in the `feature_extraction.ipynb` notebook.  The data being pulled into this notebook is the combination of the City of Chicago Food Inspection and Business License dataset.  These datasets were combined to fill in some of the missing values and add a few features.  In the future, additional information will be added to further fill in missing data and to add features like cuisine type and number of Starbucks within a certain distance.   

1.  name  
1.  license number  
1.  result (pass/fail) \<target>
1.  business age (default start from 2010 inspection date)  
1.  number of chains / is_chain boolean 
1.  risk  
1.  ward / neighborhood
1.  license code  
1.  renew  
1.  conditional approved  
1.  business activity
1.  *number of (pass/fail) inspections during 1st, 2nd, 3rd, and 4th most recent license period
    * this can have errors so maybe use years from year_min to simplify
1.  *relative location of other popular food service (ie Starbucks)

*This will be a feature added in the future

In [9]:
# Import required libraries

# # Code formatter
# # !pip3 install nb_black
# %load_ext nb_black

# eda tools
import numpy as np
import math
import pandas as pd

# visualization dependencies
import matplotlib.pyplot as plt  
import seaborn as sns

# hide jupyter lab warnings
import warnings
warnings.filterwarnings('ignore')

# expand the number of dataframe columns visible
pd.options.display.max_columns = 100

# make sound when this code executes: Audio(sound_file, autoplay=True)
from IPython.display import Audio
sound_file = './sound/chord.wav'

# display package informatin
# !conda install -c conda-forge session-info
import session_info
session_info.show()

### Read Dataset

In [10]:
# Read data
restaurant_df = pd.read_csv('./data/manipulated/combined_data.csv', parse_dates=['inspect_date', 'approx_start_date'])

# renamed first column - caused by including the index during the export from `feature_extraction.ipynb`.  This is the original index.  
restaurant_df.rename(columns={'Unnamed: 0':'original_index'}, inplace=True)
restaurant_df.head()

Unnamed: 0,original_index,inspect_id,aka_name_x,license_num,facility_type,risk,address_x,city,state,zipcode,inspect_date,inspect_type,results,violations,lat,lon,violations_list,violation_count,violation_number,vl_must_comply_list,vl_must_comply_count,vl_instructed_comply_list,vl_instructed_comply_count,vl_not_cited_list,vl_citation_count,year,month,aka_name_y,license_id,address_y,ward,precint,police_district,license_code,license_description,bus_activity_id,bus_activity,application_type,conditional_approval,approx_start_date,year_min,year_max,bus_age,license_alias,number_of_chains,week,inspections_per_year,type_of_restaurant
0,3,2579789,SENOR PAN CAFE,2659113.0,Restaurant,Risk 1 (High),4612 W FULLERTON AVE,CHICAGO,IL,60639.0,2023-08-09,Canvass Re-Inspection,Pass,-99. No violations reported,41.924402,-87.742318,['-99. No violations reported'],1,['99'],[False],0,[False],0,[False],1,2023,8,SENOR PAN CAFE,2899694.0,4612 W FULLERTON AVE 1,31.0,29.0,25.0,1006.0,Retail Food Establishment,735,Preparation of Food and Dining on Premise With...,RENEW,N,2023-05-16,2019.0,2023.0,4.0,14065,5,32,12,2-5 Sites
1,4,2579779,TAQUERIA CD HIDALGO,2078867.0,Restaurant,Risk 1 (High),7104 N CLARK ST,CHICAGO,IL,60626.0,2023-08-08,Canvass,Pass,-99. No violations reported,42.011581,-87.674614,['-99. No violations reported'],1,['99'],[False],0,[False],0,[False],1,2023,8,TAQUERIA CD HIDALGO,2884398.0,7104 N CLARK ST 1ST,49.0,3.0,24.0,1006.0,Retail Food Establishment,775,Retail Sales of Perishable Foods,RENEW,N,2023-02-16,2011.0,2023.0,12.0,15894,1,32,24,Single Site
2,7,2579723,BAR COCINA,2334704.0,Restaurant,Risk 1 (High),2901 N SHEFFIELD AVE,CHICAGO,IL,60657.0,2023-08-08,Canvass Re-Inspection,Pass,-99. No violations reported,41.934501,-87.653775,['-99. No violations reported'],1,['99'],[False],0,[False],0,[False],1,2023,8,BAR COCINA,2898722.0,2901 N SHEFFIELD AVE 1,44.0,29.0,19.0,1006.0,Retail Food Establishment,735,Preparation of Food and Dining on Premise With...,RENEW,N,2023-05-16,2015.0,2023.0,8.0,1121,1,32,17,Single Site
3,8,2579643,Itoko / GG'S Chicken Shop / Little Goat Diner,2872999.0,Restaurant,Risk 1 (High),3323-3325 N SOUTHPORT AVE,CHICAGO,IL,60657.0,2023-08-07,Complaint Re-Inspection,Pass,-99. No violations reported,41.942334,-87.663739,['-99. No violations reported'],1,['99'],[False],0,[False],0,[False],1,2023,8,Itoko / GG's Chicken Shop / Little Goat Diner,2872999.0,3323-3325 N SOUTHPORT AVE LL-2,44.0,,,1006.0,Retail Food Establishment,735,Preparation of Food and Dining on Premise With...,ISSUE,Y,2023-02-21,2023.0,2023.0,0.0,7640,1,32,6,Single Site
4,10,2579561,JIN JU,27137.0,Restaurant,Risk 1 (High),5203 N CLARK ST,CHICAGO,IL,60640.0,2023-08-04,Canvass,Pass,10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLI...,41.976301,-87.668276,['10. ADEQUATE HANDWASHING SINKS PROPERLY SUPP...,4,"['10', '47', '55', '58']","[False, False, False, False]",0,"[False, False, False, False]",0,"[False, False, False, False]",4,2023,8,JIN JU,2907977.0,5203 N CLARK ST 1ST,48.0,10.0,20.0,1006.0,Retail Food Establishment,775,Retail Sales of Perishable Foods,RENEW,N,2023-07-16,2001.0,2023.0,22.0,8059,1,31,19,Single Site
