# New-York-Restaurant-Guide

## Abstract

Every restaurant in New York City is inspected for food safety standards at a minimum of once a year, more frequently where violations are found. Health Inspections are a major source of worry for owners and managers and also a significant source of costs to a food service establishment in the form of fines and lost business where violations are found and a poor rating given. This project takes public available data published by the Department of Health through the New York Open Data program and applies Exploratory Data Analysis to extract trends across 3 areas of business characteristics: Location, Type and History.

Machine Learning Models are created to predict health inspoection outcomes "sight unseen". Using the Location, Type and History characteristics of a restaurant it is possible to determine with ~65% accuracy whether or not that establishment will receive an A on its health inspection. Similar levels of accuracy are achived using a larger sparse matrix of 153 categorical features and using a much smaller array of numerical summary statistics of 9 features with the latter performing a much faster training period. However, it was not possible to achieve accurate results by constructing a model trained only a subset of restaurants based inManhattan and extrapolating the model to all of New York. These can be used as a predictive tool for "at risk" restaurants that are concerned over the possiblity of losing their A rating or by restaurant industry consultants attempting to attract new clients. Additionally, this model can be used as a public safety tool for investing resources into neighbourhoods or restaurant types that are outliers for food safety to encourage better practices for food safety.

Future work expansion on this model could include additional features such as time based effects on ratings (seasonal or day of week), longer historical effects or incorporate additional data sources such as yelp reviews.

## Project Motivation
The New York City Department of Health and Mental Hygene inspects all food service establishments to make sure they meet Health Code requirements. Adherence to Health Code food safey requirements is necessary to prevent foodborne illness after it's introduction in 2011 instances of salmonella in New York City fell by 14% in the first year (source: https://ny.eater.com/2019/6/28/18761345/department-of-health-letter-grades-nyc-restaurant-rules) but it has also generates millions of dollars in fines to restaurants. Annual totals for restaurant fines from Dept. of Health Inspections top \\$30 million annually and reached \\$54 million in 2012 (source: https://comptroller.nyc.gov/reports/new-york-city-fine-revenues-update/). 

Penalties for Health Code Violations are effectively twofold. Where violations are found a Notice of Violation is issued and a hearing Date is set for the Health Hearing Division of the Office of Trials and Hearings (OATH), violations can be contested and fines are determined at the OATH hearing. Fines can range from \\$200 to \\$2,000 and higher for repeat offences. The second form of penalty can occur if a restaurant fails to achieve an A grade. According to a 2012 survey conducted by Baruch College at the City University of New York, 88% of New Yorkers  used the letter grades in making their dining decisions, and 76% felt more confident eating in an “A” grade restaurant(source: https://www1.nyc.gov/site/doh/about/press/pr2017/pr031-17.page). Failing to achieve or maintain an A grade can result in damage to a restaurant's reputation, negative press coverage (Upscale restaurant Per Se's 2014 poor inspection result was reported on by Business Insider - https://www.businessinsider.com/per-se-grade-pending-2014-3) and a loss of customer confidence that leads to a drop in foot traffic, sales and profits. 

The importance of an A grade has restaurants bending over backwards to beat the inspectors (source: https://ny.eater.com/2019/6/28/18761345/department-of-health-letter-grades-nyc-restaurant-rules) and has led to a booming industry of restaurant consultants, performing mock inspections and legal service that will assist in challenging violations at the OATH Hearing. The ability to predict, prior to inspection, whether a restaurant is likely to receive the highly coveted A is an invaluable tool in preparing for the inspection event. 

## Inspection Process
<img src='inspection_process.PNG'>


source: [Inspection Cycle Overview](https://www1.nyc.gov/assets/doh/downloads/pdf/rii/inspection-cycle-overview.pdf) 

<br>
Inspections that do not result in an immediate A are subject to re-inspection and/or OATH hearings and may still be able to obtain an A rating but the stigma assosciated with displaying a "Grade Pending" Sign and the costs assosciated with tribunal hearings and process adaptations in preparation for reinspection are costly. The ideal outcome is to receive 13 points or fewer during the inspection and immediately be able to display an A rating. Additionally, receiving an A rating allows the restaurant to have a longer period between inspections (11-13 months) than would be possible even if they receive an A through reinspection or OATH hearing. (5-7 or 3-5 months)

Every food service establishment receives an unannounced, onsite inspection at least once a year. The inspector may visit anytime the restaurant is receiving or preparing food or drink, or is open to the public. The inspector records observed violations in a handheld computer during the inspection. Each violation is associated with a range of points depending on the type and extent of the violation, and the risk it poses to the public. At the end of the inspection, the points are added together for an inspection score. Lower inspection scores indicate better compliance with the Health Code.

Health Code violations are classified as 'Critical' or 'General'. Critical violations are more likely to to contribute to foodborne illnesses and are a substantial risk to public health.

A score of 13 points or less results in an A rating. Receiving Critical violations will not immediately put a restaurant over the 13 points necessary to receive an A grade and likewise it is still possible to recieve more than 13 points on general violations alone.

A detailed description of all Critical and General Violations can be found here: [The Blue Book](https://www1.nyc.gov/assets/doh/downloads/pdf/rii/blue-book.pdf)

## The Data
The New York City Open Data Project records of every violation citation up to three years prior to the most recent inspection for all active restaurants within the city
source : https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

Data for this project was pulled on the 17th of August 2019 and is up to that date. 

Github Notebook: https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Inspection%20Dataframe%20construction.ipynb

The full dataset records indivdiual violation citations for all Dept of Health inspections including those that are not related to food safety and are not graded/scored. Each violation is a row within the dataset recording the identifying the violation code and description  and features for the restaurant (CAMIS:unique identifier, name, location, cuisine type) and the inspection (type of inspection, date, total inspection score). 
To perform Exploratory Data Analysis the data was cleaned in the following steps:
- The dataset was reconfigured in Pandas to group violations into inspection events based on CAMIS identifier and date.  
- "Initial Inspections" were extracted. These are the regular Dept of Health food safety inspections
- The inspection Score was determined from the violations
- Violations were encoded categorically 
- Grade was missing in many cases (for various reasons) and was determined from inspection Score:
    - A <=13
    - B 14 to 28
    - C > 28
- Critical flags are summed for the inspection based on the violations recorded and encoded in a feature
- Inspections from prior to mid 2016 were dropped as these represent a very sparse set of inspections compared to the main set

<img src='ins_fulltime.png' height = 400 width=500>

- Inspections from after to the 20th of July 2019 were also dropped as these have not yet been finalised by OATH hearing leading to a significantly higher average
<img src='ins_croptime.png'>

Some cyclical effects on inspection average were observed but the time scale is very short to extract annual cyclical effects and this was not further explored in this project but is included in future work.

All further analysis was performed based on this initial inspection data

## Initial Inspection Analysis
Github Notebook https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Initial%20inspections%20analysis.ipynb


## 3 Categories Breakdown 

### Location
#### Community Board
Github Notebook https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Community%20Board%20Investigation.ipynb



### Restaurant type
Github Notebook https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Restaurant%20Data%20Isolation.ipynb
#### Cuisine
#### Chain

### History
Github Notebook https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Twice.ipynb
### Previous Inspection Score 
### Previous Inspection Critical Flags

## Machine Learning
### Categorical Matrix
Github Notebook https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Sparse%20matrix%20classifier%202.ipynb

### Summary Statistics 
Github Notebook https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Summary%20Stats%20Classifier%20Model.ipynb

### Manhattan Only Statistics
Github Notebook https://github.com/ktzioumis/New-York-Restaurant-Guide/blob/master/Manhattan%20model%20Classifier.ipynb