# Raw Data Ingestion

This workshop will use data from the City of Chicago Open Data Portal: <https://data.cityofchicago.org>. The following datasets will be used:

1. Business license data: <https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr>
2. Food inspections: <https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5>

## Setup

In [2]:
from urllib.parse import urlencode
import pins
import pandas as pd

In [3]:
# Set up the board
board = pins.board_connect()
user_name = "sam.edwardes"

## Data set (1): Business License Data

<https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr>

**Step 1:** Gew the raw data from the data portal

In [4]:
base_url = "https://data.cityofchicago.org/resource/r5kz-chrr.csv"
params = {
    "$order": "id", 
    "$limit": 100
}
url = f"{base_url}?{urlencode(params)}"
print(url)


https://data.cityofchicago.org/resource/r5kz-chrr.csv?%24order=id&%24limit=100


In [5]:
business_license_data = pd.read_csv(url)
business_license_data

Unnamed: 0,id,license_id,account_number,site_number,legal_name,doing_business_as_name,address,city,state,zip_code,...,license_start_date,expiration_date,license_approved_for_issuance,date_issued,license_status,license_status_change_date,ssa,latitude,longitude,location
0,1000000-20020221,1000000,200001,1,MARK BOSTON,COLORS IN MOTION,6421 N DAMEN AVE,CHICAGO,IL,60645,...,2002-02-21T00:00:00.000,2002-11-15T00:00:00.000,2002-02-21T00:00:00.000,2002-02-22T00:00:00.000,AAI,,,41.998514,-87.680011,"\n, \n(41.99851437112669, -87.68001090539342)"
1,1000049-20010816,1162772,200068,1,ANTONIA CASTREJON,ILLUSIONS HAIR DESIGN,3800 W DIVERSEY AVE,CHICAGO,IL,60647,...,2001-08-16T00:00:00.000,2002-08-15T00:00:00.000,2001-08-20T00:00:00.000,2002-04-30T00:00:00.000,AAI,,,41.931960,-87.722150,"\n, \n(41.931960332638006, -87.72215036594574)"
2,1000049-20020516,1233615,10141,2,"PEPE""S RETAIL MEATS, INC.",PEREZ MEXICAN FOOD,853-855 W RANDOLPH ST 1ST,CHICAGO,IL,60607,...,2002-05-16T00:00:00.000,2003-05-15T00:00:00.000,2002-04-17T00:00:00.000,2002-04-18T00:00:00.000,AAI,,,41.884261,-87.649534,"\n, \n(41.88426142200001, -87.6495341312589)"
3,1000049-20020816,1265665,200068,1,ANTONIA CASTREJON,ILLUSIONS HAIR DESIGN,3800 W DIVERSEY AVE,CHICAGO,IL,60647,...,2002-08-16T00:00:00.000,2003-08-15T00:00:00.000,2002-08-13T00:00:00.000,2002-08-14T00:00:00.000,AAI,,,41.931960,-87.722150,"\n, \n(41.931960332638006, -87.72215036594574)"
4,1000049-20030516,1342680,10141,2,"PEPE""S RETAIL MEATS, INC.",PEREZ MEXICAN FOOD,853-855 W RANDOLPH ST 1ST,CHICAGO,IL,60607,...,2003-05-16T00:00:00.000,2004-05-15T00:00:00.000,2003-04-17T00:00:00.000,2003-04-18T00:00:00.000,AAI,,,41.884261,-87.649534,"\n, \n(41.88426142200001, -87.6495341312589)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,1000201-20060216,1663561,200313,1,"KAPPIL STORE, INC",KAPPIL STORE,5952 W DIVERSEY AVE,CHICAGO,IL,60639,...,2006-02-16T00:00:00.000,2007-02-15T00:00:00.000,2006-02-09T00:00:00.000,2006-02-10T00:00:00.000,AAI,,,41.931252,-87.775672,"\n, \n(41.931252123128445, -87.7756722749983)"
96,1000-20130516,2241648,24139,1,600 SOUTH CICERO INC,BELMONTE LIQUORS,600 S CICERO AVE 1ST,CHICAGO,IL,60644,...,2013-05-16T00:00:00.000,2015-05-15T00:00:00.000,2013-03-31T00:00:00.000,2013-04-01T00:00:00.000,AAI,,,41.873095,-87.745144,"\n, \n(41.873094832519136, -87.74514386884542)"
97,1000-20150516,2386960,24139,1,600 SOUTH CICERO INC,BELMONTE LIQUORS,600 S CICERO AVE 1ST,CHICAGO,IL,60644,...,2015-05-16T00:00:00.000,2017-05-15T00:00:00.000,2015-04-05T00:00:00.000,2015-04-06T00:00:00.000,AAI,,,41.873095,-87.745144,"\n, \n(41.873094832519136, -87.74514386884542)"
98,1000-20170516,2518597,24139,1,600 SOUTH CICERO INC,BELMONTE LIQUORS,600 S CICERO AVE 1ST,CHICAGO,IL,60644,...,2017-05-16T00:00:00.000,2019-05-15T00:00:00.000,2017-03-28T00:00:00.000,2017-03-29T00:00:00.000,AAI,,,41.873095,-87.745144,"\n, \n(41.873094832519136, -87.74514386884542)"


**Step 2:** Save as a pin to Connect

In [6]:
# Pin the data to Connect
pin_name = f"{user_name}/chicago-business-license-data-raw"
board.pin_write(
    business_license_data, 
    name=pin_name, 
    type="csv",
    versioned=True,
    title="City of Chicago - Business License Data (RAW)"
)


Writing pin:
Name: 'sam.edwardes/chicago-business-license-data-raw'
Version: 20230614T070841Z-02670


Meta(title='City of Chicago - Business License Data (RAW)', description=None, created='20230614T070841Z', pin_hash='02670584c34dbd1f', file='chicago-business-license-data-raw.csv', file_size=42283, type='csv', api_version=1, version=VersionRaw(version='75868'), tags=None, name='sam.edwardes/chicago-business-license-data-raw', user={}, local={})

In [7]:
board.pin_versions(pin_name)

Unnamed: 0,version
0,75833
1,75868


## Data set (2): Food inspections

<https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5>

**Step 1:** Gew the raw data from the data portal

In [8]:
base_url = "https://data.cityofchicago.org/resource/4ijn-s7e5.csv"
params = {
    # "$order": "inspection_date", 
    "$limit": 500_000
}
url = f"{base_url}?{urlencode(params)}"
print(url)

https://data.cityofchicago.org/resource/4ijn-s7e5.csv?%24order=inspection_date&%24limit=500000


In [9]:
food_inspection_data = pd.read_csv(url)
food_inspection_data

Unnamed: 0,inspection_id,dba_name,aka_name,license_,facility_type,risk,address,city,state,zip,inspection_date,inspection_type,results,violations,latitude,longitude,location
0,67757,DUNKIN DONUTS/BASKIN-ROBBINS,DUNKIN DONUTS/BASKIN-ROBBINS,1380279.0,Restaurant,Risk 2 (Medium),100 W RANDOLPH ST,CHICAGO,IL,60601.0,2010-01-04T00:00:00.000,Tag Removal,Pass,,41.884586,-87.631010,"(41.88458626715456, -87.63101044588599)"
1,67732,WOLCOTT'S,TROQUET,1992039.0,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961606,-87.675967,"(41.961605669949854, -87.67596676683779)"
2,67738,MICHAEL'S ON MAIN CAFE,MICHAEL'S ON MAIN CAFE,2008948.0,Restaurant,Risk 1 (High),8750 W BRYN WAWR AVE,CHICAGO,IL,60631.0,2010-01-04T00:00:00.000,License,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,,,
3,104236,TEMPO CAFE,TEMPO CAFE,80916.0,Restaurant,Risk 1 (High),6 E CHESTNUT ST,CHICAGO,IL,60611.0,2010-01-04T00:00:00.000,Canvass,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,41.898431,-87.628009,"(41.89843137207629, -87.6280091630558)"
4,67733,WOLCOTT'S,TROQUET,1992040.0,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961606,-87.675967,"(41.961605669949854, -87.67596676683779)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
255247,2577228,IMEE'S KITCHEN,IMEE'S KITCHEN,2856209.0,Restaurant,Risk 1 (High),171 N WELLS ST,CHICAGO,IL,60606.0,2023-06-13T00:00:00.000,Canvass,Pass w/ Conditions,60. PREVIOUS CORE VIOLATION CORRECTED - Commen...,41.885137,-87.633766,"(41.88513652775062, -87.63376649625414)"
255248,2577206,DRAGON SUSHI BAR LLC,DRAGON SUSHI BAR LLC,2906556.0,Restaurant,Risk 1 (High),39 N WELLS ST,CHICAGO,IL,60606.0,2023-06-13T00:00:00.000,License,Pass,"53. TOILET FACILITIES: PROPERLY CONSTRUCTED, S...",41.882946,-87.633713,"(41.882945805288045, -87.63371259871252)"
255249,2577222,VINTAGE LOUNGE,VINTAGE LOUNGE,2304532.0,Restaurant,Risk 1 (High),1447-1449 W TAYLOR ST,CHICAGO,IL,60607.0,2023-06-13T00:00:00.000,Canvass Re-Inspection,Pass,"38. INSECTS, RODENTS, & ANIMALS NOT PRESENT - ...",41.869199,-87.663550,"(41.869198989000274, -87.66354973833046)"
255250,2577208,KASHMIR,KASHMIR,2882790.0,TAVERN,Risk 3 (Low),1436 W RANDOLPH ST,CHICAGO,IL,60607.0,2023-06-13T00:00:00.000,License,Pass w/ Conditions,10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLI...,41.884385,-87.663480,"(41.88438512621228, -87.66347994097463)"


In [14]:
food_inspection_data["inspection_date"].min()

'2010-01-04T00:00:00.000'

In [15]:
food_inspection_data["inspection_date"].max()

'2023-06-13T00:00:00.000'

**Step 2:** Save as a pin to Connect

In [9]:
pin_name = f"{user_name}/chicago-food-inspection-data-raw"
board.pin_write(
    food_inspection_data, 
    name=pin_name, 
    type="csv",
    versioned=True,
    title="City of Chicago - Food Inspection Data (RAW)"
)

Writing pin:
Name: 'sam.edwardes/chicago-food-inspection-data-raw'
Version: 20230613T110821Z-0186e


Meta(title='City of Chicago - Food Inspection Data (RAW)', description=None, created='20230613T110821Z', pin_hash='0186eb52c3bac5e9', file='chicago-food-inspection-data-raw.csv', file_size=145330, type='csv', api_version=1, version=VersionRaw(version='75834'), tags=None, name='sam.edwardes/chicago-food-inspection-data-raw', user={}, local={})

In [10]:
board.pin_versions(pin_name)

Unnamed: 0,version
0,75834
