# Raw Data Ingestion

This workshop will use data from the City of Chicago Open Data Portal: <https://data.cityofchicago.org>. The following datasets will be used:

1. Business license data: <https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr>
2. Food inspections: <https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5>

## Setup

In [1]:
from urllib.parse import urlencode
import pins
import pandas as pd

In [None]:
pd.options.display.max_columns = 999

In [2]:
# Set up the board
board = pins.board_connect()
user_name = "sam.edwardes"

## Data set (1): Business License Data

<https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr>

**Step 1:** Gew the raw data from the data portal

In [3]:
base_url = "https://data.cityofchicago.org/resource/r5kz-chrr.csv"
params = {
    "$order": "id", 
    "$limit": 1_000
}
url = f"{base_url}?{urlencode(params)}"
print(url)


https://data.cityofchicago.org/resource/r5kz-chrr.csv?%24order=id&%24limit=1000


In [4]:
business_license_data = pd.read_csv(url)
business_license_data

Unnamed: 0,id,license_id,account_number,site_number,legal_name,doing_business_as_name,address,city,state,zip_code,...,license_start_date,expiration_date,license_approved_for_issuance,date_issued,license_status,license_status_change_date,ssa,latitude,longitude,location
0,1000000-20020221,1000000,200001,1,MARK BOSTON,COLORS IN MOTION,6421 N DAMEN AVE,CHICAGO,IL,60645,...,2002-02-21T00:00:00.000,2002-11-15T00:00:00.000,2002-02-21T00:00:00.000,2002-02-22T00:00:00.000,AAI,,,41.998514,-87.680011,"\n, \n(41.99851437112669, -87.68001090539342)"
1,1000049-20010816,1162772,200068,1,ANTONIA CASTREJON,ILLUSIONS HAIR DESIGN,3800 W DIVERSEY AVE,CHICAGO,IL,60647,...,2001-08-16T00:00:00.000,2002-08-15T00:00:00.000,2001-08-20T00:00:00.000,2002-04-30T00:00:00.000,AAI,,,41.931960,-87.722150,"\n, \n(41.931960332638006, -87.72215036594574)"
2,1000049-20020516,1233615,10141,2,"PEPE""S RETAIL MEATS, INC.",PEREZ MEXICAN FOOD,853-855 W RANDOLPH ST 1ST,CHICAGO,IL,60607,...,2002-05-16T00:00:00.000,2003-05-15T00:00:00.000,2002-04-17T00:00:00.000,2002-04-18T00:00:00.000,AAI,,,41.884261,-87.649534,"\n, \n(41.88426142200001, -87.6495341312589)"
3,1000049-20020816,1265665,200068,1,ANTONIA CASTREJON,ILLUSIONS HAIR DESIGN,3800 W DIVERSEY AVE,CHICAGO,IL,60647,...,2002-08-16T00:00:00.000,2003-08-15T00:00:00.000,2002-08-13T00:00:00.000,2002-08-14T00:00:00.000,AAI,,,41.931960,-87.722150,"\n, \n(41.931960332638006, -87.72215036594574)"
4,1000049-20030516,1342680,10141,2,"PEPE""S RETAIL MEATS, INC.",PEREZ MEXICAN FOOD,853-855 W RANDOLPH ST 1ST,CHICAGO,IL,60607,...,2003-05-16T00:00:00.000,2004-05-15T00:00:00.000,2003-04-17T00:00:00.000,2003-04-18T00:00:00.000,AAI,,,41.884261,-87.649534,"\n, \n(41.88426142200001, -87.6495341312589)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,1000966-20020216,1215943,200860,1,CHAUNDEL D SAMMARCO,CHAUNDEL D.SAMMARCO,3705 WONDERLAKE RD,WONDER LAKE,IL,60007,...,2002-02-16T00:00:00.000,2003-02-15T00:00:00.000,,2003-02-20T00:00:00.000,AAI,,,,,
996,1000966-20030216,1334986,200860,1,CHAUNDEL D SAMMARCO,CHAUNDEL D.SAMMARCO,3705 WONDERLAKE RD,WONDER LAKE,IL,60007,...,2003-02-16T00:00:00.000,2004-02-15T00:00:00.000,,2003-12-22T00:00:00.000,AAI,,,,,
997,1000966-20040216,1477145,200860,1,CHAUNDEL D SAMMARCO,CHAUNDEL D.SAMMARCO,3705 WONDERLAKE RD,WONDER LAKE,IL,60007,...,2004-02-16T00:00:00.000,2005-02-15T00:00:00.000,,2004-02-17T00:00:00.000,AAI,,,,,
998,1000966-20050216,1562532,200860,1,CHAUNDEL D SAMMARCO,CHAUNDEL D.SAMMARCO,3705 WONDERLAKE RD,WONDER LAKE,IL,60007,...,2005-02-16T00:00:00.000,2006-02-15T00:00:00.000,,2005-02-14T00:00:00.000,AAI,,,,,


**Step 2:** Save as a pin to Connect

In [5]:
# Pin the data to Connect
pin_name = f"{user_name}/chicago-business-license-data-raw"
board.pin_write(
    business_license_data, 
    name=pin_name, 
    type="csv",
    versioned=True,
    title="City of Chicago - Business License Data (RAW)"
)


Writing pin:
Name: 'sam.edwardes/chicago-business-license-data-raw'
Version: 20230614T073813Z-3077d


Meta(title='City of Chicago - Business License Data (RAW)', description=None, created='20230614T073813Z', pin_hash='3077d25fefb568da', file='chicago-business-license-data-raw.csv', file_size=425766, type='csv', api_version=1, version=VersionRaw(version='75871'), tags=None, name='sam.edwardes/chicago-business-license-data-raw', user={}, local={})

In [6]:
board.pin_versions(pin_name)

Unnamed: 0,version
0,75833
1,75868
2,75871


## Data set (2): Food inspections

<https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5>

**Step 1:** Gew the raw data from the data portal

In [7]:
base_url = "https://data.cityofchicago.org/resource/4ijn-s7e5.csv"
params = {
    "$order": "inspection_date", 
    "$limit": 1_000
}
url = f"{base_url}?{urlencode(params)}"
print(url)

https://data.cityofchicago.org/resource/4ijn-s7e5.csv?%24order=inspection_date&%24limit=1000


In [8]:
food_inspection_data = pd.read_csv(url)
food_inspection_data

Unnamed: 0,inspection_id,dba_name,aka_name,license_,facility_type,risk,address,city,state,zip,inspection_date,inspection_type,results,violations,latitude,longitude,location
0,67757,DUNKIN DONUTS/BASKIN-ROBBINS,DUNKIN DONUTS/BASKIN-ROBBINS,1380279,Restaurant,Risk 2 (Medium),100 W RANDOLPH ST,CHICAGO,IL,60601.0,2010-01-04T00:00:00.000,Tag Removal,Pass,,41.884586,-87.631010,"(41.88458626715456, -87.63101044588599)"
1,67732,WOLCOTT'S,TROQUET,1992039,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961606,-87.675967,"(41.961605669949854, -87.67596676683779)"
2,67738,MICHAEL'S ON MAIN CAFE,MICHAEL'S ON MAIN CAFE,2008948,Restaurant,Risk 1 (High),8750 W BRYN WAWR AVE,CHICAGO,IL,60631.0,2010-01-04T00:00:00.000,License,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,,,
3,104236,TEMPO CAFE,TEMPO CAFE,80916,Restaurant,Risk 1 (High),6 E CHESTNUT ST,CHICAGO,IL,60611.0,2010-01-04T00:00:00.000,Canvass,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,41.898431,-87.628009,"(41.89843137207629, -87.6280091630558)"
4,67733,WOLCOTT'S,TROQUET,1992040,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961606,-87.675967,"(41.961605669949854, -87.67596676683779)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,78279,55TH MAXWELL,55TH MAXWELL,1947631,Restaurant,Risk 2 (Medium),323 E GARFIELD BLVD,CHICAGO,IL,60637.0,2010-01-25T00:00:00.000,Complaint,Fail,2. FACILITIES TO MAINTAIN PROPER TEMPERATURE -...,41.794329,-87.618291,"(41.794329105990776, -87.61829143863345)"
996,78278,RAINBOW BEACH NURSING CENT INC,RAINBOW BEACH NURSING CENT INC,13682,,Risk 1 (High),7325 S EXCHANGE AVE,CHICAGO,IL,60649.0,2010-01-25T00:00:00.000,Out of Business,Fail,,41.762169,-87.562641,"(41.762168766250305, -87.56264124442326)"
997,58273,BITE,BITE,51992,Restaurant,Risk 1 (High),1039 N WESTERN AVE,CHICAGO,IL,60622.0,2010-01-25T00:00:00.000,Canvass,Fail,29. PREVIOUS MINOR VIOLATION(S) CORRECTED 7-42...,41.900528,-87.686816,"(41.90052793259093, -87.6868161338336)"
998,98349,TRASPASADA RESTAURANT,TRASPASADA RESTAURANT,1334442,Restaurant,Risk 1 (High),3144 N CALIFORNIA AVE,CHICAGO,IL,60618.0,2010-01-25T00:00:00.000,Complaint,Pass,33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSI...,41.938946,-87.697956,"(41.938946248449454, -87.69795617571842)"


**Step 2:** Save as a pin to Connect

In [9]:
pin_name = f"{user_name}/chicago-food-inspection-data-raw"
board.pin_write(
    food_inspection_data, 
    name=pin_name, 
    type="csv",
    versioned=True,
    title="City of Chicago - Food Inspection Data (RAW)"
)

Writing pin:
Name: 'sam.edwardes/chicago-food-inspection-data-raw'
Version: 20230614T073820Z-72e74


Meta(title='City of Chicago - Food Inspection Data (RAW)', description=None, created='20230614T073820Z', pin_hash='72e740edfdc4f318', file='chicago-food-inspection-data-raw.csv', file_size=1326352, type='csv', api_version=1, version=VersionRaw(version='75872'), tags=None, name='sam.edwardes/chicago-food-inspection-data-raw', user={}, local={})

In [10]:
board.pin_versions(pin_name)

Unnamed: 0,version
0,75834
1,75872
