# Raw Data Ingestion

This workshop will use data from the City of Chicago Open Data Portal: <https://data.cityofchicago.org>. The following datasets will be used:

1. Business license data: <https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr>
2. Food inspections: <https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5>

## Setup

In [1]:
from urllib.parse import urlencode
import pins
import pandas as pd

In [2]:
pd.options.display.max_columns = 999

In [3]:
# Set up the board
board = pins.board_connect()
user_name = "sam.edwardes"

## Data set (1): Business License Data

<https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr>

**Step 1:** Gew the raw data from the data portal

In [4]:
base_url = "https://data.cityofchicago.org/resource/r5kz-chrr.csv"
params = {
    "$order": "id", 
    "$limit": 5_000_000
}
url = f"{base_url}?{urlencode(params)}"
print(url)


https://data.cityofchicago.org/resource/r5kz-chrr.csv?%24order=id&%24limit=5000000


In [5]:
business_license_data = pd.read_csv(url)
business_license_data

  business_license_data = pd.read_csv(url)


Unnamed: 0,id,license_id,account_number,site_number,legal_name,doing_business_as_name,address,city,state,zip_code,ward,precinct,ward_precinct,police_district,license_code,license_description,business_activity_id,business_activity,license_number,application_type,application_created_date,application_requirements_complete,payment_date,conditional_approval,license_start_date,expiration_date,license_approved_for_issuance,date_issued,license_status,license_status_change_date,ssa,latitude,longitude,location
0,1000000-20020221,1000000,200001,1,MARK BOSTON,COLORS IN MOTION,6421 N DAMEN AVE,CHICAGO,IL,60645,50.0,28.0,50-28,24.0,1011,Home Repair,,,1000000.0,ISSUE,2000-06-19T00:00:00.000,2002-02-15T00:00:00.000,2002-02-15T00:00:00.000,N,2002-02-21T00:00:00.000,2002-11-15T00:00:00.000,2002-02-21T00:00:00.000,2002-02-22T00:00:00.000,AAI,,,41.998514,-87.680011,"\n, \n(41.99851437112669, -87.68001090539342)"
1,1000049-20010816,1162772,200068,1,ANTONIA CASTREJON,ILLUSIONS HAIR DESIGN,3800 W DIVERSEY AVE,CHICAGO,IL,60647,31.0,999.0,31-999,25.0,1010,Limited Business License,,,1000049.0,RENEW,,2001-06-25T00:00:00.000,2001-08-20T00:00:00.000,N,2001-08-16T00:00:00.000,2002-08-15T00:00:00.000,2001-08-20T00:00:00.000,2002-04-30T00:00:00.000,AAI,,,41.931960,-87.722150,"\n, \n(41.931960332638006, -87.72215036594574)"
2,1000049-20020516,1233615,10141,2,"PEPE""S RETAIL MEATS, INC.",PEREZ MEXICAN FOOD,853-855 W RANDOLPH ST 1ST,CHICAGO,IL,60607,27.0,1.0,27-1,12.0,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,1000049.0,RENEW,,2002-03-27T00:00:00.000,2002-04-17T00:00:00.000,N,2002-05-16T00:00:00.000,2003-05-15T00:00:00.000,2002-04-17T00:00:00.000,2002-04-18T00:00:00.000,AAI,,,41.884261,-87.649534,"\n, \n(41.88426142200001, -87.6495341312589)"
3,1000049-20020816,1265665,200068,1,ANTONIA CASTREJON,ILLUSIONS HAIR DESIGN,3800 W DIVERSEY AVE,CHICAGO,IL,60647,31.0,999.0,31-999,25.0,1010,Limited Business License,,,1000049.0,RENEW,,2002-06-28T00:00:00.000,2002-08-13T00:00:00.000,N,2002-08-16T00:00:00.000,2003-08-15T00:00:00.000,2002-08-13T00:00:00.000,2002-08-14T00:00:00.000,AAI,,,41.931960,-87.722150,"\n, \n(41.931960332638006, -87.72215036594574)"
4,1000049-20030516,1342680,10141,2,"PEPE""S RETAIL MEATS, INC.",PEREZ MEXICAN FOOD,853-855 W RANDOLPH ST 1ST,CHICAGO,IL,60607,27.0,1.0,27-1,12.0,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,1000049.0,RENEW,,2003-03-25T00:00:00.000,2003-04-17T00:00:00.000,N,2003-05-16T00:00:00.000,2004-05-15T00:00:00.000,2003-04-17T00:00:00.000,2003-04-18T00:00:00.000,AAI,,,41.884261,-87.649534,"\n, \n(41.88426142200001, -87.6495341312589)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1103959,9999-20140916,2343163,26256,1,CHURCH & CHAPEL METAL ARTS INC,CHURCH & CHAPEL METAL ARTS INC,2616 W GRAND AVE 1ST,CHICAGO,IL,60612,36.0,17.0,36-17,12.0,1010,Limited Business License,,,9999.0,RENEW,,2014-07-15T00:00:00.000,2014-12-26T00:00:00.000,N,2014-09-16T00:00:00.000,2016-09-15T00:00:00.000,2014-12-26T00:00:00.000,2014-12-29T00:00:00.000,AAI,,,41.892721,-87.692332,"\n, \n(41.89272080716665, -87.69233175444906)"
1103960,9999-20160916,2478055,26256,1,CHURCH & CHAPEL METAL ARTS INC,CHURCH & CHAPEL METAL ARTS INC,2616 W GRAND AVE 1ST,CHICAGO,IL,60612,36.0,17.0,36-17,12.0,1010,Limited Business License,,,9999.0,RENEW,,2016-07-15T00:00:00.000,2016-09-08T00:00:00.000,N,2016-09-16T00:00:00.000,2018-09-15T00:00:00.000,2016-09-08T00:00:00.000,2016-09-09T00:00:00.000,AAI,,,41.892721,-87.692332,"\n, \n(41.89272080716665, -87.69233175444906)"
1103961,9999-20180916,2610578,26256,1,CHURCH & CHAPEL METAL ARTS INC,CHURCH & CHAPEL METAL ARTS INC,2616 W GRAND AVE 1ST,CHICAGO,IL,60612,36.0,17.0,36-17,12.0,1010,Limited Business License,,,9999.0,RENEW,,2018-07-15T00:00:00.000,2018-09-10T00:00:00.000,N,2018-09-16T00:00:00.000,2020-09-15T00:00:00.000,2018-09-10T00:00:00.000,2018-09-11T00:00:00.000,AAI,,,41.892721,-87.692332,"\n, \n(41.89272080716665, -87.69233175444906)"
1103962,9999-20200916,2739432,26256,1,CHURCH & CHAPEL METAL ARTS INC,CHURCH & CHAPEL METAL ARTS INC,2616 W GRAND AVE 1ST,CHICAGO,IL,60612,36.0,17.0,36-17,12.0,1010,Limited Business License,,,9999.0,RENEW,,2020-07-15T00:00:00.000,2020-08-05T00:00:00.000,N,2020-09-16T00:00:00.000,2022-09-15T00:00:00.000,2020-08-05T00:00:00.000,2020-08-06T00:00:00.000,AAI,,,41.892721,-87.692332,"\n, \n(41.89272080716665, -87.69233175444906)"


**Step 2:** Save as a pin to Connect

In [6]:
# Pin the data to Connect
pin_name = f"{user_name}/chicago-business-license-data-raw"
board.pin_write(
    business_license_data, 
    name=pin_name, 
    type="csv",
    versioned=True,
    title="City of Chicago - Business License Data (RAW)"
)


Writing pin:
Name: 'sam.edwardes/chicago-business-license-data-raw'
Version: 20230623T081519Z-6f53f


Meta(title='City of Chicago - Business License Data (RAW)', description=None, created='20230623T081519Z', pin_hash='6f53fa77d317f57c', file='chicago-business-license-data-raw.csv', file_size=476524752, type='csv', api_version=1, version=VersionRaw(version='76320'), tags=None, name='sam.edwardes/chicago-business-license-data-raw', user={}, local={})

In [7]:
board.pin_versions(pin_name)

Unnamed: 0,version
0,75833
1,75868
2,75871
3,76320


## Data set (2): Food inspections

<https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5>

**Step 1:** Gew the raw data from the data portal

In [8]:
base_url = "https://data.cityofchicago.org/resource/4ijn-s7e5.csv"
params = {
    "$order": "inspection_date", 
    "$limit": 5_000_000
}
url = f"{base_url}?{urlencode(params)}"
print(url)

https://data.cityofchicago.org/resource/4ijn-s7e5.csv?%24order=inspection_date&%24limit=5000000


In [9]:
food_inspection_data = pd.read_csv(url)
food_inspection_data

Unnamed: 0,inspection_id,dba_name,aka_name,license_,facility_type,risk,address,city,state,zip,inspection_date,inspection_type,results,violations,latitude,longitude,location
0,70269,mr.daniel's,mr.daniel's,1899292.0,Restaurant,Risk 1 (High),5645 W BELMONT AVE,CHICAGO,IL,60634.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.938443,-87.768318,"(41.93844282365204, -87.76831838068422)"
1,67732,WOLCOTT'S,TROQUET,1992039.0,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961606,-87.675967,"(41.961605669949854, -87.67596676683779)"
2,67733,WOLCOTT'S,TROQUET,1992040.0,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961606,-87.675967,"(41.961605669949854, -87.67596676683779)"
3,52234,Cafe 608,Cafe 608,2013328.0,Restaurant,Risk 1 (High),608 W BARRY AVE,CHICAGO,IL,60657.0,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.938007,-87.644755,"(41.938006880423615, -87.6447545707008)"
4,67738,MICHAEL'S ON MAIN CAFE,MICHAEL'S ON MAIN CAFE,2008948.0,Restaurant,Risk 1 (High),8750 W BRYN WAWR AVE,CHICAGO,IL,60631.0,2010-01-04T00:00:00.000,License,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
255658,2577684,CHICAGO EATS SWEET HONEY TEA & LEMONADE,CHICAGO EATS SWEET HONEY TEA & LEMONADE,2812716.0,Restaurant,Risk 2 (Medium),4100 W MADISON ST,CHICAGO,IL,60624.0,2023-06-22T00:00:00.000,Canvass,Pass,"55. PHYSICAL FACILITIES INSTALLED, MAINTAINED ...",41.880804,-87.728244,"(41.880804158176524, -87.7282444216966)"
255659,2577672,HAPPY LEMON,HAPPY LEMON,2862830.0,Restaurant,Risk 2 (Medium),1411 W TAYLOR ST,CHICAGO,IL,60607.0,2023-06-22T00:00:00.000,License Re-Inspection,Pass,45. SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPER...,41.869223,-87.662097,"(41.86922282497486, -87.66209718450791)"
255660,2577669,AVY'S PIZZA,AVY'S PIZZA,2113336.0,Restaurant,Risk 1 (High),9917 S EWING AVE,CHICAGO,IL,60617.0,2023-06-22T00:00:00.000,Canvass Re-Inspection,Pass,37. FOOD PROPERLY LABELED; ORIGINAL CONTAINER ...,41.715244,-87.535130,"(41.71524399093076, -87.53513029319586)"
255661,2577674,BIRRIERIA DON LUIS,BIRRIERIA DON LUIS,2796827.0,Restaurant,Risk 1 (High),3544 E 106TH ST,CHICAGO,IL,60617.0,2023-06-22T00:00:00.000,Canvass Re-Inspection,Pass,39. CONTAMINATION PREVENTED DURING FOOD PREPAR...,41.702852,-87.537139,"(41.70285190603626, -87.537139292445)"


**Step 2:** Save as a pin to Connect

In [10]:
pin_name = f"{user_name}/chicago-food-inspection-data-raw"
board.pin_write(
    food_inspection_data, 
    name=pin_name, 
    type="csv",
    versioned=True,
    title="City of Chicago - Food Inspection Data (RAW)"
)

Writing pin:
Name: 'sam.edwardes/chicago-food-inspection-data-raw'
Version: 20230623T081632Z-e4de3


Meta(title='City of Chicago - Food Inspection Data (RAW)', description=None, created='20230623T081632Z', pin_hash='e4de384f69e9216d', file='chicago-food-inspection-data-raw.csv', file_size=290067766, type='csv', api_version=1, version=VersionRaw(version='76321'), tags=None, name='sam.edwardes/chicago-food-inspection-data-raw', user={}, local={})

In [11]:
board.pin_versions(pin_name)

Unnamed: 0,version
0,75834
1,75872
2,76321
